A computer-implemented method and system for identifying key images in a document
is provided. The operations used include extracting one or more document keywords
from the document considered important in describing the document, collecting one
or more images associated with the document including information describing each
image, generating a proximity factor for each image collected from the document
and each document keyword that reflects the degree of correlation between the image
and the document keyword, and determining the importance of each image according
to an image metric that combines the proximity factors for each document keyword
and image pair. In addition, the operations may also include ordering the document
keywords according to an ordering criterion and weighting the proximity factor
associated with each document keyword and image pair based on the order of the
document keyword.