A method for automatically organizing digitized photographic images into
events based on spoken annotations comprises the steps of: providing
natural-language text based on spoken annotations corresponding to at
least some of the photographic images; extracting predetermined
information from the natural-language text that characterizes the
annotations of the images; segmenting the images into events by examining
each annotation for the presence of certain categories of information
which are indicative of a boundary between events; and identifying each
event by assembling the categories of information into event
descriptions. The invention further comprises the step of summarizing
each event by selecting and arranging the event descriptions in a
suitable manner, such as in a photographic album.