In a system implementing image retrieval by performing speech recognition
on voice information added to an image, the speech recognition is
triggered by an event, such as an image upload event, that is not an
explicit speech-recognition order event. The system obtains voice
information added to an image, detects an event, and performs speech
recognition on the obtained voice information in response to a specific
event, even if the detected event is not an explicit speech-recognition
order event.