Image data from cameras showing movement of a number of people, and sound
data, is archived and processed to determine the position and orientation
of each person's head and to determine at whom each person is looking.
The speaker is determined by determining at which person most people are
looking. Alternatively, the sound data is processed to determine the
direction from which the sound came, and it is determined who is speaking
by determining which person's head is in a position corresponding to the
direction from which the sound came. The personal speech recognition
parameters for the speaker are selected and used to convert the sound
data to text data. Image data to be archived is chosen by selecting the
camera which best shows the speaker and the participant to whom he is
speaking. Data is stored in a meeting archive database.