A method for providing name-face/voice-role association includes determining
whether
a closed captioned text accompanies a video sequence, providing one of text recognition
and speech to text conversion to the video sequence to generate a role-name versus
actor-name list from the video sequence, extracting face boxes from the video sequence
and generating face models, searching a predetermined portion of text for an entry
on the role-name versus actor-name list, searching video frames for face models/voice
models that correspond to the text searched by using a time code so that the video
frames correspond to portions of the text where role-names are detected, assigning
an equal level of certainty for each of the face models found, using lip reading
to eliminate face models found that pronounce a role-name corresponding to said
entry on the role-name versus actor-name list, scanning a remaining portion of
text provided and updating a level of certainty for said each of the face models
previously found. Once a particular face model/voice model and role-name association
has reached a threshold the role-name, actor name, and particular face model/voice
model is stored in a database and can be displayed by a user when the threshold
for the particular face model has been reached. Thus the user can query information
by entry of role-name, actor name, face model, or even words spoken by the role-name
as a basis for the association. A system provides hardware and software to perform
these functions.