Indexing, searching, and retrieving the content of speech documents
(including but not limited to recorded books, audio broadcasts, recorded
conversations) is accomplished by finding and retrieving speech documents
that are related to a query term at a conceptual level, even if the
speech documents does not contain the spoken (or textual) query terms.
Concept-based cross-media information retrieval is used. A
term-phoneme/document matrix is constructed from a training set of
documents. Documents are then added to the matrix constructed from the
training data. Singular Value Decomposition is used to compute a vector
space from the term-phoneme/document matrix. The result is a
lower-dimensional numerical space where term-phoneme and document vectors
are related conceptually as nearest neighbors. A query engine computes a
cosine value between the query vector and all other vectors in the space
and returns a list of those term-phonemes and/or documents with the
highest cosine value.