A method and an apparatus for multimedia data management are disclosed.
The method provides an indexing and retrieval scheme for digital photos
with speech annotations based on image-like patterns transformed from the
recognized syllable candidates. For annotated spoken content, the
recognized n-best syllable candidates are transformed into a sequence of
syllable-transformed patterns. Eigen-image analysis is further adopted to
extract the significant information to reduce the dimensionality. Vector
quantization is applied to quantize the syllable-transformed patterns
into feature vectors for indexing. The invention of indexing scheme
reduces the dimensionality and noise of data, and achieves better
performance of 16.26% for speech annotated photo retrieval.