A system for automatically enhancing Web pages with annotations expressed
in Extensible Markup Language (XML) which describes the pages' multimedia
content. Each Web page is parsed or scanned to identify markup tags which
contain the URLs of separately stored multimedia data (e.g. image, audio
or video files). Each referenced multimedia data entity is then retrieved
and analyzed by a type-specific process to extract metadata which
describes its content. Additional descriptive metadata may be obtained
from the referencing markup tag, accepted from a human editor, or fetched
from operating system directories which provide access to the multimedia
files. The resulting metadata is expressed in text-based XML format and
inserted into a copy of the Web page to form an enhanced Web page whose
multimedia content may then be processed by conventional text-based
indexing and searching facilities.