A system and method for extracting data, hereinafter referred to as
MitoMine, that produces a strongly-typed ontology defined collection
referencing (and cross referencing) all extracted records. The input to
the mining process can be any data source, such as a text file delimited
into a set of possibly dissimilar records. MitoMine contains parser
routines and post processing functions, known as `munchers`. The parser
routines can be accessed either via a batch mining process or as part of
a running server process connected to a live source. Munchers can be
registered on a per data-source basis in order to process the records
produced, possibly writing them to an external database and/or a set of
servers. The present invention also embeds an interpreted ontology based
language within a compiler/interpreter (for the source format) such that
the statements of the embedded language are executed as a result of the
source compiler `recognizing` a given construct within the source and
extracting the corresponding source content. In this way, the execution
of the statements in the embedded program will occur in a sequence that
is dictated wholly by the source content. This system and method
therefore make it possible to bulk extract free-form data from such
sources as CD-ROMs, the web etc. and have the resultant structured data
loaded into an ontology based system.