A Natural Language Understanding system is provided for indexing of free
text documents. The system according to the invention utilizes
typographical and functional segmentation of text to identify those
portions of free text that carry meaning. The system then uses words and
multi-word terms and phrases identified in the free to text to identify
concepts in the free text. The system uses a lexicon of terms linked to a
formal ontology that is independent of a specific language to extract
concepts from the free text based on the words and multi-word terms in
the free text. The formal ontology contains both language independent
domain knowledge concepts and language dependent linguistic concepts that
govern the relationships between concepts and contain the rules about how
language works. The system according to the current invention may
preferably be used to index medical documents and assign codes from
independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system
according to the current invention may also preferably make use of
syntactic parsing to improve the efficiency of the method.