Method for generating training data for medical text abbreviation and acronym normalization

A method for electronically generating high-quality feature vectors that can be used in connection with electronic data processing systems implementing Maximum Entropy or other statistical models to accurately normalize abbreviations in text such as medical records. An abbreviation database and a training text database are provided. The abbreviation database includes abbreviation data representative of abbreviations and associated expansions to be normalized. The training text database includes a corpus of text having expansions of the abbreviations to be normalized. The corpus of text is processed as a function of the abbreviation data to identify the expansions in the corpus of text. Context information describing the context of the text in which the expansions were identified is generated. A set of feature vectors is also stored. Each feature vector including the context information generated for the associated expansion identified in the corpus of text.

Web www.patentalert.com

< Method and system of typing resources in a distributed system

< Operators for accessing hierarchical data in a relational system

> Creation of customized trees

> Voice interface for a search engine

~ 00251