A document (or multiple documents) is analyzed to identify entities of interest within that document. This is accomplished by constructing n-gram or bi-gram models that correspond to different kinds of text entities, such as chemistry-related words and generic English words. The models can be constructed from training text selected to reflect a particular kind of text entity. The document is tokenized, and the tokens are run against the models to determine, for each token, which kind of text entity is most likely to be associated with that token. The entities of interest in the document can then be annotated accordingly.

 
Web www.patentalert.com

< Method for transmitting software robot message

< Method of dividing past computing instances into predictable and unpredictable sets and method of predicting computing value

> Partitioning data elements of a visual display of a tree using weights obtained during the training state and a maximum a posteriori solution for optimum labeling and probability

> Learning method and apparatus utilizing genetic algorithms

~ 00607