A system and method for identifying compounds through iterative analysis of measure of association is disclosed. A limit on a number of tokens per compound is specified. Compounds within a text corpus are iteratively evaluated. A number of occurrences of one or more n-grams within the text corpus is determined. Each n-gram includes up to a maximum number of tokens, which are each provided in a vocabulary for the text corpus. At least one n-gram including a number of tokens equal to the limit based on the number of occurrences is identified. A measure of association between the tokens in the identified n-gram is determined. Each identified n-gram with a sufficient measure of association is added to the vocabulary as a compound token and the limit is adjusted.

 
Web www.patentalert.com

< Providing a topic list

> Conceptualization of job candidate information

> Natural language based search engine for handling pronouns and methods of use therefor

~ 00529