The invention provides a system and method for identifying text in a word
set. The method may include retrieving a target term set including a
plurality of target terms; retrieving the word set including a plurality
of text words; normalizing target terms in the target term set to
generate normalized terms; normalizing text words in the word set to
generate normalized words; comparing the normalized terms with the
normalized words to determine (1) a first match between a first
normalized term and a first normalized word; and (2) a second match
between a second normalized term and a second normalized word. The method
may further include determining a distance between a text word position
of the first normalized word and a text word position of the second
normalized word to determine if any relative positions satisfy threshold
criteria, and identifying a first text word position and a second text
word position as constituting possible identified text once a relative
position of the text word position of the first normalized word and a
text word position of the second normalized word satisfies the threshold
criteria.