Methods and systems of mapping of an optical character recognition (OCR)
text string to a code included in a coding dictionary by supplementing
the Levenshtein Distance Algorithm (LDA) with additional information in
the form of adjustments based on particular character substitutions,
insertions and deletions together with weighting based on multiple
alternatives for the OCR text string. In one embodiment, an OCR text
string mapping method (100) includes receiving (110) an OCR text string,
comparing (120) it with selected text strings from a coding dictionary,
computing (130) modified Levenshtein distances associated with the
comparisons by determining (140) substitution penalties, determining
(150) insertion penalties, determining (160) deletion penalties and
combining (170) the penalties, selecting (180) the best matching text
string from the coding dictionary based on the modified Levenshtein
distances, determining (190) whether a maximum threshold distance is met,
and assigning (200) a code associated with the best matching text string
to the OCR text string when met, and assigning (210) a null or no code
when not met.