A method and system for recognizing alphabetic characters that contain
diacritics is described. An image analysis separates the character into
its constituent components. The one or more diacritic components are then
distinguished and isolated from the base portion of the character.
Optical recognition is performed separately on the base portion. The
diacritic is recognized through a special image analysis and pattern
recognition algorithms. The image analysis extracts geometric information
from the one or more diacritic components. The extracted information is
used as input for the pattern recognition algorithms. The output is a
code that corresponds to a particular diacritic. The recognized base
portion and diacritic are combined and a check is performed for
acceptable combinations in a chosen language. By separately recognizing
the base portion and diacritic, the character sets used by the recognizer
can be narrowed, resulting in greater recognition.