A linguistic disambiguation system and method creates a knowledge base by
training on patterns in strings that contain ambiguity sites. The string
patterns are described by a set of reduced regular expressions (RREs) or
very reduced regular expressions (VRREs). The knowledge base utilizes the
RREs or VRREs to resolve ambiguity based upon the strings in which the
ambiguity occurs. The system is trained on a training set, such as a
properly labeled corpus. Once trained, the system may then apply the
knowledge base to raw input strings that contain ambiguity sites. The
system uses the RRE- and VRRE-based knowledge base to disambiguate the
sites.