A linguistic disambiguation system and method creates a knowledge base by training
on patterns in strings that contain ambiguity sites. The string patterns are described
by a set of reduced regular expressions (RREs) or very reduced regular expressions
(VRREs). The knowledge base utilizes the RREs or VRREs to resolve ambiguity based
upon the strings in which the ambiguity occurs. The system is trained on a training
set, such as a properly labeled corpus. Once trained, the system may then apply
the knowledge base to raw input strings that contain ambiguity sites. The system
uses the RRE- and VRRE-based knowledge base to disambiguate the sites.