A rules-based grammar is generated. Segmentation ambiguities are
identified in training data. Rewrite rules for the ambiguous
segmentations are enumerated and probabilities are generated for each.
Ambiguities are resolved based on the probabilities. In one embodiment,
this is done by applying the expectation maximization (EM) algorithm.