Techniques are described for reducing the false positive rate of regular
expression attribute extractions via a specific data representation and a
machine learning method that can be trained at a much lower cost (much
fewer labeled examples) than would be required by a full scale machine
learning solution. Attribute determinations made using the regular
expression technique are represented as skeleton tokens. The skeleton
tokens, along with accurate attribute determinations, are provided to a
machine-learning mechanism to train the machine-learning mechanism. Once
trained, the machine-learning mechanism is used to predict the accuracy
of attribute determinations represented by skeleton tokens generated for
not-yet-analyzed input text.