A method for repairing a wrapper associated with an information source,
includes defining a classifier, based on content features of extracted
and labeled information using the wrapper, using the classifier to
extract content information from the file according to a set of
classifier extraction rules; analyzing the extracted content information
according to the content features and assigning a label to any extracted
content information which satisfies the label's rules; and defining a
repaired wrapper as the classifier and those labels in the set which have
been assigned to extracted content information. Additional content
information and labels can be extracted by iteratively creating a
classifier based on both content features and structure features of
extracted strings.