A method and system for identifying object information of an information
page is provided. An information extraction system identifies the object
blocks of an information page. The extraction system classifies the
object blocks into object types. Each object type has associated
attributes that define a schema for the information of the object type.
The extraction system identifies object elements within an object block
that may represent an attribute value for the object. After the object
elements are identified, the extraction system attempts to identify which
object elements correspond to which attributes of the object type in a
process referred to as "labeling." The extraction system uses an
algorithm to determine the confidence that a certain object element
corresponds to a certain attribute. The extraction system then selects
the set of labels with the highest confidence as being the labels for the
object elements.