A by-line extraction method detects a set of potential headlines from a
title meta-tag of a crawled document, selects a candidate headline from
the set of potential headlines, and extracts the by-line information from
the document using the location of the selected candidate headline. The
method constructs the set of potential headlines based on the title
meta-tag. The method selects a candidate headline by evaluating the set
of potential headlines in order of the lengths of the potential
headlines. The method extracts the by-line information from the document
by using the location of the selected candidate headline to extract a
string representing a date, a name, or a source located within a minimum
distance from the location of the potential headline.