A method and system for classifying display pages based on automatically
generated summaries of display pages. A web page classification system
uses a web page summarization system to generate summaries of web pages.
The summary of a web page may include the sentences of the web page that
are most closely related to the primary topic of the web page. The
summarization system may combine the benefits of multiple summarization
techniques to identify the sentences of a web page that represent the
primary topic of the web page. Once the summary is generated, the
classification system may apply conventional classification techniques to
the summary to classify the web page. The classification system may use
conventional classification techniques such as a Naive Bayesian
classifier or a support vector machine to identify the classifications of
a web page based on the summary generated by the summarization system.