A method and system for clustering documents based on generalized sentence
patterns of the topics of the documents is provided. A generalized
sentence patterns ("GSP") system identifies a "sentence" that describes
the topic of a document. To cluster documents, the GSP system generates a
"generalized sentence" form of the sentence that describes the topic of
each document. The generalized sentence is an abstraction of the words of
the sentence. The GSP system identifies clusters of documents based on
the patterns of their generalized sentences. The GSP system clusters
documents when the generalized sentence representations of their topics
have a similar pattern.