A method and system are provided for detection of authors across different
types of information sources such as across documents on the Web. The
method includes obtaining a compression signature for a document, and
determining the similarity between compression signatures of two or more
documents. If the similarity is greater than a threshold measure, the two
or more documents are considered to be by the same author. Scored pairs
of documents are clustered to provide a group of documents by the same
author.The group of documents by the same author can be used for user
profiling, noise reduction, contribution sizing, detecting fraudulent
contributions, obtaining other search results by the same author, or
mating a document with undisclosed authorship to a document of known
author.