A method and system are provided for detection of authors across different types of information sources such as across documents on the Web. The method includes obtaining a compression signature for a document, and determining the similarity between compression signatures of two or more documents. If the similarity is greater than a threshold measure, the two or more documents are considered to be by the same author. Scored pairs of documents are clustered to provide a group of documents by the same author.The group of documents by the same author can be used for user profiling, noise reduction, contribution sizing, detecting fraudulent contributions, obtaining other search results by the same author, or mating a document with undisclosed authorship to a document of known author.

 
Web www.patentalert.com

< Query-based text summarization

< Crawlable applications

> Apparatus and computerised method for determining constituent words of a compound word

> Retrieval apparatus, retrieval method and retrieval program

~ 00620