Methods and computer program products for creating sketches of a document,
which are compared with sketches of other documents, in order to
determine the documents' degree of similarity. A sketch is a digest of
information from random locations within a document. A document is
divided into a set of shingles. Each shingle is converted into a set of
fingerprints. A sketch is determined based on one bit fingerprints thus
created. In order to create additional sketches of the document, a new
set of fingerprints are created by randomization techniques.