Systems and methods enabling search of a repository for the location of
data that is similar to input data, using a defined measure of
similarity, in a time that is independent of the size of the repository
and linear in a size of the input data, and a space that is proportional
to a small fraction of the size of the repository. The similar data
segments thus located are further analyzed to determine their common
(identical) data sections, regardless of the order and position of the
common data sections in the repository and input, and in a time that is
linear in the segment size and in constant space.