A system for compressing a first file based on a second file used as a
dictionary. The second file is sampled at predetermined locations within
the second file, each sample having a fixed sample length. A dictionary
database is created by storing each sample and a start position within the
second file of the corresponding sample. The first file is compared to the
dictionary database to locate any matches that exist between the first
file and the stored samples of the dictionary database. In turn, the first
file is encoded by outputting coded segments for each match located by
this comparison of the source file and the dictionary database and uncoded
segments for all remaining portions of the first file.