Methods and data structures are disclosed for carrying out identifying differences
between large files comprising many lines (or other units of comparison such as
rows, words, paragraphs, sentences, etc.). The disclosed methods and data structures
facilitate and carry out a streamlined, yet thorough comparison of two files to
identify differences between them. The streamlining is achieved by pre-processing
the files prior to submitting them to any known longest common subsequence (LCS)
search engine. The output of the LCS generator is post-processed to compensate
for changes to the sequences introduced by the pre-processing stage.