Methods and data structures are disclosed for carrying out identifying
differences between large files comprising many lines (or other units of
comparison such as rows, words, paragraphs, sentences, etc.). The
disclosed methods and data structures facilitate and carry out a
streamlined, yet thorough comparison of two files to identify differences
between them. The streamlining is achieved by pre-processing the files
prior to submitting them to any known longest common subsequence (LCS)
search engine. The output of the LCS generator is post-processed to
compensate for changes to the sequences introduced by the pre-processing
stage.