A method for performing differencing and updating between electronic files is
provided.
A byte-level file differencing algorithm receives two byte streams corresponding
to an original file and a new file. The new file includes updated and revised versions
of the original file. The file differencing algorithm determines a longest common
sub-string (LCS) between the two byte streams and divides each of the two byte
streams into sub-streams. The sub-streams include the LCS along with prefix and
suffix sub-streams to the LCS. The file differencing algorithm then recursively
determines an LCS and divides each sub-stream until a size of the sub-streams is
less than a pre-specified size. Byte-level differences are then identified between
each of the corresponding sub-streams. Further, the file differencing algorithm
defines a protocol for structuring a delta file by using a set of operation codes
and a variable length integer format to eliminate redundant information in the
delta file. Using this protocol, the file differencing algorithm generates the
delta file including an operation array that codes the identified byte-level differences.