In a dictionary formation aspect of the invention, a computer-based method
of processing a plurality of sequences in a database comprises the
following steps. First, the method includes evaluating each of the
plurality of sequences including characters which form each sequence.
Then, at least one pattern of characters is generated representing at
least a subset of the sequences in the database. The pattern has a
statistical significance associated therewith, the statistical
significance of the pattern being determined by a value representing a
minimum number of sequences that the pattern supports in the database.