Approximate substring indexing is accomplished by decomposing each string
in a database into overlapping "positional q-grams", sequences of a
predetermined length q, and containing information regarding the
"position" of each q-gram within the string (i.e., 1.sup.st q-gram,
4.sup.th q-gram, etc.). An index is then formed of the tuples of the
positional q-gram data (such as, for example, a B-tree index or a hash
index). Each query applied to the database is similarly parsed into a
plurality of positional q-grams (of the same length), and a candidate set
of matches is found. Position-directed filtering is used to remove the
candidates which have the q-grams in the wrong order and/or too far apart
to form a "verified" output of matching candidates. If errors are
permitted (defined in terms of an edit distance between each candidate
and the query), an edit distance calculation can then be performed to
produce the final set of matching strings.