The invention relates to identification of individual nucleic acid
sequences from a mixed nucleic acid population. A typical application is
to determine the bacteria present in sample containing a mix of several
different bacteria. Present techniques require initial cultivation of the
mixed bacteria sample and manual separation of the bacteria prior to
sequencing. The invention allows for identification of the different
bacteria by direct sequencing of the mixed bacteria sample without prior
cultivation and separation. One aspect of the invention relates to
generating a degenerate sequence from a chromatogram obtained by
sequencing a mixed bacteria sample. Another aspect relates to
base-calling, i.e. identification of individual sequences making up the
degenerate sequence from the mixed bacteria sample. In this aspect, the
degenerate sequence is divided into degenerate subsequences from which
query subsequence combinations are generated. Then each query subsequence
combination is aligned against target sequences present in a database.
From these alignments, the target sequences present in the database are
assigned an overall score which is used to determine which individual
sequences were present in the mixed bacteria sample.