Compositions and methods for rapid and highly efficient characterization
of genetic diversity in organisms are provided. The methods involve rapid
sequencing and characterization of extrachromosomal DNA, particularly
plasmids, to identify useful nucleotide sequences. The method involves
generating a library of extrachromosomal DNA clones, sequencing a portion
of the clones, comparing the sequences against a database of existing DNA
sequences, using an algorithm to select novel nucleotide sequences based
on the presence or absence of the sequence in a database, and
identification of at least one novel nucleotide sequence. The DNA
sequence can also be translated in all six frames and the resulting amino
acid sequences compared against a database of protein sequences.
Organisms of particular interest include, but are not limited to
bacteria, fungi, algae, and the like. Compositions comprise a mini-cosmid
vector comprising a stuffer fragment and at least one cos site.