The invention provides a method, apparatus and system for classification
and clustering electronic data streams such as email, images and sound
files for identification, sorting and efficient storage. The inventive
systems disclose labeling a document as belonging to a predefined class
though computer methods that comprise the steps of identifying an
electronic data stream using one or more learning machines and comparing
the outputs from the machines to determine the label to associate with
the data. The method further utilizes learning machines in combination
with hashing schemes to cluster and classify documents. In one embodiment
hash apparatuses and methods taxonomize clusters. In yet another
embodiment, clusters of documents utilize geometric hash to contain the
documents in a data corpus without the overhead of search and storage.