Online document clustering using TFIDF and predefined time windows

Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document and a relative age of each of the cluster centroids. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. A list of cluster identifiers in the cluster table may be maintained based on the relative age of cluster centroids corresponding to the cluster identifiers. Cluster identifiers that correspond to cluster centroids with a relative age exceeding a predetermined threshold are periodically removed from the list of cluster identifiers.

Web www.patentalert.com

< Reduction of memory usage for prime number storage by using a table of differences between a closed form numerical function and prime numbers which bounds a prime numeral between two index values

< Method and system for measuring interest levels of digital messages

> Configurable hierarchical content filtering system

> Agent engine

~ 00602