Documents from a data stream are clustered by first generating a feature
vector for each document. A set of cluster centroids (e.g., feature
vectors of their corresponding clusters) are retrieved from a memory
based on the feature vector of the document and a relative age of each of
the cluster centroids. The centroids may be retrieved by retrieving a set
of cluster identifiers from a cluster table, the cluster identifiers each
indicative of a respective cluster centroid, and retrieving the cluster
centroids corresponding to the retrieved cluster identifiers from a
memory. A list of cluster identifiers in the cluster table may be
maintained based on the relative age of cluster centroids corresponding
to the cluster identifiers. Cluster identifiers that correspond to
cluster centroids with a relative age exceeding a predetermined threshold
are periodically removed from the list of cluster identifiers.