An improved system and method for clustering text or content described by
text is provided. Each text in a set of texts may be represented as a
dimensional vector of words. Singleton texts that may not be similar to
another text may be excluded from the set of texts for clustering. Texts
identified as good nearest neighbors may then be grouped in the same
cluster. In addition, metadata describing content may be used for
clustering items of aggregated content from content feeds. Metadata
describing items of content from content feeds may be converted into a
set of texts and texts identified as good nearest neighbors may then be
clustered. Items of content feeds described by the clustered texts may
then be similarly clustered. Any types of items of content that may be
described by text may be clustered, including audio, images, video,
multimedia content, and so forth.