Apparatus for identifying topics of document data has: a word ranker (171) for ranking words that are present in or representative of the content of the document data; a co-occurrence ranker (172) for ranking co-occurrences of words that are present in or representative of the content of the document data; a phrase ranker (170) for ranking phrases in the document data; a word selector (174) for selecting the highest ranking words; a co-occurrence identifier (176) for identifying which of the highest ranking co-occurrences contain at least one of the highest ranking words; a phrase identifier (177) for identifying the phrases containing at least one word from the identified co-occurrences; a phrase selector (178) for selecting the highest ranking one or ones of the identified phrases as the topic or topics of the document data; and an output device (40) for outputting data relating to the selected topics.

 
Web www.patentalert.com

> In-database clustering

~ 00372