A method organizes semi-structured data into a taxonomy, based on Tag-Separated (TS) clustering. The method comprises retrieving documents including the semi-structured data. The semi-structured data comprises structured data including structured data fields and tags, and unstructured data. The method selects a structured attribute type including any of a categorical attribute, a numerical attribute, and a tag associated with annotated text, and an unstructured attribute type including a text attribute. The method clusters the semi-structured data from the retrieved documents into a plurality of clusters based on the selected structured attribute type and the selected unstructured attribute type. For a categorical attribute, each category corresponds to a single cluster. For a numerical attribute, a clustering algorithm clusters numerical data projected onto a range of the numerical attribute. For an annotated text attribute, a monothetic clustering algorithm clusters annotated text data according to tags associated with a vocabulary for the annotated text data.

 
Web www.patentalert.com

< Allocating processing resources for multiple instances of a software component

> Automated learning of model classifications

> Neural networks decoder

~ 00503