Method and apparatus for scalable probabilistic clustering using decision trees

Some embodiments of the invention include methods for identifying clusters in a database, data warehouse or data mart. The identified clusters can be meaningfully understood by a list of the attributes and corresponding values for each of the clusters. Some embodiments of the invention include a method for scalable probabilistic clustering using a decision tree. Some embodiments of the invention, perform linearly in the size of the set of data and only require a single access to the set of data. Some embodiments of the invention produce interpretable clusters that can be described in terms of a set of attributes and attribute values for that set of attributes. In some embodiments, the cluster can be interpreted by reading the attribute values and attributes on the path from the root node of the decision tree to the node of the decision tree corresponding to the cluster. In some embodiments, it is not necessary for there to be a domain specific distance function for the attributes. In some embodiments, a cluster is determined by identifying an attribute with the highest influence on the distribution of the other attributes. Each of the values assumed by the identified attribute corresponds to a cluster, and a node in the decision tree. In some embodiments, the CUBE operation is used to access the set of data a single time and the result is used to compute the influence and other calculations.
Некоторые воплощения вымысла вклюают методы для определять группы в базу данных, пакгауз данных или рынок данных. Определенные группы могут содержательн быть поняты перечнем атрибуты и соответствуя значения по каждом из из группы. Некоторые воплощения вымысла вклюают метод для scalable вероятностный связывать использующ дерева решений. Некоторые воплощения вымысла, выполняют линейно в размере комплекта данных и только требуют одиночного доступа к комплекту данных. Некоторые воплощения вымысла производят interpretable группы можно описать in terms of комплект атрибутов и значения атрибут для того установил атрибутов. В некоторых воплощениях, группа может быть интерпретирована путем читать значения и атрибуты атрибута на курсе от узла корня дерева решений к узлу дерева решений соответствуя к группе. В некоторых воплощениях, не обязательно for there to be функция расстояния домена специфически для атрибутов. В некоторых воплощениях, группа обусловлена путем определять атрибут с самым высоким влиянием на распределении других атрибутов. Каждое из значений предположенных определенным атрибутом соответствует к группе, и узлу в дерева решений. В некоторых воплощениях, деятельность КУБИКА использована для того чтобы достигнуть комплекта данных одиночное время и результат использован для того чтобы вычислить влияние и другие вычисления.

Web www.patentalert.com

< Methods and apparatus for performing pattern discovery and generation with respect to data sequences

< Control device and method therefor, information processing device and method therefor, and medium

> Methods and apparatus for finding semantic information, such as usage logs, similar to a query using a pattern lattice data space

> Knowledge based expert analysis system

~ 00071