Data clustering using error-tolerant frequent item sets

A generalization of frequent item sets to error-tolerant frequent item sets (ETF) is disclosed, together with its application in data clustering using error-tolerant frequent item sets to either build clusters or as an initialization technique for standard clustering algorithms. Efficient feasible computational algorithms for computing ETF's from very large databases is presented. In one embodiment, a method determines a plurality of weak ETF's, which are strongly tolerant of errors, and determines a plurality of strong ETF's therefrom, which are less tolerant of errors. The resulting clusters can be used as an initial model for a standard clustering approach, or may themselves be used as the end clusters. In one embodiment, the data covered by the strong clusters is removed from the data, and the process is repeated, until no more weak clusters can be found. Te invention includes methods for constructing ETF's from more general data types: data sets that include categorical discrete, continuous, and binary attributes.
Una generalizzazione di frequenti insiemi dell'articolo ai frequenti insiemi errore-tolleranti dell'articolo (ETF) è rilevata, insieme alla relativa applicazione nei dati che ragruppano usando i frequenti insiemi errore-tolleranti dell'articolo alle serie di ingranaggi di configurazione o come tecnica di inizio per le procedure ragruppanti standard. Le procedure di calcolo fattibili efficienti per la computazione del ETF dalle basi di dati molto grandi è presentata. In un incorporamento, un metodo determina una pluralità di ETF debole, che sono fortemente tolleranti degli errori e determina una pluralità di ETF forte da ciò, che sono meno tolleranti degli errori. Le serie di ingranaggi risultanti possono essere usate come modello iniziale per un metodo ragruppante standard, o possono essi stessi essere usate come le serie di ingranaggi dell'estremità. In un incorporamento, i dati coperti dalle serie di ingranaggi forti sono rimossi dai dati ed il processo è ripetuto, fino a che più serie di ingranaggi debole non possa più essere trovata. L'invenzione di Te include i metodi per la costruzione del ETF dai tipi di dati più generali: insiemi di dati che includono gli attributi discreti, continui e binari categorici.

Web www.patentalert.com

< Web site quality assurance system and method

< FAULT HANDLING MONITOR TRANSPARENTLY USING MULTIPLE TECHNOLOGIES FOR FAULT HANDLING IN A MULTIPLE HIERARCHAL/PEER DOMAIN FILE SERVER WITH DOMAIN CENTERED, CROSS DOMAIN COOPERATIVE FAULT HANDLING MECHANISMS

> Method for internet radio broadcasting including listener requests of audio and/or video files with input dedications

> Fastpath redeployment of EJBs

~ 00072