Filtering training data for machine learning

Data center activity traces form a corpus used for machine learning. The data in the corpus are putatively normal but may be tainted with latent anomalies. There is a statistical likelihood that the corpus represents predominately legitimate activity, and this likelihood is exploited to allow for a targeted examination of only the data representing possible anomalous activity. The corpus is separated into clusters having members with like features. The clusters having the fewest members are identified, as these clusters represent potential anomalous activities. These clusters are evaluated to determine whether they represent actual anomalous activities. The data from the clusters representing actual anomalous activities are excluded from the corpus. As a result, the machine learning is more effective and the trained system provides better performance, since latent anomalies are not mistaken for normal activity.

Web www.patentalert.com

< Transparent checkpointing and process migration in a distributed system

> System and method for predictive process management

> System and method for creating and implementing community defined presentation structures

~ 00593