A system and method for learning models from scarce and/or skewed training
data includes partitioning a data stream into a sequence of time windows.
A most likely current class distribution to classify portions of the data
stream is determined based on observing training data in a current time
window and based on concept drift probability patterns using historical
information.