A hybrid method of predicting the occurrence of future critical events in
a computer cluster having a series of nodes records system performance
parameters and the occurrence of past critical events. A data filter
filters the logged to data to eliminate redundancies and decrease the
data storage requirements of the system. Time-series models and rule
based classification schemes are used to associate various system
parameters with the past occurrence of critical events and predict the
occurrence of future critical events. Ongoing processing jobs are
migrated to nodes for which no critical events are predicted and future
jobs are routed to more robust nodes.