One embodiment of the present invention provides a system that optimizes
subset selection to facilitate parallel training of a support vector
machine (SVM). During operation, the system receives a dataset comprised
of data points. Next, the system evaluates the data points to produce a
class separability measure, and uses the class separability measure to
partition the data points in the dataset into N batches. The system then
performs SVM training computations on the N batches in parallel to
produce support vectors for each of the N batches. Finally, the system
performs a final SVM training computation using an agglomeration of
support vectors computed for each of the N batches to obtain a
substantially optimal solution to the SVM training problem for the entire
dataset.