A computer-implemented method is provided for ranking features within a
large dataset containing a large number of features according to each
feature's ability to separate data into classes. For each feature, a
support vector machine separates the dataset into two classes and
determines the margins between extremal points in the two classes. The
margins for all of the features are compared and the features are ranked
based upon the size of the margin, with the highest ranked features
corresponding to the largest margins. A subset of features for
classifying the dataset is selected from a group of the highest ranked
features. In one embodiment, the method is used to identify the best
genes for disease prediction and diagnosis using gene expression data
from micro-arrays.