Provided are methods of classifying biological samples based on high
dimensional data obtained from the samples. The methods are especially
useful for prediction of a class to which the sample belongs under
circumstances in which the data are statistically under-determined. The
advent of microarray technologies which provide the ability to measure en
masse many different variables (such as gene expression) at once has
resulted in the generation of high dimensional data sets, the analysis of
which benefits from the methods of the present invention. High
dimensional data is data in which the number of variables, p, exceeds the
number of independent observations (e.g. samples), N, made. The invention
relies on a dimension reduction step followed by a logistic determination
step. The methods of the invention are applicable for binary (i.e.
univariate) classification and multi-class (i.e. multivariate)
classifications. Also provided are data selection techniques that can be
used in accordance with the methods of the invention.