A method and system for deriving an outcome predictor for a data set in
which a number of complex variables affect outcome. A two step model is
applied that includes application of 1) a flexible nonparametric tool for
modeling complex data, and 2) a recursive partitioning (e.g.,
classification and regression trees) methodology. In one variation, a
determination is made as to whether the data set used is representative
of a population of interest; if not, underrepresented data is replicated
so as to produce a representative data set. In one variation, a holdout
sample of the data is also used with the two step model and the
determined outcome predictor to verify the predictor produced.