The data mining platform comprises a plurality of system modules, each
formed from a plurality of components. Each module has an input data
component, a data analysis engine for processing the input data, an
output data component for outputting the results of the data analysis,
and a web server to access and monitor the other modules within the unit
and to provide communication to other units. Each module processes a
different type of data, for example, a first module processes microarray
(gene expression) data while a second module processes biomedical
literature on the Internet for information supporting relationships
between genes and diseases and gene functionality. In the preferred
embodiment, the data analysis engine is a kernel-based learning machine,
and in particular, one or more support vector machines (SVMs). The data
analysis engine includes a pre-processing function for feature selection,
for reducing the amount of data to be processed by selecting the optimum
number of attributes, or "features", relevant to the information to be
discovered.