Data mining techniques are provided which are effective and efficient for discovering
useful information from an amorphous collection or data set of records. For example,
the present invention provides for the mining of data, e.g., of several or many
records, to discover interesting associations between entries of qualitative text,
and covariances between data of quantitative numerical types, in records. Although
not limited thereto, the invention has particular application and advantage when
the data is of a type such as clinical, pharmacogenomic, forensic, police and financial
records, which are characterized by many varied entries, since the problem is then
said to be one of "high dimensionality" which has posed mathematical and technical
difficulties for researchers. This is especially true when considering strong negative
associations and negative covariance, i.e., between items of data which may so
rarely come together that their concurrence is never seen in any record, yet the
fact that this is not expected is of potential great interest.