Systems and methods for identifying populations of events in a
multi-dimensional data set, e.g., seven dimensional flow cytometry data
of a blood sample. The populations may, for example, be sets or clusters
of data representing different white blood cell components in the sample.
The methods use a library consisting of one or more one finite mixture
models, each model component comprising parameters representing
multi-dimensional Gaussian probability density functions, one density for
each population of events expected in the data set. The methods further
use an expert knowledge set comprising one or more data transformations
for operation on the multi-dimensional data set and one or more logical
statements. The transformations and logical statements encode a priori
expectations as to the relationships between different event populations
in the data set. The methods further use program code comprising
instructions by which a processing unit such as a computer may operate on
the multi-dimensional data, a finite mixture model selected from the
library, and the expert knowledge set to thereby identify populations of
events in the multi-dimensional data set.