A system and method for dynamically evaluating latent concepts in
unstructured documents is disclosed. A multiplicity of concepts are
extracted from a set of unstructured documents into a lexicon. The
lexicon uniquely identifies each concept and a frequency of occurrence. A
frequency of occurrence representation is created for the documents set.
The frequency representation provides an ordered corpus of the
frequencies of occurrence of each concept. A subset of concepts is
selected from the frequency of occurrence representation filtered against
a pre-defined threshold. A group of weighted clusters of concepts
selected from the concepts subset is generated. A matrix of best fit
approximations is determined for each document weighted against each
group of weighted clusters of concepts.