A computer-based process retrieves information organized in documents
containing text and/or coded representations of text. The process
involves obtaining and labeling a selected set of documents, and
extracting and selecting features from each document in the selected set.
The extracted and selected features are represented, and models are
constructed using parametric learning algorithms. The constructed models
are capable of assigning a label to each document. The model parameters
being instantiated use a first subset of the selected set of documents.
Parameters are chosen by validating the corresponding model against at
least a second subset of the full document set. The constructed models
also are capable of assigning labels and ranks to similar documents
outside a selected subset not previously given to the process of model
construction.