A mechanism to classify source documents into one of two categories,
either likely to contain desired information or unlikely to contain
desired information. Generally some form of rules based classification in
conjunction with deeper analysis using advanced techniques on difficult
cases is utilized. The rules based classification is generally good for
eliminating cases from further consideration and for identifying
documents of interest based on generally discernable relationships
between data or based on the presence or absence of data. The deeper
analysis is used to uncover more complex relationships between data that
may identify documents of interest. Portions of the process may use the
entire document while other portions of the process may use only a
portion of the document.