A computer implemented information retrieval system is provided. The
system includes a user input configured to receive a user query relative
to the corpus. A machine learning classifier is trained with a first set
of training data comprising anchor text relative to at least some of the
documents in the corpus. A processing unit is adapted to interact with
the classifier to obtain search results relative to the query using the
machine learning classifier. In some aspects, the classifier is also
trained with a second set of training data. A method of integrating a new
document into a corpus of documents is also provided. A method of
training a machine learning classifier for retrieving documents from a
corpus using two distinct types of training data is also provided.