A method for searching a document collection includes providing an index
of terms indicating the documents in which the terms appear. A first
statistical distribution of each of at least some of the terms in the
index and a second statistical distribution of each of at least some of
the categories are estimated a over the documents in the collection. A
query including one or more of the terms and a category restriction
referring to at least one of the categories is accepted. A modified term
distribution is produced by operating on the first statistical
distribution of at least one of the terms in the query using the second
statistical distribution, responsively to the category restriction. The
query is applied to the index to return a response, in which occurrences
of the at least one of the terms are scored responsively to the modified
term distribution.