A data processing method and system for retrieving a subset of k items
from a database of n items (n.gtoreq.k) firstly determines a limited set
of bk items (b>1) in the database which have the greatest similarity
to an input query t according to a given similarity function S. A result
subset is then constructed by including as a first member the item having
the greatest similarity S to the query t, and iteratively selecting each
successive member of the subset as that remaining item of the bk items
having the highest quality Q, where Q is a given function of both
similarity to the input query t and relative diversity RD with respect to
the items already in the results subset. In this way the diversity of the
results subset is greatly increased relative to a simple selection of the
k most similar items to the query t, with only a modest additional
increase in processing requirements.