Methods and apparatus, including systems and computer program products, to
provide clustering of users in which users are each represented as a set
of elements representing items, e.g., items selected by users using a
system. In one aspect, a program operates to obtain a respective interest
set for each of multiple users, each interest set representing items in
which the respective user expressed interest; for each of the users, to
determine k hash values of the respective interest set, wherein the i-th
hash value is a minimum value under a corresponding i-th hash function;
and to assign each of the multiple users to each of the respective k
clusters established for the respective user, the i-th cluster being
represented by the i-th hash value. The assignment of each of the users
to k clusters is done without regard to the assignment of any of the
other users to k clusters.