The subject disclosure pertains to systems and methods for facilitating
training of machine learning systems utilizing pairwise training. The
number of computations required during pairwise training is reduced by
grouping the computations. First, a score is generated for each retrieved
data item. During processing of the data item pairs, the scores of the
data items in the pair are retrieved and used to generate a gradient for
each data item. Once all of the pairs have been processed, the gradients
for each data item are aggregated and the aggregated gradients are used
to update the machine learning system.