A method and system for quantifying the quality of search results from a
search engine based on cohesion. The method and system include modeling a
set of search engine search results as a cluster and measuring the
cohesion of the cluster. In an embodiment, the cohesion of the cluster is
the average similarity between the cluster elements to a centroid vector.
The centroid vector is the average of the weights of the vectors of the
cluster. The similarity between the centroid vector and the cluster's
elements is the cosine similarity measure. Each document in the set of
search results is represented by a vector where each cell of the vector
represents a stemmed word. Each cell has a cell value which is the
frequency of the corresponding stemmed word in a document multiplied by a
weight that takes into account the location of the stemmed word within
the document.