Embodiments of a data representation system for describing specific data
sets, such as documents, web pages, or search engine queries, based on
data tokens, such as words or n-grams, contained in a collection of
documents are described. Such a system can be used in any type of
information retrieval application, such as a document, web page, or
online advertisement serving process, based on an information request,
such as a query executed through an Internet search engine. For example,
when a search is performed at a search engine, a content provider uses
the system to represent the search query and compares the query
representation against representations of a set of content in order to
identify, retrieve and aggregate the content from the set most relevant
to the search query, in the form of a web page or other data unit for
display or access through the web browser.