A method and system that aggregates data associated with one or more
entities from different data sources are provided. The data sources
include documents, web pages, or images that have information about one
or more entities. The information is extracted from the data sources
based on criteria that define the entities. The extracted information is
utilized to generate a hash identifier that corresponds to each entity
and one or more storage locations. The one or more storage locations and
associated hash identifiers are utilized to store the extracted
information corresponding to the entities, and the extracted information
for each entity is structured as a virtual page that is stored in an
index having references to the data sources.