A method and apparatus are provided for retrieving documents from a
collection of documents based on information other than the contents of a
desired document. The collection of documents, which may be a hypertext
system or documents available via the World Wide Web, is indexed. In one
embodiment, an indexing process of a search engine receives one or more
specifications that identify documents, or document locations, and
non-content information such as a tag word or code word. The indexing
process searches the index to identify all documents in the index that
match one or more of the specifications. If a match is found, the tag word
is added to the index, and information about the matching document is
stored in the index in association with the tag word. A search query is
submitted to the search engine. The search query is automatically modified
to add a reference to the tag word, such as a query term that will exclude
any index entry for a document associated with the tag word. The search is
executed against the index, and a set of search results is generated.
Accordingly, the search results automatically exclude all documents
associated with the tag word. These techniques may be used, for example,
to implement a Web search service that produces more accurate search
results or that prevents certain documents, such as pornographic
materials, from appearing in search results.