A system and method in accordance with an embodiment of the invention addresses
the problems of unlinked or sparsely linked documents by linking them using a set
of automatically extracted content words, the "index terms." Upon receiving a list
of documents for indexing, the system and method in accordance with an embodiment
of the invention automatically selects the terms to be indexed and generates a
hypertext concordance (an "HC"). A concordance is an index where each of the indexed
terms is listed with surrounding text, i.e., in context. As well, each of the indexed
terms in the HC is given a hyperlink, instead of a page number, back to the occurrence
of the term in a version of the indexed document. In one embodiment of the invention,
the original document that has been indexed is also revised to include hyperlinks
from the index terms into the HC.