A method and system for augmenting a corpus with documents on concepts not
sufficiently covered within the corpus is provided. The augmentation
system generates a corpus concept graph from the documents of a corpus. A
corpus concept graph represents concepts of the documents as nodes and
related concepts as links between nodes. To generate a corpus concept
graph, the augmentation system identifies the concepts that are related
within each document of the corpus and adds nodes and links to the corpus
concept graph for related concepts. The augmentation system analyzes the
corpus concept graph to determine whether the relatedness of concepts of
the documents of the corpus is sufficient. If the relatedness of a pair
of concepts is not sufficient, then the augmentation system attempts to
identify documents not already in the corpus that are related to the
concepts that are not sufficiently related.