Diagnosis of corruption in interrelated data entities uses a graph of
nodes and edges. Datum nodes represent the data entities, relationship
nodes represent the relationships among the data entities. The datum
nodes are connected through their relationship nodes by the edges. When
corruption is detected, the relationships are analyzed and each edge
connecting a datum node to a relationship node is removed from the graph
when the corresponding relationship is invalid. The datum nodes that
remain connected to their relationship nodes form a subgraph and the
corresponding data entities are considered correct. In one aspect, if
more than one subgraph is formed, the datum nodes in the largest are
used. In another aspect, the data entities and relationships are analyzed
to create the graph when the data entities are assumed correct. The data
entities may be data and metadata of various types that can be associated
with the data.