A distributed fault detection system and method for diagnosing a storage network
fault in a data storage network having plural network access nodes connected to
plural logical storage units. When a fault is detected, the node that detects it
(designated the primary detecting node) issues a fault information broadcast advising
one or more other access nodes (peer nodes) of the fault. The primary detecting
node also sends a fault report pertaining to the fault to a fault diagnosis node.
When the peer nodes receive the fault information broadcast, they attempt to recreate
the fault. Each peer node that successfully recreates the fault (designated a secondary
detecting node) sends its own fault report pertaining to said fault to the fault
diagnosis node. The fault diagnosis node performs fault diagnosis based on all
of the fault reports.