A system and method for monitoring the state and operability of components
in distributed computing systems. The present invention indicates whether
a component is operating correctly, and reliably distributes the state of
all components among the elements of the system by means of a reliable
multicast protocol. A Life Support Service (LSS) update service enables
clients to record, retrieve, and distribute state information locally and
remotely via table entries. An LSS heartbeat service enables prompt
delivery of notice of failure to all components.