System, method and computer program product for recovering from a failure
of a computing device. Start up of a first component of the device is
monitored and a determination is made whether the first component has
started successfully. If so, a second, higher level component of the
device is started. Operational data received from the second component is
monitored. If the operational data falls outside of an operational
boundary, an action is performed on the second component to enable the
second component to operate within a preferred operational boundary. If
the first component does not start up successfully, a determination is
made if start up of the first component is critical to operation of the
second component. If so, a corrective action is performed relative to the
first component and afterwards, an attempt is made to start up the second
component.