Techniques are provided for managing a resource in a High Availability
(HA) system. The techniques involve incrementing a count when a
particular type of remedial action is performed on a resource, so that
the count that reflects how often the particular type of remedial action
has been performed for the resource. When it is determined that the
resource has been in stable operation, the count is automatically
reduced. After a failure, the count is used to determine whether to
attempt to perform the particular type of remedial action on the
resource. Examples of remedial actions include restarting the resource,
and relocating the resource to another node of a cluster. By using the
count, the system insures that a faulty resource does not get constantly
"bounced". By reducing the count when a resource has become stable, there
is less likelihood that failure of otherwise stable resources will
require manual intervention.