A method and system is provided for monitoring the health of processes running
on a router. A behavior of a process is monitored and the process is killed if
the behavior is abnormal. The behavior may be abnormal if the process is non-responsive,
cannot start, or repeatedly crashes. The system may include a timer to measure
a predetermined time interval for the process to perform a desired action, a counter
to count a number of times the process fails to perform the desired action before
the timer expires, and a controller to kill the process when the counter exceeds
a maximum number of failures. Alternatively, the timer could measure an amount
of uptime, the counter could count the number of times the process crashes, and
the controller could kill the process when a crash rate calculated from the number
of times the process crashes per the amount of uptime exceeds a maximum crash rate.