One embodiment of the present invention provides a system that tunes
state-based scheduling policies, wherein the system contains a number of
central processing units (CPUs). During operation, the system recurrently
estimates a long-term benefit to the system by feeding a system state as
input to a parametric value function and computing an output from the
parametric value function. The system makes scheduling decisions for the
CPUs based on the estimated long-term benefit to the system. The system
also tunes a parameter of the parametric value function based on current
and previously estimated long-term benefit to the system, thereby
facilitating more effective scheduling policies.