The present invention is a method and an apparatus for reward-based
learning of policies for managing or controlling a system or plant. In
one embodiment, a method for reward-based learning includes receiving a
set of one or more exemplars, where at least two of the exemplars
comprise a (state, action) pair for a system, and at least one of the
exemplars includes an immediate reward responsive to a (state, action)
pair. A distance measure between pairs of exemplars is used to compute a
Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action)
pairs into a lower-dimensional representation. The mapping is then
applied to the set of exemplars, and reward-based learning is applied to
the transformed exemplars to obtain a management policy.