A system is controlled by an actor-critic based fuzzy reinforcement learning
algorithm
that provides instructions to a processor of the system for applying actor-critic
based fuzzy reinforcement learning. The system includes a database of fuzzy-logic
rules for mapping input data to output commands for modifying a system state, and
a reinforcement learning algorithm for updating the fuzzy-logic rules database
based on effects on the system state of the output commands mapped from the input
data. The reinforcement learning algorithm is configured to converge at least one
parameter of the system state to at least approximately an optimum value following
multiple mapping and updating iterations. The reinforcement learning algorithm
may be based on an update equation including a derivative with respect to at least
one parameter of a logarithm of a probability function for taking a selected action
when a selected state is encountered.