The invention provides a novel highly-adaptive agent learning machine
comprising a plurality of learning modules each having a set of
reinforcement learning system which works on an environment and determines
an action output for maximizing a reward provided as a result thereof and
an environment predicting system which predicts a change in the
environment, wherein a responsibility signal is calculated such that the
smaller a prediction error of the environment predicting system of each of
the learning modules, the larger the value thereof, and the action output
by the reinforcement learning system is weighted in proportion to the
responsibility signal, thereby providing an action with regard to the
environment. The machine switches and combines actions optimum to various
states or operational modes of an environment without using any specific
teacher signal and performs behavior learning flexibly without using any
prior knowledge.