A method and apparatus for providing learning capability to processing
device, such as a computer game, educational toy, telephone, or
television remote control, is provided to achieve one or more objective.
One of a plurality of actions (e.g., game actions, educational prompts,
listed phone numbers, or listed television channels) to be performed on
the processing device is selected. A user input indicative of a user
action (e.g., a player action, educational input, called phone number, or
watched television channel) is received. An outcome of the selected
action and/or user action is determined. An action probability
distribution having probability values corresponding to the plurality of
actions is then updated based on the determined outcome. The next action
will then be selected based on this updated action probability
distribution. The foregoing steps can be modified based on a performance
index to achieve the objective of the processing device so that it
learns.