Disclosed is a system and method for instructing a computer program to
self-optimize comprising inputting commands into the computer program and
allowing a learning protocol in the computer program to determine an
approximately optimal policy of operation of the computer program based
on the commands. The commands comprise operational choices for the
computer program to select from including an approximately optimal choice
for optimizing the operation of the program. The commands comprise a
selection command for selecting any function in a list of instructions
inputted into the program, wherein the function provides a basis of
making an approximately optimal choice. Additionally, the commands
comprise a rule command for instructing the computer program of how to
make an approximately optimal choice. Moreover, the commands comprise a
reward command for instructing the program which of the operational
choices results in an approximately optimal choice for optimizing the
operation of the computer program.