A gene expression programming genetic algorithm for performing symbolic
regression is provided. The algorithm avoids expression bloating and over
fitting by employing a fitness function that depends inversely on the
mathematical expression complexity. Members of a population that are
evolved by the algorithm are represented as a set arrays (e.g., in the
form of a matrix) of indexes that reference operands and operators, thus
facilitating selection, mutation, and cross over operations conducted in
the course of evolving the population. The algorithm comprises a syntax
checking part that may be applied to population members without their
having to be converted to executable programs first. An object-oriented
programming language data structure is providing for encapsulating basic
data for each codon (e.g., operand, operator) used by the algorithm.