Methods, computer systems, and computer program products for biopolymer
engineering. A variant set for a biopolymer of interest is constructed by
identifying, using a plurality of rules, a plurality of positions in the
biopolymer of interest and, for each respective position in the plurality
of positions, substitutions for the respective position. The plurality of
positions and the substitutions for each respective position in the
plurality of positions collectively define a biopolymer sequence space. A
variant set comprising a plurality of variants of the biopolymer of
interest is selected. A property of all or a position of the variants in
the variant set is measured. A sequence-activity relationship is modeled
between (i) one or more substitutions at one or more positions of the
biopolymer of interest represented by the variant set and (ii) the
property measured for all or the portion of the variants in the variant
set. The variant set is redefined to comprise variants that include
substitutions in the plurality of positions that are selected based on a
function of the sequence-activity relationship.