Optimized loading of program data on a device comprises receiving a
program including multiple program units, at least one of which is a main
program unit. A use graph of the program is obtained, where the root node
of the use graph represents the joining of the root node to one or more
nodes representing the at least one main program unit. The multiple
program units are ordered based at least in part on a depth-first
traversal of the use graph and the ordered program is communicated to the
device.