A processing architecture supports executing instructions in parallel
after identifying at least one level of dependency associated with a set
of traces within a segment of code. Each trace represents a sequence of
logical instructions within the segment of code that can be executed in a
corresponding operand stack. Scheduling information is generated based on
a dependency order identified among the set of traces. Thus, multiple
traces may be scheduled for parallel execution unless a dependency order
indicates that a second trace is dependent upon a first trace. In this
instance, the first trace is executed prior to the second trace. Trace
dependencies may be identified at run-time as well as prior to execution
of traces in parallel. Results associated with execution of a trace are
stored in a temporary buffer (instead of memory) until after it is known
that a data dependency was not detected at run-time.