Systems, methods and computer program products for hardware assists for
microcoded floating point divide and square root operations. Exemplary
embodiments include a method including receiving a first microcoded
instruction in the pipeline, decoding the first microcoded instruction in
a decode stage of the pipeline, initiating a microcode engine coupled to
the processor, with the microcode engine configured to process the
streamlined microcode routine. During the delay between detecting the
need to start a microcode routine and seeing the first microcode
instruction actually issued, and using the processor cycle intended for
the original instruction, hardware prepares for the microcode by
pre-normalizing the operand, writing the pre-normalized operand to a
scratch register coupled to the processor, conditionally generating a
final result and discarding microcode routine instructions subsequent to
the first microcode routine instruction and copying a final result from
the scratch register to a floating point architectural register
associated with the processor.