The present invention provides alignment and ordering of vector elements
for SIMD processing. In the alignment of vector elements for SIMD
processing, one vector is loaded from a memory unit into a first register
and another vector is loaded from the memory unit into a second register.
The first vector contains a first byte of an aligned vector to be
generated. Then, a starting byte specifying the first byte of an aligned
vector is determined. Next, a vector is extracted from the first register
and the second register beginning from the first bit in the first byte of
the first register continuing through the bits in the second register.
Finally, the extracted vector is replicated into a third register such
that the third register contains a plurality of elements aligned for SIMD
processing. In the ordering of vector elements for SIMD processing, a
first vector is loaded from a memory unit into a first register and a
second vector is loaded from the memory unit into a second register.
Then, a subset of elements are selected from the first register and the
second register. The elements from the subset are then replicated into
the elements in the third register in a particular order suitable for
subsequent SIMD vector processing.