Execution of a single stand-alone instruction manipulates two n bit
strings of data to pack data or align the data. Decoding of the single
instruction identifies two registers of n bits each and a shift value,
preferably as parameters of the instruction. A first and a second subset
of data of less than n bits are selected, by logical shifting, from the
two registers, respectively, based solely upon the shift value. Then, the
subsets are concatenated, preferably by a logical OR, to obtain an output
of n bits. The output may be aligned data or packed data, particularly
useful for performing a single operation on multiple sets of the data
through parallel processing with a SIMD processor.