Pixel values of an image are loaded into main memory and cache of a
computer system. Two different instructions are used to load pixel values
of the image from the cache to a set of registers in a processor of the
system. A first one is used when loading an operand (containing pixel
values) that is aligned with a cache line boundary of the cache. A second
instruction is to be used when loading an operand (containing pixel
values) that is not aligned with the cache line boundary. The second
instruction can execute a cache line split without a significant
performance penalty relative to execution of the first instruction. Other
embodiments are also described and claimed.