A system and method is provided for vectorizing misaligned references in
compiled code for SIMD architectures that support only aligned loads and
stores. In this framework, a loop is first simdized as if the memory unit
imposes no alignment constraints. The compiler then inserts data
reorganization operations to satisfy the actual alignment requirements of
the hardware. Finally, the code generation algorithm generates SIMD codes
based on the data reorganization graph, addressing realistic issues such
as runtime alignments, unknown loop bounds, residual iteration counts,
and multiple statements with arbitrary alignment combinations. Loop
peeling is used to reduce the computational overhead associated with
misaligned data. A loop prologue and epilogue are peeled from individual
iterations in the simdized loop, and vector-splicing instructions are
applied to the peeled iterations, while the steady-state loop body incurs
no additional computational overhead.