X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260