Fix a bug where we would construct shufflevector instructions addressing
invalid elements.
Differential Revision: https://reviews.llvm.org/D29313
llvm-svn: 293673
Summary:
This patch aims to generalize matching of the strided store accesses to more general masks.
The more general rule is to have consecutive accesses based on the stride:
[x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...]
All elements in the masks need not form a contiguous space, there may be gaps.
As before, undefs are allowed and filled in with adjacent element loads.
Reviewers: HaoLiu, mssimpso
Subscribers: mkuper, delena, llvm-commits
Differential Revision: https://reviews.llvm.org/D23646
llvm-svn: 289573
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
When matching an interleaved load to an ldN pattern, the interleaved access
pass checks that all users of the load are shuffles. If the load is used by an
instruction other than a shuffle, the pass gives up and an ldN is not
generated. This patch considers users of the load that are extractelement
instructions. It attempts to modify the extracts to use one of the available
shuffles rather than the load. After the transformation, the load is only used
by shuffles and will then be matched with an ldN pattern.
Differential Revision: http://reviews.llvm.org/D20250
llvm-svn: 270142
Summary:
Interleaved access lowering removes a memory operation and a
sequence of vector shuffles and replaces it with a series of
memory operations. This should be always beneficial.
This pass in only enabled on ARM/AArch64.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12145
llvm-svn: 246540
E.g. An interleaved load (Factor = 2):
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%v0 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <0, 2, 4, 6>
%v1 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <1, 3, 5, 7>
It can be transformed into a ld2 intrinsic in AArch64 backend or a vld2 intrinsic in ARM backend.
E.g. An interleaved store (Factor = 3):
%i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
store <12 x i32> %i.vec, <12 x i32>* %ptr
It can be transformed into a st3 intrinsic in AArch64 backend or a vst3 intrinsic in ARM backend.
Differential Revision: http://reviews.llvm.org/D10533
llvm-svn: 240751