This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 270973
Most often as not this is what it started out as, the extraction is zero-cost on AVX and the PMOVZX/PMOVSX folding logic is based around 128-bit loads.
llvm-svn: 270858
By making pointer extraction from a vector more expensive in the cost model,
we avoid the vectorization of a loop that is very likely to be memory-bound:
https://llvm.org/bugs/show_bug.cgi?id=27826
There are still bugs related to this, so we may need a more general solution
to avoid vectorizing obviously memory-bound loops when we don't have HW gather
support.
Differential Revision: http://reviews.llvm.org/D20601
llvm-svn: 270729
As noted in the review, there are still problems, so this doesn't the bug completely.
Differential Revision: http://reviews.llvm.org/D20529
llvm-svn: 270718
Followup to D20528 clang patch, this removes the (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) llvm intrinsics and auto-upgrades to sitofp/fpext instead.
Differential Revision: http://reviews.llvm.org/D20568
llvm-svn: 270678
This isn't the complete fix, but it handles the trivial examples of duplicate vzero* ops in PR27823:
https://llvm.org/bugs/show_bug.cgi?id=27823
...and amusingly, the bogus cases already exist as regression tests, so let's take this baby step.
We'll need to do more in the general case where there's legitimate AVX usage in the function + there's
already a vzero in the code.
Differential Revision: http://reviews.llvm.org/D20477
llvm-svn: 270378
We performed a number of memory allocations each time getTTI was called,
remove them by using SmallString.
No functionality change intended.
llvm-svn: 270246
This patch is a first step towards a more extendible method of matching combined target shuffle masks.
Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.
I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.
Differential Revision: http://reviews.llvm.org/D19198
llvm-svn: 270230
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.
The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.
llvm-svn: 270209
Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.
Differential Revision: http://reviews.llvm.org/D20406
llvm-svn: 270109