Commit Graph

490 Commits

Author SHA1 Message Date
Matt Arsenault d81f557fe2 AMDGPU: Fold icmp/fcmp into icmp intrinsic
The typical use is a library vote function which
compares to 0. Fold the user condition into the intrinsic.

llvm-svn: 297650
2017-03-13 18:14:02 +00:00
Mikael Holmen 760dc9aba7 Remove sometimes faulty rewrite of memcpy in instcombine.
Summary:
Solves PR 31990.

The bad rewrite could replace a memcpy of one word with
 store i4 -1
while it should actually be
 store i8 -1

Hopefully opt and llc has improved enough so the original optimization
done by the code isn't needed anymore.

One already existing testcase is affected. It originally tested that
the memcpy was replaced with
 load double
but since we now remove that rewrite it will be
 load i64
instead.

Patch suggestion by Eli Friedman.

Reviewers: eli.friedman, majnemer, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D30254

llvm-svn: 296585
2017-03-01 06:45:20 +00:00
Matt Arsenault cdb468c0f9 AMDGPU: Basic folds for fmed3 intrinsic
Constant fold, canonicalize constants to RHS,
reduce to minnum/maxnum when inputs are nan/undef.

llvm-svn: 296409
2017-02-27 23:08:49 +00:00
Matt Arsenault d4bca1e9ef AMDGPU: Replace disabled exp inputs with undef
llvm-svn: 295914
2017-02-23 00:44:03 +00:00
Matt Arsenault f5262256a1 AMDGPU: Add replacement bfe intrinsics
llvm-svn: 295899
2017-02-22 23:04:58 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Matt Arsenault 920576042d InstCombine: Canonicalize fast fmuladd to fmul + fadd
llvm-svn: 295353
2017-02-16 18:46:24 +00:00
Craig Topper 3731f4d173 [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit.
llvm-svn: 295294
2017-02-16 07:35:23 +00:00
Igor Laevsky a9b6872908 [InstComobineCalls] Fix buildbot failures after r294453.
Some targets don't support uint64_t options. Change type to unsigned.

Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294461
2017-02-08 15:21:48 +00:00
Igor Laevsky 900ffa34c8 [InstCombineCalls] Unfold element atomic memcpy instruction
Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294453
2017-02-08 14:32:04 +00:00
Igor Laevsky 4b317fa24e [InstCombineCalls] Remove zero length atomic memcpy intrinsics
Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294452
2017-02-08 14:23:47 +00:00
Sanjoy Das e0e5795f6b [InstCombine] Allow InstCombine to merge adjacent guards
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:

    guard(a); guard(b) -> guard(a & b).

Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29378

llvm-svn: 293778
2017-02-01 16:34:55 +00:00
Davide Italiano aec4617dc8 [Instcombine] Combine consecutive identical fences
Differential Revision:  https://reviews.llvm.org/D29314

llvm-svn: 293661
2017-01-31 18:09:05 +00:00
Justin Lebar 25ebe2d767 [NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC.
llvm-svn: 293253
2017-01-27 02:04:07 +00:00
Justin Lebar e3ac0fb948 [NVPTX] Fix use-after-stack-free bug in InstCombineCalls.
Introduced in r293244.

llvm-svn: 293251
2017-01-27 01:49:39 +00:00
Justin Lebar 698c31b8db [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls.
Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.

For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32.  On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction.  In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.

These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.

Reviewers: tra

Subscribers: hfinkel, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D28794

llvm-svn: 293244
2017-01-27 00:58:58 +00:00
Sanjoy Das 7516192a71 Revert a couple of InstCombine/Guard checkins
This change reverts:

r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"

They miscompile cases like:

```
declare void @llvm.experimental.guard(i1, ...)

define void @test_guard_not_or(i1 %A, i1 %B) {
  %C = or i1 %A, %B
  %D = xor i1 %C, true
  call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
  ret void
}
```

because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.

llvm-svn: 293227
2017-01-26 23:38:11 +00:00
Craig Topper b6122122c9 [X86] Add demanded elts support for the inputs to pclmul intrinsic
This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors.

Differential Revision: https://reviews.llvm.org/D28979

llvm-svn: 293151
2017-01-26 05:17:13 +00:00
Artur Pilipenko b85f7a5d99 [InstCombine] Canonicalize guards for NOT OR condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29075

Patch by Maxim Kazantsev.

llvm-svn: 293061
2017-01-25 14:45:12 +00:00
Simon Pilgrim 6f6b279109 [InstCombine][SSE] Add support for PACKSS/PACKUS constant folding
Differential Revision: https://reviews.llvm.org/D28949

llvm-svn: 293060
2017-01-25 14:37:24 +00:00
Artur Pilipenko 4df4c4a4aa [InstCombine] Canonicalize guards for AND condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29074

Patch by Maxim Kazantsev.

llvm-svn: 293058
2017-01-25 14:20:52 +00:00
Artur Pilipenko e812ca00bb [InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: majnemer, apilipenko

Differential Revision: https://reviews.llvm.org/D29071

Patch by Maxim Kazantsev.

llvm-svn: 293056
2017-01-25 14:12:12 +00:00
Simon Pilgrim 78f8630ac0 [InstCombine][X86] MULDQ/MULUDQ undef -> zero
Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now

Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst

llvm-svn: 292913
2017-01-24 11:07:41 +00:00
Matt Arsenault 954a624fb9 SimplifyLibCalls: Replace more unary libcalls with intrinsics
llvm-svn: 292855
2017-01-23 23:55:08 +00:00
Simon Pilgrim f6f3a36159 [InstCombine][X86] Add MULDQ/MULUDQ constant folding support
llvm-svn: 292793
2017-01-23 15:22:59 +00:00
Simon Pilgrim bb13fdabec [InstCombine][X86] MULDQ/MULUDQ undef -> zero
Match generic mul behaviour so that <X x i64> multiply and muldq/muludq pattern act the same

llvm-svn: 292784
2017-01-23 12:07:32 +00:00
Simon Pilgrim a50a93fcd0 [InstCombine][X86] Add MULDQ/MULUDQ undef handling
llvm-svn: 292627
2017-01-20 18:20:30 +00:00
Simon Pilgrim a22c3a1c0f [InstCombine] Remove unnecessary intrinsics demanded elts handling
As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need.

llvm-svn: 292365
2017-01-18 13:44:04 +00:00
Simon Pilgrim d4eb800b03 [InstCombine][X86][AVX] Add DemandedElts support for VPERMILPD/VPERMILPS instructions
Simplify a vpermilvar shuffle mask based on the elements of the mask that are actually demanded.

llvm-svn: 292209
2017-01-17 11:35:03 +00:00
Matt Arsenault 7233344c28 SimplifyLibCalls: Replace fabs libcalls with intrinsics
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.

llvm-svn: 292172
2017-01-17 00:10:40 +00:00
Simon Pilgrim 73a68c25a0 [InstCombine][SSE] Add DemandedElts support for PSHUFB instructions
Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded.

Differential Revision: https://reviews.llvm.org/D28745

llvm-svn: 292101
2017-01-16 11:30:41 +00:00
Hal Finkel 8a9a783f2c Make processing @llvm.assume more efficient - Add affected values to the assumption cache
Here's my second try at making @llvm.assume processing more efficient. My
previous attempt, which leveraged operand bundles, r289755, didn't end up
working: it did make assume processing more efficient but eliminating the
assumption cache made ephemeral value computation too expensive. This is a
more-targeted change. We'll keep the assumption cache, but extend it to keep a
map of affected values (i.e. values about which an assumption might provide
some information) to the corresponding assumption intrinsics. This allows
ValueTracking and LVI to find assumptions relevant to the value being queried
without scanning all assumptions in the function. The fact that ValueTracking
started doing O(number of assumptions in the function) work, for every
known-bits query, has become prohibitively expensive in some cases.

As discussed during the review, this is a pragmatic fix that, longer term, will
likely be replaced by a more-principled solution (perhaps based on an extended
SSA form).

Differential Revision: https://reviews.llvm.org/D28459

llvm-svn: 291671
2017-01-11 13:24:24 +00:00
Matt Arsenault 3f509042b0 InstCombine: Set operands instead of creating new call
llvm-svn: 291612
2017-01-10 23:17:52 +00:00
Matt Arsenault 3bdd75d01e InstCombine: Fold cos(-x) -> cos(x)
Also cos(fabs(x)) -> cos(x)

llvm-svn: 291022
2017-01-04 22:49:03 +00:00
Matt Arsenault 56ff4839ae InstCombine: Fold fabs on select of constants
llvm-svn: 290913
2017-01-03 22:40:34 +00:00
Sanjay Patel f0d1e77373 [InstCombine] use 'match' to reduce code bloat; NFCI
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.

So just in case we decide this is the right way, we might as well
have a cleaner implementation.

llvm-svn: 290912
2017-01-03 22:25:31 +00:00
Matt Arsenault b264c94963 InstCombine: Add fma with constant transforms
DAGCombine already does these.

llvm-svn: 290860
2017-01-03 04:32:35 +00:00
Matt Arsenault 1cc294c85d InstCombine: Add fma + fabs/fneg transforms
fma (fneg x), (fneg y), z -> fma x, y, z
fma (fabs x), (fabs x), z -> fma x, x, z

llvm-svn: 290859
2017-01-03 04:32:31 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Craig Topper 991636312b [InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones.
This saves InstCombine the burden of having to optimize the select later.

llvm-svn: 290774
2016-12-30 23:06:28 +00:00
Craig Topper 72f2d4e8d6 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Craig Topper 7f8540b5e7 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Craig Topper 020b228155 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Simon Pilgrim c9cf7fc7a4 [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Craig Topper 7b788ada2d [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper e328045711 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
George Burgess IV 3f08914e7e [Analysis] Centralize objectsize lowering logic.
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.

llvm-svn: 290214
2016-12-20 23:46:36 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Craig Topper ab5f355d8c [AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts.
llvm-svn: 289759
2016-12-15 03:49:45 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Hal Finkel cb9f78e1c3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Craig Topper aeaa52cc11 [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics.
llvm-svn: 289639
2016-12-14 07:46:12 +00:00
Craig Topper 268b3abe6d [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar add/sub/mul/div/max/min intrinsics better.
Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking.

llvm-svn: 289636
2016-12-14 06:06:58 +00:00
Craig Topper dfd268d76b [X86][InstCombine] Handle scalar fmadd intrinsics correctly in SimplifyDemandedVectorElts.
Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly.

llvm-svn: 289632
2016-12-14 05:43:05 +00:00
Craig Topper eb6a20e79e [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.

Also calculate UndefElts correctly.

Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.

llvm-svn: 289629
2016-12-14 03:17:30 +00:00
Craig Topper a0372dec26 [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar min/max/cmp intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.

Also calculate UndefElts correctly.

Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.

llvm-svn: 289628
2016-12-14 03:17:27 +00:00
Craig Topper ac75bca1eb [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly.
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.

Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.

llvm-svn: 289523
2016-12-13 07:45:45 +00:00
Craig Topper 61b280e7b0 [X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for scalar FMA intrinsics.
These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them.

llvm-svn: 289372
2016-12-11 07:42:06 +00:00
Craig Topper d96395365a [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for scalar cmp intrinsics with masking and rounding.
These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead.

llvm-svn: 289371
2016-12-11 07:42:04 +00:00
Craig Topper 790d0fa569 [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding.
These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well.

llvm-svn: 289370
2016-12-11 07:42:01 +00:00
Craig Topper 58917f3508 [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit.
llvm-svn: 289354
2016-12-11 01:59:36 +00:00
Craig Topper 9a63d7ade5 [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant.
llvm-svn: 289348
2016-12-11 00:23:50 +00:00
David Majnemer d5648c7a7d Replace some callers of setTailCall with setTailCallKind
We were a little sloppy with adding tailcall markers.  Be more
consistent by using setTailCallKind instead of setTailCall.

llvm-svn: 287955
2016-11-25 22:35:09 +00:00
Craig Topper 1de753f7f5 [InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.

llvm-svn: 287316
2016-11-18 06:04:33 +00:00
Craig Topper 6910fa0ef4 [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083
2016-11-16 05:24:10 +00:00
Craig Topper b4173a5a70 [InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics.
llvm-svn: 286755
2016-11-13 07:26:19 +00:00
Craig Topper 8b831cbb2a [InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 shift by immediate and shift by single value.
This does not include support for the AVX-512 variable shifts. That will be coming in a future patch.

llvm-svn: 286739
2016-11-13 01:51:55 +00:00
Andrea Di Biagio f3fd316223 [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic.
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.

llvm-svn: 280807
2016-09-07 12:47:53 +00:00
Andrea Di Biagio 8df5b9cf48 [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls.
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi

The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.

This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.

Added reproducible test cases to x86-sse4a.ll.

Differential Revision: https://reviews.llvm.org/D24256

llvm-svn: 280804
2016-09-07 12:03:03 +00:00
Dorit Nuzman abd15f69b2 [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing
memcpy with ld/st.

When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.

Differential Revision: https://reviews.llvm.org/D23499

llvm-svn: 280617
2016-09-04 07:49:39 +00:00
Dorit Nuzman 7673ba7ac2 Test commit.
llvm-svn: 280615
2016-09-04 07:06:00 +00:00
Matt Arsenault 46a0382ab2 AMDGPU: Do basic folding of class intrinsic
This allows more of the OCML builtin library to be
constant folded.

llvm-svn: 280586
2016-09-03 07:06:58 +00:00
Amaury Sechet 763c59dc9a Make cltz and cttz zero undef when the operand cannot be zero in InstCombine
Summary: Also add popcount(n) == bitsize(n)  -> n == -1 transformation.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23134

llvm-svn: 279141
2016-08-18 20:43:50 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Eugene Zelenko cdc7161281 Fix some Clang-tidy modernize and Include What You Use warnings.
Differential revision: https://reviews.llvm.org/D23291

llvm-svn: 278364
2016-08-11 17:20:18 +00:00
Sanjay Patel 38ae83de38 fix comment; NFC
llvm-svn: 278342
2016-08-11 15:23:56 +00:00
Sanjay Patel e3c335cbed use auto* with dyn_cast ; NFC
llvm-svn: 278340
2016-08-11 15:21:21 +00:00
Sanjay Patel 5a470950b9 getParent()->getParent() == getFunction() ; NFC
llvm-svn: 278339
2016-08-11 15:16:06 +00:00
Sanjay Patel 8e3ab17c44 [InstCombine] refactor ctlz/cttz folds (NFCI)
Note that this fold really belongs in InstSimplify.
Refactoring here anyway as an intermediate step because
there's a planned addition to this function in D23134.

Differential Revision: https://reviews.llvm.org/D23223

llvm-svn: 277883
2016-08-05 22:42:46 +00:00
Justin Bogner 9979840f59 InstCombine: Replace some never-null pointers with references. NFC
llvm-svn: 277792
2016-08-05 01:06:44 +00:00
Vitaly Buka 0ab23cf1c8 Do not remove empty lifetime.start/lifetime.end ranges
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.

This is possible for code like this:
for (int i = 0; i < 3; i++) {
  int x;
  p = &x;
}

"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.

PR27453

Reviewers: eugenis

Differential Revision: https://reviews.llvm.org/D22842

llvm-svn: 277072
2016-07-28 22:59:03 +00:00
Vitaly Buka 2fae6a7702 Should be committed as one CL.
This reverts commits r277068 r277067 r277066.

llvm-svn: 277071
2016-07-28 22:59:01 +00:00
Vitaly Buka f0500b6ae5 Do not remove empty lifetime.start/lifetime.end ranges
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.

This is possible for code like this:
for (int i = 0; i < 3; i++) {
  int x;
  p = &x;
}

"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.

PR27453

Reviewers: eugenis

Differential Revision: https://reviews.llvm.org/D22842

llvm-svn: 277068
2016-07-28 22:50:48 +00:00
Vitaly Buka 3645793872 maned
llvm-svn: 277067
2016-07-28 22:50:45 +00:00
Vitaly Buka caca9da4ff range
llvm-svn: 277066
2016-07-28 22:50:43 +00:00
David Majnemer 666aa945a5 [InstCombine] Masked loads with undef masks can fold to normal loads
We were able to fold masked loads with an all-ones mask to a normal
load.  However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.

llvm-svn: 275380
2016-07-14 06:58:42 +00:00
David Majnemer d77a3b61eb Move a transform from InstCombine to InstSimplify.
This transform doesn't require any new instructions, it can safely live
in InstSimplify.

llvm-svn: 275344
2016-07-13 23:32:53 +00:00
Sean Silva 45835e731d Remove dead TLI arg of isKnownNonNull and propagate deadness. NFC.
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.

This also removes the dependence of functionattrs on TLI altogether.

llvm-svn: 274455
2016-07-02 23:47:27 +00:00
Matt Arsenault 802ebcb4bb InstCombine: Don't strip convergent from intrinsic callsites
Specific instances of intrinsic calls may want to be convergent, such
as certain register reads but the intrinsic declaration is not.

llvm-svn: 273188
2016-06-20 19:04:44 +00:00
Craig Topper 99d1eab327 [IR] Require ArrayRef of 'uint32_t' instead of 'int' for the mask argument for one of the signatures of CreateShuffleVector. This better emphasises that you can't use it for the -1 as undef behavior.
llvm-svn: 272491
2016-06-12 00:41:19 +00:00
Benjamin Kramer 46e38f3678 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Simon Pilgrim db9893fb90 [InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to native shifts
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).

If the shift amount is constant we can sometimes convert these instructions to native shifts:

1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.

In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.

Differential Revision: http://reviews.llvm.org/D19675

llvm-svn: 271996
2016-06-07 10:27:15 +00:00
Simon Pilgrim 91e3ac8293 [InstCombine][SSE] Add MOVMSK constant folding (PR27982)
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.

The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.

Differential Revision: http://reviews.llvm.org/D20998

llvm-svn: 271990
2016-06-07 08:18:35 +00:00
Craig Topper 8287fd8abd [X86] Remove SSE/AVX unaligned store intrinsics as clang no longer uses them. Auto upgrade to native unaligned store instructions.
llvm-svn: 271236
2016-05-30 23:15:56 +00:00
Simon Pilgrim 9602d678cb [X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.

Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.

Differential Revision: http://reviews.llvm.org/D20686

llvm-svn: 271131
2016-05-28 18:03:41 +00:00
Simon Pilgrim 4642a57fbf Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
llvm-svn: 270976
2016-05-27 09:02:25 +00:00
Simon Pilgrim c013e5737b [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.

A companion patch (D20684) removes/auto-upgrade the clang intrinsics.

Differential Revision: http://reviews.llvm.org/D20686

llvm-svn: 270973
2016-05-27 08:49:15 +00:00
Craig Topper a423aa4642 [X86] Add the AVX storeu intrinsics to InstCombine and LoopStrengthReduce in the same places that the SSE/SSE2 storeu intrinsics appear.
I don't really know how to test this. Just seemed like we should be consistent.

llvm-svn: 270819
2016-05-26 04:28:45 +00:00
Arnaud A. de Grandmaison 333ef381b8 [InstCombine] Remove trivially empty va_start/va_end and va_copy/va_end ranges.
When a va_start or va_copy is immediately followed by a va_end (ignoring
debug information or other start/end in between), then it is safe to
remove the pair. As this code shares some commonalities with the lifetime
markers, this has been factored to helper functions.

This InstCombine pattern kicks-in 3 times when running the LLVM test
suite.

llvm-svn: 269033
2016-05-10 09:24:49 +00:00
Simon Pilgrim ca140b17cb [InstCombine][SSE] Added support to VPERMD/VPERMPS to shuffle combine to accept UNDEF elements.
llvm-svn: 268206
2016-05-01 20:43:02 +00:00