Commit Graph

2795 Commits

Author SHA1 Message Date
Sanjay Patel 604cb9e3ed [InstCombine] refactor folds for mul with negated operands; NFCI
This keeps with our current usage of 'match' and is easier to see that
the optional NSW only applies in the non-constant operand case. 

llvm-svn: 325140
2018-02-14 16:50:55 +00:00
Elena Demikhovsky 945b7e5aa6 Adding a width of the GEP index to the Data Layout.
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.

Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html

I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.

Differential Revision: https://reviews.llvm.org/D42123

llvm-svn: 325102
2018-02-14 06:58:08 +00:00
Sanjay Patel 7558d860af [InstCombine] (lshr X, 31) * Y --> (ashr X, 31) & Y
This replaces the bit-tracking based fold that did the same thing,
but it only worked for scalars and not directly. 

There is no evidence in existing regression tests that the greater 
power of bit-tracking was needed here, but we should be aware of 
this potential loss of optimization.

llvm-svn: 325062
2018-02-13 22:24:37 +00:00
Sanjay Patel cb8ac00f73 [InstCombine] (bool X) * Y --> X ? Y : 0
This is both a functional improvement for vectors and an
efficiency improvement for scalars. The existing code below
the new folds does the same thing for scalars, but in an 
indirect and expensive way.

llvm-svn: 325048
2018-02-13 20:41:22 +00:00
Simon Pilgrim be0dd72620 [InstCombine] Simplify getLogBase2 case for scalar/splats. NFCI.
llvm-svn: 325003
2018-02-13 13:16:26 +00:00
Daniel Neilson 2363da9236 [InstCombine] Simplify MemTransferInst's source and dest alignments separately
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and
instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify
the source and destination alignment attributes separately.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: majnemer, bollu, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D42871

llvm-svn: 324960
2018-02-12 23:06:55 +00:00
Sanjay Patel 4a4f35f324 [InstCombine] X / (X * Y) --> 1.0 / Y
This is similar to the instsimplify fold added with D42385 
( rL323716 )
...but this can't be in instsimplify because we're creating/morphing
a different instruction.

llvm-svn: 324927
2018-02-12 19:39:21 +00:00
Sanjay Patel 1998cc6a47 [InstCombine] various clean-ups for div transforms; NFC
llvm-svn: 324922
2018-02-12 18:38:35 +00:00
Sanjay Patel 39059d2630 [InstCombine] various clean-ups for commonIDivTransforms; NFC
llvm-svn: 324891
2018-02-12 14:14:56 +00:00
Sanjay Patel 510d647a4d [InstCombine] X / (X * Y) -> 1 / Y if the multiplication does not overflow
The related cases for (X * Y) / X were handled in rL124487.

https://rise4fun.com/Alive/6k9

The division in these tests is subsequently eliminated by existing instcombines
for 1/X.

llvm-svn: 324843
2018-02-11 17:20:32 +00:00
Simon Pilgrim 19495198af [InstCombine] Add constant vector support for ~(C >> Y) --> ~C >> Y
Includes adding m_NonNegative constant pattern matcher

llvm-svn: 324825
2018-02-10 21:46:09 +00:00
Simon Pilgrim 9620f4b746 [InstCombine] Add constant vector support for X udiv C, where C >= signbit
llvm-svn: 324728
2018-02-09 10:43:59 +00:00
Simon Pilgrim a54e8e429b [InstCombine] visitSRem - use m_Negative(APInt) helper. NFCI.
llvm-svn: 324636
2018-02-08 19:00:45 +00:00
Simon Pilgrim 1889f26b94 [InstCombine] Add m_Negative pattern matching
Allows us to add non-uniform constant vector support for "X urem C -> X < C ? X : X - C, where C >= signbit."

llvm-svn: 324631
2018-02-08 18:36:01 +00:00
Simon Pilgrim 2a90acd17a [InstCombine] Fix issue with X udiv (POW2_C1 << N) for non-splat constant vectors
foldUDivShl was assuming that the input was a scalar or a splat constant

llvm-svn: 324613
2018-02-08 15:19:38 +00:00
Simon Pilgrim 94cc89d5f2 [InstCombine] Fix issue with X udiv 2^C -> X >> C for non-splat constant vectors
foldUDivPow2Cst was assuming that the input was a scalar or a splat constant

llvm-svn: 324608
2018-02-08 14:46:10 +00:00
Simon Pilgrim 4039dbea77 Fix unused variable warning.
llvm-svn: 324605
2018-02-08 14:24:26 +00:00
Simon Pilgrim 0b9f3912ce [InstCombine] Improve mul(x, pow2) -> shl combine for vector constants
Refactor getLogBase2Vector into getLogBase2 to accept all scalars/vectors. Generalize from ConstantDataVector to support all constant vectors.

llvm-svn: 324603
2018-02-08 14:10:01 +00:00
Sanjay Patel 49aafec2e6 [InstCombine] don't try to evaluate instructions with >1 use (revert r324014)
This example causes a compile-time explosion:

define i16 @foo(i16 %in) {
  %x = zext i16 %in to i32
  %a1 = mul i32 %x, %x
  %a2 = mul i32 %a1, %a1
  %a3 = mul i32 %a2, %a2
  %a4 = mul i32 %a3, %a3
  %a5 = mul i32 %a4, %a4
  %a6 = mul i32 %a5, %a5
  %a7 = mul i32 %a6, %a6
  %a8 = mul i32 %a7, %a7
  %a9 = mul i32 %a8, %a8
  %a10 = mul i32 %a9, %a9
  %a11 = mul i32 %a10, %a10
  %a12 = mul i32 %a11, %a11
  %a13 = mul i32 %a12, %a12
  %a14 = mul i32 %a13, %a13
  %a15 = mul i32 %a14, %a14
  %a16 = mul i32 %a15, %a15
  %a17 = mul i32 %a16, %a16
  %a18 = mul i32 %a17, %a17
  %a19 = mul i32 %a18, %a18
  %a20 = mul i32 %a19, %a19
  %a21 = mul i32 %a20, %a20
  %a22 = mul i32 %a21, %a21
  %a23 = mul i32 %a22, %a22
  %a24 = mul i32 %a23, %a23
  %T = trunc i32 %a24 to i16
  ret i16 %T
}

 

llvm-svn: 324276
2018-02-05 21:50:32 +00:00
Sanjay Patel e9a153f414 [InstCombine] add unsigned saturation subtraction canonicalizations
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited: 
https://reviews.llvm.org/D37510
https://reviews.llvm.org/D37534

It converts unsigned saturated subtraction patterns to forms recognized 
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)

Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D41480

llvm-svn: 324255
2018-02-05 17:53:29 +00:00
Sanjay Patel 2329fcd293 [InstCombine] only allow narrow/wide evaluation of values with >1 use if that user is a binop
There was a logic hole in D42739 / rL324014 because we're not accounting for select and phi
instructions that might have repeated operands. This is likely a source of an infinite loop.
I haven't manufactured a test case to prove that, but it should be safe to speculatively limit
this transform to binops while we try to create that test.

llvm-svn: 324252
2018-02-05 17:16:50 +00:00
David Green 7174023f57 [InstCombine] Allow common type conversions to i8/i16/i32
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.

Differential Revision: https://reviews.llvm.org/D42424

llvm-svn: 324174
2018-02-03 16:51:03 +00:00
Daniel Neilson 38af2eed51 [InstCombine] Use getDestAlignment in SimplifyMemSet (NFC)
Summary:
Small NFC change to change the name of the function used getting and setting
the alignment of a memset.

llvm-svn: 324148
2018-02-02 22:03:03 +00:00
Sanjay Patel 1ea8697cdf [InstCombine] simplify logic for swapMayExposeCSEOpportunities; NFCI
llvm-svn: 324122
2018-02-02 19:08:12 +00:00
Sanjay Patel 4ccae1cb2b [InstCombine] fix typos, formatting; NFC
llvm-svn: 324118
2018-02-02 18:39:05 +00:00
Sanjay Patel 3343fcef86 [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst
This is the enhancement suggested in D42536 to fix a shortcoming in 
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can 
allow that expression to be narrowed or widened for the same cost as a single-use 
value.

AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified 
away if the operands are the same value; add becomes shl; shifts with a variable shift 
amount aren't handled.

Differential Revision: https://reviews.llvm.org/D42739

llvm-svn: 324014
2018-02-01 21:55:53 +00:00
David Green 184df0c35d Revert commit rL323951
Looks like it's causing timeouts out on at least ppc64le
buildbots.

llvm-svn: 323959
2018-02-01 13:05:25 +00:00
David Green e11f0545db [InstCombine] Allow common type conversions to i8/i16/i32
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.

Differential Revision: https://reviews.llvm.org/D42424

llvm-svn: 323951
2018-02-01 11:06:18 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Sanjay Patel 1b66deea16 [InstCombine] reduce code duplication for canEvaluate* functions; NFCI
We'd have to make the change suggested in D42536 3x otherwise. 

llvm-svn: 323877
2018-01-31 14:55:53 +00:00
Vedant Kumar e48597a50e [InstCombine] Preserve debug values for eliminable casts
A cast from A to B is eliminable if its result is casted to C, and if
the pair of casts could just be expressed as a single cast. E.g here,
%c1 is eliminable:

  %c1 = zext i16 %A to i32
  %c2 = sext i32 %c1 to i64

InstCombine optimizes away eliminable casts. This patch teaches it to
insert a dbg.value intrinsic pointing to the final result, so that local
variables pointing to the eliminable result are preserved.

Differential Revision: https://reviews.llvm.org/D42566

llvm-svn: 323570
2018-01-26 22:02:52 +00:00
Sanjay Patel 1d68112c4b [InstCombine] narrow masked zexted binops (PR35792)
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792

Alive proofs for the cases handled here as well as the bitwise 
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97
https://rise4fun.com/Alive/Lc5E
https://rise4fun.com/Alive/kdf

llvm-svn: 323437
2018-01-25 16:34:36 +00:00
Sanjay Patel 9530f18864 [InstCombine] (X << Y) / X -> 1 << Y
...when the shift is known to not overflow with the matching
signed-ness of the division.

This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709

Patch by Anton Bikineev!

Differential Revision: https://reviews.llvm.org/D42032

llvm-svn: 323068
2018-01-21 16:14:51 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
John Brawn 2867bd72c0 [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.

Differential Revision: https://reviews.llvm.org/D39958

llvm-svn: 322930
2018-01-19 10:05:15 +00:00
Sanjay Patel aa766efd09 [InstCombine] fix demanded-bits propagation for zext/trunc
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.

llvm-svn: 322662
2018-01-17 14:39:28 +00:00
Daniel Neilson 2409d24201 [NFC] Change MemIntrinsicInst::setAlignment() to take an unsigned instead of a Constant
Summary:
 In preparation for https://reviews.llvm.org/D41675 this NFC changes this
prototype of MemIntrinsicInst::setAlignment() to accept an unsigned instead
of a Constant.

llvm-svn: 322403
2018-01-12 21:33:37 +00:00
Benjamin Kramer 738e6e7cb0 [InstCombine] Apply the fix from r322284 for sin / cos -> tan too
llvm-svn: 322285
2018-01-11 15:33:21 +00:00
Benjamin Kramer 44993ede60 [InstCombine] For cos/sin -> tan copy attributes from cos instead of the
parent function

Ideally we should merge the attributes from the functions somehow, but
this is obviously an improvement over taking random attributes from the
caller which will trip up the verifier if they're nonsensical for an
unary intrinsic call.

llvm-svn: 322284
2018-01-11 15:19:02 +00:00
Dmitry Venikov e5fbf591a7 [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x)
Summary: This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag

Reviewers: hfinkel, spatel

Reviewed By: spatel

Subscribers: andrew.w.kaylor, efriedma, scanon, llvm-commits

Differential Revision: https://reviews.llvm.org/D41286

llvm-svn: 322255
2018-01-11 06:33:00 +00:00
Sanjay Patel 6fb1357c35 [InstCombine] weaken assertions for icmp folds (PR35846)
Because of potential UB (known bits conflicts with an llvm.assume),
we have to check rather than assert here because InstSimplify doesn't
kill the compare:
https://bugs.llvm.org/show_bug.cgi?id=35846

llvm-svn: 322104
2018-01-09 18:56:03 +00:00
Simon Pilgrim 5d909be91b [InstCombine] Check for out of range ashr values using APInt before calling getZExtValue
Reduced from oss-fuzz #5032 test case

llvm-svn: 322078
2018-01-09 14:23:46 +00:00
Sanjay Patel 31b4b76f99 [InstCombine] fold min/max tree with common operand (PR35717)
There is precedence for factorization transforms in instcombine for FP ops with fast-math. 
We also have similar logic in foldSPFofSPF().

It would take more work to add this to reassociate because that's specialized for binops, 
and min/max are not binops (or even single instructions). Also, I don't have evidence that 
larger min/max trees than this exist in real code, but if we find that's true, we might
want to reorganize where/how we do this optimization.

In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have:

int test(int xc, int xm, int xy) {
  int xk;
  if (xc < xm)
    xk = xc < xy ? xc : xy;
  else
    xk = xm < xy ? xm : xy;
  return xk;
}

This patch solves that problem because we recognize more min/max patterns after rL321672

https://rise4fun.com/Alive/Qjne
https://rise4fun.com/Alive/3yg

Differential Revision: https://reviews.llvm.org/D41603

llvm-svn: 321998
2018-01-08 15:05:34 +00:00
Sanjay Patel 26a6fcde83 [InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b)
In the minimal case, this won't remove instructions, but it still improves
uses of existing values.

In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603

llvm-svn: 321936
2018-01-06 17:34:22 +00:00
Sanjay Patel 5b6aacf2c1 [InstCombine] add folds for min(~a, b) --> ~max(a, b)
Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b),
the use checking and operand creation were off. We were potentially creating 
repeated identical instructions of existing values. This led to infinite
looping after I added the extra folds.

By using the simpler m_Not matcher and not creating new 'not' ops for a and b,
we avoid that problem. It's possible that not using IsFreeToInvert() here is
more limiting than the simpler matcher, but there are no tests for anything
more exotic. It's also possible that we should relax the use checking further
to handle a case like PR35834:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but we can make that a follow-up if it is needed. 

llvm-svn: 321882
2018-01-05 19:01:17 +00:00
Sanjay Patel c63f9014d6 [InstCombine] safely create a constant of the right type (PR35794)
llvm-svn: 321801
2018-01-04 14:31:56 +00:00
Simon Pilgrim 3bf2d64589 [InstCombine] Check for out of range shift values using APInt before calling getZExtValue
Reduced from oss-fuzz #4871 test case

llvm-svn: 321748
2018-01-03 18:28:20 +00:00
Dmitry Venikov a58d8deb3a [InstCombine] Missed optimization in math expression: squashing sqrt functions
Summary: This patch enables folding under -ffast-math flag sqrt(a) * sqrt(b) -> sqrt(a*b)

Reviewers: hfinkel, spatel, davide

Reviewed By: spatel, davide

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D41322

llvm-svn: 321637
2018-01-02 05:58:11 +00:00
Simon Pilgrim 472689a159 [InstCombine] Check for isa<Instruction> before using cast<>
Protects against casts from constexpr etc.

Reduced from oss-fuzz #4788 test case

llvm-svn: 321515
2017-12-28 09:35:35 +00:00
Simon Pilgrim e7d032f1d8 [InstCombine] Gracefully handle out of range extractelement indices
InstSimplify is responsible for handling these, but we shouldn't just assert here.

Reduced from oss-fuzz #4808 test case

llvm-svn: 321489
2017-12-27 12:00:18 +00:00
Philip Reames cd13a66381 [instcombine] add powi(x, 2) -> x * x
llvm-svn: 321468
2017-12-27 01:30:12 +00:00
Philip Reames 5000ba69d7 Sink a couple of transforms from instcombine into instsimplify.
llvm-svn: 321467
2017-12-27 01:14:30 +00:00
Philip Reames 7a6db4fc4f [NFC] Extract out a helper function for SimplifyCall(CS, Q)
This simplifies code, but the real motivation is that it lets me clean up some downstream code.

llvm-svn: 321466
2017-12-27 00:16:12 +00:00
Sanjay Patel 14adbacd8a [InstCombine] fix miscompile of frem with 0.0 operand (PR34870)
We might want to select NAN here or do this transform with fast-math,
but this should at least fix the miscompile.

llvm-svn: 321461
2017-12-26 22:12:20 +00:00
Florian Hahn 012c8f97b2 [InstCombine] Add debug location to new caller.
Reviewers: rnk, aprantl, majnemer

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D414

llvm-svn: 321191
2017-12-20 17:16:59 +00:00
Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921
2017-12-16 16:41:17 +00:00
Fedor Sergeev 83bcc68afa [PM][InstCombine] fixing omission of AliasAnalysis in new-pass-manager's version of InstCombine
Summary:
Passing AliasAnalysis results instead of nullptr appears to work just fine.
A couple new-pass-manager tests updated to align with new order of analyses.

Reviewers: chandlerc, spatel, craig.topper

Reviewed By: chandlerc

Subscribers: mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D41203

llvm-svn: 320687
2017-12-14 10:36:31 +00:00
Michael Zolotukhin 6af4f232b5 Remove redundant includes from lib/Transforms.
llvm-svn: 320628
2017-12-13 21:31:01 +00:00
Igor Laevsky e0edb66475 Reintroduce r320049, r320014 and r319894.
OpenGL issues should be fixed by now.

llvm-svn: 320568
2017-12-13 11:21:18 +00:00
Alexey Bataev 83c15b1363 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320525
2017-12-12 20:28:46 +00:00
Alexey Bataev fa0a76dbcc Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320510 - again sanitizers bbots.

llvm-svn: 320513
2017-12-12 19:12:34 +00:00
Alexey Bataev 195c97e220 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320510
2017-12-12 18:47:00 +00:00
Alexey Bataev 6132a50d2a Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320499 again to resolve the problem with the
sanitizers bbots.

llvm-svn: 320501
2017-12-12 17:35:29 +00:00
Alexey Bataev ca4c9a5246 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320499
2017-12-12 17:19:15 +00:00
Alexey Bataev d19dbe6791 Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320496 to solve the problems with sanitizer
buildbots.

llvm-svn: 320498
2017-12-12 17:08:48 +00:00
Alexey Bataev d0c3aeb200 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320496
2017-12-12 16:58:48 +00:00
Alexey Bataev c9f1d2e4a0 Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320488 because of the failed asan buildbots..

llvm-svn: 320490
2017-12-12 16:05:52 +00:00
Alexey Bataev fb68c48a82 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320488
2017-12-12 15:54:49 +00:00
Alexey Bataev ca2a8cea2f Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
This reverts commit r320483 because of the failed Windows buildbots.

llvm-svn: 320485
2017-12-12 15:24:17 +00:00
Alexey Bataev 1daef8a667 [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320483
2017-12-12 15:03:17 +00:00
Anna Thomas 2dd9835f35 [InstComineLoadStoreAlloca] Optimize stores to GEP off null base
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
  1. load off a null
  2. load off a GEP with null base
  3. store to a null

This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.

Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).

Reviewers: majnemer, hfinkel, sanjoy, davide

Reviewed by: sanjoy, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41026

llvm-svn: 320480
2017-12-12 14:12:33 +00:00
Igor Laevsky d63560b817 Revert r320049, r320014 and r319894
They were causing failures of the piglit OpenGL tests with AMD GPUs using the
Mesa radeonsi driver.

llvm-svn: 320466
2017-12-12 10:03:39 +00:00
Hans Wennborg 27d1c00c01 Revert r320407 "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
The tests fail (opt asserts) on Windows.

> Summary:
> If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
> &V2)))), bitcast)`, but the load is used in other instructions, it leads
> to looping in InstCombiner. Patch adds additional check that all users
> of the load instructions are stores and then replaces all uses of load
> instruction by the new one with new type.
>
> Reviewers: RKSimon, spatel, majnemer
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320421
2017-12-11 21:15:27 +00:00
Alexey Bataev ec128ace8a [InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.

Reviewers: RKSimon, spatel, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41072

llvm-svn: 320407
2017-12-11 19:11:16 +00:00
Simon Pilgrim a42a54258e [InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515)
Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.)

llvm-svn: 320270
2017-12-09 23:42:56 +00:00
Evgeniy Stepanov c667c1f47a Hardware-assisted AddressSanitizer (llvm part).
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.

HWASan comes with its own IR attribute.

A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).

Reviewers: kcc, pcc, alekseyshl

Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D40932

llvm-svn: 320217
2017-12-09 00:21:41 +00:00
Alexey Bataev ec95c6cc0a [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1, &V2)) --> store (, load (select(Cond, load &V1, load &V2)))
Summary:
If we have the code like this:
```
float a, b;
a = std::max(a ,b);
```
it is converted into something like this:
```
%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4
```
After inlinning this code is converted to the next:
```
%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4

```
This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence
```
store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2))))
```
to a sequence
```
store (,load (select((cmp V1, V2), &V1, &V2)))
```
After this the code is recognized as minmax pattern.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40304

llvm-svn: 320157
2017-12-08 15:32:10 +00:00
Igor Laevsky 4a4f2e8c67 [InstCombine] Don't crash on out of bounds index in the insertelement
Differential Revision: https://reviews.llvm.org/D40390

llvm-svn: 320049
2017-12-07 15:00:52 +00:00
Sanjay Patel b6404a8ca6 [InstCombine] canonicalize constant-minus-boolean to select-of-constants
This restores the half of:
https://reviews.llvm.org/rL75531
that was reverted at:
https://reviews.llvm.org/rL159230

For the x86 case mentioned there, we now produce:
leal 1(%rdi), %eax
subl %esi, %eax

We have target hooks to invert this in DAGCombiner (and x86 is enabled) with:
https://reviews.llvm.org/rL296977
https://reviews.llvm.org/rL311731

AArch64 and possibly other targets would probably benefit from enabling those hooks too. 
See PR30327:
https://bugs.llvm.org/show_bug.cgi?id=30327#c2

Differential Revision: https://reviews.llvm.org/D40612

llvm-svn: 319964
2017-12-06 21:22:57 +00:00
Sanjay Patel 863d494730 [InstCombine] use 'auto' with 'dyn_cast'; NFC
llvm-svn: 319067
2017-11-27 18:19:32 +00:00
Sanjay Patel b3fa94586f [InstCombine] include 'sub' in the list of narrow-able binops
// trunc (binop X, C) --> binop (trunc X, C')
      // trunc (binop (ext X), Y) --> binop X, (trunc Y)

I'm grouping sub with the other binops  because that makes the code simpler
and the transforms are valid:
https://rise4fun.com/Alive/UeF
...so even though we don't expect a sub with constant Op1 or any of the
other opcodes with constant Op0 due to canonicalization rules, we might as
well handle those situations if non-canonical code somehow reaches this
point (it should just make instcombine more efficient in reaching its
end goal).

This should solve the problem that later manifests in the vectorizers in 
PR35295:
https://bugs.llvm.org/show_bug.cgi?id=35295

llvm-svn: 318404
2017-11-16 14:40:51 +00:00
Sanjay Patel 03d0cd6a81 [InstCombine] trunc (binop X, C) --> binop (trunc X, C')
Note that one-use and shouldChangeType() are checked ahead of the switch.

Without the narrowing folds, we can produce inferior vector code as shown in PR35299:
https://bugs.llvm.org/show_bug.cgi?id=35299

llvm-svn: 318323
2017-11-15 19:12:01 +00:00
Reid Kleckner 72b819b8ee [InstCombine] Salvage debug info during initial DCE
InstCombine salvages debug info for every instruction it erases from its
worklist, but it wasn't doing it during its initial DCE when populating
its worklist. This fixes that.

This should help improve availability of 'this' in optimized debug info
when casts are necessary.

llvm-svn: 318320
2017-11-15 18:51:12 +00:00
Craig Topper f7b86728fa [InstCombine] Simplify binops that are only used by a select and are fed by a select with the same condition.
Summary:
This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select.

As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero?

Reviewers: spatel, majnemer

Reviewed By: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39999

llvm-svn: 318267
2017-11-15 05:23:02 +00:00
Craig Topper d3e5781e53 [InstCombine] Teach visitICmpInst to not break integer absolute value idioms
Summary:
This patch adds an early out to visitICmpInst if we are looking at a compare as part of an integer absolute value idiom. Similar is already done for min/max.

In the particular case I observed in a benchmark we had an absolute value of a load from an indexed global. We simplified the compare using foldCmpLoadFromIndexedGlobal into a magic bit vector, a shift, and an and. But the load result was still used for the select and the negate part of the absolute valute idiom. So we overcomplicated the code and lost the ability to recognize it as an absolute value.

I've chosen a simpler case for the test here.

Reviewers: spatel, davide, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39766

llvm-svn: 317994
2017-11-12 02:28:21 +00:00
Craig Topper 7dd4d32431 Recommit r317510 "[InstCombine] Pull shifts through a select plus binop with constant"
The hexagon test should be fixed now.

Original commit message:

This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.

This can allow us to get the select closer to other selects to enable removing one.

Differential Revision: https://reviews.llvm.org/D39222

llvm-svn: 317600
2017-11-07 18:47:24 +00:00
Craig Topper 386fc2516c [InstCombine] Update stale comment. NFC
Datalayout is no longer optional so the comment didn't match what the code currently does.

llvm-svn: 317594
2017-11-07 17:37:32 +00:00
Hans Wennborg 8c4b10e84a Revert r317510 "[InstCombine] Pull shifts through a select plus binop with constant"
This broke the CodeGen/Hexagon/loop-idiom/pmpy-mod.ll test on a bunch of buildbots.

> This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
>
> This can allow us to get the select closer to other selects to enable removing one.
>
> Differential Revision: https://reviews.llvm.org/D39222
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317510 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-svn: 317518
2017-11-06 22:28:02 +00:00
Craig Topper 8917647333 [InstCombine] Pull shifts through a select plus binop with constant
This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.

This can allow us to get the select closer to other selects to enable removing one.

Differential Revision: https://reviews.llvm.org/D39222

llvm-svn: 317510
2017-11-06 21:07:22 +00:00
Sanjay Patel 629c411538 [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic 
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc' 
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did 
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), 
but that's apparently already used for other purposes. Also, I don't think we can just 
add a field to FPMathOperator because Operator is not intended to be instantiated. 
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the 
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile 
them. For example, if nsw is ever replaced with something else, dropping it would be 
a valid way to upgrade the IR." 
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR 
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will 
fail to optimize some previously 'fast' code because it's no longer recognized as 
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow 
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488
2017-11-06 16:27:15 +00:00
Matthew Simpson b6915fbfa2 [InstCombine] Simplify selects that test cmpxchg instructions
If a select instruction tests the returned flag of a cmpxchg instruction and
selects between the returned value of the cmpxchg instruction and its compare
operand, the result of the select will always be equal to its false value.

Differential Revision: https://reviews.llvm.org/D39383

llvm-svn: 316994
2017-10-31 12:34:02 +00:00
Daniel Neilson f9c7d29c77 Create instruction classes for identifying any atomicity of memory intrinsic. (NFC)
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html

This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:

IntrinsicInst
  -> MemIntrinsicBase (methods-only class)
    -> MemIntrinsic (non-atomic intrinsics)
      -> MemSetInst
      -> MemTransferInst
        -> MemCpyInst
        -> MemMoveInst
    -> AtomicMemIntrinsic (atomic intrinsics)
      -> AtomicMemSetInst
      -> AtomicMemTransferInst
        -> AtomicMemCpyInst
        -> AtomicMemMoveInst
    -> AnyMemIntrinsic (both atomicities)
      -> AnyMemSetInst
      -> AnyMemTransferInst
        -> AnyMemCpyInst
        -> AnyMemMoveInst

This involves some class renaming:
    ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
    ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
    ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.

An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).

---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"

REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"

FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )

SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"

for f in $FILES; do
    echo "Processing: $f"
    sed  -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done

Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev

Reviewed By: sanjoy

Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38419

llvm-svn: 316950
2017-10-30 19:51:48 +00:00
Eugene Zelenko 7f0f9bc5ab [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316503
2017-10-24 21:24:53 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Marek Olsak 2114fc3bcb AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D38543

llvm-svn: 316426
2017-10-24 10:26:59 +00:00
Sanjay Patel 42135beac8 [InstCombine] don't unnecessarily generate a constant; NFCI
llvm-svn: 315910
2017-10-16 14:47:24 +00:00
Nikolai Bozhenov 0e7ebbccc7 Move folding of icmp with zero after checking for min/max idioms.
Summary:
The following transformation for cmp instruction:

  icmp smin(x, PositiveValue), 0 -> icmp x, 0

should only be done after checking for min/max to prevent infinite
looping caused by a reverse canonicalization. That is why this
transformation was moved to place after the mentioned check.

Reviewers: spatel, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38934

Patch by: Artur Gainullin <artur.gainullin@intel.com>

llvm-svn: 315895
2017-10-16 09:19:21 +00:00
Sanjay Patel 934738a3da revert r314984: revert r314698 - [InstCombine] remove one-use restriction for icmp (shr exact X, C1), C2 --> icmp X, (C2<<C1)
Recommitting r314698. The bug exposed by this change should be fixed with:
https://reviews.llvm.org/rL315579 

llvm-svn: 315857
2017-10-15 15:39:15 +00:00
Sanjay Patel b869f76d85 [InstCombine] use m_Neg() to reduce code; NFCI
llvm-svn: 315762
2017-10-13 21:28:50 +00:00
Sanjay Patel f0242de143 [InstCombine] move code to remove repeated constant check; NFCI
Also, consolidate tests for this fold in one place.

llvm-svn: 315745
2017-10-13 20:29:11 +00:00
Sanjay Patel 28b3aa3663 [InstCombine] recycle adds for better efficiency
Also, clean up unnecessary matcher capture variable initializations.

llvm-svn: 315743
2017-10-13 20:12:21 +00:00
Sanjay Patel 2118952162 [InstCombine] use local var to reduce code duplication; NFCI
llvm-svn: 315728
2017-10-13 18:32:53 +00:00
Sanjay Patel c419c9f640 [InstCombine] add hasOneUse check to add-zext-add fold to prevent increasing instructions
llvm-svn: 315718
2017-10-13 17:47:25 +00:00
Sanjay Patel 76ed9eab29 [InstCombine] use AddOne helper to reduce code; NFC
llvm-svn: 315709
2017-10-13 17:00:47 +00:00
Sanjay Patel 8d810fee43 [InstCombine] rearrange code to remove repeated constant check; NFCI
llvm-svn: 315703
2017-10-13 16:43:58 +00:00
Sanjay Patel 2150651ac3 [InstCombine] allow zext(bool) + C --> select bool, C+1, C for vector types
The backend should be prepared for this transform after:
https://reviews.llvm.org/rL311731

llvm-svn: 315701
2017-10-13 16:29:38 +00:00
Xinliang David Li 4cdc9dab0a Renable r314928
Eliminate inttype phi with inttoptr/ptrtoint.

 This version fixed a bug in finding the matching
 phi -- the order of the incoming blocks may be 
 different (triggered in self build on Windows).
 A new test case is added.

llvm-svn: 315272
2017-10-10 05:07:54 +00:00
Adam Nemet 0965da2055 Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*
Sync it up with the name of the class actually defined here.  This has been
bothering me for a while...

llvm-svn: 315249
2017-10-09 23:19:02 +00:00
Sanjay Patel ce36b03b03 [InstCombine] fix formatting; NFC
llvm-svn: 315223
2017-10-09 17:54:46 +00:00
Sanjay Patel 72d339abb7 [InstCombine] use correct type when propagating constant condition in simplifyDivRemOfSelectWithZeroOp (PR34856)
llvm-svn: 315130
2017-10-06 23:43:06 +00:00
Sanjay Patel ae2e3a44d2 [InstCombine] rename SimplifyDivRemOfSelect to be clearer, add comments, simplify code; NFCI
There's at least one bug here - this code can fail with vector types (PR34856).
It's also being called for FREM; I'm still trying to understand how that is valid.

llvm-svn: 315127
2017-10-06 23:20:16 +00:00
Reid Kleckner b6b210e61f Revert "Roll forward r314928"
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870

Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:

C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
            _Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
                   ~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~

I am attempting to reproduce this now.

This reverts r315044

llvm-svn: 315108
2017-10-06 21:17:51 +00:00
Xinliang David Li bcd36f7c5a Roll forward r314928
Fixed ThinLTO bootstrap failure : track new
bitcast per incomingVal. Added new tests.

llvm-svn: 315044
2017-10-06 05:15:25 +00:00
Sanjay Patel 7ac2db6a48 [InstCombine] improve folds for icmp gt/lt (shr X, C1), C2
We can always eliminate the shift in: icmp gt/lt (shr X, C1), C2 --> icmp gt/lt X, C'
This patch was supposed to just be an efficiency improvement because we were doing this 3-step process to fold:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: ADD:   %1 = udiv i4 %x, 2
IC: Old =   %c = icmp ugt i4 %1, 1
    New =   <badref> = icmp uge i4 %x, 4
IC: ADD:   %c = icmp uge i4 %x, 4
IC: ERASE   %2 = icmp ugt i4 %1, 1
IC: Visiting:   %c = icmp uge i4 %x, 4
IC: Old =   %c = icmp uge i4 %x, 4
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %2 = icmp uge i4 %x, 4
IC: Visiting:   %c = icmp ugt i4 %x, 3
IC: DCE:   %1 = udiv i4 %x, 2
IC: ERASE   %1 = udiv i4 %x, 2
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   ret i1 %c

When we could go directly to canonical icmp form:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: Old =   %c = icmp ugt i4 %s, 1
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %1 = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   %c = icmp ugt i4 %x, 3

...but then I noticed that the folds were incomplete too:
https://godbolt.org/g/aB2hLE

Here are attempts to prove the logic with Alive:
https://rise4fun.com/Alive/92o

Name: lshr_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

Name: ashr_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_ugt
Pre: (((C2+1) << C1) u>> C1) == (C2+1)
%sh = lshr i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, ((C2+1) << C1) - 1

Name: ashr_sgt
Pre: (C2 != 127) && ((C2+1) << C1 != -128) && (((C2+1) << C1) >> C1) == (C2+1)
%sh = ashr i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, ((C2+1) << C1) - 1

Name: ashr_exact_sgt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, (C2 << C1)

Name: ashr_exact_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_exact_ugt
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, (C2 << C1)

Name: lshr_exact_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

We did something similar for 'shl' in D28406.

Differential Revision: https://reviews.llvm.org/D38514

llvm-svn: 315021
2017-10-05 21:11:49 +00:00
Sanjay Patel f11b5b4f87 revert r314698 - [InstCombine] remove one-use restriction for icmp (shr exact X, C1), C2 --> icmp X, (C2<<C1)
There is a bot failure that appears to be related to this change:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/2117

...so reverting to confirm that and attempting to keep the bot green while investigating.

llvm-svn: 314984
2017-10-05 14:26:15 +00:00
Craig Topper 17b0c78447 [InstCombine] Fix a vector splat handling bug in transformZExtICmp.
We were using an i1 type and then zero extending to a vector. Instead just create the 0/1 directly as a ConstantInt with the correct type. No need to ask ConstantExpr to zero extend for us.

This bug is a bit tricky to hit because it requires us to visit a zext of an icmp that would normally be simplified to true/false, but that icmp hasnt' been visited yet. In the test case this zext and icmp were created by visiting a udiv and due to worklist ordering we got to the zext first.

Fixes PR34841.

llvm-svn: 314971
2017-10-05 07:59:11 +00:00
Xinliang David Li 04ab11a08a Revert r314928 to investigate thinLTO bootstrap failure
llvm-svn: 314961
2017-10-05 01:40:13 +00:00
Craig Topper 7a93092399 [InstCombine] Improve support for ashr in foldICmpAndShift
We can support ashr similar to lshr, if we know that none of the shifted in bits are used. In that case SimplifyDemandedBits would normally convert it to lshr. But that conversion doesn't happen if the shift has additional users.

Differential Revision: https://reviews.llvm.org/D38521

llvm-svn: 314945
2017-10-04 23:06:13 +00:00
Xinliang David Li 7a73757358 Recommit r314561 after fixing msan build failure
(trial 2) Incoming val defined by terminator instruction which
also requires bitcasts can not be handled.

llvm-svn: 314928
2017-10-04 20:17:55 +00:00
Craig Topper df63b96811 [InstCombine] Use isSignBitCheck to simplify an if statement. Directly create new sign bit compares instead of manipulating the constant. NFCI
Since we no longer had the direct constant compares, manipulating the constant seemeded less clear.

llvm-svn: 314830
2017-10-03 19:14:23 +00:00
Craig Topper 8ed1aa91bd [InstCombine] Change a bunch of methods to take APInts by reference instead of pointer.
This allows us to remove a bunch of dereferences and only have a few dereferences at the call sites.

llvm-svn: 314762
2017-10-03 05:31:07 +00:00
Craig Topper 664c4d0190 [InstCombine] Replace an equality compare of two APInt pointers with a compare of the APInts themselves.
Apparently this works by virtue of the fact that the pointers are pointers to the APInts stored inside of the ConstantInt objects. But I really don't think we should be relying on that.

llvm-svn: 314761
2017-10-03 04:55:04 +00:00
Sanjay Patel 6e17c00a88 [InstCombine] remove one-use restriction for icmp (shr exact X, C1), C2 --> icmp X, (C2<<C1)
llvm-svn: 314698
2017-10-02 18:26:44 +00:00
Dehao Chen f464627f28 Update getMergedLocation to check the instruction type and merge properly.
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.

Reviewers: dblaikie, aprantl

Reviewed By: dblaikie, aprantl

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D37877

llvm-svn: 314694
2017-10-02 18:13:14 +00:00
Craig Topper 6e025a3ecc [InstCombine] Use APInt for all the math in foldICmpDivConstant
Summary: This currently uses ConstantExpr to do its math, but as noted in a TODO it can all be done directly on APInt.

Reviewers: spatel, majnemer

Reviewed By: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38440

llvm-svn: 314640
2017-10-01 23:53:54 +00:00
Daniel Jasper 3c9c60c727 Revert r314579: "Recommi r314561 after fixing over-debug assertion".
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.

Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.

llvm-svn: 314622
2017-10-01 09:53:53 +00:00
Xinliang David Li b8aac3ac19 Fix buildbot failure -- tighten type check for matching phi
llvm-svn: 314585
2017-09-30 05:27:46 +00:00
Xinliang David Li 3409d9c07f Recommi r314561 after fixing over-debug assertion
llvm-svn: 314579
2017-09-30 00:46:32 +00:00
Xinliang David Li 455dec098b Revert 314561 due to debug build assertion failure
llvm-svn: 314563
2017-09-29 22:30:34 +00:00
Xinliang David Li 5b9d96825b Eliminate PHI (int typed) which has only one use by intptr
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.

Differential revision: http://reviews.llvm.org/D37832

llvm-svn: 314561
2017-09-29 22:10:15 +00:00
Craig Topper 0cd25942f7 Revert r314017 '[InstCombine] Simplify check for RHS being a splat constant in foldICmpUsingKnownBits by just checking Op1Min==Op1Max rather than going through m_APInt.'
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.

llvm-svn: 314360
2017-09-27 22:57:18 +00:00
Chad Rosier d8b4b06f5d [InstCombine] Gating select arithmetic optimization.
These changes faciliate positive behavior for arithmetic based select
expressions that match its translation criteria, keeping code size gated to
neutral or improved scenarios.

Patch by Michael Berg <michael_c_berg@apple.com>!

Differential Revision: https://reviews.llvm.org/D38263

llvm-svn: 314320
2017-09-27 17:16:51 +00:00
Craig Topper 8bf622174d [InstCombine] Remove one use restriction on the shift for calls to foldICmpAndShift.
If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has.

This distributes the one use check to other transforms in foldICmpAndConstConst that do need it.

Differential Revision: https://reviews.llvm.org/D38206

llvm-svn: 314233
2017-09-26 18:47:25 +00:00
Craig Topper 30dc9797e9 [InstCombine] Move an optimization from foldICmpAndConstConst to foldICmpUsingKnownBits
All this optimization cares about is knowing how many low bits of LHS is known to be zero and whether that means that the result is 0 or greater than the RHS constant. It doesn't matter where the zeros in the low bits came from. So we don't need to specifically look for an AND. Instead we can use known bits.

Differential Revision: https://reviews.llvm.org/D38195

llvm-svn: 314153
2017-09-25 21:15:00 +00:00
Sanjay Patel ecb175608f [InstCombine] remove extract-of-select vector transform (2nd try)
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118

because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again. 

Original commit message:

The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013

And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>

Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.

The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.

The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.

Differential Revision: https://reviews.llvm.org/D38006

llvm-svn: 314147
2017-09-25 20:30:53 +00:00
Sanjay Patel aa7f750bec revert r314117 because there are bogus clang tests that depend on the optimizer
llvm-svn: 314118
2017-09-25 17:00:04 +00:00
Sanjay Patel 9639897d77 [InstCombine] remove extract-of-select vector transform
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013

And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>

Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.

The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.

The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.

Differential Revision: https://reviews.llvm.org/D38006

llvm-svn: 314117
2017-09-25 16:41:34 +00:00
Craig Topper ea927baee2 [InstCombine] Teach foldICmpUsingKnownBits to simplify SLE/SGE/ULE/UGE to equality comparisons when the min/max ranges intersect in a single value.
This is the inverse of what we do for SGT/SLT/UGT/ULT.

llvm-svn: 314032
2017-09-22 21:47:22 +00:00
Craig Topper 3f364aa908 [InstCombine] Add constant splat handling to one of the ICMP_SLT/SGT cases in foldICmpUsingKnownBits.
llvm-svn: 314025
2017-09-22 19:54:15 +00:00
Craig Topper 3edda87c42 [InstCombine] Move the call to isSignBitCheck into getDemandedBitsLHSMask instead of calling it outside and passing its result through a flag. NFCI
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.

llvm-svn: 314018
2017-09-22 18:57:23 +00:00
Craig Topper 5b35b68785 [InstCombine] Simplify check for RHS being a splat constant in foldICmpUsingKnownBits by just checking Op1Min==Op1Max rather than going through m_APInt.
llvm-svn: 314017
2017-09-22 18:57:22 +00:00
Craig Topper 2c9b7d7894 [InstCombine] Make cases for ICMP_UGT/ICMP_ULT use similar formatting since they use similar code. NFC
llvm-svn: 314016
2017-09-22 18:57:20 +00:00
Reid Kleckner 0fe506bc5e Re-land r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
     Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+    if (DII == InsertBefore)
+      InsertBefore = &*std::next(InsertBefore->getIterator());
     DII->eraseFromParent();

I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.

The reduced C test case for this was:
  void useit(int*);
  static inline void inlineme() {
    int x[2];
    useit(x);
  }
  void f() {
    inlineme();
    inlineme();
  }

llvm-svn: 313905
2017-09-21 19:52:03 +00:00
Daniel Jasper 7d2f38d600 Revert r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare"
.. as well as the two subsequent changes r313826 and r313875.

This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).

llvm-svn: 313876
2017-09-21 12:07:33 +00:00
Craig Topper 18887bf179 [InstCombine] Teach getDemandedBitsLHSMask to handle constant splat vectors
This replaces a ConstantInt dyn_cast with m_APInt

Differential Revision: https://reviews.llvm.org/D38100

llvm-svn: 313840
2017-09-20 23:48:58 +00:00
Reid Kleckner 3f547e87b2 [IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
  http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html

This is tracked as PR34136

llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.

The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.

Reviewers: aprantl, dblaikie, probinson

Subscribers: eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37768

llvm-svn: 313825
2017-09-20 21:52:33 +00:00
Craig Topper 562bf99ee6 [InstCombine] Handle (X & C2) < C1 --> (X & C2) == 0
We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1)

Differential Revision: https://reviews.llvm.org/D38065

llvm-svn: 313819
2017-09-20 21:18:17 +00:00
Craig Topper a0c897f634 [InstCombine] Use APInt::getActiveBits() to avoid creating an APInt from a trailing zero count to do a comparison. NFCI
llvm-svn: 313792
2017-09-20 18:49:29 +00:00
Quentin Colombet aa103b3d86 [InstCombine] Add select simplifications
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.

Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019

llvm-svn: 313774
2017-09-20 17:32:16 +00:00
Craig Topper f264fcc704 [X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.

llvm-svn: 313450
2017-09-16 07:36:14 +00:00