Commit Graph

4946 Commits

Author SHA1 Message Date
JF Bastien 8fc9eea43a Test that volatile load type isn't changed
Summary: As discussed in D75505, it's not particularly useful to change the type of a load to/from floating-point/integer because it's followed by a bitcast, and it might lead to surprising code generation. Check that this doesn't generally happen.

Reviewers: lebedev.ri

Subscribers: jkorous, dexonsmith, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75644
2020-03-09 11:19:23 -07:00
Nikita Popov 45555c3819 [InstSimplify] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. The "-inst-simplify" pass
already handled this for the constant integer argument case via
known bits, which is invoked in SimplifyInstruction. However,
non-constant (or non-int) arguments are not handled at all right now.

This addresses one of the regressions from D75801.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-09 18:53:47 +01:00
Nikita Popov c3ca6876ed [InstCombine] Don't simplify calls without uses
When simplifying a call without uses, replaceInstUsesWith() is
going to do nothing, but we'll skip all following folds. We can
only run into this problem with calls that both simplify and are
not trivially dead if unused, which currently seems to happen only
with calls to undef, as the test diff shows. When extending
SimplifyCall() to handle "returned" attributes, this becomes a much
bigger problem, so I'm fixing this first.

Differential Revision: https://reviews.llvm.org/D75814
2020-03-09 18:47:46 +01:00
Nikita Popov 51a466a61f [InstCombine] Fix known bits handling in SimplifyDemandedUseBits
Fixes a regression from D75801. SimplifyDemandedUseBits() is also
supposed to compute the known bits (of the demanded subset) of the
instruction. For unknown instructions it does so by directly calling
computeKnownBits(). For known instructions it will compute known
bits itself. However, for instructions where only some cases are
handled directly (e.g. a constant shift amount) the known bits
invocation for the unhandled case is sometimes missing. This patch
adds the missing calls and thus removes the main discrepancy with
ExpensiveCombines mode.

Differential Revision: https://reviews.llvm.org/D75804
2020-03-07 18:16:41 +01:00
Nikita Popov f2419adc48 [InstCombine] Regenerate test checks; NFC 2020-03-07 17:58:33 +01:00
Nikita Popov 2904a332fe [InstCombine] Add additional known bits folding tests; NFC 2020-03-07 17:17:03 +01:00
Nikita Popov 4cfb4afb70 [InstCombine] Highlight tests using expensive combines; NFC 2020-03-07 17:16:47 +01:00
Sanjay Patel 89fdee87f7 [InstCombine] regenerate complete test checks; NFC 2020-03-07 10:20:38 -05:00
Sanjay Patel 564f5eed1a [InstCombine] add test for gep (select),... (PR45084); NFC 2020-03-07 10:00:31 -05:00
Roman Lebedev 1badf7c33a
[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant
Summary:
This is potentially more friendly for further optimizations,
analysies, e.g.: https://godbolt.org/z/G24anE

This resolves phase-ordering bug that was introduced
in D75145 for https://godbolt.org/z/2gBwF2
https://godbolt.org/z/XvgSua

Reviewers: spatel, nikic, dmgreen, xbolva00

Reviewed By: nikic, xbolva00

Subscribers: hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75757
2020-03-06 21:39:07 +03:00
Roman Lebedev 69ec84f8e7
[NFC][InstCombine] Add 'x - (x & y)' tests with multi-use 'and'
If %y is constant, we could still perform the fold
2020-03-06 19:41:19 +03:00
Nikita Popov 9b5de84e27 [InstCombine] Use IRBuilder to create bitcast
This makes sure that the constant expression bitcast goes through
target-dependent constant folding, and thus avoids an additional
iteration of InstCombine.
2020-03-04 18:28:38 +01:00
Roman Lebedev 9e1443e6f6
[NFC][InstCombine] Add test with non-CSE'd casts of load
in @t0 we can still change type of load and get rid of casts.
2020-03-03 11:27:27 +03:00
Reid Kleckner 1adbe86d87 [WinEH] Fix inttoptr+phi optimization in presence of catchswitch
getFirstInsertionPt's return value must be checked for validity before
casting it to Instruction*. Don't attempt to insert casts after a phi in
a catchswitch block.

Fixes PR45033, introduced in D37832.

Reviewed By: davidxl, hfinkel

Differential Revision: https://reviews.llvm.org/D75381
2020-03-01 07:49:28 -08:00
Simon Pilgrim 7e9747b50b [X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554)
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.

Differential Revision: https://reviews.llvm.org/D75162
2020-02-29 18:57:35 +00:00
Nikita Popov 4ef272ec9c [InstCombine] DCE instructions earlier
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.

To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.

This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.

This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.

Differential Revision: https://reviews.llvm.org/D75008
2020-02-27 18:45:59 +01:00
Simon Moll ddd11273d9 Remove BinaryOperator::CreateFNeg
Use UnaryOperator::CreateFNeg instead.

Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75130
2020-02-27 09:06:03 -08:00
Simon Pilgrim 080890a9f3 [InstCombine] Add PR14365 test cases + vector equivalents. 2020-02-27 15:54:14 +00:00
Nikita Popov 56f7de5baa [InstCombine] Remove trivially empty ranges from end
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.

The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.

Differential Revision: https://reviews.llvm.org/D75011
2020-02-26 20:04:11 +01:00
Roman Lebedev 2855c8fed9
[InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): fix miscompile (PR44802)
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
  icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
  icmp eq/ne (and (x shift (Q+K)), y), 0  iff (Q+K) u< bitwidth(x)

While we know that originally (Q+K) would not overflow
(because  2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.

To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.

https://bugs.llvm.org/show_bug.cgi?id=44802
2020-02-25 18:23:58 +03:00
Roman Lebedev 6f807ca00d
[NFC][InstCombine] Add shift amount reassociation in bittest miscompile example from PR44802
https://bugs.llvm.org/show_bug.cgi?id=44802
2020-02-25 18:23:58 +03:00
Roman Lebedev 781d077afb
[InstCombine] reassociateShiftAmtsOfTwoSameDirectionShifts(): fix miscompile (PR44802)
As input, we have the following pattern:
  Sh0 (Sh1 X, Q), K
We want to rewrite that as:
  Sh x, (Q+K)  iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because  2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.

To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.

https://bugs.llvm.org/show_bug.cgi?id=44802
2020-02-25 18:23:51 +03:00
Roman Lebedev 425ef99938
[NFC][InstCombine] Add shift amount reassociation miscompile example from PR44802
https://bugs.llvm.org/show_bug.cgi?id=44802
2020-02-25 18:23:16 +03:00
Nuno Lopes 98ac6e7696 [NFC] fix test nan value 2020-02-23 12:42:47 +00:00
Krzysztof Parzyszek d2b7c09e79 [Hexagon] Simplify intrinsic (vandvrt (vandqrt q b) m) -> q if possible
When each byte in b&m is non-zero, this conversion Q->V->Q is a no-op.
2020-02-21 13:56:04 -06:00
Nikita Popov b178555318 [InstCombine] Improve simplify demanded bits worklist management
This fixes a small mistake from D72944: The worklist add should
happen before assigning the new operand, not after.

In case an actual replacement happens, the old operand needs to
be added for DCE. If no actual replacement happens, then old/new
are the same, so it doesn't matter.

This drops one iteration from the annotated test case.
2020-02-21 18:51:41 +01:00
Nikita Popov a8db806d52 [SimplifyLibCalls][IRBuilder] Accept any IRBuilder in SimplifyLibCalls
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.

To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.

Differential Revision: https://reviews.llvm.org/D74792
2020-02-21 18:26:05 +01:00
Nikita Popov f6875c434e Reapply [IRBuilder] Always respect inserter/folder
Some IRBuilder methods that were originally defined on
IRBuilderBase do not respect custom IRBuilder inserters/folders,
because those were not accessible prior to D73835. Fix this by
making use of existing (and now accessible) IRBuilder methods,
which will handle inserters/folders correctly.

There are some changes in OpenMP and Instrumentation tests, where
bitcasts now get constant folded. I've also highlighted one
InstCombine test which now finishes in two rather than three
iterations, thanks to new instructions being inserted into the
worklist.

Differential Revision: https://reviews.llvm.org/D74787
2020-02-19 20:51:38 +01:00
Nikita Popov b92b1701cd Revert "[IRBuilder] Always respect inserter/folder"
This reverts commit f12fb2d99b.

I missed some changes in instrumentation test cases.
2020-02-19 17:51:55 +01:00
Nikita Popov f12fb2d99b [IRBuilder] Always respect inserter/folder
Some IRBuilder methods that were originally defined on
IRBuilderBase do not respect custom IRBuilder inserters/folders,
because those were not accessible prior to D73835. Fix this by
making use of existing (and now accessible) IRBuilder methods,
which will handle inserters/folders correctly.

There are some changes in OpenMP tests, where bitcasts now get
constant folded. I've also highlighted one InstCombine test which
now finishes in two rather than three iterations, thanks to new
instructions being inserted into the worklist.

Differential Revision: https://reviews.llvm.org/D74787
2020-02-19 17:44:43 +01:00
Nikita Popov 1ab37fad61 [InstCombine] Fix worklist management when simplifying demanded bits
When simplifying demanded bits, we currently only report the
instruction on which SimplifyDemandedBits was called as changed.
However, this is a recursive call, and the actually modified
instruction will usually be further up the chain. Additionally,
all the intermediate instructions should also be revisited,
as additional combines may be possible after the demanded bits
simplification. We fix this by explicitly adding them back to the
worklist.

Differential Revision: https://reviews.llvm.org/D72944
2020-02-18 17:55:40 +01:00
Nikita Popov c9540fe59b [InstCombine] Fix multi-use handling in cttz transform
The select-of-cttz transform can currently duplicate cttz intrinsics
and zext/trunc ops. The cause is that it unnecessarily duplicates
the intrinsic and the zext/trunc when setting the "undef_on_zero"
flag to false. However, it's always legal to set the flag from true
to false, so we can make this replacement even if there are extra users.

Differential Revision: https://reviews.llvm.org/D74685
2020-02-18 17:55:00 +01:00
Nikita Popov 9adedd146d [InstCombine] Relax preconditions for ashr+and+icmp fold (PR44754)
Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have
a fold that converts icmp (and (ashr X, C3), C2), C1 into
icmp (and C2'), C1', but it imposed overly strict requirements on the
transform.

Relax this by checking that both C2 and C1 don't shift out bits
(in a signed sense) when forming the new constants.

Alive proofs (https://rise4fun.com/Alive/PTz0):

    Name: ashr_legal
    Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1
    %a = ashr i16 %x, C3
    %b = and i16 %a, C2
    %c = icmp i16 %b, C1
    =>
    %d = and i16 %x, C2 << C3
    %c = icmp i16 %d, C1 << C3

    Name: ashr_shiftout_eq
    Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1
    %a = ashr i16 %x, C3
    %b = and i16 %a, C2
    %c = icmp eq i16 %b, C1
    =>
    %c = false

Note that >> corresponds to ashr here. The case of an equality
comparison has some special handling in this transform, because
it will form to a true/false result if the condition on the comparison
constant it violated.

Differential Revision: https://reviews.llvm.org/D74294
2020-02-18 17:49:46 +01:00
Nikita Popov 9bc6bc2d8c [InstCombine] Add more tests for icmp+and+ashr; NFC 2020-02-18 17:47:48 +01:00
Florian Hahn 6c85e92bcf [InstCombine] Simplify a umul overflow check to a != 0 && b != 0.
This patch adds a simplification if an OR weakens the overflow condition
for umul.with.overflow by treating any non-zero result as overflow. In that
case, we overflow if both umul.with.overflow operands are != 0, as in that
case the result can only be 0, iff the multiplication overflows.

Code like this is generated by code using __builtin_mul_overflow with
negative integer constants, e.g.
   bool test(unsigned long long v, unsigned long long *res) {
     return __builtin_mul_overflow(v, -4775807LL, res);
   }

```
----------------------------------------
Name: D74141
  %res = umul_overflow {i8, i1} %a, %b
  %mul = extractvalue {i8, i1} %res, 0
  %overflow = extractvalue {i8, i1} %res, 1
  %cmp = icmp ne %mul, 0
  %ret = or i1 %overflow, %cmp
  ret i1 %ret
=>
  %t0 = icmp ne i8 %a, 0
  %t1 = icmp ne i8 %b, 0
  %ret = and i1 %t0, %t1
  ret i1 %ret
  %res = umul_overflow {i8, i1} %a, %b
  %mul = extractvalue {i8, i1} %res, 0
  %cmp = icmp ne %mul, 0
  %overflow = extractvalue {i8, i1} %res, 1

Done: 1
Optimization is correct!

```

Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D74141
2020-02-18 09:11:55 +01:00
Florian Hahn b0866f61c1 [InstCombine] Precommit umul.with.overflow sign check test.
Precommit tests for D74141.
2020-02-18 08:46:50 +01:00
Nikita Popov 6cdc36afb2 [InstCombine] Add multiuse tests for cttz transform; NFC
These show incorrect duplication of instructions.
2020-02-16 15:52:09 +01:00
Yuanfang Chen 4ad7685258 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
2020-02-13 10:16:06 -08:00
Yuanfang Chen 17122ec10a Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""""
This reverts commit bb51d24330.
2020-02-13 10:08:05 -08:00
Yuanfang Chen bb51d24330 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
2020-02-13 10:02:53 -08:00
Yuanfang Chen 80a34ae311 Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.

There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
2020-02-11 20:41:53 -08:00
Yuanfang Chen 8cedf0e299 Reland "[Support] make report_fatal_error `abort` instead of `exit`"
Summary:
Reland D67847 after D73742 is committed. Replace `sys::Process::Exit(1)`
with `abort` in `report_fatal_error`.

After this patch, for tools turning on `CrashRecoveryContext`,
crash handler installed by `CrashRecoveryContext` is called unless
they installed a non-returning handler using `llvm::install_fatal_error_handler`
like `cc1_main` currently does.

Reviewers: rnk, MaskRay, aganea, hans, espindola, jhenderson

Subscribers: jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, rupprecht, jocewei, jsji, Jim, dmgreen, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74456
2020-02-11 18:20:40 -08:00
Sanjay Patel 62ce7e650a [InstCombine] fix use check when canonicalizing abs/nabs
We were checking for extra uses of the negated operand even
if we were not going to create it as part of this canonicalization.

This was showing up as a regression when we limit EarlyCSE as
proposed in D74285.
2020-02-10 14:57:37 -05:00
Sanjay Patel 93073e52b1 [InstCombine] add tests for abs with extra use of operand; NFC 2020-02-10 14:57:37 -05:00
David Stenberg 982944525c Revert "[InstCombine][DebugInfo] Fold constants wrapped in metadata"
This reverts commit b54a8ec1bc.

The commit triggered debug invariance (different output with/without
-g). The patch seems to have exposed a pre-existing invariance problem
in GlobalOpt, which I'll write a bug report for.
2020-02-10 17:58:33 +01:00
George Burgess IV f8c9ceb1ce [SimplifyLibCalls] Add __strlen_chk.
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.

Differential Revision: https://reviews.llvm.org/D74079
2020-02-08 11:51:00 -08:00
Nikita Popov a148b9e990 [InstCombine] Fix infinite min/max canonicalization loop (PR44541)
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.

Differential Revision: https://reviews.llvm.org/D73849
2020-02-08 20:42:17 +01:00
Nikita Popov d4627b90a0 [InstCombine] Avoid modifying instructions in-place
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.

This tends to be more readable and less prone to worklist management
bugs.

Test changes are only superficial (instruction naming and order).
2020-02-08 17:05:56 +01:00
Nikita Popov 23db9724d0 [InstCombine] Fix infinite loop in min/max load/store bitcast combine (PR44835)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).

Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.

As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.

Differential Revision: https://reviews.llvm.org/D74278
2020-02-08 16:55:22 +01:00
Florian Hahn 14ef87bda6 [ValueTracking] usub(a, b) cannot overflow if a >= b.
If we know that a >= b (unsigned), usub.with.overflow(a, b) cannot
overflow. Similarly, if b > a, the same expression overflows.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: nikic, Gerolf

Differential Revision: https://reviews.llvm.org/D74066
2020-02-07 10:41:18 +00:00
Florian Hahn 89ca4b9ef2 [InstCombine] Precommit usub.with.overflow test for D74066. 2020-02-07 10:30:28 +00:00
Sanjay Patel 2a191cf850 [InstCombine] add more splat tests with undef elements; NFC 2020-02-04 09:13:08 -05:00
Sanjay Patel 5d04e008f7 [InstCombine] add splat tests with undef elements; NFC 2020-02-04 07:59:12 -05:00
Sanjay Patel 0cf0be993c [InstCombine] fix operands of shouldChangeType() for casted phi transform
This is a bug noted in the recent D72733 and seen
in the similar transform just above the changed source code.

I added tests with illegal types and zexts to show the bug -
we could transform legal phi ops to illegal, etc. I did not add
tests with trunc because we won't see any diffs on those patterns.
That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to
do those transforms independently of datalayout. It can also create
more casts than are present in existing code.

There are some existing regression tests that do not include a
datalayout that would be altered by this fix. I assumed that the
lack of a datalayout in those regression files is an oversight, so
I added the minimal layout (make i32 legal) necessary to preserve
behavior on those tests.

Differential Revision: https://reviews.llvm.org/D73907
2020-02-04 07:45:48 -05:00
Sanjay Patel b2e884bee7 [InstCombine] add tests for casted phi; NFC 2020-02-03 11:54:47 -05:00
Sanjay Patel 5c2e6207b7 [InstCombine] regenerate complete test checks; NFC 2020-02-03 10:30:26 -05:00
Sanjay Patel e78fb556c5 [InstCombine] reassociate splatted vector ops
bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp

This patch depends on the splat analysis enhancement in D73549.
See the test with comment:
; Negative test - mismatched splat elements
...as the motivation for that first patch.

The motivating case for reassociating splatted ops is shown in PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174

In that example, a slight change in order-of-associative math results
in a big difference in IR and codegen. This patch gets all of the
unnecessary shuffles out of the way, but doesn't address the potential
scalarization (see D50992 or D73480 for that).

Differential Revision: https://reviews.llvm.org/D73703
2020-02-03 09:08:36 -05:00
Simon Pilgrim 105e5c940c [ValueTracking] Add DemandedElts support to computeKnownBits/ComputeNumSignBits (PR36319)
This patch adds initial support for a DemandedElts mask to the internal computeKnownBits/ComputeNumSignBits methods, matching the SelectionDAG and GlobalISel equivalents.

So far only a couple of instructions have been setup to handle the DemandedElts, the remainder still using the existing 'all elements' default. The plan is to extend support as we have test coverage.

Differential Revision: https://reviews.llvm.org/D73435
2020-02-01 12:45:46 +00:00
Nikita Popov ff17da3f75 [InstCombine] Push negation through multiply (PR44234)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44234 by adding
multiply support to freelyNegateValue(). Only one of the operands
needs to be negatible, so this still fits within the framework.

Differential Revision: https://reviews.llvm.org/D73410
2020-01-31 20:58:55 +01:00
David Stenberg b54a8ec1bc [InstCombine][DebugInfo] Fold constants wrapped in metadata
Summary:
When constant folding, constants that are wrapped in metadata were not
folded. This could lead to dbg.values being the only user of a constant
expression, due to the non-dbg uses having been rewritten, resulting in
the constant later on being removed by some other pass. This occurred
with the attached test case, in which the non-rewritten GEP in the
dbg.value intrinsic was later on removed by globalopt.

This patch makes the code look through metadata and fold such constants.

I guess that we in the future may want to allow dbg.values using GEPs and
other constant expressions to be emittable even if there are no non-dbg
uses, but for example SelectionDAG does not support that.

Reviewers: jmorse, aprantl, vsk, davide

Reviewed By: aprantl, vsk, davide

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73630
2020-01-30 15:50:16 +01:00
Piotr Sobczak dd7148822b [InstCombine][AMDGPU] Trim components of s_buffer_load
Summary:
Add trimming of unused components of s_buffer_load.

For s_buffer_load and unformatted buffer_load also trim unused
components at the beginning of vector and update offset accordingly.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71785
2020-01-30 10:48:25 +01:00
Nikita Popov 8058196677 [InstCombine] Process newly inserted instructions in the correct order
InstCombine operates on the basic premise that the operands of the
currently processed instruction have already been simplified. It
achieves this by pushing instructions to the worklist in reverse
program order, so that instructions are popped off in program order.
The worklist management in the main combining loop also makes sure
to uphold this invariant.

However, the same is not true for all the code that is performing
manual worklist management. The largest problem (addressed in this
patch) are instructions inserted by InstCombine's IRBuilder. These
will be pushed onto the worklist in order of insertion (generally
matching program order), which means that a) the users of the
original instruction will be visited first, as they are pushed later
in the main loop and b) the newly inserted instructions will be
visited in reverse program order.

This causes a number of problems: First, folds operate on instructions
that have not had their operands simplified, which may result in
optimizations being missed (ran into this in
https://reviews.llvm.org/D72048#1800424, which was the original
motivation for this patch). Additionally, this increases the amount
of folds InstCombine has to perform, both within one iteration, and
by increasing the number of total iterations.

This patch addresses the issue by adding a Worklist.AddDeferred()
method, which is used for instructions inserted by IRBuilder. These
will only be added to the real worklist after the combine finished,
and in reverse order, so they will end up processed in program order.
I should note that the same should also be done to nearly all other
uses of Worklist.Add(), but I'm starting with just this occurrence,
which has by far the largest test fallout.

Most of the test changes are due to
https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where
we don't canonicalize something. These are neutral. One regression
has been addressed in D73575 and D73647. The remaining regression
in an shl+sdiv fold can't really be fixed without dropping another
transform, but does not seem particularly problematic in the first
place.

Differential Revision: https://reviews.llvm.org/D73411
2020-01-30 09:40:10 +01:00
Sanjay Patel 89195638bf [InstCombine] add splat binop tests; NFC 2020-01-29 15:38:03 -05:00
Nikita Popov e086e23024 [InstCombine] Support non-splat vectors in icmp eq + add/sub fold
For the

    icmp eq (add X, C1), C2 => icmp eq X, C2-C1
    icmp eq (sub C1, X), C2 => icmp eq X, C1-C2

folds, this allows C1 to be non-splat and contain undefs.
C2 is still splat, due to the structure of the code.

This is to address the remaining part of the regression in D73411,
where demanded element analysis replaces some elements with undef.

Differential Revision: https://reviews.llvm.org/D73647
2020-01-29 20:56:58 +01:00
Nikita Popov 5171587a5f [InstCombine] Add undef/non-splat tests for add/sub + icmp eq; NFC 2020-01-29 20:56:58 +01:00
Nikita Popov 6a74641e72 [InstCombine] Regenerate test checks; NFC 2020-01-29 18:22:07 +01:00
Sanjay Patel 87f6314f8c [InstCombine] canonicalize splat shuffle after cmp
cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M

As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.

This patch handles the special (but presumably most common case) of
splat shuffles. If both operands are splats, then we can do the
comparison on the non-splat inputs followed by splat of the compare.
That should take care of the regression noted in D73411.

There's another potential fold requested in PR37463 to scalarize the
compare, but that's another patch (and it's not clear if we can do
that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463

Differential Revision: https://reviews.llvm.org/D73575
2020-01-29 08:34:29 -05:00
Sanjay Patel 276a6b8889 [InstCombine] add tests for cmp with splat operand and splat constant; NFC
See PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
2020-01-28 13:42:20 -05:00
Sanjay Patel 747242af8d [InstCombine] allow more narrowing of casted select
D47163 created a rule that we should not change the casted
type of a select when we have matching types in its compare condition.
That was intended to help vector codegen, but it also could create
situations where we miss subsequent folds as shown in PR44545:
https://bugs.llvm.org/show_bug.cgi?id=44545

By using shouldChangeType(), we can continue to get the vector folds
(because we always return false for vector types). But we also solve
the motivating bug because it's ok to narrow the scalar select in that
example.

Our canonicalization rules around select are a mess, but AFAICT, this
will not induce any infinite looping from the reverse transform (but
we'll need to watch for that possibility if committed).

Side note: there's a similar use of shouldChangeType() for phi ops
just below this diff, and the source and destination types appear to
be reversed.

Differential Revision: https://reviews.llvm.org/D72733
2020-01-27 16:35:50 -05:00
Sanjay Patel 242fed9d7f [InstCombine] convert fsub nsz with fneg operand to -(X + Y)
This was noted in D72521 - we need to match fneg specifically to
consistently handle that pattern along with (-0.0 - X).
2020-01-27 14:49:15 -05:00
Nikita Popov bcfa0f592f [InstCombine] Move negation handling into freelyNegateValue()
Followup to D72978. This moves existing negation handling in
InstCombine into freelyNegateValue(), which make it composable.
In particular, root negations of div/zext/sext/ashr/lshr/sub can
now always be performed through a shl/trunc as well.

Differential Revision: https://reviews.llvm.org/D73288
2020-01-27 20:46:23 +01:00
Nikita Popov 0957748cb7 [InstCombine] Add more negation tests; NFC
Additional test cases for pushing negations through various
instructions.
2020-01-27 20:46:23 +01:00
Simon Pilgrim f99ef5455a [InstCombine] Add extra shift(c1,add(c2,y)) tests for PR15141 2020-01-26 19:04:12 +00:00
Guillaume Chatelet cc034a5883 [IR] masked gather/scatter alignment should be set
Summary: masked_load and masked_store instructions require the alignment to be specified and a power of two. It seems to me that this requirement applies to masked_gather and masked_scatter as well.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73179
2020-01-26 18:51:36 +01:00
Nikita Popov 0b83c5a78f [InstCombine] Combine neg of shl of sub (PR44529)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44529. We already have
a combine to sink a negation through a left-shift, but it currently
only works if the shift operand is negatable without creating any
instructions. This patch introduces freelyNegateValue() as a more
powerful extension of dyn_castNegVal(), which allows negating a
value as long as this doesn't end up increasing instruction count.
Specifically, this patch adds support for negating A-B to B-A.

This mechanism could in the future be extended to handle general
negation chains that a) start at a proper 0-X negation and b) only
require one operand to be freely negatable. This would end up as a
weaker form of D68408 aimed at the most obviously profitable subset
that eliminates a negation entirely.

Differential Revision: https://reviews.llvm.org/D72978
2020-01-22 23:03:58 +01:00
Nikita Popov 80c34f94ac [InstCombine] Add test for PR44529; NFC 2020-01-22 23:03:58 +01:00
Sanjay Patel 0ade2abdb0 [InstCombine] fneg(X + C) --> -C - X
This is 1 of the potential folds uncovered by extending D72521.

We don't seem to do this in the backend either (unless I'm not
seeing some target-specific transform).

icc and gcc (appears to be target-specific) do this transform.

Differential Revision: https://reviews.llvm.org/D73057
2020-01-22 09:48:43 -05:00
Sanjay Patel c0f53ed806 [InstCombine] add tests for fneg+fadd; NFC 2020-01-22 08:59:28 -05:00
Sanjay Patel 7bee94410c [InstCombine] form copysign from select of FP constants (PR44153)
This should be the last step needed to solve the problem in the
description of PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

If we're casting an FP value to int, testing its signbit, and then
choosing between a value and its negated value, that's a
complicated way of saying "copysign":

(bitcast X) <  0 ? -TC :  TC --> copysign(TC,  X)

Differential Revision: https://reviews.llvm.org/D72643
2020-01-20 10:51:14 -05:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Nikita Popov 522c030aa9 [InstCombine] Fix worklist management in DSE (PR44552)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make
sure that the store is reprocessed, because performing DSE may
expose more DSE opportunities.

There is a slight caveat here though: We need to make sure that we
add back the store the worklist first, because that means it will
be processed after the operands of the removed store have been
processed. This is a general bug in InstCombine worklist management
that I hope to address at some point, but for now it means we need
to do this manually rather than just returning the instruction as
changed.

Differential Revision: https://reviews.llvm.org/D72807
2020-01-17 18:10:56 +01:00
Nikita Popov 77befe54f7 [InstCombine] Fix worklist management in return combine
There are two related bugs here: First, we don't add the operand
we're replacing to the worklist, which means it may not get DCEd
(see test change). Second, usually this would just get picked up
in the next iteration, but we also do not report the instruction
as changed. This means that we do not get that extra instcombine
iteration, and more importantly, may break the pass pipeline, as
the function is not marked as changed.

Differential Revision: https://reviews.llvm.org/D72864
2020-01-17 17:59:23 +01:00
Nikita Popov 10d0e2882b [InstCombine] Split assume test in expensive and not; NFC
The IR difference in @icmp1 serves as a test for D72864.
2020-01-17 17:57:59 +01:00
Nikita Popov 2ca092f320 [InstCombine] Support disabling expensive combines in opt
Currently, there is no way to disable ExpensiveCombines when doing
a standalone opt -instcombine run, as that's the default, and the
opt option can currently only be used to force enable, not to force
disable. The only way to disable expensive combines is via -O1 or -O2,
but that of course also runs the rest of the kitchen sink...

This patch allows using opt -instcombine -expensive-combines=0 to
run InstCombine without ExpensiveCombines.

Differential Revision: https://reviews.llvm.org/D72861
2020-01-17 17:56:20 +01:00
Nikita Popov 2d0d4235a2 [InstCombine] Add test for -expensive-combines option; NFC
This shows that -expensive-combines=0 is ignored.
2020-01-17 17:56:20 +01:00
Matt Arsenault 3ef8cdf666 AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine
There's more potential value to discarding the source value earlier,
since we always know the value of the fi/bc bits.
2020-01-16 17:27:53 -05:00
Florian Hahn 0b21d55262 [IR] Mark memset.* intrinsics as IntrWriteMem.
llvm.memset intrinsics do only write memory, but are missing
IntrWriteMem, so they doesNotReadMemory() returns false for them.

The test change is due to the test checking the fn attribute ids at the
call sites, which got bumped up due to a new combination with writeonly
appearing in the test file.

Reviewers: jdoerfert, reames, efriedma, nlopes, lebedev.ri

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72789
2020-01-16 10:35:46 +00:00
Yuanfang Chen 6e24c6037f Revert "[Support] make report_fatal_error `abort` instead of `exit`"
This reverts commit 647c3f4e47.

Got bots failure from sanitizer-windows and maybe others.
2020-01-15 17:52:25 -08:00
Yuanfang Chen 647c3f4e47 [Support] make report_fatal_error `abort` instead of `exit`
Summary:
This patch could be treated as a rebase of D33960. It also fixes PR35547.
A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems
the consensus is that the test is passing by chance and I'm not
sure how important it is for us. So it is removed like in D33960 for now.
The rest of the test fixes are just adding `--crash` flag to `not` tool.

** The reason it fixes PR35547 is

`exit` does cleanup including calling class destructor whereas `abort`
does not do any cleanup. In multithreading environment such as ThinLTO or JIT,
threads may share states which mostly are ManagedStatic<>. If faulting thread
tearing down a class when another thread is using it, there are chances of
memory corruption. This is bad 1. It will stop error reporting like pretty
stack printer; 2. The memory corruption is distracting and nondeterministic in
terms of error message, and corruption type (depending one the timing, it
could be double free, heap free after use, etc.).

Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola

Reviewed By: rnk, MaskRay

Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D67847
2020-01-15 17:05:13 -08:00
Sanjay Patel 3180af4362 [InstCombine] reassociate fsub+fsub into fsub+fadd
As discussed in the motivating PR44509:
https://bugs.llvm.org/show_bug.cgi?id=44509

...we can end up with worse code using fast-math than without.
This is because the reassociate pass greedily transforms fsub
into fneg/fadd and apparently (based on the regression tests
seen here) expects instcombine to clean that up if it wasn't
profitable. But we were missing this fold:

(X - Y) - Z --> X - (Y + Z)

There's another, more specific case that I think we should
handle as shown in the "fake" fneg test (but missed with a real
fneg), but that's another patch. That may be tricky to get
right without conflicting with existing transforms for fneg.

Differential Revision: https://reviews.llvm.org/D72521
2020-01-15 11:14:13 -05:00
Nikita Popov 04e586151e [InstCombine] Fix worklist management when removing guard intrinsic
When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.

For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).

Differential Revision: https://reviews.llvm.org/D72558
2020-01-14 21:47:48 +01:00
Nikita Popov 65c0805be5 [InstCombine] Fix infinite loop due to bitcast <-> phi transforms
Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up
fighting against each other, because optimizeBitCastFromPhi()
assumes that bitcasts of loads will get folded. This doesn't
happen here, because a dangling phi node prevents the one-use
fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by explicitly performing the load
combine as part of the bitcast of phi transform. Other attempts
to force the load to be combined first were ultimately too
unreliable.

Differential Revision: https://reviews.llvm.org/D71164
2020-01-14 20:45:13 +01:00
Nikita Popov 652cd7c100 [InstCombine] Fix user iterator invalidation in bitcast of phi transform
This fixes the issue encountered in D71164. Instead of using a
range-based for, manually iterate over the users and advance the
iterator beforehand, so we do not skip any users due to iterator
invalidation.

Differential Revision: https://reviews.llvm.org/D72657
2020-01-14 20:38:10 +01:00
Nikita Popov fa63234093 [InstCombine] Add test for iterator invalidation bug; NFC 2020-01-14 20:38:10 +01:00
Sanjay Patel 57cb468514 [InstCombine] add test for possible cast-of-select transform; NFC 2020-01-14 14:23:14 -05:00
Juneyoung Lee 3e32b7e127 [InstCombine] Let combineLoadToNewType preserve ABI alignment of the load (PR44543)
Summary:
If aligment on `LoadInst` isn't specified, load is assumed to be ABI-aligned.
And said aligment may be different for different types.
So if we change load type, but don't pay extra attention to the aligment
(i.e. keep it unspecified), we may either overpromise (if the default aligment
of the new type is higher), or underpromise (if the default aligment
of the new type is smaller).

Thus, if no alignment is specified, we need to manually preserve the implied ABI alignment.

This addresses https://bugs.llvm.org/show_bug.cgi?id=44543 by making combineLoadToNewType preserve ABI alignment of the load.

Reviewers: spatel, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72710
2020-01-15 03:20:53 +09:00
Juneyoung Lee 0877843dda [test] Make data layout of load-bitcast64.ll explicit, use update_test_checks.py 2020-01-15 02:49:44 +09:00
Sanjay Patel cfe2fab708 [InstSimplify] add tests for vector select; NFC 2020-01-14 08:41:06 -05:00
Sanjay Patel 80a094e134 [InstCombine] add FMF to tests for more coverage; NFC 2020-01-13 16:29:20 -05:00
Sanjay Patel 69f4cea413 [InstCombine] add tests for select --> copysign; NFC
This is testing for another (possibly final) transform suggested in:
https://bugs.llvm.org/show_bug.cgi?id=44153
2020-01-13 15:39:24 -05:00
Sanjay Patel 26d2ace9e2 [InstSimplify] move tests for select from InstCombine; NFC
InstCombine has transforms that would enable these simplifications
in an indirect way, but those transforms are unsafe and likely to
be removed.
2020-01-13 09:13:21 -05:00
Nikita Popov 0e322c8a1f [InstCombine] Preserve nuw on sub of geps (PR44419)
Fix https://bugs.llvm.org/show_bug.cgi?id=44419 by preserving the
nuw on sub of geps. We only do this if the offset has a multiplication
as the final operation, as we can't be sure the operations is nuw
in the other cases without more thorough analysis.

Differential Revision: https://reviews.llvm.org/D72048
2020-01-11 11:01:12 +01:00
Sanjay Patel 26cdaeb1f0 [InstCombine] add tests for fsub; NFC
Conflicting/missing canonicalizations are visible in PR44509:
https://bugs.llvm.org/show_bug.cgi?id=44509
2020-01-10 12:02:43 -05:00
@raghesh (Raghesh Aloor) 6c04ef472a [InstCombine] Z / (1.0 / Y) => (Y * Z)
This is a special case of Z / (X / Y) => (Y * Z) / X, with X = 1.0.
The m_OneUse check is avoided because even in the case of the
multiple uses for 1.0/Y, the number of instructions remain the same
and a division is replaced by a multiplication.

Differential Revision: https://reviews.llvm.org/D72319
2020-01-09 10:52:39 -05:00
Sanjay Patel 032a9393a7 [InstCombine] Use minimal FMF in testcase for Z / (1.0 / Y) => (Y * Z); NFC
Patch by: @raghesh (Raghesh Aloor)

Differential Revision: https://reviews.llvm.org/D72431
2020-01-09 08:21:38 -05:00
Sanjay Patel 5dfd52398f [InstCombine] Adding testcase for Z / (1.0 / Y) => (Y * Z); NFC
The added testcase shows the current transformation for the operation
Z / (1.0 / Y), which remains unchanged. This will be updated to align
with the transformed code (Y * Z) with D72319.

The existing transformation Z / (X / Y) => (Y * Z) / X is not handling
this case as there are multiple uses for (1.0 / Y) in this testcase.

Patch by: @raghesh (Raghesh Aloor)

Differential Revision: https://reviews.llvm.org/D72388
2020-01-08 10:33:44 -05:00
Kadir Cetinkaya b212eb7159
Revert "[InstCombine] fold zext of masked bit set/clear"
This reverts commit a041c4ec6f.

This looks like a non-trivial change and there has been no code
reviews (at least there were no phabricator revisions attached to the
commit description). It is also causing a regression in one of our
downstream integration tests, we haven't been able to come up with a
minimal reproducer yet.
2020-01-08 11:21:21 +01:00
Sanjay Patel f8962571f7 [InstCombine] try to pull 'not' of select into compare operands
not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) -->
     select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?)

If both sides of the select are cmps, we can remove an instruction.
The case where only side is a cmp is deferred to a possible
follow-on patch.

We have a more general 'isFreeToInvert' analysis, but I'm not seeing
a way to use that more widely without inducing infinite looping
(opposing transforms).
Here, we flip the compare predicates directly, so we should not have
any danger by creating extra intermediate 'not' ops.

Alive proofs:
https://rise4fun.com/Alive/jKa

Name: both select values are compares - invert predicates
  %tcmp = icmp sle i32 %x, %y
  %fcmp = icmp ugt i32 %z, %w
  %sel = select i1 %cond, i1 %tcmp, i1 %fcmp
  %not = xor i1 %sel, true
=>
  %tcmp_not = icmp sgt i32 %x, %y
  %fcmp_not = icmp ule i32 %z, %w
  %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not

Name: false val is compare - invert/not
  %fcmp = icmp ugt i32 %z, %w
  %sel = select i1 %cond, i1 %tcmp, i1 %fcmp
  %not = xor i1 %sel, true
=>
  %tcmp_not = xor i1 %tcmp, -1
  %fcmp_not = icmp ule i32 %z, %w
  %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not

Differential Revision: https://reviews.llvm.org/D72007
2020-01-07 10:44:23 -05:00
Roman Lebedev 772ede3d5d
[InstCombine] Sink sub into hands of select if one hand becomes zero. Part 2 (PR44426)
This decreases use count of %Op0, makes one hand of select to be 0,
and possibly exposes further folding potential.

Name: sub %Op0, (select %Cond, %Op0, %FalseVal) -> select %Cond, 0, (sub %Op0, %FalseVal)
  %Op0 = %TrueVal
  %o = select i1 %Cond, i8 %Op0, i8 %FalseVal
  %r = sub i8 %Op0, %o
=>
  %n = sub i8 %Op0, %FalseVal
  %r = select i1 %Cond, i8 0, i8 %n

Name: sub %Op0, (select %Cond, %TrueVal, %Op0) -> select %Cond, (sub %Op0, %TrueVal), 0
  %Op0 = %FalseVal
  %o = select i1 %Cond, i8 %TrueVal, i8 %Op0
  %r = sub i8 %Op0, %o
=>
  %n = sub i8 %Op0, %TrueVal
  %r = select i1 %Cond, i8 %n, i8 0

https://rise4fun.com/Alive/aHRt

https://bugs.llvm.org/show_bug.cgi?id=44426
2020-01-04 17:30:51 +03:00
Roman Lebedev d2b79c76be
[NFC][InstCombine] 'subtract from one hands of select' pattern tests (PR44426)
https://bugs.llvm.org/show_bug.cgi?id=44426
2020-01-04 17:30:51 +03:00
Roman Lebedev 4d8e47ca18
[InstCombine] Sink sub into hands of select if one hand becomes zero (PR44426)
This decreases use count of %Op1, makes one hand of select to be 0,
and possibly exposes further folding potential.

Name: sub (select %Cond, %Op1, %FalseVal), %Op1 -> select %Cond, 0, (sub %FalseVal, %Op1)
  %Op1 = %TrueVal
  %o = select i1 %Cond, i8 %Op1, i8 %FalseVal
  %r = sub i8 %o, %Op1
=>
  %n = sub i8 %FalseVal, %Op1
  %r = select i1 %Cond, i8 0, i8 %n

Name: sub (select %Cond, %TrueVal, %Op1), %Op1 -> select %Cond, (sub %TrueVal, %Op1), 0
  %Op1 = %FalseVal
  %o = select i1 %Cond, i8 %TrueVal, i8 %Op1
  %r = sub i8 %o, %Op1
=>
  %n = sub i8 %TrueVal, %Op1
  %r = select i1 %Cond, i8 %n, i8 0

https://rise4fun.com/Alive/avL

https://bugs.llvm.org/show_bug.cgi?id=44426
2020-01-04 17:30:51 +03:00
Roman Lebedev 83aa0b6734
[NFC][InstCombine] 'subtract of one hands of select' pattern tests (PR44426)
https://bugs.llvm.org/show_bug.cgi?id=44426
2020-01-04 17:30:51 +03:00
Roman Lebedev 7973aa05f6
[NFC][InstCombine] '(Op1 & С) - Op1' -> '-(Op1 & ~C)' fold (PR44427)
This decreases use count of Op1, potentially allows
us to further hoist said 'neg' later on,
and results in marginally better X86 codegen.

Name: (Op1 & С) - Op1 -> -(Op1 & ~C)
  %o = and i64 %Op1, C1
  %r = sub i64 %o, %Op1
=>
  %n = and i64 %Op1, ~C1
  %r = sub i64 0, %n

https://rise4fun.com/Alive/rwgA

https://godbolt.org/z/R_RMfM

https://bugs.llvm.org/show_bug.cgi?id=44427
2020-01-03 21:25:48 +03:00
Roman Lebedev 6f922dbbea
[NFC][InstCombine] '(Op1 & С) - Op1' pattern tests (PR44427) 2020-01-03 21:25:48 +03:00
Roman Lebedev 9b750cc6ba
[NFC][InstCombine] Autogenerate and2.ll checklines 2020-01-03 21:25:48 +03:00
Roman Lebedev cc0216bedb
[NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448)
Name: (X & (- Y)) - X  ->  - (X & (Y - 1))  (PR44448)
  %negy = sub i8 0, %y
  %unbiasedx = and i8 %negy, %x
  %r = sub i8 %unbiasedx, %x
=>
  %ymask = add i8 %y, -1
  %xmasked = and i8 %ymask, %x
  %r = sub i8 0, %xmasked

https://rise4fun.com/Alive/OIpla

This decreases use count of %x, may allow us to
later hoist said negation even further,
and results in marginally nicer X86 codegen.

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 20:27:29 +03:00
Roman Lebedev b87a351182
[NFC][InstCombine] '(X & (- Y)) - X' pattern tests (PR44448)
As discussed in https://bugs.llvm.org/show_bug.cgi?id=44448,
we can hoist negation out of the pattern.
2020-01-03 20:27:17 +03:00
Sanjay Patel 1640582743 [InstCombine] replace undef elements in vector constant when doing icmp folds (PR44383)
As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.

We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.

Differential Revision: https://reviews.llvm.org/D72101
2020-01-03 09:16:57 -05:00
Sanjay Patel 4bb4f5b1d9 [InstCombine] add tests for vector icmp with undef constant elements; NFC 2020-01-02 15:39:45 -05:00
Sanjay Patel 88fc5fdef6 [InstCombine] remove uses before deleting instructions (PR43723)
This is a less ambitious alternative to previous attempts to fix
this bug with:
rG56b2aee1875a
rGef02831f0a4e
rG56b2aee1875a
...because those all failed bot testing with use-after-free or
other problems.

The original crashing/assert problem is still showing up on
various fuzzers, so I've added a new minimal test based on
another one of those failures.

Instead of trying to manage and coordinate the logic in
isAllocSiteRemovable() with the deletion loops, just loosen
the existing code that handles casts and GEP by replacing
with undef to allow other opcodes. That means that no
instructions with uses should assert on deletion, and there
are hopefully no non-obvious sanitizer bugs induced.
2020-01-02 09:47:36 -05:00
Nikita Popov 8dd9a13619 [InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423)
This addresses https://bugs.llvm.org/show_bug.cgi?id=44423.
If one of the GEPs is inbounds and the other is zero-index,
we can also preserve inbounds.

Differential Revision: https://reviews.llvm.org/D72060
2020-01-01 23:04:28 +01:00
Nikita Popov 6ba5f8c4ac [InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425)
This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to
drop inbounds if one of the GEPs is not inbounds. This was already
done when creating a new GEP, but not when modifying in place.

Differential Revision: https://reviews.llvm.org/D72059
2020-01-01 22:10:55 +01:00
Nikita Popov 11552433eb [InstCombine] Add tests for PR44423 and PR44425; NFC 2020-01-01 20:27:57 +01:00
Nikita Popov 7f48171d2f [InstCombine] Regenerate test checks; NFC 2020-01-01 20:27:57 +01:00
Nikita Popov 8756cd0963 [InstCombine] Add tests for sub nuw of geps; NFC
Tests for PR44419.
2020-01-01 20:27:57 +01:00
Craig Topper 374e0299cf [X86][InstCombine] Add constant folding and simplification support for pdep and pext
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.

There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.

Fixes PR44389

Differential Revision: https://reviews.llvm.org/D71952
2019-12-31 15:06:47 -08:00
Sanjay Patel a041c4ec6f [InstCombine] fold zext of masked bit set/clear
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8

We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.

Proofs:
https://rise4fun.com/Alive/uVB

  Name: masked bit set
  %sh1 = shl i32 1, %y
  %and = and i32 %sh1, %x
  %cmp = icmp ne i32 %and, 0
  %r = zext i1 %cmp to i32
  =>
  %s = lshr i32 %x, %y
  %r = and i32 %s, 1

  Name: masked bit clear
  %sh1 = shl i32 1, %y
  %and = and i32 %sh1, %x
  %cmp = icmp eq i32 %and, 0
  %r = zext i1 %cmp to i32
  =>
  %xn = xor i32 %x, -1
  %s = lshr i32 %xn, %y
  %r = and i32 %s, 1
2019-12-31 12:35:10 -05:00
Sanjay Patel eb5c026ef0 [InstCombine] add/adjust tests for masked bit; NFC 2019-12-31 12:35:10 -05:00
Nikita Popov 7adb5c2aca Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms"
This reverts commit 27a0795943.

Seems to break test-suite.
2019-12-31 17:42:57 +01:00
Sanjay Patel 108645cd0a [InstCombine] add tests for masked bit set/clear; NFC 2019-12-31 10:20:45 -05:00
Nikita Popov 27a0795943 [InstCombine] Fix infinite loop due to bitcast <-> phi transforms
Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up
fighting against each other, because optimizeBitCastFromPhi()
assumes that bitcasts of loads will get folded. This doesn't happen
here, because a dangling phi node prevents the one-use fold in
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by adding manually removing the old phis.

Differential Revision: https://reviews.llvm.org/D71164
2019-12-31 16:17:14 +01:00
Connor Abbott fb114694e9 [InstCombine] Don't rewrite phi-of-bitcast when the phi has other users
Judging by the existing comments, this was the intention, but the
transform never actually checked if the existing phi's would be removed.
See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where
this causes much worse code generation on AMDGPU.

Differential Revision: https://reviews.llvm.org/D71209
2019-12-31 12:15:02 +01:00
Connor Abbott d04e64a25a [InstCombine] Add tests for PR44242
Differential Revision: https://reviews.llvm.org/D71260
2019-12-31 12:15:02 +01:00
Sanjay Patel ee3eebba0d [InstCombine] remove stale comment on test; NFC 2019-12-30 12:39:10 -05:00
Sanjay Patel 987eb8e26c [InstCombine] propagate sign argument through nested copysigns
This is another optimization suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-30 11:06:02 -05:00
Qiu Chaofan 65661908cb [NFC] Add test for load-insert-store pattern
This patch adds necessary test cases for load-update-store pattern
which only updates single element of vector.

Differential Revision: https://reviews.llvm.org/D71886
2019-12-30 16:14:37 +08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Sanjay Patel 25cf5d97ac [InstCombine] add test for copysign; NFC 2019-12-23 17:54:31 -05:00
Sanjay Patel 9a77c20954 [InstCombine] add tests for not(select ...); NFC 2019-12-23 17:14:32 -05:00
Sanjay Patel 9cdcd81d3f [InstCombine] enhance fold for copysign with known sign arg
This is another optimization suggested in PRPR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-22 10:07:01 -05:00
Sanjay Patel 79c7fa31f3 [InstCombine] check alloc size in bitcast of geps fold (PR44321)
We missed a constraint in D44833
when folding a bitcast into a GEP with vector/array types.
If the alloc sizes specified by the datalayout don't match,
this could miscompile as shown in:
https://bugs.llvm.org/show_bug.cgi?id=44321

Differential Revision: https://reviews.llvm.org/D71771
2019-12-21 10:31:21 -05:00
Sanjay Patel 19f9f374d9 [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transforms
As discussed in PR44330:
https://bugs.llvm.org/show_bug.cgi?id=44330
...the transform from pow(X, -0.5) libcall/intrinsic to
reciprocal square root can result in small deviations from
the expected result due to differences in the pow()
implementation and/or the extra rounding step from the division.

This patch proposes to allow that difference with either the
'approximate functions' or 'reassociate' FMF:
http://llvm.org/docs/LangRef.html#fast-math-flags

In practice, this likely means that the code is compiled with
all of 'fast' (-ffast-math), but I have preserved the existing
specializations for -0.0/-INF that enable generating safe code
if those special values are allowed simultaneously with
allowing approximation/reassociation.

The question about whether a similar restriction is needed for
the non-reciprocal case -- pow(X, 0.5) -- is deferred. That
transform is allowed without FMF currently, and this patch does
not change that behavior.

Differential Revision: https://reviews.llvm.org/D71706
2019-12-21 10:00:53 -05:00
Jakub Kuderski c431c407eb [InstCombine] Improve infinite loop detection
Summary:
This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop.

Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details.

The two limits can be specified via command line options.

Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71673
2019-12-20 16:15:04 -05:00
Sanjay Patel 0b421d842d [InstCombine] add tests for cast+gep; NFC
PR44321:
https://bugs.llvm.org/show_bug.cgi?id=44321
2019-12-20 10:52:23 -05:00
Roman Lebedev 047186cc98
[ValueTracking] isKnownNonZero() should take non-null-ness assumptions into consideration (PR43267)
Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).

But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.

I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.

While there, we also deal with other predicates,
mainly if the comparison is with constant.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].

Differential Revision: https://reviews.llvm.org/D71660
2019-12-20 01:47:57 +03:00
Roman Lebedev 92083a295a
[ValueTracking] isValidAssumeForContext(): CxtI itself also must transfer execution to successor
This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.

We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.

As noted in D71660 review by @nikic.
2019-12-20 01:47:57 +03:00
Roman Lebedev ffcae008d7
[NFC][InstCombine] Add a test for assume-induced miscompile
@escape() may throw here, we don't know that assumption, which is located
afterwards in the same block, is executed, therefore %load arg of
call to @escape() can not be marked as non-null.

As noted in D71660 review by @nikic.
2019-12-20 01:47:56 +03:00
Sanjay Patel 5889e7823d [InstCombine] add/adjust tests for pow->sqrt; NFC
There's at least 1 bug here as discussed in PR44330.
2019-12-19 09:25:19 -05:00
David Green a59cc5e128 [InstCombine] Canonicalize select immediates
In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
  %1 = icmp slt i32 %shr, -128
  %2 = select i1 %1, i32 128, i32 %shr
  %.inv = icmp sgt i32 %shr, 127
  %spec.select.i = select i1 %.inv, i32 127, i32 %2
  %conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.

To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.

Differential Revision: https://reviews.llvm.org/D71516
2019-12-19 12:36:46 +00:00
David Green d38153325f [Instcombine] Add select canonicalization tests. NFC 2019-12-19 12:36:46 +00:00
Piotr Sobczak 40b5a0f7c8 Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load"
Revert D70315, as it breaks gfx8 for some reason.

This reverts commit 65f94b3380.
2019-12-18 22:04:44 +01:00
Jakub Kuderski 3d29c41ad5 [InstCombine] Insert instructions before adding them to worklist
Summary:
This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions.
It also adds a check in `Worklist::Add` that ensures that all added instructions have parents.

Simple test case that illustrates the difference when run with `--debug-only=instcombine`:

```
define i32 @test35(i32 %a, i32 %b) {
  %1 = or i32 %a, 1135
  %2 = or i32 %1, %b
  ret i32 %2
}
```

Before this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   <badref> = or i32 %2, 1135
...
```

With this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   %3 = or i32 %2, 1135
...
```

Reviewers: fhahn, davide, spatel, foad, grosser, nikic

Reviewed By: nikic

Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71093
2019-12-18 14:55:41 -05:00
Jakub Kuderski 406b6019cd [InstCombine] Allow to limit the max number of iterations
Summary:
This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions.

InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites.

Bounding the number of iterations can have 2 benefits:
* In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything.
* When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`).

This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in.

Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00

Reviewed By: spatel

Subscribers: craig.topper, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71145
2019-12-18 13:48:54 -05:00
stozer 89d19d60ad Reapply: [DebugInfo] Correctly handle salvaged casts and split fragments at ISel
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.

The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
2019-12-18 16:26:42 +00:00
Roman Lebedev c6a56c9a50
[NFC][InstCombine] Autogenerate assume.ll test 2019-12-18 17:16:19 +03:00
Sanjay Patel c7492fbd4e [InstCombine] add tests for copysign; NFC 2019-12-18 07:56:36 -05:00
stozer 1f3dd83cc1 Revert "[DebugInfo] Correctly handle salvaged casts and split fragments at ISel"
Reverted due to build failure on windows bots.

This reverts commit bb1b0bc4e5.
2019-12-18 11:46:10 +00:00
stozer bb1b0bc4e5 [DebugInfo] Correctly handle salvaged casts and split fragments at ISel
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.

There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
2019-12-18 11:09:18 +00:00
Piotr Sobczak 65f94b3380 [InstCombine][AMDGPU] Trim more components of *buffer_load
Summary:
Add trimming of unused components of s_buffer_load.

Extend trimming of *buffer_load to also include
unused components at the beginning of vectors and update offset.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70315
2019-12-17 17:50:07 +01:00
Craig Topper 02f644c59a [InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the size of a store.
We can change the type as long as we don't change the size.

Fixes PR44306

Differential Revision: https://reviews.llvm.org/D71532
2019-12-16 12:12:54 -08:00
Sanjay Patel 6080387f13 [InstSimplify] fold splat of inserted constant to vector constant
shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>

This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.

The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.

This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.

Differential Revision: https://reviews.llvm.org/D71488
2019-12-15 09:32:03 -05:00
Nicola Zaghen 97572775d2 Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

This fixes the buildbot failures.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-13 14:30:21 +00:00
Nicola Zaghen f798eb21ec Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same."
This reverts commit 5f6208778f.

This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll
const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
2019-12-12 10:29:54 +00:00
Nicola Zaghen 5f6208778f [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-12 10:07:01 +00:00
Nikita Popov 8db5143b1a [InstCombine] Optimize overflow check base on uadd.with.overflow result
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.

This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.

We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.

The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.

This does not yet handle the negated version of this pattern.

Differential Revision: https://reviews.llvm.org/D58644
2019-12-11 20:52:04 +01:00
Danila Kutenin 19e83a9b4c [ValueTracking] Pointer is known nonnull after load/store
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.

Differential Revision: https://reviews.llvm.org/D71177
2019-12-11 20:32:29 +01:00
Sanjay Patel 252d3b9805 [InstSimplify] add tests for insert constant + splat; NFC 2019-12-10 17:16:58 -05:00
Sanjay Patel 396d18aeb6 [InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded
This pattern is noted as a regression from:
D70246
...where we removed an over-aggressive shuffle simplification.

SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses,
so I'm proposing to pattern match the minimal sequence directly. This fold does not
conflict with any of our current shuffle undef/poison semantics.

Differential Revision: https://reviews.llvm.org/D71220
2019-12-10 10:10:05 -05:00
Johannes Doerfert a7d992c0f2 [ValueTracking] Allow context-sensitive nullness check for non-pointers
Summary:
Same as D60846 and D69571 but with a fix for the problem encountered
after them. Both times it was a missing context adjustment in the
handling of PHI nodes.

The reproducers created from the bugs that caused the old commits to be
reverted are included.

Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans

Subscribers: hiraditya, bollu, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71181
2019-12-09 15:15:52 -06:00
Sanjay Patel 92f94b762a [InstCombine] add tests for shuffle with insertelement operand; NFC 2019-12-09 14:27:03 -05:00
Bob Haarman 055779a9ac Revert "[InstCombine] keep assumption before sinking calls"
Summary:
This reverts commit c3b06d0c39.

Reason for revert: Caused miscompiles when inserting assume for undef.

Also adds a test to prevent similar breakage in future.

Fixes PR44154.

Reviewers: rnk, jdoerfert, efriedma, xbolva00

Reviewed By: rnk

Subscribers: thakis, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70933
2019-12-05 10:39:34 -08:00
Roman Lebedev 796fa662f1
[InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`)
Summary:
D68408 proposes to greatly improve our negation sinking abilities.
But in current canonicalization, we produce `sub A, zext(B)`,
which we will consider non-canonical and try to sink that negation,
undoing the existing canonicalization.
So unless we explicitly stop producing previous canonicalization,
we will have two conflicting folds, and will end up endlessly looping.

This inverts canonicalization, and adds back the obvious fold
that we'd miss:
* `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)`
  https://rise4fun.com/Alive/xx4
* `sext(bool) + C -> bool ? C - 1 : C`
  https://rise4fun.com/Alive/fBl

It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()`
(oss-fuzz 4871) are no longer folded as much, though those aren't really worrying.

Reviewers: spatel, efriedma, t.p.northover, hfinkel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71064
2019-12-05 21:21:30 +03:00
Sanjay Patel 3c6b5d3674 [InstCombine] narrow select with FP casts
Select doesn't change values, so truncate of extended operand cancels out.
2019-12-05 11:12:44 -05:00
Sanjay Patel 403bb33a2e [InstCombine] add tests for fpext+select+fptrunc; NFC 2019-12-05 10:49:29 -05:00
Roman Lebedev 09311459e3
[InstCombine] Extend `0 - (X sdiv C) -> (X sdiv -C)` fold to non-splat vectors
Split off from https://reviews.llvm.org/D68408
2019-12-05 15:48:29 +03:00
Roman Lebedev b89ba5f939
[NFC][InstCombine] Autogenerate check lines in a few tests
These files are potentially affected by Negator (D68408) patch.
2019-12-05 01:14:03 +03:00
Roman Lebedev cd04e8349b
[NFC][InstCombine] Update sub-of-negatible.ll test 2019-12-04 15:49:36 +03:00
Craig Topper 5ebbabc1af [InstCombine] Revert aafde063aa and 6749dc3446 related to bitcast handling of x86_mmx
This reverts these two commits
[InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double.
[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement

We're seeing at least one internal test failure related to a
bitcast that was previously before an inline assembly block
containing emms being placed after it. This leads to the mmx
state ending up not empty after the emms. IR has no way to
make any specific guarantees about this. Reverting these patches
to get back to previous behavior which at least worked for this
test.
2019-12-03 14:02:22 -08:00
Sanjay Patel af4e59949c [InstCombine] fix undef propagation for vector urem transform (PR44186)
As described here:
https://bugs.llvm.org/show_bug.cgi?id=44186

The match() code safely allows undef values, but we can't safely
propagate a vector constant that contains an undef to the new
compare instruction.
2019-12-02 12:17:38 -05:00
Simon Tatham 01aefae4a1 [ARM,MVE] Add an InstCombine rule permitting VPNOT.
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.

This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70484
2019-12-02 16:20:30 +00:00
Roman Lebedev 0f22e783a0
[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100)
rL341831 moved one-use check higher up, restricting a few folds
that produced a single instruction from two instructions to the case
where the inner instruction would go away.

Original commit message:
> InstCombine: move hasOneUse check to the top of foldICmpAddConstant
>
> There were two combines not covered by the check before now,
> neither of which actually differed from normal in the benefit analysis.
>
> The most recent seems to be because it was just added at the top of the
> function (naturally). The older is from way back in 2008 (r46687)
> when we just didn't put those checks in so routinely, and has been
> diligently maintained since.

From the commit message alone, there doesn't seem to be a
deeper motivation, deeper problem that was trying to solve,
other than 'fixing the wrong one-use check'.

As i have briefly discusses in IRC with Tim, the original motivation
can no longer be recovered, too much time has passed.

However i believe that the original fold was doing the right thing,
we should be performing such a transformation even if the inner `add`
will not go away - that will still unchain the comparison from `add`,
it will no longer need to wait for `add` to compute.

Doing so doesn't seem to break any particular idioms,
as least as far as i can see.

References https://bugs.llvm.org/show_bug.cgi?id=44100
2019-12-02 18:06:15 +03:00
Sanjay Patel af0babc90a [InstCombine] fold copysign with constant sign argument to (fneg+)fabs
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic

This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.

Differential Revision: https://reviews.llvm.org/D70792
2019-12-02 09:23:12 -05:00
Bjorn Pettersson a9d6b0e544 [InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast)))
Summary:
optimizeVectorResize is rewriting patterns like:
  %1 = bitcast vector %src to integer
  %2 = trunc/zext %1
  %dst = bitcast %2 to vector

Since bitcasting between integer an vector types gives
different integer values depending on endianness, we need
to take endianness into account. As it happens the old
implementation only produced the correct result for little
endian targets.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178

Reviewers: spatel, lattner, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70844
2019-12-02 11:05:25 +01:00
Craig Topper 67298d683c [X86][InstCombine] Move non-X86 specific instcombine test from test/CodeGen/X86/ to test/Transforms/InstCombine/ 2019-12-01 10:31:04 -08:00
Craig Topper 3dd93dc2a1 [X86][InstCombine] Move instcombine test from test/CodeGen/X86 to test/Transforms/InstCombine/ and replace grep with FileCheck 2019-12-01 10:31:04 -08:00
David Green 59b56e5c57 [InstCombine] Expand usub_sat patterns to handle constants
The constants come through as add %x, -C, not a sub as would be
expected. They need some extra matchers to canonicalise them towards
usub_sat.

Differential Revision: https://reviews.llvm.org/D69514
2019-11-30 16:58:01 +00:00
David Green 3a1bef5616 [InstCombine] Adjust usub_sat fold one use checks
This adjusts the one use checks in the the usub_sat fold code to not
increase instruction count, but otherwise do the fold. Reviewed as a
part of D69514.
2019-11-30 16:58:00 +00:00
David Green a46b959ebd [InstCombine] More usub_sat tests. NFC. 2019-11-30 16:58:00 +00:00
Bjorn Pettersson 363cbcc590 [InstCombine] Run the cast.ll test a twice, now also testing little endian. NFC
Some tests in test/Transforms/InstCombine/cast.ll depend on
endianness. Added a second run line to run the tests with both
big and little endian. In the past we only compiled for big
endian, and then it was hard to see if any big endian bugfixes
would impact the little endian result etc.
2019-11-29 13:24:13 +01:00
Sanjay Patel 5e6b728763 [InstCombine] add tests for copysign; NFC 2019-11-27 11:32:23 -05:00
Sanjay Patel 8d20dd0b06 [ConstFolding] move tests for copysign; NFC
InstCombine doesn't have any transforms for copysign currently.
2019-11-26 16:54:46 -05:00
Dávid Bolvanský bb7b8540f0 [InstCombine] Optimize some memccpy calls to memcpy/null
Summary:
return memccpy(d, "helloworld", 'r', 20)
=>
return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8

Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68089
2019-11-26 10:54:47 +01:00
Sanjay Patel f575f12c64 [InstCombine] remove identity shuffle simplification for mask with undefs
And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that
pattern. That preserves some of the old optimizations in IR.

Given a shuffle that includes undef elements in an otherwise identity mask like:

define <4 x float> @shuffle(<4 x float> %arg) {
  %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
  ret <4 x float> %shuf
}

We were simplifying that to the input operand.

But as discussed in PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958
...that means that per-vector-element poison that would be stopped by the shuffle can now
leak to the result.

Also note that we still have (and there are tests for) the same transform with no undef
elements in the mask (a fully-defined identity mask). I don't think there's any
controversy about that case - it's a valid transform under any interpretation of
shufflevector/undef/poison.

Looking at a few of the diffs into codegen, I don't see any difference in final asm. So
depending on your perspective, that's good (no real loss of optimization power) or bad
(poison exists in the DAG, so we only partially fixed the bug).

Differential Revision: https://reviews.llvm.org/D70246
2019-11-24 10:06:26 -05:00
Davide Italiano c32f0ff92f [InstCombine] Fix call guard difference with dbg
Patch by Chris Ye!

Differential Revision: https://reviews.llvm.org/D68004
2019-11-22 13:35:53 -08:00
Philip Reames 1f4395942f Precommit tests for forthcoming widenable.condition transforms 2019-11-20 17:02:04 -08:00
Simon Tatham f4f77aa53e [ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.

  mve_pred16_t pred = vcmpeqq(v1, v2);
  v_out = vaddq_m(v_inactive, v3, v4, pred);

then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.

To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:

  vpt.u16 eq, q1, q2
  vaddt.u16 q0, q3, q4

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70313
2019-11-18 10:39:30 +00:00
Sanjay Patel 5d67d81f48 [InstCombine] prevent crashing/assert on shift constant expression (PR44028)
The binary operator cast implies an instruction, but the matcher for shift does not:
https://bugs.llvm.org/show_bug.cgi?id=44028
2019-11-17 17:31:09 -05:00
Florian Hahn 8eeabbaf5d [ConstantFold] Handle identity folds at top of ConstantFoldBinaryInst
Currently we miss folds with undef and identity values for binary ops
that do not fold to undef in general.

We can generalize the identity simplifications and do them before
checking for undef in particular.

Alive checks:
 * OR - https://rise4fun.com/Alive/8OsK
 * AND - https://rise4fun.com/Alive/e3tE

This will also allow us to remove some now redundant cases throughout
the function, but I would like to do this as follow-up. That should make
tracking down potential issues easier.

Reviewers: spatel, RKSimon, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D70169
2019-11-17 21:30:14 +00:00
David Green 08390c52a2 [InstCombine] Canonicalize ssub.with.overflow with clamp to ssub.sat
Working on top of D69252, this adds canonicalisation patterns for ssub.with.overflow to ssub.sats.

Differential Revision: https://reviews.llvm.org/D69753
2019-11-17 10:45:11 +00:00
David Green 03fce6b12e [InstCombine] Canonicalize sadd.with.overflow with clamp to sadd.sat
This adds to D69245, adding extra signed patterns for folding from a
sadd_with_overflow to a sadd_sat. These are more complex than the
unsigned patterns, as the overflow can occur in either direction.

For the add case, the positive overflow can only occur if both of the
values are positive (same for both the values being negative). So there
is an extra select on whether to use the positive or negative overflow
limit.

Differential Revision: https://reviews.llvm.org/D69252
2019-11-17 10:42:39 +00:00