Commit Graph

874 Commits

Author SHA1 Message Date
Craig Topper e7211bb567 [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression
Patch showing the effect of enabling bool vector oversimplification.

Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:

insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i

Preventing the removal of the AND to clear the upper bits of result

Differential Revision: https://reviews.llvm.org/D53022

llvm-svn: 369780
2019-08-23 17:14:58 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Roman Lebedev edfaee0811 [TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling
Summary:
The general fold is only valid for positive divisors.
Which effectively means, it is invalid for `INT_MIN` divisors,
and we currently bailout if we see them.

But that is too strict, we can just fix-up the results.
For that, let's do a second computation 'in parallel':
```
Name: srem -> and
Pre: isPowerOf2(C)
%o = srem i8 %X, C
%r = icmp eq %o, 0
  =>
%n = and i8 %X, C-1
%r = icmp eq %n, 0
```
https://rise4fun.com/Alive/Sup

And then just blend results: if the divisor was `INT_MIN`,
pick the value we got via bit-test,
else pick the value from general fold.

There's interesting observation - `ISD::ROTR` is set to
`LegalizeAction::Expand` before AVX512, so we should not
treat `INT_MIN` divisor as even; and as it can be seen
while `@test_srem_odd_even_one` improves on all run-lines,
`@test_srem_odd_even_INT_MIN` only improves for AVX512.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66300

llvm-svn: 369268
2019-08-19 15:01:42 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Simon Pilgrim 983e9118a2 Remove BitVector.h include. NFCI.
BitVector type isn't used at all in the cpp file.

llvm-svn: 369007
2019-08-15 14:39:28 +00:00
Roman Lebedev 676594305a [CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case)
Summary:
This implements an optimization described in Hacker's Delight 10-17:
when `C` is constant, the result of `X % C == 0` can be computed
more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

One huge caveat: this signed case is only valid for positive divisors.

While we can freely negate negative divisors, we can't negate `INT_MIN`,
so for now if `INT_MIN` is encountered, we bailout.
As a follow-up, it should be possible to handle that more gracefully
via extra `and`+`setcc`+`select`.

This passes llvm's test-suite, and from cursory(!) cross-examination
the folds (the assembly) match those of GCC, and manual checking via alive
did not reveal any issues (other than the `INT_MIN` case)

Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65366

llvm-svn: 368702
2019-08-13 14:57:37 +00:00
Roman Lebedev f4de7eda4a [TargetLowering][NFC] prepareUREMEqFold(): fixup comment
The comment initially matched the code, but the code was incorrect
and was fixed after the initial revert back back when it was introduced,
but the comment was never updated.

llvm-svn: 368701
2019-08-13 14:57:08 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
Simon Pilgrim 05e8209e33 [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE
llvm-svn: 368553
2019-08-12 10:56:05 +00:00
Simon Pilgrim e2e366797e [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368276
2019-08-08 10:37:03 +00:00
Simon Pilgrim 0eafe011ca [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE
In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway

llvm-svn: 368155
2019-08-07 11:43:13 +00:00
Aditya Nandakumar c8ac029d0a [GISel]: Add GISelKnownBits analysis
https://reviews.llvm.org/D65698

This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.

llvm-svn: 368065
2019-08-06 17:18:29 +00:00
Simon Pilgrim dae5ddad9d [TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops
If we demand no bits/elts from an Op, just return UNDEF

llvm-svn: 368043
2019-08-06 14:30:42 +00:00
Craig Topper 5a4989e2ac [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users.
Summary:
The SimplifyDemandedVectorElts function can replace with undef
when no elements are demanded, but due to how it interacts with
TargetLoweringOpts, it can only do this when the node has
no other users.

Remove a now unneeded DAG combine from the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65713

llvm-svn: 367788
2019-08-04 17:30:41 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Simon Pilgrim 794f7591ec [TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple.
Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple.

llvm-svn: 367721
2019-08-02 21:07:07 +00:00
Simon Pilgrim 1d183b407a [TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling
Allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367588
2019-08-01 17:46:44 +00:00
Simon Pilgrim 603f94aa2a [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied)
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.

This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.

Fixes PR42777

llvm-svn: 367174
2019-07-27 14:11:59 +00:00
Simon Pilgrim 8a52671782 [SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit.
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.

This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. 

This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.

llvm-svn: 367171
2019-07-27 12:48:46 +00:00
Simon Pilgrim 3ff6126487 [TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.

llvm-svn: 367169
2019-07-27 12:23:36 +00:00
Nico Weber 13f337c4cb Revert r367091, it caused PR42777.
llvm-svn: 367118
2019-07-26 14:58:42 +00:00
Simon Pilgrim 9758407bf1 [TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support.
llvm-svn: 367096
2019-07-26 09:41:08 +00:00
Simon Pilgrim b32ceb79b0 [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support.
This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back.

llvm-svn: 367091
2019-07-26 08:38:39 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Sanjay Patel 10dad95a75 [SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC
We canonicalize to the add form, so create that directly for efficiency.

llvm-svn: 366914
2019-07-24 15:43:50 +00:00
Simon Pilgrim 0e8359aec1 [TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support.
If all the demanded elts are from one operand and are inline, then we can use the operand directly.

The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.

llvm-svn: 366817
2019-07-23 15:35:55 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Roman Lebedev cd9b19484b [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
   be illegal.
   * There is no ban in Hacker's Delight
   * I think the ban came from the same bug that caused the miscompile
      in the base patch - in `floor((2^W - 1) / D)` we were dividing by
      `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
      which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
   that the fold is invalid for `D = 0`. Further considerations:
   * We know that
     * `(X u% 1) == 0`  can be constant-folded to `1`,
     * `(X u% 1) != 0`  can be constant-folded to `0`,
   *  Also, we know that
     * `X u<= -1` can be constant-folded to `1`,
     * `X u>  -1` can be constant-folded to `0`,
   * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
   * We know will end up with the following:
       `(setule/setugt (rotr (mul N, P), K), Q)`
   * Therefore, for given new DAG nodes and comparison predicates
     (`ule`/`ugt`), we will still produce the correct answer if:
     `Q` is a all-ones constant; and both `P` and `K` are *anything*
     other than `undef`.
   * The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
   their values for the lanes where divisor was `1`.

Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00

Reviewed By: RKSimon

Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63963

llvm-svn: 366637
2019-07-20 16:33:15 +00:00
Sanjay Patel 138328e45c [SDAG] commute setcc operands to match a subtract
If we have:

R = sub X, Y
P = cmp Y, X

...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.

Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.

There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".

Differential Revision: https://reviews.llvm.org/D63958

llvm-svn: 365711
2019-07-10 23:23:54 +00:00
Nick Desaulniers 8728e45706 [TargetLowering] support BlockAddress as "i" inline asm constraint
Summary:
This allows passing address of labels to inline assembly "i" input
constraints.

Fixes pr/42502.

Reviewers: ostannard

Reviewed By: ostannard

Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64167

llvm-svn: 365664
2019-07-10 17:08:25 +00:00
Simon Pilgrim 9285bf0fb9 [TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases.
Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well).

A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts.

llvm-svn: 365309
2019-07-08 11:00:39 +00:00
Roman Lebedev 7c8ee375d8 [NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963
llvm-svn: 364921
2019-07-02 13:21:23 +00:00
Benjamin Kramer ed13fef477 [SelectionDAG] Do minnum->minimum at legalization time instead of building time
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.

llvm-svn: 364743
2019-07-01 11:00:23 +00:00
Roman Lebedev 29d05c005f [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364600
2019-06-27 21:52:10 +00:00
Roman Lebedev 0a2b7b79fa Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)"
*Appears* to break test-suite on
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790

FAIL: burg.execution_time
FAIL: spiff.execution_time
FAIL: employ.execution_time
FAIL: llu.execution_time
FAIL: gramschmidt.execution_time
FAIL: fdtd-apml.execution_time

This reverts commit r364563.

llvm-svn: 364568
2019-06-27 17:22:31 +00:00
Roman Lebedev 0627b09863 [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch author: @hermord (Dmytro Shynkevych)!

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364563
2019-06-27 16:45:42 +00:00
Simon Pilgrim 83e1a1e79b [TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.
llvm-svn: 364548
2019-06-27 14:25:54 +00:00
Simon Pilgrim c692a8dc51 [TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts
llvm-svn: 364541
2019-06-27 13:48:43 +00:00
Sanjay Patel 685c5cbc65 [SDAG] expand ctpop != 1
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

This is the inverted predicate sibling pattern that was added with:
D63004

This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.

llvm-svn: 364319
2019-06-25 14:46:52 +00:00
Simon Pilgrim 1a18bb6f25 [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases.

Reapplies rL363856.

llvm-svn: 364311
2019-06-25 13:25:57 +00:00
Simon Pilgrim 36953ce769 [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

Reapplies rL363850 but now with legality checks added at rL364290

llvm-svn: 364303
2019-06-25 12:57:43 +00:00
Sanjay Patel e4ef62291b [SDAG] improve expansion of ctpop+setcc
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.

llvm-svn: 364302
2019-06-25 12:49:35 +00:00
Simon Pilgrim 69fc111184 [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

Reapplies rL363802 but now with legality checks added at rL364290

llvm-svn: 364299
2019-06-25 12:19:12 +00:00
Simon Pilgrim 49b3778e32 [TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.

We'll improve X86 legality in future commits.

llvm-svn: 364290
2019-06-25 10:51:15 +00:00
Roman Lebedev cdd43eac4f [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.

Reviewers: RKSimon, craig.topper, xbolva00, spatel

Reviewed By: xbolva00, spatel

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63390

llvm-svn: 364286
2019-06-25 10:01:42 +00:00
Craig Topper 079924b0b7 Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"

We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.

The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.

llvm-svn: 364264
2019-06-25 01:32:42 +00:00
Simon Pilgrim f05369768c [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case.

llvm-svn: 363856
2019-06-19 18:34:58 +00:00
Simon Pilgrim 6016fb726c [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

llvm-svn: 363850
2019-06-19 18:00:24 +00:00
Simon Pilgrim c3994f77cb [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

llvm-svn: 363802
2019-06-19 13:58:02 +00:00
Simon Pilgrim 5bef886cd8 [TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handling
Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch.

llvm-svn: 363716
2019-06-18 18:22:30 +00:00