Commit Graph

162 Commits

Author SHA1 Message Date
Nikita Popov 2b494f85f1 [CVP] Remove -cvp-dont-add-nowrap-flags option
This option was originally added to work around a bug in LFTR.
The bug has long since been fixed.
2021-03-07 18:19:31 +01:00
Nikita Popov afbb6d97b5 [CVP] Simplify and generalize switch handling
CVP currently handles switches by checking an equality predicate
on all edges from predecessor blocks. Of course, this can only
work if the value being switched over is defined in a different block.

Replace this implementation with a call to getPredicateAt(), which
also does the predecessor edge predicate check (if not defined in
the same block), but can also do quite a bit more: It can reason
about phi-nodes by checking edge predicates for incoming values,
it can reason about assumes, and it can reason about block values.

As such, this makes the implementation both simpler and more
powerful. The compile-time impact on CTMark is in the noise.
2020-12-12 21:12:27 +01:00
Philip Reames e46d74b589 [CVP] Allow two transforms in one invocation
For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early.

This is somewhat of a speculative fix.  Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding.  I never received a test case.  However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue.  So, since it's a real issue, fix it and hope.
2020-09-28 15:11:42 -07:00
Nikita Popov fe79061be2 [LVI][CVP] Use block value when simplifying icmps
Add a flag to getPredicateAt() that allows making use of the block
value. This allows us to take into account range information from
the current block, rather than only information that is threaded
over edges, making the icmp simplification in CVP a lot more
powerful.

I'm not changing getPredicateAt() to use the block value
unconditionally to avoid any impact on the JumpThreading pass,
which is somewhat picky about LVI query order.

Most test changes here are just icmps that now get dropped (while
previously only a result used in a return was replaced). The three
tests in icmp.ll show some representative improvements. Some of
the folds this enables have been covered by IPSCCP in the meantime,
but LVI can reason about some cases which are hard to support in
IPSCCP, such as in test_br_cmp_with_offset.

The compile-time time cost of doing this is fairly minimal, with
a ~0.05% CTMark regression for ReleaseThinLTO:
https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions

This is because the block values will typically already be queried
and cached by other CVP optimizations anyway.

Differential Revision: https://reviews.llvm.org/D69686
2020-09-27 20:25:16 +02:00
Nikita Popov 9b959b59df [LVI] Require context instruction in external API (NFCI)
Require CxtI in getConstant() and getConstantRange() APIs.
Accordingly drop the BB parameter, as it is implied by
CxtI->getParent().

This makes sure we don't forget to pass the context instruction,
and makes the API contract clearer (also clean up the comments to
that effect -- the value holds at the context instruction, not
the end of the block).
2020-09-27 18:07:24 +02:00
Nikita Popov c8abf1c12d [CVP] Pass context instruction when narrowing div/rem
This fold was the only place not passing the context instruction.
The tests worked around that fact by introducing a basic block split,
which is now no longer necessary.
2020-09-27 17:51:30 +02:00
Martin Storsjö b90132399a [CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC. 2020-09-23 09:03:01 +03:00
Roman Lebedev b289dc5306
[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands
This is practically identical to what we already do for UDiv/URem:
  https://rise4fun.com/Alive/04K

Name: narrow udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow exact udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv exact i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow urem
Pre: C0 u<= 255 && C1 u<= 255
%r = urem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = urem i8 %t0, %t1
%r = zext i8 %t2 to i16

... only here we need to look for 'min signed bits', not 'active bits',
and there's an UB to be aware of:
  https://rise4fun.com/Alive/KG86
  https://rise4fun.com/Alive/LwR

Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv exact i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = srem i9 %t0, %t1
%r = sext i9 %t2 to i16


Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv exact i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = srem i8 %t0, %t1
%r = sext i8 %t2 to i16


The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks
the logic, that we can losslessly truncate ConstantRange to
`getMinSignedBits()` and signext it back, and it will be identical
to the original CR.

On vanilla llvm test-suite + RawSpeed, this fires 1262 times,
while the same fold for UDiv/URem only fires 384 times. Sic!

Additionally, this causes +606.18% (+1079) extra cases of
aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145)
of aggressive-instcombine.NumInstrsReduced folds.
2020-09-22 21:37:30 +03:00
Roman Lebedev 4977eadee5
[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms 2020-09-22 21:37:30 +03:00
Roman Lebedev ba5afe5588
[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits()
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:

```
TEST_F(ConstantRangeTest, CVP_UDiv) {
  unsigned Bits = 4;
  EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
    if(CR0.isEmptySet())
      return;
    EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
      if(CR0.isEmptySet())
        return;

      unsigned MaxActiveBits = 0;
      for (const ConstantRange &CR : {CR0, CR1})
        MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());

      ConstantRange OperandRange(Bits, /*isFullSet=*/false);
      for (const ConstantRange &CR : {CR0, CR1})
        OperandRange = OperandRange.unionWith(CR);
      unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();

      EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
    });
  });
}
```
2020-09-22 21:37:29 +03:00
Roman Lebedev 4eeeb356fc
[CVP] Enhance SRem -> URem fold to work not just on non-negative operands
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:

Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, C1

Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, -C1

Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0

Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0

https://rise4fun.com/Alive/Vd6

Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
2020-09-22 21:37:28 +03:00
Nikita Popov 25af353b0e [NewPM][LVI] Abandon LVI after CVP
As mentioned on D70376, LVI can currently cause performance issues
when running under NewPM. The problem is that, unlike the legacy
pass manager, NewPM will not immediately discard the LVI analysis
if the following pass does not need it. This is a problem, because
LVI has a high memory requirement, and mass invalidation of LVI
values is very inefficient. LVI should only be alive during passes
that actively interact with it.

This patch addresses the issue by explicitly abandoning LVI after CVP,
which gets us back to the LegacyPM behavior.

Differential Revision: https://reviews.llvm.org/D84959
2020-08-01 23:47:46 +02:00
Roman Lebedev 9dceb32f30
[NFC][CVP] processSDiv(): pacify gcc compilers 2020-07-18 19:41:43 +03:00
Roman Lebedev 8d487668d0
[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands.
Yes, if operands are non-positive this comes at the extra cost
of two extra negations. But  a. division is already just
ridiculously costly, two more subtractions can't hurt much :)
and  b. we have better/more analyzes/folds for an unsigned division,
we could end up narrowing it's bitwidth, converting it to lshr, etc.

This is essentially a take two on 0fdcca07ad,
which didn't fix the potential regression i was seeing,
because ValueTracking's computeKnownBits() doesn't make use
of dominating conditions in it's analysis.
While i could teach it that, this seems like the more general fix.

This big hammer actually does catch said potential regression.

Over vanilla test-suite + RawSpeed + darktable
(10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more
(+2%) SDiv's, the total instruction count at the end of middle-end pipeline
is only +6, so out of +10 extra negations, ~half are folded away,
and asm instr count is only +1, so practically speaking all extra
negations are folded away and are therefore free.
Sadly, all these new UDiv's remained, none folded away.
But there are two less basic blocks.

https://rise4fun.com/Alive/VS6

Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 C0, C1

Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0

Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0

Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 -C0, -C1
2020-07-18 17:59:56 +03:00
Roman Lebedev 45b7388824
[NFC][CVP] Rename predicates - s/positive/non negative/ to better note that zero is ok 2020-07-18 17:59:32 +03:00
Roman Lebedev 2cde6984d8
[NFC][CVP] Refactor isPositive() out of hasPositiveOperands() 2020-07-18 17:59:32 +03:00
Mircea Trofin ceb7f308b8 [llvm][NFC][CallSite] Removed CallSite from few implementation details
Reviewers: dblaikie, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78724
2020-04-23 10:36:36 -07:00
Florian Hahn b37543750c [ValueLattice] Distinguish between constant ranges with/without undef.
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the  second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76931
2020-03-31 12:50:20 +01:00
Enna1 03bc311a16 [CorrelatedValuePropagation] Remove redundant if statement in processSelect()
This statement

    if (ReplaceWith == S) ReplaceWith = UndefValue::get(S->getType());

is introduced in https://reviews.llvm.org/rG35609d97ae89b8e13f40f4e6b9b056954f8baa83
to fix a case where unreachable code can cause select instruction
simplification to fail. In https://reviews.llvm.org/rGd10480657527ffb44ea213460fb3676a6b1300aa,
we begin to perform a depth-first walk of basic blocks. This means
we will not visit unreachable blocks. So we do not need this the
special check any more.

Differential Revision: https://reviews.llvm.org/D76753
2020-03-28 18:01:17 +01:00
Nikita Popov 9d9633fb70 [CVP] Simplify cmp of local phi node
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)

Differential Revision: https://reviews.llvm.org/D72169
2020-02-26 20:36:41 +01:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Sanjay Patel f2e93d10fe [CVP] prevent propagating poison when substituting edge values into a phi (PR43802)
This phi simplification transform was added with:
D45448

However as shown in PR43802:
https://bugs.llvm.org/show_bug.cgi?id=43802

...we must be careful not to propagate poison when we do the substitution.
There might be some more complicated analysis possible to retain the overflow flag,
but it should always be safe and easy to drop flags (we have similar behavior in
instcombine and other passes).

Differential Revision: https://reviews.llvm.org/D69442
2019-10-28 08:58:28 -04:00
Roman Lebedev 7cd7f4a83b [CVP] No-wrap deduction for `shl`
Summary:
This is the last `OverflowingBinaryOperator` for which we don't deduce flags.
D69217 taught `ConstantRange::makeGuaranteedNoWrapRegion()` about it.

The effect is better than of the `mul` patch (D69203):

| statistic                              |     old |     new | delta | % change |
| correlated-value-propagation.NumAddNUW |    7145 |    7144 |    -1 | -0.0140% |
| correlated-value-propagation.NumAddNW  |   12126 |   12125 |    -1 | -0.0082% |
| correlated-value-propagation.NumAnd    |     443 |     446 |     3 |  0.6772% |
| correlated-value-propagation.NumNSW    |    5986 |    7158 |  1172 | 19.5790% |
| correlated-value-propagation.NumNUW    |   10512 |   13304 |  2792 | 26.5601% |
| correlated-value-propagation.NumNW     |   16498 |   20462 |  3964 | 24.0272% |
| correlated-value-propagation.NumShlNSW |       0 |    1172 |  1172 |          |
| correlated-value-propagation.NumShlNUW |       0 |    2793 |  2793 |          |
| correlated-value-propagation.NumShlNW  |       0 |    3965 |  3965 |          |
| instcount.NumAShrInst                  |   13824 |   13790 |   -34 | -0.2459% |
| instcount.NumAddInst                   |  277584 |  277586 |     2 |  0.0007% |
| instcount.NumAndInst                   |   66061 |   66056 |    -5 | -0.0076% |
| instcount.NumBrInst                    |  709153 |  709147 |    -6 | -0.0008% |
| instcount.NumICmpInst                  |  483709 |  483708 |    -1 | -0.0002% |
| instcount.NumSExtInst                  |   79497 |   79496 |    -1 | -0.0013% |
| instcount.NumShlInst                   |   40691 |   40654 |   -37 | -0.0909% |
| instcount.NumSubInst                   |   61997 |   61996 |    -1 | -0.0016% |
| instcount.NumZExtInst                  |   68208 |   68211 |     3 |  0.0044% |
| instcount.TotalBlocks                  |  843916 |  843910 |    -6 | -0.0007% |
| instcount.TotalInsts                   | 7387528 | 7387448 |   -80 | -0.0011% |

Reviewers: nikic, reames, sanjoy, timshen

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69277

llvm-svn: 375455
2019-10-21 21:31:19 +00:00
Roman Lebedev 2927716277 [CVP] Deduce no-wrap on `mul`
Summary:
`ConstantRange::makeGuaranteedNoWrapRegion()` knows how to deal with `mul`
since rL335646, there is exhaustive test coverage.
This is already used by CVP's `processOverflowIntrinsic()`,
and by SCEV's `StrengthenNoWrapFlags()`

That being said, currently, this doesn't help much in the end:
| statistic                              |     old |     new | delta | percentage |
| correlated-value-propagation.NumMulNSW |       4 |     275 |   271 |   6775.00% |
| correlated-value-propagation.NumMulNUW |       4 |    1323 |  1319 |  32975.00% |
| correlated-value-propagation.NumMulNW  |       8 |    1598 |  1590 |  19875.00% |
| correlated-value-propagation.NumNSW    |    5715 |    5986 |   271 |      4.74% |
| correlated-value-propagation.NumNUW    |    9193 |   10512 |  1319 |     14.35% |
| correlated-value-propagation.NumNW     |   14908 |   16498 |  1590 |     10.67% |
| instcount.NumAddInst                   |  275871 |  275869 |    -2 |      0.00% |
| instcount.NumBrInst                    |  708234 |  708232 |    -2 |      0.00% |
| instcount.NumMulInst                   |   43812 |   43810 |    -2 |      0.00% |
| instcount.NumPHIInst                   |  316786 |  316784 |    -2 |      0.00% |
| instcount.NumTruncInst                 |   62165 |   62167 |     2 |      0.00% |
| instcount.NumUDivInst                  |    2528 |    2526 |    -2 |     -0.08% |
| instcount.TotalBlocks                  |  842995 |  842993 |    -2 |      0.00% |
| instcount.TotalInsts                   | 7376486 | 7376478 |    -8 |      0.00% |
(^ test-suite plain, tests still pass)

Reviewers: nikic, reames, luqmana, sanjoy, timshen

Reviewed By: reames

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69203

llvm-svn: 375396
2019-10-21 08:21:44 +00:00
Roman Lebedev e695f4c851 [CVP] setDeducedOverflowingFlags(): actually inc per-opcode stats
This is really embarrassing. Those are pointers, so that offsets the
pointers, not the statistics pointed-by the pointer...

llvm-svn: 375290
2019-10-18 21:19:26 +00:00
Roman Lebedev 284b6d7f4d [CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, also try to prove other no-wrap
Summary:
CVP, unlike InstCombine, does not run till exaustion.
It only does a single pass.

When dealing with those special binops, if we prove that they can
safely be demoted into their usual binop form,
we do set the no-wrap we deduced. But when dealing with usual binops,
we try to deduce both no-wraps.

So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`,
we won't attempt to check whether it can be `add nuw nsw`.

This patch proposes to call `processBinOp()` on newly-created binop,
which is identical to what we do for div/rem already.

Reviewers: nikic, spatel, reames

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69183

llvm-svn: 375273
2019-10-18 19:32:47 +00:00
Roman Lebedev fa0ac2558e [NFC][CVP] Count all the no-wraps we proved
Summary:
It looks like this is the only missing statistic in the CVP pass.
Since we prove NSW and NUW separately i'd think we should count them separately too.

Reviewers: nikic, spatel, reames

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68740

llvm-svn: 375230
2019-10-18 13:20:16 +00:00
Philip Reames 2d5820cd72 [CVP] Remove a masking operation if range information implies it's a noop
This is really a known bits style transformation, but known bits isn't context sensitive. The particular case which comes up happens to involve a range which allows range based reasoning to eliminate the mask pattern, so handle that case specifically in CVP.

InstCombine likes to generate the mask-by-low-bits pattern when widening an arithmetic expression which includes a zext in the middle.

Differential Revision: https://reviews.llvm.org/D68811

llvm-svn: 374506
2019-10-11 03:48:56 +00:00
Roman Lebedev 354ba6985c [CVP} Replace SExt with ZExt if the input is known-non-negative
Summary:
zero-extension is far more friendly for further analysis.
While this doesn't directly help with the shift-by-signext problem, this is not unrelated.

This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager):
| Statistic                            |     old |     new | delta | percent change |
| correlated-value-propagation.NumSExt |       0 |    6026 |  6026 |   +100.00%     |
| instcount.NumAddInst                 |  272860 |  271283 | -1577 |     -0.58%     |
| instcount.NumAllocaInst              |   27227 |   27226 | -1    |      0.00%     |
| instcount.NumAndInst                 |   63502 |   63320 | -182  |     -0.29%     |
| instcount.NumAShrInst                |   13498 |   13407 | -91   |     -0.67%     |
| instcount.NumAtomicCmpXchgInst       |    1159 |    1159 |  0    |      0.00%     |
| instcount.NumAtomicRMWInst           |    5036 |    5036 |  0    |      0.00%     |
| instcount.NumBitCastInst             |  672482 |  672353 | -129  |     -0.02%     |
| instcount.NumBrInst                  |  702768 |  702195 | -573  |     -0.08%     |
| instcount.NumCallInst                |  518285 |  518205 | -80   |     -0.02%     |
| instcount.NumExtractElementInst      |   18481 |   18482 |  1    |      0.01%     |
| instcount.NumExtractValueInst        |   18290 |   18288 | -2    |     -0.01%     |
| instcount.NumFAddInst                |  139035 |  138963 | -72   |     -0.05%     |
| instcount.NumFCmpInst                |   10358 |   10348 | -10   |     -0.10%     |
| instcount.NumFDivInst                |   30310 |   30302 | -8    |     -0.03%     |
| instcount.NumFenceInst               |     387 |     387 |  0    |      0.00%     |
| instcount.NumFMulInst                |   93873 |   93806 | -67   |     -0.07%     |
| instcount.NumFPExtInst               |    7148 |    7144 | -4    |     -0.06%     |
| instcount.NumFPToSIInst              |    2823 |    2838 |  15   |      0.53%     |
| instcount.NumFPToUIInst              |    1251 |    1251 |  0    |      0.00%     |
| instcount.NumFPTruncInst             |    2195 |    2191 | -4    |     -0.18%     |
| instcount.NumFSubInst                |   92109 |   92103 | -6    |     -0.01%     |
| instcount.NumGetElementPtrInst       | 1221423 | 1219157 | -2266 |     -0.19%     |
| instcount.NumICmpInst                |  479140 |  478929 | -211  |     -0.04%     |
| instcount.NumIndirectBrInst          |       2 |       2 |  0    |      0.00%     |
| instcount.NumInsertElementInst       |   66089 |   66094 |  5    |      0.01%     |
| instcount.NumInsertValueInst         |    2032 |    2030 | -2    |     -0.10%     |
| instcount.NumIntToPtrInst            |   19641 |   19641 |  0    |      0.00%     |
| instcount.NumInvokeInst              |   21789 |   21788 | -1    |      0.00%     |
| instcount.NumLandingPadInst          |   12051 |   12051 |  0    |      0.00%     |
| instcount.NumLoadInst                |  880079 |  878673 | -1406 |     -0.16%     |
| instcount.NumLShrInst                |   25919 |   25921 |  2    |      0.01%     |
| instcount.NumMulInst                 |   42416 |   42417 |  1    |      0.00%     |
| instcount.NumOrInst                  |  100826 |  100576 | -250  |     -0.25%     |
| instcount.NumPHIInst                 |  315118 |  314092 | -1026 |     -0.33%     |
| instcount.NumPtrToIntInst            |   15933 |   15939 |  6    |      0.04%     |
| instcount.NumResumeInst              |    2156 |    2156 |  0    |      0.00%     |
| instcount.NumRetInst                 |   84485 |   84484 | -1    |      0.00%     |
| instcount.NumSDivInst                |    8599 |    8597 | -2    |     -0.02%     |
| instcount.NumSelectInst              |   45577 |   45913 |  336  |      0.74%     |
| instcount.NumSExtInst                |   84026 |   78278 | -5748 |     -6.84%     |
| instcount.NumShlInst                 |   39796 |   39726 | -70   |     -0.18%     |
| instcount.NumShuffleVectorInst       |  100272 |  100292 |  20   |      0.02%     |
| instcount.NumSIToFPInst              |   29131 |   29113 | -18   |     -0.06%     |
| instcount.NumSRemInst                |    1543 |    1543 |  0    |      0.00%     |
| instcount.NumStoreInst               |  805394 |  804351 | -1043 |     -0.13%     |
| instcount.NumSubInst                 |   61337 |   61414 |  77   |      0.13%     |
| instcount.NumSwitchInst              |    8527 |    8524 | -3    |     -0.04%     |
| instcount.NumTruncInst               |   60523 |   60484 | -39   |     -0.06%     |
| instcount.NumUDivInst                |    2381 |    2381 |  0    |      0.00%     |
| instcount.NumUIToFPInst              |    5549 |    5549 |  0    |      0.00%     |
| instcount.NumUnreachableInst         |    9855 |    9855 |  0    |      0.00%     |
| instcount.NumURemInst                |    1305 |    1305 |  0    |      0.00%     |
| instcount.NumXorInst                 |   10230 |   10081 | -149  |     -1.46%     |
| instcount.NumZExtInst                |   60353 |   66840 |  6487 |     10.75%     |
| instcount.TotalBlocks                |  829582 |  829004 | -578  |     -0.07%     |
| instcount.TotalFuncs                 |   83818 |   83817 | -1    |      0.00%     |
| instcount.TotalInsts                 | 7316574 | 7308483 | -8091 |     -0.11%     |

TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`.
To be noted, clearly, not all new `zext`'s are produced by this fold.

(And now i guess it might have been interesting to measure this for D68103 :S)

Reviewers: nikic, spatel, reames, dberlin

Reviewed By: nikic

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68654

llvm-svn: 374112
2019-10-08 20:29:48 +00:00
Nikita Popov b9e668f2e7 [CVP] Generate simpler code for elided with.overflow intrinsics
Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.

Differential Revision: https://reviews.llvm.org/D67034

llvm-svn: 370573
2019-08-31 09:58:37 +00:00
David Bolvansky d2904ccf88 Let CorrelatedValuePropagation preserve LazyValueInfo
Summary:
This patch makes CorrelatedValuePropagation preserve LazyValueInfo by adding LazyValueInfo::eraseValue & calling it whenever an instruction is erased.

Passes `make check` , test-suite, and SPECrate 2017.

Patch by aqjune (Juneyoung Lee)

Reviewers: reames, mzolotukhin

Reviewed By: reames

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59349

llvm-svn: 366942
2019-07-24 20:27:32 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Nikita Popov f1ffc4305d [CVP] Reenable nowrap flag inference
Inference of nowrap flags in CVP has been disabled, because it
triggered a bug in LFTR (https://bugs.llvm.org/show_bug.cgi?id=31181).
This issue has been fixed in D60935, so we should be able to reenable
nowrap flag inference now.

Differential Revision: https://reviews.llvm.org/D62776

llvm-svn: 364228
2019-06-24 20:13:13 +00:00
Yevgeny Rouban a3e16719c4 Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"
This reverts commit 5b32f60ec3.
The fix is in commit 4f9e68148b.

This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126

llvm-svn: 362583
2019-06-05 05:46:40 +00:00
Nikita Popov 7bafae55c0 Reapply [CVP] Simplify non-overflowing saturating add/sub
If we can determine that a saturating add/sub will not overflow based
on range analysis, convert it into a simple binary operation. This is
a sibling transform to the existing with.overflow handling.

Reapplying this with an additional check that the saturating intrinsic
has integer type, as LVI currently does not support vector types.

Differential Revision: https://reviews.llvm.org/D62703

llvm-svn: 362263
2019-05-31 20:48:26 +00:00
Nikita Popov 23a02f6a5f [CVP] Fix assertion failure on vector with.overflow
Noticed on D62703. LVI only handles plain integers, not vectors of
integers. This was previously not an issue, because vector support
for with.overflow is only a relatively recent addition.

llvm-svn: 362261
2019-05-31 20:42:07 +00:00
Nikita Popov ccb63e0bfe Revert "[CVP] Simplify non-overflowing saturating add/sub"
This reverts commit 1e692d1777.

Causes assertion failure in builtins-wasm.c clang test.

llvm-svn: 362254
2019-05-31 19:04:47 +00:00
Nikita Popov 1e692d1777 [CVP] Simplify non-overflowing saturating add/sub
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.

Differential Revision: https://reviews.llvm.org/D62703

llvm-svn: 362242
2019-05-31 16:46:05 +00:00
Nikita Popov e906f2a370 [CVP] Generalize willNotOverflow(); NFC
Change argument from WithOverflowInst to BinaryOpIntrinsic, so this
function can also be used for saturating math intrinsics.

llvm-svn: 362152
2019-05-30 21:03:10 +00:00
Nikita Popov 5b32f60ec3 Revert "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"
This reverts commit 53f2f32865.

As reported on D62126, this causes assertion failures if the switch
has incorrect branch_weights metadata, which may happen as a result
of other transforms not handling it correctly yet.

llvm-svn: 361881
2019-05-28 21:28:24 +00:00
Yevgeny Rouban 53f2f32865 [CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126

llvm-svn: 361808
2019-05-28 11:33:50 +00:00
Nikita Popov 8b1fa07639 [CVP] Remove unnecessary checks for empty GNWR; NFC
The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.

To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.

llvm-svn: 361698
2019-05-25 14:11:55 +00:00
Nikita Popov 5aacc7a573 Revert "[ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC"
This reverts commit 7bf4d7c07f2fac862ef34c82ad0fef6513452445.

After thinking about this more, this isn't right, the range is not exact
in the same sense as makeExactICmpRegion(). This needs a separate
function.

llvm-svn: 358876
2019-04-22 09:01:38 +00:00
Nikita Popov 5299e25f50 [ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC
Following D60632 makeGuaranteedNoWrapRegion() always returns an
exact nowrap region. Rename the function accordingly. This is in
line with the naming of makeExactICmpRegion().

llvm-svn: 358875
2019-04-22 08:36:05 +00:00
Luqman Aden 2993661cc0 [CorrelatedValuePropagation] Mark subs that we know not to wrap with nuw/nsw.
Summary:
Teach CorrelatedValuePropagation to also handle sub instructions in addition to add. Relatively simple since makeGuaranteedNoWrapRegion already understood sub instructions. Only subtle change is which range is passed as "Other" to that function, since sub isn't commutative.

Note that CorrelatedValuePropagation::processAddSub is still hidden behind a default-off flag as IndVarSimplify hasn't yet been fixed to strip the added nsw/nuw flags and causes a miscompile. (PR31181)

Reviewers: sanjoy, apilipenko, nikic

Reviewed By: nikic

Subscribers: hiraditya, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60036

llvm-svn: 358816
2019-04-20 13:14:18 +00:00
Roman Lebedev 0080645846 [CVP] processOverflowIntrinsic(): don't crash if constant-holding happened
As reported by Mikael Holmén in post-commit review in
https://reviews.llvm.org/D60791#1469765

llvm-svn: 358559
2019-04-17 06:35:07 +00:00
Nikita Popov 52b24ee932 [CVP] Simplify umulo and smulo that cannot overflow
If a umul.with.overflow or smul.with.overflow operation cannot
overflow, simplify it to a simple mul nuw / mul nsw. After the
refactoring in D60668 this is just a matter of removing an
explicit check against multiplications.

Differential Revision: https://reviews.llvm.org/D60791

llvm-svn: 358521
2019-04-16 20:31:41 +00:00
Nikita Popov 79dffc67b5 [IR] Add WithOverflowInst class
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.

The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.

Differential Revision: https://reviews.llvm.org/D60668

llvm-svn: 358512
2019-04-16 18:55:16 +00:00
Nikita Popov 00a0d5d1de [CVP] Set NSW/NUW flags when simplifying with.overflow
When CVP determines that a with.overflow intrinsic cannot overflow,
it currently inserts a simple add/sub. As we already determined that
there can be no overflow, we should add the appropriate NUW/NSW flag.

Differential Revision: https://reviews.llvm.org/D60585

llvm-svn: 358298
2019-04-12 18:18:17 +00:00
Chijun Sima 70e97163e0 [DTU] Refine the interface and logic of applyUpdates
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.

Logic changes:

Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.

Interface changes:
The second semantic of `applyUpdates`  is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.

Reviewers: kuhar, brzycki, dmgreen, grosser

Reviewed By: brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58170

llvm-svn: 354669
2019-02-22 13:48:38 +00:00