Commit Graph

15987 Commits

Author SHA1 Message Date
Sanjay Patel 4964d75d70 [InstCombine] add bitwise logic fold tests for D86395; NFC 2020-09-08 09:17:42 -04:00
Max Kazantsev 046f240202 [Test] More tests where IndVars fails to eliminate a range check 2020-09-08 14:43:29 +07:00
Andrew Wei 78071fb524 [LSR] Canonicalize a formula before insert it into the list
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.

Patched by: mdchen
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86939
2020-09-08 13:14:53 +08:00
Johannes Doerfert 711bf7dcf9 [Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions
The CloneFunctionInto has implicit requirements with regards to the
linkage and visibility of the function. We now update these after we did
the CloneFunctionInto on the copy with the same linkage and visibility
as the original.
2020-09-07 23:38:09 -05:00
Johannes Doerfert 53e4ef7fc2 [Attributor][NFC] Cleanup internalize test case
One run line was different and probably introduced for the manually
added function attribute & name checks. We can do this with the script
and a check prefix used for the other run lines as well.
2020-09-07 23:38:09 -05:00
Max Kazantsev 247d023965 [Test] Auto-generated checks for some IndVarSimplify tests 2020-09-08 11:15:40 +07:00
Florian Hahn efb8e156da [DSE,MemorySSA] Add an early check for read clobbers to traversal.
Depending on the benchmark, this early exit can save a substantial
amount of compile-time:

http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions
2020-09-07 23:22:10 +01:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Nikita Popov 9fb46a452d [SCCP] Compute ranges for supported intrinsics
For intrinsics supported by ConstantRange, compute the result range
based on the argument ranges. We do this independently of whether
some or all of the input ranges are full, as we can often still
constrain the result in some way.

Differential Revision: https://reviews.llvm.org/D87183
2020-09-07 22:16:06 +02:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Sanjay Patel 70207816e3 [InstCombine] add ptr difference tests; NFC 2020-09-07 15:54:32 -04:00
Sanjay Patel 7a6d6f0f70 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Sanjay Patel 11d8eedfa5 [InstCombine] move/add tests for icmp with mul operands; NFC 2020-09-07 12:40:37 -04:00
Sanjay Patel b22910daab [InstCombine] erase instructions leading up to unreachable
Normal dead code elimination ignores assume intrinsics, so we fail to
delete assumes that are not meaningful (and potentially worse if they
cause conflicts with other assumptions).

The motivating example in https://llvm.org/PR47416 suggests that we
might have problems upstream from here (difference between C and C++),
but this should be a cheap way to make sure we remove more dead code.

Differential Revision: https://reviews.llvm.org/D87149
2020-09-07 10:44:08 -04:00
Sanjay Patel 28aa60aae2 [InstCombine] add test with more unreachable insts; NFC
Goes with D87149
2020-09-07 08:19:43 -04:00
Sanjay Patel 3ca8b9a560 [InstCombine] give a name to an intermediate value for easier tracking; NFC
As noted in PR47430, we probably want to conditionally include 'nsw'
here anyway, so we are going to need to fill out the optional args.
2020-09-07 08:19:42 -04:00
Sam Parker 65f78e73ad [SimplifyCFG] Consider cost of combining predicates.
Modify FoldBranchToCommonDest to consider the cost of inserting
instructions when attempting to combine predicates to fold blocks.
The threshold can be controlled via a new option:
-simplifycfg-branch-fold-threshold which defaults to '2' to allow
the insertion of a not and another logical operator.

Differential Revision: https://reviews.llvm.org/D86526
2020-09-07 10:04:50 +01:00
Nikita Popov ff218cbc84 [InstSimplify] Fold degenerate abs of abs form
This addresses the remaining issue from D87188. Due to a series of
folds, we may end up with abs-of-abs represented as
x == 0 ? -abs(x) : abs(x). Rather than recognizing this as a special
abs pattern and doing an abs-of-abs fold on it afterwards,
I'm directly folding this to one of the select operands in InstSimplify.

The general pattern falls into the "select with operand replaced"
category, but that fold is not powerful enough to recognize that
both hands of the select are the same for value zero.

Differential Revision: https://reviews.llvm.org/D87197
2020-09-06 09:43:08 +02:00
Nikita Popov 621b10ca18 [InstSimplify] Add tests for a peculiar abs of abs form (NFC)
This pattern shows up when canonicalizing to spf abs form to
intrinsic abs form.
2020-09-05 21:42:22 +02:00
Florian Hahn 1ddb3a369f [LangRef] Adjust guarantee for llvm.memcpy to also allow equal arguments.
This adjusts the description of `llvm.memcpy` to also allow operands
to be equal. This is in line with what Clang currently expects.

This change is intended to be temporary and followed by re-introduce
a variant with the non-overlapping guarantee for cases where we can
actually ensure that property in the front-end.

See the links below for more details:
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html
and PR11763.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D86815
2020-09-05 19:18:23 +01:00
Nikita Popov 5ad6552a83 [InstCombine] Add tests for known negative abs intrinsic (NFC)
And duplicate tests for known non-negative from InstSimplify.
2020-09-05 17:31:04 +02:00
Nikita Popov 4892d3a198 [InstCombine] Fold abs with dominating condition
Similar to D87168, but for abs. If we have a dominating x >= 0
condition, then we know that abs(x) is x. This fold is in
InstCombine, because we need to create a sub instruction for
the x < 0 case.

Differential Revision: https://reviews.llvm.org/D87184
2020-09-05 16:18:35 +02:00
Nikita Popov 73104b0751 [InstSimplify] Fold min/max based on dominating condition
If we have a dominating condition that x >= y, then umax(x, y) is x,
etc. I'm doing this in InstSimplify as the corresponding transform
for the select form is also done there.

Differential Revision: https://reviews.llvm.org/D87168
2020-09-05 16:16:40 +02:00
Nikita Popov ada8a17d94 [InstCombine] Fold abs intrinsic eq zero
Following the same transform for the select version of abs.
2020-09-05 15:11:38 +02:00
Nikita Popov 94c71d6aa1 [InstCombine] Add tests for abs intrinsic eq zero (NFC) 2020-09-05 15:11:38 +02:00
Nikita Popov 58b28fa7a2 [InstCombine] Fold mul of abs intrinsic
Same as the existing SPF_ABS fold. We don't need to explicitly
handle NABS, as the negs will get folded away first.
2020-09-05 12:37:45 +02:00
Nikita Popov 3ab13348ba [InstCombine] Add tests for mul of abs intrinsic (NFC) 2020-09-05 12:36:27 +02:00
Nikita Popov 10cb23c6ca [InstCombine] Fold cttz of abs intrinsic
Same as the existing fold for SPF_ABS. We don't need to explicitly
handle the NABS variant, as we'll first fold away the neg in that
case.
2020-09-05 12:25:41 +02:00
Nikita Popov 1903a1afd9 [InstCombine] Add tests for cttz of abs intrinsic (NFC) 2020-09-05 12:22:42 +02:00
Nikita Popov d401e376e4 [InstCombine] Test abs with dominating condition (NFC) 2020-09-05 11:10:01 +02:00
Nikita Popov 39caf9e940 [SCCP] Add tests for intrinsic ranges (NFC) 2020-09-05 10:28:13 +02:00
serge-sans-paille 3a6f3fc160 Fix return status of SimplifyCFG
When a switch case is folded into default's case, that's an IR change that
should be reported, update ConstantFoldTerminator accordingly.

Differential Revision: https://reviews.llvm.org/D87142
2020-09-05 07:54:15 +02:00
Nikita Popov 781a438408 [InstSimplify] Add tests for min/max with dominating condition (NFC) 2020-09-04 23:45:54 +02:00
Sanjay Patel 35c6d56c04 [InstCombine] rename tmp values to avoid scripted FileCheck conflicts; NFC 2020-09-04 17:02:18 -04:00
Sanjay Patel c5d6b2b7e5 [InstCombine] add test for assume in block with unreachable (PR47416); NFC 2020-09-04 16:57:35 -04:00
Nikita Popov b3e139444f [BDCE] Add tests for min/max intrinsincs (NFC) 2020-09-04 22:41:52 +02:00
Florian Hahn 00eb6fef08 [DSE,MemorySSA] Check for throwing instrs between killing/killed def.
We also have to check all uses between the killing & killed def and
check if any of them is throwing.
2020-09-04 18:54:59 +01:00
Florian Hahn 51932fc6bd [DSE,MemorySSA] Remove some duplicated test functions.
Some tests from multibuild-malloc-free.ll do not actually use malloc or
free and where split out to multiblock-throwing.ll, but not removed from
the original file. This patch cleans that up. It also moves @test22 to
simple.ll, because it does not involve multiple blocks.
2020-09-04 17:52:59 +01:00
Wei Wang 4eef14f978 [OpenMPOpt] Assume indirect call always changes ICV
When checking call sites, give special handling to indirect call, as the
callee may be unknown and can lead to nullptr dereference later. Assume
conservatively that the ICV always changes in such case.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D87104
2020-09-04 09:05:32 -07:00
Bryan Chan 3404add468 [EarlyCSE] Verify hash code in regression tests
As discussed in D86843, -earlycse-debug-hash should be used in more regression
tests to catch inconsistency between the hashing and the equivalence check.

Differential Revision: https://reviews.llvm.org/D86863
2020-09-04 10:40:35 -04:00
Florian Hahn 6cb54cfe0b [DSE] Move legacy tests to DeadStoreElimination/MemDepAnalysis.
This patch moves the tests for the old MemDepAnalysis based DSE
implementation to the MemDepAnalysis subdirectory and updates them to
pass -enable-dse-memoryssa=false.

This is in preparation for the switch to MemorySSA-backed DSE.
2020-09-04 14:38:03 +01:00
Florian Hahn 6bc5e866bd [MemCpyOpt] Account for case that MemInsertPoint == BI.
In that case, the new MemoryDef needs to be inserted *before*
MemInsertPoint.
2020-09-04 14:04:08 +01:00
Max Kazantsev 8784e9016d [Test] Range fix in test
test02_neg is not testing what it claims to test because its starting
value -1 lies outside of specified range.
2020-09-04 19:28:58 +07:00
Florian Hahn ab86e64a96 [DSE] Remove some dead code from DSE tests.
Some tests depend on DSE removing dead instructions unrelated to any
memory optimization. That's not really DSE's job, remove it.
2020-09-04 09:39:40 +01:00
Florian Hahn e2fc6a31d3 [MemCpyOpt] Preserve MemorySSA.
This patch updates MemCpyOpt to preserve MemorySSA. It uses the
MemoryDef at the insertion point of the builder and inserts the new def
after that def.

In some cases, we just modify a memory instruction. In that case, get
the defining access, then remove the memory access and add a new one.
If the defining access is in a different block, insert a new def at the
beginning of the current block, otherwise after the defining access.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86651
2020-09-04 09:05:33 +01:00
Bryan Chan a09eef113f Replace CRLF with LF; NFC 2020-09-03 15:30:08 -04:00
Wenlei He d1be928d23 SVML support for log2
Although LLVM supports vectorization of loops containing log2, it did not support using SVML implementation of it. Added support so that when clang is invoked with -fveclib=SVML now an appropriate SVML library log2 implementation will be invoked.

Follow up on: https://reviews.llvm.org/D77114

Tests:
Added unit tests to svml-calls.ll, svml-calls-finite.ll. Can be run with llvm-lint.
Created a simple c++ file that tests log2, and used clang+ to build it, and output final assembly.

Reviewed By: wenlei, craig.topper

Differential Revision: https://reviews.llvm.org/D86730
2020-09-03 11:52:29 -07:00
Sanjay Patel 2391a34f9f [InstCombine] canonicalize all commutative intrinsics with constant arg 2020-09-03 12:42:04 -04:00
Sanjay Patel 9bb3a9eebb [InstCombine] add tests for commutative intrinsics; NFC 2020-09-03 12:42:04 -04:00
Sanjay Patel bdd5bfd0e4 [IR][GVN] add/allow commutative intrinsics with >2 args
Follow-up to D86798 and rGe25449f.
2020-09-03 10:14:53 -04:00