Commit Graph

15680 Commits

Author SHA1 Message Date
Florian Hahn 1ac72a0774 [IPConstProp] Regenerate check lines.
Preparation for D84447.
2020-07-30 09:52:16 +01:00
Yuanfang Chen 8224c5047e For some tests targeting SystemZ, -march=z13 ---> -mcpu=z13
z13 is not a target. It is a CPU.
2020-07-29 19:18:01 -07:00
Juneyoung Lee 111a02decd [JumpThreading] Fold br(freeze(undef))
This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction
is only used by the branch.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84818
2020-07-30 09:38:50 +09:00
Hiroshi Yamauchi ae7589e1f1 Revert "[PGO] Include the mem ops into the function hash."
This reverts commit 120e66b341.

Due to a buildbot failure.
2020-07-29 15:04:57 -07:00
Sanjay Patel fef513f5cc [InstSimplify] fold min/max intrinsic with undef operand 2020-07-29 17:03:50 -04:00
Sanjay Patel 5cd695dd7f [InstSimplify] fold min/max with opposite of limit value 2020-07-29 17:03:50 -04:00
Hiroshi Yamauchi 120e66b341 [PGO] Include the mem ops into the function hash.
To avoid hash collisions when the only difference is in mem ops.

Differential Revision: https://reviews.llvm.org/D84782
2020-07-29 13:59:40 -07:00
Hiroshi Yamauchi cd890944ad [PGO] Remove insignificant function hash values from some tests.
This is to avoid the need to update a bunch of test files when the PGO
instrumentation function hashing changes.

Split off of D84782.

Differential Revision: https://reviews.llvm.org/D84865
2020-07-29 10:23:42 -07:00
Craig Topper 3efc978bae [LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable
This patch adds support for vectorizing these intrinsics.

Differential Revision: https://reviews.llvm.org/D84796
2020-07-29 10:23:07 -07:00
Roman Lebedev 1d51dc38d8
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
Arthur Eubanks 4a10029d7e [NewPM][Attributor] Pin tests with -attributor to legacy PM
All these tests already explicitly test against both legacy PM and NPM.

$ sed -i 's/ -attributor / -attributor -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=)
$ sed -i 's/ -attributor-cgscc / -attributor-cgscc -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=)

Now all tests in Transforms/Attributor/ pass under NPM.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84813
2020-07-29 09:02:30 -07:00
Sanjay Patel 3e8534fbc6 [InstSimplify] allow partial undef constants for vector min/max folds 2020-07-29 11:53:41 -04:00
Sanjay Patel 3c20ede18b [InstSimplify] fold integer min/max intrinsic with same args 2020-07-29 11:53:41 -04:00
David Sherwood 9ad7c980bb [SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock
In vectorizeChainsInBlock we try to collect chains of PHI nodes
that have the same element type, but the code is relying upon
the implicit conversion from TypeSize -> uint64_t. For now, I have
modified the code to ignore PHI nodes with scalable types.

Differential Revision: https://reviews.llvm.org/D83542
2020-07-29 16:29:19 +01:00
Juneyoung Lee 672df0fc67 [InstSimplify] add tests for expandCommutativeBinOp; NFC 2020-07-29 23:21:39 +09:00
David Green 9ddb28964c [ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores
This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.

This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.

It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)

Original Patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79163
2020-07-29 13:41:34 +01:00
Matt Arsenault c230965ccf AMDGPU: Make saturating add/sub legal for DAG path 2020-07-29 08:27:31 -04:00
Florian Hahn 2aa2c40d23 [NewGVN] Require asserts for crashing tests.
Without asserts, it might take a long time for the tests to crash.
Only run them with assert builds.
2020-07-29 12:41:05 +01:00
Yevgeny Rouban 5d6cd61904 [LoopSimplifyCFG] Delete landing pads in dead exit blocks
In addition to removing phi nodes this patch removes any
landing pad that the dead exit block might have. Without
this fix Verifier complains about a new switch instruction
jumps to a block with a landing pad.

Differential Revision: https://reviews.llvm.org/D84320
2020-07-29 18:36:51 +07:00
Juneyoung Lee 1ae766e3e0 [InstCombine] Add tests for select(freeze(undef)); NFC 2020-07-29 15:27:09 +09:00
Florian Hahn 8e67982384 [NewGVN] Add test cases for remaining known issues.
This patch adds IR tests for the known NewGVN issues. The intention is
that adding them now will make it easier to keep track of fixes.
2020-07-28 20:33:04 +01:00
Sanjay Patel 3fb13b8484 [InstSimplify] allow undefs in icmp with vector constant folds
This is the main icmp simplification shortcoming seen in D84655.

Alive2 agrees that the basic examples are correct at least:

define <2 x i1> @src(<2 x i8> %x) {
%0:
  %r = icmp sle <2 x i8> { undef, 128 }, %x
  ret <2 x i1> %r
}
=>
define <2 x i1> @tgt(<2 x i8> %x) {
%0:
  ret <2 x i1> { 1, 1 }
}
Transformation seems to be correct!

define <2 x i1> @src(<2 x i32> %X) {
%0:
  %A = or <2 x i32> %X, { 63, 63 }
  %B = icmp ult <2 x i32> %A, { undef, 50 }
  ret <2 x i1> %B
}
=>
define <2 x i1> @tgt(<2 x i32> %X) {
%0:
  ret <2 x i1> { 0, 0 }
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/omt2ee
https://alive2.llvm.org/ce/z/GW4nP_

Differential Revision: https://reviews.llvm.org/D84762
2020-07-28 15:13:53 -04:00
Sanjay Patel f75cf240d6 [InstCombine] avoid crashing on vector constant expression (PR46872) 2020-07-28 15:02:36 -04:00
Sanjay Patel 496fc3f196 [InstSimplify] add tests for icmp with partial undef constant; NFC 2020-07-28 15:00:33 -04:00
Juneyoung Lee 4c9af6d0e0 [JumpThreading] Add a basic support for freeze instruction
This patch adds a basic support for freeze instruction to JumpThreading
by making ComputeValueKnownInPredecessorsImpl look into its operand.

Reviewed By: efriedma, nikic

Differential Revision: https://reviews.llvm.org/D84598
2020-07-29 03:12:14 +09:00
Juneyoung Lee 4887495a3e [JumpThreading] Add tests that have a cast of freeze and vice versa 2020-07-29 02:16:44 +09:00
Arthur Eubanks 2ca6c422d2 [FunctionAttrs] Rename functionattrs -> function-attrs
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84694
2020-07-28 09:09:13 -07:00
Florian Hahn be2ea29ee1 [SCEV] Add additional tests.
Increase test coverage for upcoming changes to how SCEV deals with LCSSA
phis.
2020-07-28 16:15:57 +01:00
Jinsong Ji d28f86723f Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit bf544fa1c3.

Fixed the typo in PPCInstrInfo.cpp.
2020-07-28 14:00:11 +00:00
Luofan Chen 5ee07dc53f [Attributor] Track AA dependency using dependency graph
This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78861
2020-07-28 18:02:49 +08:00
Roman Lebedev e40315d2b4
[GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives
While this doesn't appear to help with the perf issue being exposed by
D84108, the function as-is is very weird, convoluted, and what's worse,
recursive.

There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`,
tri-state choice is enough. We don't even ever check for that state.

The basic idea here is that we need to perform a depth-first traversal
of the predecessors of the basic block in question, either finding a
preexisting state for the block in a map, or inserting a "placeholder"
`SpeculativelyAvaliable`,

If we encounter an `Unavaliable` block, then we need to give up search,
and back-propagate the `Unavaliable` state to the each successor of
said block, more specifically to the each `SpeculativelyAvaliable`
we've just created.

However, if we have traversed entirety of the predecessors and have not
encountered an `Unavaliable` block, then it must mean the value is fully
available. We could update each inserted `SpeculativelyAvaliable` into
a `Avaliable`, but we don't need to, as assertion excersizes,
because we can assume that if we see an `SpeculativelyAvaliable` entry,
it is actually `Avaliable`, because during the time we've produced it,
if we would have found that it has an `Unavaliable` predecessor,
we would have updated it's successors, including this block,
into `Unavaliable`

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D84181
2020-07-28 10:19:28 +03:00
Wei Mi a23f62343c Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.

The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.

In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.

Differential Revision: https://reviews.llvm.org/D81981
2020-07-27 20:17:40 -07:00
Jinsong Ji bf544fa1c3 Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit adffce7153.

This is breaking test-suite, revert while investigation.
2020-07-27 21:07:00 +00:00
Jinsong Ji adffce7153 [PowerPC] Remove QPX/A2Q BGQ/BGP CNK support
Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html
no one is making use of QPX/A2Q/BGQ/BGP CNK anymore.

This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang,
CNK support in openmp/polly.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D83915
2020-07-27 19:24:39 +00:00
Shinji Okumura bef19abcf7 [Attributor][NFC] Add tests to noalias.ll
Summary: Add tests to `noalias.ll` to make changes in D84665 clear

Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis

Subscribers: uenoku, kuter, bbn, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84688
2020-07-28 03:53:06 +09:00
Roman Lebedev 1da9834557
[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857)
SplitBlockPredecessors() can not split blocks that have such terminators,
and in two other places we already ensure that we don't end up calling
SplitBlockPredecessors() on such blocks. Do so in one more place.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46857
2020-07-27 15:39:03 +03:00
Sanjay Patel 1ebcf03551 [InstSimplify] add tests for min/max intrinsics; NFC 2020-07-27 08:26:27 -04:00
Juneyoung Lee 6701c0bf73 [JumpThreading] Add a test case that has a phi with undef; NFC 2020-07-27 19:08:45 +09:00
Juneyoung Lee c891f519e1 [JumpThreading] Add a test that threads jumps with frozen branch conditions 2020-07-27 19:04:50 +09:00
Juneyoung Lee e1eacf27c6 [InstCombine] Fold freeze into phi if one operand is not undef
This patch adds folding freeze into phi if it has only one operand to target.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84601
2020-07-27 17:07:27 +09:00
Juneyoung Lee 194a4beedd [InstCombine] Add more tests to freeze-phi.ll; NFC 2020-07-27 09:43:00 +09:00
Juneyoung Lee ab4e1be7ab [InstCombine] Add a test for folding freeze into phi; NFC 2020-07-27 02:24:00 +09:00
Sanjay Patel 0481e1ae3c [InstSimplify] fold integer min/max intrinsics with limit constant 2020-07-26 09:41:54 -04:00
Sanjay Patel c6cf71107a [InstSimplify] add tests for min/max intrinsics; NFC 2020-07-26 09:04:37 -04:00
Sanjay Patel b89ae102e6 [InstSimplify] fold fcmp using isKnownNeverInfinity + isKnownNeverNaN
Follow-up to D84035 / rG7393d7574c09.
This sidesteps a question of FMF/poison on fcmp raised in PR46077:
http://bugs.llvm.org/PR46077

https://alive2.llvm.org/ce/z/TCsyzD
  define i1 @src(float %x) {
  %0:
    %x42 = fadd nnan ninf float %x, 42.000000
    %r = fcmp ueq float %x42, inf
    ret i1 %r
  }
  =>
  define i1 @tgt(float %x) {
  %0:
    ret i1 0
  }
  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/FQaH7a
  define i1 @src(i8 %x) {
  %0:
    %cast = uitofp i8 %x to float
    %r = fcmp one float inf, %cast
    ret i1 %r
  }
  =>
  define i1 @tgt(i8 %x) {
  %0:
    ret i1 1
  }
  Transformation seems to be correct!
2020-07-26 09:04:37 -04:00
Sanjay Patel 912e9e5262 [InstSimplify] add tests for fcmp with infinity constant; NFC 2020-07-26 09:04:36 -04:00
Juneyoung Lee 920e267974 [JumpThreading] Add a test for D84598; NFC 2020-07-26 22:00:01 +09:00
Juneyoung Lee 9f074214b7 [ValueTracking] Instruction::isBinaryOp should be used for constexprs
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84596
2020-07-26 21:48:51 +09:00
Juneyoung Lee 02dadab1b4 NFC; add an example that subtracts pointers to two global vars 2020-07-26 20:47:33 +09:00
Nikita Popov f4199b8f0b [SCCP] Add assume non null test (NFC) 2020-07-25 16:02:15 +02:00
Nikita Popov ad16e71c95 Reapply [SCCP] Directly remove non-feasible edges
Reapply with DTU update moved after CFG update, which is a
requirement of the API.

-----

Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)

Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.

Differential Revision: https://reviews.llvm.org/D84264
2020-07-25 14:52:35 +02:00
Florian Hahn 3c1476d26c [IPSCCP] Drop argmemonly after replacing pointer argument.
This patch updates IPSCCP to drop argmemonly and
inaccessiblemem_or_argmemonly if it replaces a pointer argument.

Fixes PR46717.

Reviewers: efriedma, davide, nikic, jdoerfert

Reviewed By: efriedma, jdoerfert

Differential Revision: https://reviews.llvm.org/D84432
2020-07-25 11:52:14 +01:00
Rong Xu 1dd39b1133 [PGO] Fix incorrect function entry count
Function entry count might be zero after the profile counts reset and
before reentry to the function.

Zero profile entry count is very bad as the profile count from BFI will
be wrong.

A simple fix is to set the profile entry count to 1 if there are
non-zero profile counts in this function.

Differential Revision: https://reviews.llvm.org/D84378
2020-07-24 17:39:55 -07:00
Rong Xu 31bd15c562 [PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction
Skip profile count promotion if any of the ExitBlocks contains a ret
instruction. This is to prevent dumping of incomplete profile -- if the
the loop is a long running loop and dump is called in the middle
of the loop, the result profile is incomplete.

ExitBlocks containing a ret instruction is an indication of a long running
loop -- early exit to error handling code.

Differential Revision: https://reviews.llvm.org/D84379
2020-07-24 17:38:31 -07:00
Rong Xu dcf1bca0de Revert "[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction"
This reverts commit 867ef4472d.
2020-07-24 17:33:49 -07:00
Rong Xu 867ef4472d [PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction
Forgot including the tests in the commit 6fdc6f6c7d.
2020-07-24 17:23:33 -07:00
Johannes Doerfert aa09db495a [SROA] Teach promote to register about droppable instructions
This is the second of two patches to address PR46753. We basically allow
SROA to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The (transitive) uses are replaced by
`undef` in the droppable instructions.

See also D83976.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D83978
2020-07-24 15:15:39 -05:00
Johannes Doerfert ce8928f2e4 [Mem2Reg] Teach promote to register about droppable instructions
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D83976
2020-07-24 15:15:38 -05:00
Johannes Doerfert ce2d69b557 [SROA][Mem2Reg] Do not crash on alloca + addrspacecast
SROA knows that it can look through addrspacecast but
PromoteMemoryToRegister did not handle them. This caused an assertion
error for the test case, exposed while running
`Transforms/PhaseOrdering/inlining-alignment-assumptions.ll` with D83978
applied.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84085
2020-07-24 15:15:38 -05:00
Arthur Eubanks 9bb6ce78be Rename scoped-noalias -> scoped-noalias-aa
Summary: To match NewPM name. Also the new name is clearer and more consistent.

Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84542
2020-07-24 12:14:27 -07:00
Roman Lebedev 3319d05630
[NFC][GVN] Improve loadpre-missed-opportunity.ll test again thanks to @fhahn 2020-07-24 20:32:51 +03:00
Florian Hahn 1c7c69c795 [ValueTracking] Check for ConstantExpr before using recursive helpers.
Make sure we do not call
constainsConstantExpression/containsUndefElement on ConstantExpression,
which is not supported.

In particular, containsUndefElement/constainsConstantExpression are only
supported on constants which are supported by getAggregateElement.

Unfortunately there's no convenient way to check if a constant supports
getAggregateElement, so just check for non-constantexpressions with
vector type. Other users of those functions do so too.

Reviewers: spatel, nikic, craig.topper, lebedev.ri, jdoerfert, aqjune

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84512
2020-07-24 17:37:09 +01:00
Roman Lebedev 804622053a
[NFC][GVN] Clean loadpre-missed-opportunity.ll test some more 2020-07-24 12:44:22 +03:00
Florian Hahn 2c1799f892 [IPSCCP] Add another test case with argmemonly callsite attributes. 2020-07-24 10:13:51 +01:00
Fangrui Song 4637daa990 Revert D84264 "[SCCP] Directly remove non-feasible edges" & 5db5b4bc43
It breaks stage-2 build. Clang crashed when compiling
llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp

llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node
2020-07-23 17:51:48 -07:00
Roman Lebedev 0a5971139a
[NFC][GVN] Add a (horrible) test for D84181 demonstrating non-NFC'ness 2020-07-24 01:28:23 +03:00
Sidharth Baveja 38a8217931 [Loop Fusion] Integrate Loop Peeling into Loop Fusion (re-land after fixing ASAN build failures)
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.

Here is a simple scenario peeling can be used in loop fusion:

for (i = 0; i < 10; ++i)
  a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
  b[j] = b[j] + 5;

Here is we can make use of peeling, and then fuse the two loops together. We
can peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.

a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
  a[i] = a[i] + 3;
  b[i] = b[i] + 5;
}

Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.

Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song)

Differential Revision: https://reviews.llvm.org/D82927
2020-07-23 21:02:04 +00:00
Nikita Popov 183342c0a9 [SCCP] Add another switch+phi test (NFC) 2020-07-23 21:51:09 +02:00
Nikita Popov 9394c3ec88 [SCCP] Directly remove non-feasible edges
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)

Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.

Differential Revision: https://reviews.llvm.org/D84264
2020-07-23 20:32:57 +02:00
Nikita Popov def48b0e88 [PredicateInfo][SCCP] Remove assertion (PR46814)
As long as RenamedOp is not guaranteed to be accurate, we cannot
assert here and should just return false. This was already done
for the other conditions in this function.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.
2020-07-23 19:36:51 +02:00
Florian Hahn 0f80d598b0 [IPSCCP] Add test case for PR46717 for argmemonly handling. 2020-07-23 17:25:26 +01:00
Sanjay Patel cfe40acd16 [VectorCombine] add tests for load vectorization; NFC 2020-07-23 11:24:04 -04:00
Florian Hahn 82e35197e6 [LSR] Re-generate check lines for test.
The test is quite frafile, as the check lines match IR numbers and it is
not obvious why only a very small subset is checked.

Re-generate check lines, so further changes are more obvious.
2020-07-23 13:53:53 +01:00
Florian Hahn 09c96a31ef [LoopIdiom] Add additional test cases. 2020-07-23 13:53:26 +01:00
Hamilton Tobon Mosquera 6f0d99d2b9 [OpenMPOpt] Regression test for hiding latency of H2D mem transfers 2020-07-22 20:02:54 -05:00
Fangrui Song 27650ec554 Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes."
This reverts commit 4a539faf74.

There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang:

```
(gdb) bt
llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) ()
llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const*, llvm::SCEV
const*, llvm::SCEV const*, unsigned int) ()
```

(The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace)

```
 23│    0x000055555dba0961 <+65>:    nopw   %cs:0x0(%rax,%rax,1)
 24│    0x000055555dba096b <+75>:    nopl   0x0(%rax,%rax,1)
 25│    0x000055555dba0970 <+80>:    mov    %rsi,%rbx
 26│    0x000055555dba0973 <+83>:    mov    0x8(%rsi),%rsi  # %rsi=-1 -> SIGSEGV
 27│    0x000055555dba0977 <+87>:    cmp    %r15,(%rbx)
 28│    0x000055555dba097a <+90>:    je     0x55555dba0a76 <__llvm_profile_instrument_target+342>
```
2020-07-22 16:08:25 -07:00
Rong Xu 50da55a585 [PGO] Supporting code for always instrumenting entry block
This patch includes the supporting code that enables always
instrumenting the function entry block by default.

This patch will NOT the default behavior.

It adds a variant bit in the profile version, adds new directives in
text profile format, and changes llvm-profdata tool accordingly.

This patch is a split of D83024 (https://reviews.llvm.org/D83024)
Many test changes from D83024 are also included.

Differential Revision: https://reviews.llvm.org/D84261
2020-07-22 15:01:53 -07:00
Nikita Popov e20b3079c1 [SCCP] Add additional multi-edge + phi tests (NFC) 2020-07-22 22:10:23 +02:00
Nikita Popov 33f6542014 [SCCP] Regenerate test checks (NFC)
And adjust the indbrtest4 test to actually test what it's supposed
to. BB1 is supposed to be eliminated here, but isn't, because
BB0 still branches to it. This was lost due to the incomplete CHECK
lines.
2020-07-22 22:10:23 +02:00
Nikita Popov eae6bb3807 [SCCP] Add multi-edge switch + phi test case (NFC) 2020-07-22 20:28:22 +02:00
Anton Afanasyev 56c92bf4b7 [SLP][Test] Precommit tests for D83779. NFC. 2020-07-22 18:25:45 +03:00
Sebastian Neubauer 2a6c871596 [InstCombine] Move target-specific inst combining
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728
2020-07-22 15:59:49 +02:00
Alexey Bataev be37f13e2d [SLP]Add an extra test for vectorization of non-pow-2 trees, NFC. 2020-07-22 09:13:30 -04:00
Sebastian Neubauer 2c659082bd [AMDGPU] Don't combine memory intrs to v3i16
v3i16 and v3f16 currently cannot be legalized and lowered so they should
not be emitted by inst combining.

Moved the check down to still allow extracting 1 or 2 elements via the dmask.

Fixes image intrinsics being combined to return v3x16.

Differential Revision: https://reviews.llvm.org/D84223
2020-07-22 12:44:01 +02:00
Max Kazantsev 360ab70712 [SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls
We do not thread blocks with convergent calls, but this check was missing
when we decide to insert PR Phis into it (which we only do for threading).

Differential Revision: https://reviews.llvm.org/D83936
Reviewed By: nikic
2020-07-22 13:53:50 +07:00
Chen Zheng e8425b27fe [PowerPC] add store (load float*) pattern to isProfitableToHoist
store (load float*) can be optimized to store(load i32*) in InstCombine pass.

Add store (load float*) to isProfitableToHoist to make sure we don't break
the opt in InstCombine pass.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D82341
2020-07-21 20:55:13 -04:00
Juneyoung Lee ace0bf7490 [ValueTracking] Fix incorrect handling of canCreateUndefOrPoison
.. in isGuaranteedNotToBeUndefOrPoison.

This caused early exit of isGuaranteedNotToBeUndefOrPoison, making it return
imprecise result.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84251
2020-07-22 09:31:16 +09:00
Nikita Popov ef868a848e [SCCP] Add switch+range tests (NFC) 2020-07-21 23:07:50 +02:00
Fangrui Song 8a268bec1b Revert D82927 "[Loop Fusion] Integrate Loop Peeling into Loop Fusion"
This reverts commit bb8850d34d.

It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build.

LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.
2020-07-21 12:24:50 -07:00
Hiroshi Yamauchi 7bedae7dee [PGO][PGSO] Add profile guided size optimization to loop vectorization legality. 2020-07-21 11:16:36 -07:00
Sidharth Baveja bb8850d34d [Loop Fusion] Integrate Loop Peeling into Loop Fusion
Summary:
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.

Here is a simple scenario peeling can be used in loop fusion:

for (i = 0; i < 10; ++i)
  a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
  b[j] = b[j] + 5;

Here is we can make use of peeling, and then fuse the two loops together. We can
peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.

a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
  a[i] = a[i] + 3;
  b[i] = b[i] + 5;
}

Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.

Author: sidbav (Sidharth Baveja)

Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour

Reviewed By: bmahjour

Subscribers: bmahjour, mgorny, hiraditya, zzheng

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D82927
2020-07-21 15:59:14 +00:00
Jay Foad 5e5bda74b6 [IR] Simplify Use::swap. NFCI.
The new implementation makes it clear that there are exactly two
conditional stores (after the initial no-op optimization). By contrast
the old implementation had seven conditionals, some hidden inside other
functions.

This commit can change the order of operands in operand lists, hence the
tweak to one test case.

Differential Revision: https://reviews.llvm.org/D80116
2020-07-21 12:15:12 +01:00
Florian Hahn 752fea7c27 [SCCP] Add range metadata to call sites with known return ranges.
If we inferred a range for the function return value, we can add !range
at all call-sites of the function, if the range does not include undef.

Reviewers: efriedma, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D83952
2020-07-21 10:06:54 +01:00
Sanjay Patel 750f4c591d [InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701)
We might want to also allow trunc of the shift amount, but that seems less likely?

  define i32 @src(i32 %x, i1 %y) {
  %0:
    %rem = and i1 %y, 1
    %cmp = icmp eq i1 %rem, 0
    %sh_prom = zext i1 %rem to i32
    %sub = sub nsw nuw i1 0, %rem
    %sh_prom1 = zext i1 %sub to i32
    %shr = lshr i32 %x, %sh_prom1
    %shl = shl i32 %x, %sh_prom
    %or = or i32 %shl, %shr
    %r = select i1 %cmp, i32 %x, i32 %or
    ret i32 %r
  }
  =>
  define i32 @tgt(i32 %x, i1 %y) {
  %0:
    %t = zext i1 %y to i32
    %r = fshl i32 %x, i32 %x, i32 %t
    ret i32 %r
  }

  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/xgMvE3

http://bugs.llvm.org/PR45701
2020-07-20 16:18:11 -04:00
Sanjay Patel 92ec0c5da6 [InstCombine] add tests for funnel shift/rotate with narrow shift amount; NFC 2020-07-20 16:18:11 -04:00
Florian Hahn f13a59bcff [Matrix] Use TileInfo to create tiled loop nest for matrix multiply.
This patch uses the TileInfo introduced in D77550 to generate a loop
nest for tiled matrix multiplication, instead of generating the
unrolled code for the whole multiplication. This makes code-generation
more scalable for larger matrixes.

Initially loops are only used if both the number of rows and columns are
divisible by the tile size. Other cases will be added as follow-up.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D81308
2020-07-20 21:11:53 +01:00
Mircea Trofin 70f8d0ac8a [llvm] Development-mode InlineAdvisor
Summary:
This is the InlineAdvisor used in 'development' mode. It enables two
scenarios:

 - loading models via a command-line parameter, thus allowing for rapid
 training iteration, where models can be used for the next exploration
 phase without requiring recompiling the compiler. This trades off some
 compilation speed for the added flexibility.

 - collecting training logs, in the form of tensorflow.SequenceExample
 protobufs. We generate these as textual protobufs, which simplifies
 generation and testing. The protobufs may then be readily consumed by a
 tensorflow-based training algorithm.

To speed up training, training logs may also be collected from the
'default' training policy. In that case, this InlineAdvisor does not
use a model.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html

Reviewers: jdoerfert, davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83733
2020-07-20 11:01:56 -07:00
Matt Arsenault 5e999cbe8d IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.
2020-07-20 10:23:09 -04:00
Florian Hahn dc1087d408 [Matrix] Add minimal lowering pass that only requires TTI.
This patch adds a new variant of the matrix lowering pass that only does
a minimal lowering and only depends on TTI. The main purpose of this pass
is to have a pass with minimal dependencies to run as part of the backend
pipeline.

At the moment, the only difference to the regular lowering pass is that it
does not support remarks. But in subsequent patches add support for tiling
to the lowering pass which will require more analysis, which we do not want
to run in the backend, as the lowering should happen in the middle-end in
practice and running it in the backend is mostly for convenience when
running llc.

Reviewers: anemet, Gerolf, efriedma, hfinkel

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76867
2020-07-20 11:16:11 +01:00
Roman Lebedev 04b729d076
[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag
Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.

D84108 will hopefully make hoisting off-by-default too.
2020-07-20 10:29:57 +03:00
Roman Lebedev 3de4166325
[NFC][SimplifyCFG] Add standalone test for common code hoisting xform option
Also, move one test into it's correct place
2020-07-20 10:29:29 +03:00
sstefan1 e3d646c699 [Attributor][NFC] applying update_test_checks with --check-attributes
Summary:
All tests are updated, except wrapper.ll since it is not working nicely
with newly created functions.

Reviewers: jdoerfert, uenoku, baziotis, homerdin

Subscribers: arphaman, jfb, kuter, bbn, okura, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84130
2020-07-20 08:17:34 +02:00
Juneyoung Lee 30201d3b61 [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison use canCreateUndefOrPoison
This patch adds support more operations.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D83926
2020-07-20 09:21:39 +09:00
Wenlei He d41d952be9 Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks"
This reverts commit 2d6ecfa168.
2020-07-19 08:49:04 -07:00
Wenlei He 2d6ecfa168 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
Summary:
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

Subscribers: mgorny, aprantl, hiraditya, llvm-commits

Tags: #llvm

Resubmit for https://reviews.llvm.org/D84086
2020-07-19 08:21:05 -07:00
Nikita Popov c6e13667e7 [PredicateInfo] Add a method to interpret predicate as cmp constraint
Both users of predicteinfo (NewGVN and SCCP) are interested in
getting a cmp constraint on the predicated value. They currently
implement separate logic for this. This patch adds a common method
for this in PredicateBase.

This enables a missing bit of PredicateInfo handling in SCCP: Now
the predicate on the condition itself is also used. For switches
it means we know that the switched-on value is the same as the case
value. For assumes/branches we know that the condition is true or
false.

Differential Revision: https://reviews.llvm.org/D83640
2020-07-19 15:34:32 +02:00
Sanjay Patel 7393d7574c [InstSimplify] fold fcmp with infinity constant using isKnownNeverInfinity
This is a step towards trying to remove unnecessary FP compares
with infinity when compiling with -ffinite-math-only or similar.
I'm intentionally not checking FMF on the fcmp itself because
I'm assuming that will go away eventually.
The analysis part of this was added with rGcd481136 for use with
isKnownNeverNaN. Similarly, that could be an enhancement here to
get predicates like 'one' and 'ueq'.

Differential Revision: https://reviews.llvm.org/D84035
2020-07-19 09:24:52 -04:00
Nikita Popov d12ec0f752 [InstCombine] Fix store merge worklist management (PR46680)
Fixes https://bugs.llvm.org/show_bug.cgi?id=46680.

Just like insertions through IRBuilder, InsertNewInstBefore()
should be using the deferred worklist mechanism, so that processing
of newly added instructions is prioritized.

There's one side-effect of the worklist order change which could be
classified as a regression. An add op gets pushed through a select
that at the time is not a umax. We could add a reverse transform
that tries to push adds in the reverse direction to restore a min/max,
but that seems like a sure way of getting infinite loops... Seems
like something that should best wait on min/max intrinsics.

Differential Revision: https://reviews.llvm.org/D84109
2020-07-19 15:05:45 +02:00
Nikita Popov 13ae440de4 [InstCombine] Add test for PR46680 (NFC) 2020-07-18 23:37:16 +02:00
Joseph Huber 3bbbe4c4b6 [OpenMP] Add Additional Function Attribute Information to OMPKinds.def
Summary:
This patch adds more function attribute information to the runtime function definitions in OMPKinds.def. The goal is to provide sufficient information about OpenMP runtime functions to perform more optimizations on OpenMP code.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits yaxunl guansong sstefan1 llvm-commits

Tags: #OpenMP #clang #LLVM

Differential Revision: https://reviews.llvm.org/D81031
2020-07-18 12:55:50 -04:00
Roman Lebedev 8d487668d0
[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands.
Yes, if operands are non-positive this comes at the extra cost
of two extra negations. But  a. division is already just
ridiculously costly, two more subtractions can't hurt much :)
and  b. we have better/more analyzes/folds for an unsigned division,
we could end up narrowing it's bitwidth, converting it to lshr, etc.

This is essentially a take two on 0fdcca07ad,
which didn't fix the potential regression i was seeing,
because ValueTracking's computeKnownBits() doesn't make use
of dominating conditions in it's analysis.
While i could teach it that, this seems like the more general fix.

This big hammer actually does catch said potential regression.

Over vanilla test-suite + RawSpeed + darktable
(10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more
(+2%) SDiv's, the total instruction count at the end of middle-end pipeline
is only +6, so out of +10 extra negations, ~half are folded away,
and asm instr count is only +1, so practically speaking all extra
negations are folded away and are therefore free.
Sadly, all these new UDiv's remained, none folded away.
But there are two less basic blocks.

https://rise4fun.com/Alive/VS6

Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 C0, C1

Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0

Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0

Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 -C0, -C1
2020-07-18 17:59:56 +03:00
Roman Lebedev 7b16fd8a25
[NFC][CVP] Add tests for possible sdiv->udiv where operands are not non-negative
Currently that fold requires both operands to be non-negative,
but the only real requirement for the fold is that we must know
the domains of the operands.
2020-07-18 17:59:31 +03:00
David Green 2f4c3e8097 [LV] Add additional InLoop redution tests. NFC 2020-07-18 12:14:23 +01:00
Chen Zheng bb07eb944f [PowerPC]add testcase for adding store (load float*) pattern, nfc 2020-07-17 22:57:08 -04:00
Chen Zheng 6d247f980d [SCEV][IndVarSimplify] insert point should not be block front.
Recommit after removing the unused cast instructions.

Differential Revision:  https://reviews.llvm.org/D80975
2020-07-17 22:25:10 -04:00
Arthur Eubanks 0dfa4a83fa Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality."
This reverts commit 30c382a7c6.

See https://crbug.com/1106813.
2020-07-17 16:47:41 -07:00
Eric Christopher 020545d386 Temporarily Revert "[OpenMP] Add Additional Function Attribute Information to OMPKinds.def"
as it's causing a few unused variable warnings via the macro instantiation:

sources/llvm-project/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def:649:17: error: unused variable 'InaccessibleOnlyAttrs' [-Werror,-Wunused-variable]
__OMP_ATTRS_SET(InaccessibleOnlyAttrs,
                ^

This reverts commit 09fe0c5ab9.
2020-07-17 15:05:42 -07:00
Eric Christopher ae08dbc673 Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks"
as it is failing the inline-replay.ll test as well as sanitizers/Werror
from returning a stack local variable.

This reverts commit 029946b112.
2020-07-17 14:58:01 -07:00
Joseph Huber 09fe0c5ab9 [OpenMP] Add Additional Function Attribute Information to OMPKinds.def
Summary:
This patch adds more function attribute information to the runtime function definitions in OMPKinds.def. The goal is to provide sufficient information about OpenMP runtime functions to perform more optimizations on OpenMP code.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits yaxunl guansong sstefan1 llvm-commits

Tags: #OpenMP #clang #llvm

Differential Revision: https://reviews.llvm.org/D81031
2020-07-17 17:54:01 -04:00
Wenlei He 029946b112 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
Summary:
This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed.

Reviewers: davidxl, mtrofin, wmi, hoy

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83743
2020-07-17 13:30:47 -07:00
Xinan Jiang d8e0baf29d [InstCombine] Fix typo in comment.
Reviewers: fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D83951
2020-07-17 20:57:45 +01:00
Roman Lebedev 0fdcca07ad
[InstCombine] Fold X sdiv (-1 << C) -> -(X u>> Y) iff X is non-negative
This is the one i'm seeing as missed optimization,
although there are likely other possibilities, as usual.

There are 4 variants of a general sdiv->udiv fold:
https://rise4fun.com/Alive/VS6

Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 C0, C1

Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0

Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0

Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 -C0, -C1


If we really don't like sdiv (more than udiv that is),
and are okay with increasing instruction count (2 new negations),
and we ensure that we don't undo the fold,
then we could just implement these..
2020-07-17 22:50:09 +03:00
Roman Lebedev 66b66988e6
[NFC][InstCombine] Add some tests with sdiv-by-negative-power-of-two 2020-07-17 22:50:09 +03:00
George Rokos 04713f8aa6 Added missing API call to OpenMP test 2020-07-17 10:40:11 -07:00
Sanjay Patel acbc688263 [InstSimplify] add tests for fcmp with infinity; NFC 2020-07-17 11:51:41 -04:00
Sjoerd Meijer 7ebc6bed84 [ARM][MVE] Reorg of the LV tail-folding tests
It was getting difficult to see which test was in which file, so this
reorganises the test files so that now all filenames start with tail-folding-*
followed by a more descriptive name what that group of tests check.
2020-07-17 15:54:15 +01:00
Sidharth Baveja 11e879d4f1 [Loop Simplify] Resolve an issue where metadata is not applied to a loop latch.
Summary:
This patch resolves an issue where the metadata of a loop is not added to the
new loop latch, and not removed from the old loop latch. This issue occurs in
the SplitBlockPredecessors function, which  adds a new block in a loop, and
in the case that the block passed into this function is the header of the loop,
the loop can be modified such that the latch of the loop is replaced.
This patch applies to the Loop Simplify pass since it ensures that each loop
has exit blocks which only have predecessors that are inside of the loop. In
the case that this is not true, the pass will create a new exit block for the
loop. This guarantees that the loop preheader/header will dominate the exit blocks.

Author: sidbav (Sidharth Baveja)

Reviewers: asbirlea (Alina Sbirlea), chandlerc (Chandler Carruth), Whitney (Whitney Tsang), bmahjour (Bardia Mahjour)

Reviewed By:  asbirlea (Alina Sbirlea)

Subscribers: hiraditya (Aditya Kumar), llvm-commits

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D83869
2020-07-17 14:02:14 +00:00
Sam Parker ed48e6fa65 [NFC][ARM] Add SimplifyCFG test 2020-07-17 14:07:40 +01:00
Anna Welker 23c9534515 [LV] Enable the LoopVectorizer to create pointer inductions
This patch enables the LoopVectorizer to build a phi of pointer
type and provide the vector loads and stores with vector type
getelementptrs built from the pointer induction variable, which
produces much less instructions than the previous approach of
creating scalar getelementpointers and glue them together to a
vector.

Differential Revision: https://reviews.llvm.org/D81267
2020-07-17 13:35:07 +01:00
Max Kazantsev df6e185e8f [InstCombine][Test] Test for fix of replacing select with Phis when branch has the same labels
An additional test that allows to check the correctness of handling the case of the same
branch labels in the dominator when trying to replace select with phi-node.

Patch By: Kirill Polushin
Differential Revision: https://reviews.llvm.org/D84006
Reviewed By: mkazantsev
2020-07-17 17:16:28 +07:00
Juneyoung Lee 582901d0b5 [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison consider noundef
This patch adds support for noundef arguments.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D83752
2020-07-17 12:53:08 +09:00
Juneyoung Lee cd4953246b Add a test for D83752 2020-07-17 12:50:40 +09:00
Jon Roelofs a0537fc35f [SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build
SimplifyCFG was incorrectly reporting to the pass manager that it had not made
changes after folding away a PHI.  This is detected in the EXPENSIVE_CHECKS
build when the function's hash changes.

Differential Revision: https://reviews.llvm.org/D83985
2020-07-16 15:34:41 -06:00
Roman Lebedev b636e7d1fc
[NFC][PhaseOrdering] Add a test demonstrating pitfails of common code hoisting on loop rotation
Depending on the -rotation-max-header-size=?,
hoisting common code early makes loop rotation impossible.
2020-07-16 23:53:26 +03:00
Mircea Trofin 9870f77441 [llvm] Moved InlineSizeEstimatorAnalysis test to .ll
Summary:
Following guidance in
https://llvm.org/docs/TestingGuide.html#testing-analysis

Reviewers: mehdi_amini

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83918
2020-07-16 12:25:16 -07:00
Eric Christopher 7bfaa40086 Temporarily Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions"
due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753.

An SROA change soon may obviate some of these problems.

This reverts commit 8d09f20798.
2020-07-16 11:54:04 -07:00
Matt Arsenault 0347039a6e ValueTracking: Fix isKnownNonZero for non-0 null pointers for byval
The IR doesn't have a proper concept of invalid pointers, and "null"
constants are just all zeros (though it really needs one).

I think it's not possible to break this for AMDGPU due to the copy
semantics of byval. If you have an original stack object at 0, the
byval copy will be placed above it so I don't think it's really
possible to hit a 0 address.
2020-07-16 13:50:49 -04:00
Florian Hahn 037c812191 [SCCP] Add test cases for adding !range to call-sites. 2020-07-16 15:34:58 +01:00
Max Kazantsev 989ee11df6 [Test] Add test that shows how SimplifyCFG may insert redunant Phi
It happens when a block cannot be threaded because of a convergent function.
2020-07-16 16:23:11 +07:00
Max Kazantsev 90798e09e2 Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG"
This reverts commit b893822e32.

+ Clang test fixes
+ Insertion point fix for landing pads
2020-07-16 16:09:08 +07:00
Max Kazantsev b893822e32 Revert "[InstCombine] Simplify boolean Phis with const inputs using CFG"
This reverts commit 00472067c3.

Need to fix failing clang tests.
2020-07-16 12:58:39 +07:00
Max Kazantsev 00472067c3 [InstCombine] Simplify boolean Phis with const inputs using CFG
This patch adds simplification for pattern:
```
  if (cond)
  /       \
 ...      ...
  \       /
p = phi [true] [false]
...
br p, succ_1, succ_2
```
If we can prove that top block's branches dominate respective
inputs of a block that has a Phi with constant inputs, we can
use the branch condition (maybe inverted) instead of Phi.
This will make proofs of implication for further jump threading
more transparent.

Differential Revision: https://reviews.llvm.org/D81375
Reviewed By: xbolva00
2020-07-16 12:06:10 +07:00
Craig Topper 00f3579aea Revert "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and subsequent patches
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.

1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
2020-07-15 22:02:33 -07:00
George Rokos 911fcf382f Fix lit test related to declare mapper patch D67833. 2020-07-15 20:31:36 -07:00
Hongtao Yu f3731d34fa [LoopUnroll] Update branch weight for remainder loop
Unrolling a loop with compile-time unknown trip count results in a remainder loop. The remainder loop executes the remaining iterations of the original loop when the original trip count is not a multiple of the unroll factor. For better profile counts maintenance throughout the optimization pipeline, I'm assigning an artificial weight to the latch branch of the remainder loop.

A remainder loop runs up to as many times as the unroll factor subtracted by 1. Therefore I'm assigning the maximum possible trip count as the back edge weight. This should be more accurate than the default non-profile weight, which assumes the back edge runs much more frequently than the exit edge.

Differential Revision: https://reviews.llvm.org/D83187
2020-07-15 12:33:29 -07:00
Hiroshi Yamauchi 30c382a7c6 [PGO][PGSO] Add profile guided size optimization to loop vectorization legality.
Differential Revision: https://reviews.llvm.org/D83329
2020-07-15 11:49:36 -07:00
Sanjay Patel d8b268680d [InstCombine] prevent infinite looping in or-icmp fold (PR46712)
I'm not sure if the test is truly minimal, but we need to
induce a situation where a value becomes a constant but is
not immediately folded before getting to the 'or' transform.
2020-07-15 14:12:12 -04:00
Sanjay Patel efc30e591b [InstCombine] update datalayout in test file; NFC
We need to specify legal integer widths to trigger PR46712,
so add those here. This doesn't appear to affect any existing
tests, and it's not clear why a datalayout would not include
any legal integer widths.

While here, change some variable names that include 'tmp' to
avoid warnings from the auto-generating script for CHECK lines.
2020-07-15 14:12:12 -04:00
Hiroshi Yamauchi 4a539faf74 [PGO] Extend the value profile buckets for mem op sizes.
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).

Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
2020-07-15 10:26:15 -07:00
Arthur Eubanks f413b53a67 [NPM][IVUsers] Rename ivusers -> iv-users
LPM passes were named iv-users, which seems nicer than ivusers.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D83803
2020-07-15 09:38:21 -07:00
David Green 24cd66d219 [HardwareLoops] Add sibling loop test.
This missed being part of 9e03547cab.
2020-07-15 16:36:17 +01:00
Tim Northover 37b96d51d0 CodeGenPrep: remove AssertingVH references before deleting dead instructions.
CodeGenPrepare keeps fairly close track of various instructions it's
seen, particularly GEPs, in maps and vectors. However, sometimes those
instructions become dead and get removed while it's still executing.
This triggers AssertingVH references to them in an asserts build and
could lead to miscompiles in a release build (I've only seen a later
segfault though).

So this patch adds a callback to
RecursivelyDeleteTriviallyDeadInstructions which can make sure the
instruction about to be deleted is removed from CodeGenPrepare's data
structures.
2020-07-15 15:19:21 +01:00
John Brawn 20854d85e1 [DSE,MSSA] Recognise init_trampoline in getLocForWriteEx
This fixes an instance where MemorySSA-using Dead Store Elimination is failing
to do a transformation that the non-MemorySSA-using version does.

Differential Revision: https://reviews.llvm.org/D83783
2020-07-15 12:18:58 +01:00
Tim Northover 5165b2b5fd AArch64+ARM: make LLVM consider system registers volatile.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.

This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
2020-07-15 09:47:36 +01:00
Giorgis Georgakoudis 7f680be593 [OpenMPOpt][NFC] Update checks for parallel_deletion test 2020-07-14 23:36:33 -07:00
Luofan Chen 6db99d18b6 Revert "[Attributor] Track AA dependency using dependency graph"
This reverts commit 8df7af560a.
2020-07-15 11:48:08 +08:00
Johannes Doerfert 64d99a1d04 [CallGraph] Update callback call sites in RefreshCallGraph
Since D82572, we keep "reference" edges for callback call sites. While
not strictly necessary they can improve the traversal order. However, we
did not update them properly in case a pass removed the callback call
site which caused a verification error (PR46687). With this patch we
update these reference edges properly during the invocation of
`CallGraphSCCPass::RefreshCallGraph` in non-checking mode.

Reviewed By: sdmitriev

Differential Revision: https://reviews.llvm.org/D83718
2020-07-14 22:33:57 -05:00
Luofan Chen 8df7af560a [Attributor] Track AA dependency using dependency graph
Summary: This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers.

Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis

Reviewed By: jdoerfert

Subscribers: jfb, okura, mgrang, kuter, lebedev.ri, hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78861
2020-07-15 10:40:21 +08:00
Luofan Chen e21323a1e9 Revert "[Attributor] [WIP] Track AA dependency using dependency graph"
This reverts commit 6b78ed6070.
2020-07-15 10:33:55 +08:00
Luofan Chen 6b78ed6070 [Attributor] [WIP] Track AA dependency using dependency graph
Summary: This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers.

Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis

Reviewed By: jdoerfert

Subscribers: jfb, okura, mgrang, kuter, lebedev.ri, hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78861
2020-07-15 10:21:49 +08:00
Christopher Tetreault 9c87c55805 [SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats
Reviewers: efriedma, lebedev.ri, fhahn, c-rhodes, david-arm

Reviewed By: efriedma, david-arm

Subscribers: tschuett, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83001
2020-07-14 14:20:39 -07:00
Tyker 0257ba581c Fix tests after 16f777f421 2020-07-14 22:52:26 +02:00
Tyker 16f777f421 [NFC] Add debug and stat counters to assume queries and assume builder
Summary:
Add debug counter and stats counter to assume queries and assume builder
here is the collected stats on a build of check-llvm + check-clang.
  "assume-builder.NumAssumeBuilt": 2720879,
  "assume-builder.NumAssumesMerged": 761396,
  "assume-builder.NumAssumesRemoved": 1576212,
  "assume-builder.NumBundlesInAssumes": 6518809,
  "assume-queries.NumAssumeQueries": 85566380,
  "assume-queries.NumUsefullAssumeQueries": 2727360,
the NumUsefullAssumeQueries stat is actually pessimistic because in a few places queries
ask to keep providing information to try to get better information. and this isn't counted
as a usefull query evem tho it can be usefull

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83506
2020-07-14 21:49:14 +02:00
Teresa Johnson 6014c46c80 Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP"
This restores commit 80d0a137a5, and the
follow on fix in 873c0d0786, with a new
fix for test failures after a 2-stage clang bootstrap, and a more robust
fix for the Chromium build failure that an earlier version partially
fixed. See also discussion on D75201.

Reviewers: evgeny777

Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73242
2020-07-14 12:16:57 -07:00
Sjoerd Meijer 2b3c505d0f [Matrix] Intrinsic descriptions
This changes the matrix load/store intrinsic definitions to load/store from/to
a pointer, and not from/to a pointer to a vector, as discussed in D83477.

This also includes the recommit of "[Matrix] Tighten LangRef definitions and
Verifier checks" which adds improved language reference descriptions of the
matrix intrinsics and verifier checks.

Differential Revision: https://reviews.llvm.org/D83785
2020-07-14 19:58:16 +01:00
Sanjay Patel e6c016420c [ValueTracking] fix library to intrinsic mapping to respect 'nobuiltin' attribute
This is another problem raised in:
http://bugs.llvm.org/PR46627
2020-07-14 10:04:24 -04:00
Sanjay Patel 9300de4d1c [InstSimplify] add test with nobuiltin attribute (PR46627); NFC 2020-07-14 10:04:24 -04:00
serge-sans-paille 1cd1c1d62e Revert "[SCEV][IndVarSimplify] insert point should not be block front."
This reverts commit f1efb8bb4b.

Reverted because it doesn't correctly update the pass return status, see

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/9441/steps/test-check-all/logs/FAIL%3A%20LLVM%3A%3Awiden-i32-i8ptr.ll
2020-07-14 14:24:26 +02:00
Sanjay Patel 34d35d4a42 [ValueTracking] fix miscompile in maxnum case of cannotBeOrderedLessThanZeroImpl (PR46627)
A miscompile with -0.0 is shown in:
http://bugs.llvm.org/PR46627

This is because maxnum(-0.0, +0.0) does not specify a fixed result:
http://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic

So we need to tighten the constraints for when it is ok to say the
result of maxnum is positive (including +0.0).

Differential Revision: https://reviews.llvm.org/D83601
2020-07-14 08:08:09 -04:00
Sanjay Patel 9cc669d22d [InstCombine][InstSimplify] add tests for sign of maxnum; NFC
More coverage for D83601.
2020-07-14 08:08:09 -04:00
Sam Parker a5405a2f05 [NFC][ARM] Add SimplifyCFG tests 2020-07-14 11:10:11 +01:00
Sjoerd Meijer 959eaa50d6 [ARM][MVE] Only tail-fold integer add reductions
If a vector body has live-out values, it is probably a reduction, which needs a
final reduction step after the loop. MVE has a VADDV instruction to reduce
integer vectors, but doesn't have an equivalent one for float vectors. A
live-out value that is not recognised as reduction later in the optimisation
pipeline will result in the tail-predicated loop to be reverted to a
non-predicated loop and this is very expensive, i.e. it has a significant
performance impact, which is what we hope to avoid with fine tuning the ARM TTI
hook preferPredicateOverEpilogue implementation.

Differential Revision: https://reviews.llvm.org/D82953
2020-07-14 10:15:07 +01:00
Jameson Nash 2c7a07b59d [GVN] teach ConstantFolding correct handling of non-integral addrspace casts
Here we teach the ConstantFolding analysis pass that it is not legal to
replace a load of a bitcast constant (having a non-integral addrspace)
with a bitcast of the value of that constant (with a different
non-integral addrspace).

But also teach it that certain bit patterns are always known and
convertable (a fact it already uses elsewhere). This required us to also
fix a globalopt test, since, after this change, LLVM is able to realize
that the test actually is a valid transform (NULL is always a known
bit-pattern) and so it doesn't need to emit the failure remarks for it.

Also simplify some of the negative tests for transforms by avoiding a
type change in their bitcast, and add positive versions of the same
tests, to show that they otherwise should work.

Differential Revision: https://reviews.llvm.org/D59730
2020-07-13 21:44:17 -04:00
Jameson Nash 19f01a4847 [GVN] add early exit to ConstantFoldLoadThroughBitcast [NFC]
And adds some additional test coverage to ensure later commits don't
introduce regressions.

Differential Revision: https://reviews.llvm.org/D59730
2020-07-13 21:44:17 -04:00
Mircea Trofin 6b109f2f05 [llvm][NFC] Removed unused CHECKs in a ml test
The CHECKs are now in Inputs/test-module.ll
2020-07-13 16:59:14 -07:00
Mircea Trofin 73f02a61df [llvm][NFC] ML InlineAdvisor: Factored CHECKs in common test
The CHECKs are going to be shared with the development mode test
2020-07-13 16:31:07 -07:00
Tyker 8d09f20798 [AssumeBundles] Use operand bundles to encode alignment assumptions
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.

Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71739
2020-07-14 01:05:58 +02:00
Gui Andrade bfa3b627c6 [InstCombine] Erase attribute lists for simplified libcalls
Currently, a transformation like pow(2.0, x) -> exp2(x) copies the pow
attribute list verbatim and applies it to exp2. This works out fine
when the attribute list is empty, but when it isn't clang may error due
due to the mismatch.

The source function and destination don't necessarily have anything
to do with one another, attribute-wise. So it makes sense to remove
the attribute lists (this is similar to what IPO does in this
situation).

This was discovered after implementing the `noundef` param attribute.

Differential Revision: https://reviews.llvm.org/D82820
2020-07-13 22:32:33 +00:00
Vedant Kumar 3d52b1e81b Revert "[InstCombine] Drop debug loc in TryToSinkInstruction (reland)"
This reverts commit 9649c2095f. See
discussion on the llvm-commits thread: if it's OK to preserve the
location when sinking a call, it's probably OK to always preserve the
location.
2020-07-13 15:17:07 -07:00
Nikita Popov 353fa4403a [PredicateInfo] Place predicate info after assume
Place the ssa.copy instructions for assumes after the assume,
instead of before it. Both options are valid, but placing them
afterwards prevents assumes from being replaced with assume(true).
This fixes https://bugs.llvm.org/show_bug.cgi?id=37541 in NewGVN
and will avoid a similar issue in SCCP when we handle more
predicate infos.

Differential Revision: https://reviews.llvm.org/D83631
2020-07-13 21:10:11 +02:00
Nikita Popov 4b626dd949 [NewGVN] Separate passing assume tests (NFC)
Result might not be exactly the same as under GVN, but all the
desired transforms are made.
2020-07-13 21:07:03 +02:00
Nikita Popov 14f738b350 [NewGVN] Rename xfail tests (NFC)
Add an -xfail.ll suffix to tests marked XFAIL, so these files can
be split into passing and failing parts.
2020-07-13 21:07:03 +02:00
Sanne Wouda 7b84045565 [SLPVectorizer] handle vectorizeable library functions
Teaches the SLPVectorizer to use vectorized library functions for
non-intrinsic calls.

This already worked for intrinsics that have vectorized library
functions, thanks to D75878, but schedules with library functions with a
vector variant were being rejected early.

-   assume that there are no load/store dependencies between lib
    functions with a vector variant; this would otherwise prevent the
    bundle from becoming "ready"

-   check during legalization that the vector variant can be used

-   fix-up where we previously assumed that a call would be an intrinsic

Differential Revision: https://reviews.llvm.org/D82550
2020-07-13 15:28:46 +01:00
Sanne Wouda e909f6bc48 Pre-commit tests
Prepare to land D82550
2020-07-13 15:28:46 +01:00
Sjoerd Meijer 595270ae39 [ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.

This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.

Differential Revision: https://reviews.llvm.org/D83133
2020-07-13 13:40:33 +01:00
Max Kazantsev e808cab824 [InstCombine] Improve select -> phi canonicalization: consider more blocks
We can try to replace select with a Phi not in its parent block alone,
but also in blocks of its arguments. We benefit from it when select's
argument is a Phi.

Differential Revision: https://reviews.llvm.org/D83284
Reviewed By: nikic
2020-07-13 11:40:32 +07:00
Shinji Okumura c73f425f84 [Attributor] Add AAValueSimplifyCallSiteArgument::manifest
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D82861
2020-07-13 07:01:50 +09:00
Alexey Lapshin 0a01fc96e2 Revert "[TRE] allow TRE for non-capturing calls."
This reverts commit f7907e9d22.

That commit caused error on multi-stage build.
2020-07-13 00:39:48 +03:00
Sanjay Patel 4458973347 [InstCombine] fold mul of zext/sext bools to 'and'
Similar to rG40fcc42:
The base case only worked because we were relying on a
poison-unsafe select transform; if that is fixed, we
would regress on patterns like this.

The extra use tests show that the select transform can't
be applied consistently. So it may be a regression to have
an extra instruction on 1 test, but that result was not
created safely and does not happen reliably.
2020-07-12 15:56:26 -04:00
Ayal Zaks 82a5157ff1 [LV] Fixing versioning-for-unit-stide of loops with small trip count
This patch fixes D81345 and PR46652.

If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis
still generates runtime checks for unit strides that will version the loop.

In such cases, the loop vectorizer should either re-run the analysis or bail-out
from vectorizing the loop, as done prior to D81345. The latter is applied for
now as the former requires refactoring.

Differential Revision: https://reviews.llvm.org/D83470
2020-07-12 19:51:47 +03:00
Nikita Popov d589372704 [SCCP] Extend nonnull metadata test (NFC) 2020-07-12 17:48:32 +02:00
Nikita Popov 6634aef71f [SCCP] Add test for predicate info condition handling (NFC) 2020-07-12 10:13:10 +02:00
kuter 4dbe82eef3 [Attributor] Introudce attribute seed allow list. 2020-07-12 02:25:33 +03:00
Nikita Popov 6792069a3f [NewGVN] Regenerate test checks (NFC) 2020-07-11 22:51:49 +02:00
Michael Liao 0b4cf802fa [fix-irreducible] Skip unreachable predecessors.
Summary:
- Skip unreachable predecessors during header detection in SCC. Those
  unreachable blocks would be generated in the switch lowering pass in
  the corner cases or other frontends. Even though they could be removed
  through the CFG simplification, we should skip them during header
  detection.

Reviewers: sameerds

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83562
2020-07-11 10:08:44 -04:00
Alexey Lapshin f7907e9d22 [TRE] allow TRE for non-capturing calls.
The current implementation of Tail Recursion Elimination has a very restricted
pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function
call receives a pointer to local stack. Generally, function calls that
receive a pointer to local stack but do not capture it - should not
break TRE. This fix allows us to do TRE if it is proved that no pointer
to the local stack is escaped.

Reviewed by: efriedma

Differential Revision: https://reviews.llvm.org/D82085
2020-07-11 14:01:48 +03:00
Johannes Doerfert 5b0581aedc [OpenMP] Replace function pointer uses in GPU state machine
In non-SPMD mode we create a state machine like code to identify the
parallel region the GPU worker threads should execute next. The
identification uses the parallel region function pointer as that allows
it to work even if the kernel (=target region) and the parallel region
are in separate TUs. However, taking the address of a function comes
with various downsides. With this patch we will identify the most common
situation and replace the function pointer use with a dummy global
symbol (for identification purposes only). That means, if the parallel
region is only called from a single target region (or kernel), we do not
use the function pointer of the parallel region to identify it but a new
global symbol.

Fixes PR46450.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83271
2020-07-11 01:44:00 -05:00
Johannes Doerfert e8039ad4de [OpenMP] Identify GPU kernels (aka. OpenMP target regions)
We now identify GPU kernels, that is entry points into the GPU code.
These kernels (can) correspond to OpenMP target regions. With this patch
we identify and on request print them via remarks.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83269
2020-07-11 01:44:00 -05:00
sstefan1 b8235d2bd8 Reland "[OpenMPOpt] ICV Tracking"
This reverts commit 1d542f0ca8.

`recollectUses()` is added to prevent looking at dead uses after
Attributor run.

This is the first and most basic ICV Tracking implementation. For this
first version, we only support deduplication within the same BB.

Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku,
baziotis, lebedev.ri

Differential Revision: https://reviews.llvm.org/D81788
2020-07-11 02:25:57 +02:00
Sanjay Patel 351f2b3c0a [InstSimplify] add tests for maxnum (PR46627); NFC 2020-07-10 20:20:38 -04:00