This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.
This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104602
This change tries to handle multiple dominating users of the pointer operand
by choosing the most immediately dominating one, if possible. While making
this change I also found that the previous implementation had a missing break
statement, making all loads with an odd number of dominating users emit an
OtherAccess value, so that has also been fixed.
Patch by Henrik G Olsson!
Differential Revision: https://reviews.llvm.org/D79097
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
This fixes a "Cached first special instruction is wrong!" assert.
The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.
Reviewed By: nikic, rnk
Differential Revision: https://reviews.llvm.org/D99977
performScalarPREInsertion() inserts instructions into blocks that we
need to tell ImplicitControlFlowTracking about, otherwise the ICF cache
may be invalid.
Fixes PR49193.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99909
Commit 22ce5eb051, changed checks in
GVN/big-endian.ll into CHECK-NOT. The intent was to check that a
succession of lines does not occur but each CHECK-NOT is checked
independently. In other word, one CHECK-NOT uses a variable defined
in a pattern (the one defining the variable) that should not occur in
the input. The bug was then copied over in NewGVN/big-endian.ll.
This commit only checks for the absence of i16 load which rules out the
presence of the whole sequence and does not involve an undefined
variable.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99581
Before this change, the `llvm.access.group` metadata was dropped
when moving a load instruction in GVN. This prevents vectorizing
a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`.
This change propagates the metadata as well as other metadata if
it is safe (the move-destination basic block and source basic
block belong to the same loop).
Differential Revision: https://reviews.llvm.org/D93503
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).
Adjust the type check to consider vectors as well. The previous
version of the patch dropped the type check entirely, but it
turns out that getAggregateElement() does require the constant
to be an aggregate in some edge cases: For Poison/Undef the
getNumElements() API is called, without checking in advance that
we're dealing with an aggregate. Possibly the implementation should
avoid doing that, but for now I'm adding an assert so the next
person doesn't fall into this trap.
This was (partially) reverted in cfe8f8e0 because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607.
Original commit message follows:
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)
We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.
Differential Revision: https://reviews.llvm.org/D97974
As readnone function they become movable and LICM can hoist them
out of a loop. As a result in LCSSA form phi node of type token
is created. No one is ready that GCRelocate first operand is phi node
but expects to be token.
GVN test were also updated, it seems it does not do what is expected.
Test for LICM is also added.
This reverts commit f352463ade.
Fix bug in MemoryDependence [and thus GVN] for invariant group.
Previously MemDep didn't verify that the store was storing into a
pointer rather than a store simply using a pointer.
Differential Revision: https://reviews.llvm.org/D98267
The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use).
Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify.
Differential Revision: https://reviews.llvm.org/D98082
When materializing an available load value, do not explicitly
materialize the undef values from dead blocks. Doing so will
will force creation of a phi with an undef operand, even if there
is a dominating definition. The phi will be folded away on
subsequent GVN iterations, but by then we may have already
poisoned MDA cache slots.
Simply don't register these values in the first place, and let
SSAUpdater do its thing.
What this test illustrates is that GVN inserts an unnecessary
phi node initially, which prevents alias analysis from establishing
NoAlias, and MDA caches that result. We would be able to fully fold
this after another -gvn run with clean MDA.
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.
However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.
This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)
Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.
Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.
Differential Revision: https://reviews.llvm.org/D98080
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).
Rather than adjusting the type check, simply drop it entirely,
as getAggregateElement() is well-defined for non-aggregates: It
simply returns null in that case.
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)
We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.
Differential Revision: https://reviews.llvm.org/D97974
(Note: Part of the reviewed change was split and landed as f352463a)
For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone.
The EarlyCSE change is simply moving the special casing from readonly to readnone handling.
As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice.
This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.
Fixes an infinite loop encountered in GVN.
GVN will delay PRE if it encounters critical edges, attempt to split
them later via calls to SplitCriticalEdge(), then restart.
The caller of GVN::splitCriticalEdges() assumed a return value of true
meant that critical edges were split, that the IR had changed, and that
PRE should be re-attempted, upon which we loop infinitely.
This was exposed after D88438, by compiling the Linux kernel for s390,
but the test case is reproducible on x86.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D94996
Calling null or undef results in immediate undefined behavior.
Return poison instead of undef in this case, similar to what
we do for immediate UB due to division by zero.
This patch updates GVN to correctly return the modified status, if PRE
is performed on indices. It fixes a crash when building the test-suite
with EXPENSIVE_CHECKS and LTO.