After r333856, opt -debugify would just stop emitting debug value
intrinsics after encountering a musttail call. This wasn't sufficient to
avoid verifier failures.
Debug value intrinicss for all instructions preceding a musttail call
must also be emitted before the musttail call.
llvm-svn: 333866
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576
llvm-svn: 333864
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.
llvm-svn: 333845
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds.
The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that
clearer.
This is NFCI at this point, but I've regenerated the test file
to show the cosmetic value naming difference of using
instcombine's RAUW vs. the builder.
Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.
llvm-svn: 333820
As noted in the review thread for rL333782, we're lacking coverage
for this transform, so add tests for each binop opcode with constant
operand.
llvm-svn: 333818
Summary:
I noticed this issue because we didn't put the primary cloned loop into
the `NonChildClonedLoops` vector and so never iterated on it. Once
I fixed that, it made it clear why I had to do a really complicated and
unnecesasry dance when updating the loops to remain in canonical form --
I was unwittingly working around the fact that the primary cloned loop
wasn't in the expected list of cloned loops. Doh!
Now that we include it in this vector, we don't need to return it and we
can consolidate the update logic as we correctly have a single place
where it can be handled.
I've just added a test for the iteration order aspect as every time
I changed the update logic partially or incorrectly here, an existing
test failed and caught it so that seems well covered (which is also
evidenced by the extensive working around of this missing update).
Reviewers: asbirlea, sanjoy
Subscribers: mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47647
llvm-svn: 333811
Summary:
Getelementptr returns a vector of pointers, instead of a single address,
when one or more of its arguments is a vector. In such case it is not
possible to simplify the expression by inserting a bitcast of operand(0)
into the destination type, as it will create a bitcast between different
sizes.
Reviewers: majnemer, mkuper, mssimpso, spatel
Reviewed By: spatel
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D46379
llvm-svn: 333783
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37648
...was created with the enhancement to this transform with rL332479.
The urem test shows the disaster potential: any undef divisor lane makes
the whole op undef.
The test diffs show that vector demanded elements turns some of the potential,
but not all, unused binop operands back into undef already.
llvm-svn: 333782
Summary:
Emit summaries for bitcode modules that are only destined for the
regular LTO portion of the build so they can participate in
summary-based dead stripping.
This change reduces the size of a nacl_helper build with cfi-icall
enabled by 7%, removing the majority of the overhead due to enabling
cfi-icall. The cfi-icall size increase was caused by compiling in lots
of unused code and cfi-icall generating jumptable references to unused
symbols that could no longer be removed by -Wl,-gc-sections. Increasing
the visibility of summary-based dead stripping prevented jumptable
entries being created for unused symbols from the regular LTO portion
of the build.
Reviewers: pcc
Reviewed By: pcc
Subscribers: dschuff, mehdi_amini, inglorion, eraman, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47594
llvm-svn: 333768
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333740
Summary:
Loop idiom recognize tries to convert loops like
```
int foo(int x) {
int cnt = 0;
while (x) {
x >>= 1;
++cnt;
}
return cnt;
}
```
into calls to ctlz, but if x is initially negative this loop should be infinite.
It happens that the cases that motivated this change have an absolute value of x before the loop. So this patch restricts the transform to cases where we know x is positive. Note: We are relying on the absolute value of INT_MIN to be undefined so we can assume that the result is always positive.
Fixes PR37479
Reviewers: spatel, hfinkel, efriedma, javed.absar
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D47348
llvm-svn: 333702
This is the planned enhancement to D47163 / rL333611.
We want to match cmp/select sizes because that will be recognized
as min/max more easily and lead to better codegen (especially for
vector types).
As mentioned in D47163, this improves some of the tests that would
also be folded by D46380, so we may want to adjust that patch to
match the new patterns where the extend op occurs after the select.
llvm-svn: 333689
Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.
Differential Revision: https://reviews.llvm.org/D46273
llvm-svn: 333643
In post-commit review, Eric Christopher notes that many
new MSan warnings are being observed with this patch.
The probable reason is: if 'y' is undef here and we could
evaluate it twice and get different results.
We can't increase the number of uses of a value.
llvm-svn: 333631
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'
This is something that came up as far back as D26556, and I lost track of it.
I suspect that this transform is part of the underlying problem that is
inspiring some of the recent proposals that seek to match larger patterns
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).
A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.
Differential Revision: https://reviews.llvm.org/D47163
llvm-svn: 333611
Summary:
Fix PR37625. It's possible for an extern_weak declaration to be emitted
to the merged module when a definition exists in the ThinLTO portion of
the build; discard the linkage on the declaration in that case.
(otherwise we copy the linkage to the alias to the jumptable and fail)
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47494
llvm-svn: 333604
Summary:
The isKnownNonZero() function have checks that abort the recursion when
it reaches the specified max depth. However one of the recursive calls
was placed before the max depth check was done, resulting in a endless
recursion that eventually triggered a segmentation fault.
Fixed the problem by moving the max depth check above the first
recursive call.
Reviewers: Prazek, nlopes, spatel, craig.topper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bjope, llvm-commits
Differential Revision: https://reviews.llvm.org/D47531
llvm-svn: 333557
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.
Differential Revision: https://reviews.llvm.org/D46133
llvm-svn: 333550
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.
This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.
I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.
Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.
Differential Revision: https://reviews.llvm.org/D47408
llvm-svn: 333493
be both simpler and substantially more efficient.
Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.
Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.
On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.
Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.
This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.
I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.
Differential Revision: https://reviews.llvm.org/D47407
llvm-svn: 333461
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 333358
Reverting this to see if this is causing the failures of the
clang-with-thin-lto-ubuntu bot.
[IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333323
Libfuzzer tests have been fixed to prevent being optimized.
Original commit message:
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. N
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333300
Summary:
Look past debug intrinsics when querying whether an instruction is the
first instruction in the header block. The commit includes a reproducer
for a case where LICM would not hoist an instruction, due to the presence
of the intrinsic.
A caveat with this commit is that the check will not work properly if
the instruction at hand is a debug intrinsic. I assume that no one
depends on isGuaranteedToExecute() to return true for debug intrinsics
for these cases (and that this might be an indication of another debug
invariant issue), so I thought that it was not worth adding that extra
bit of complexity.
Reviewers: reames, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47197
llvm-svn: 333274
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333268
Setting the "Debug Info Version" module flag makes it possible to pipe
synthetic debug info into llc, which is useful for testing backends.
llvm-svn: 333237
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333226
Reassociation of math ops in some contexts (especially vector contexts)
has generally only been happening when the 'fast' FMF was set. This
enables reassoication when only the finer grained controls 'reassoc' and
'nsz' are set.
Differential Revision: https://reviews.llvm.org/D47335
llvm-svn: 333221
Summary:
In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update
only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically,
instead of all analyses that depend on the CFG (setPreservesCFG()).
This change should fix PR37323.
Reviewers: uabelho, davide, dberlin, Ka-Ka
Reviewed By: dberlin
Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46775
llvm-svn: 333198
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.
Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!
llvm-svn: 333193
Summary:
If NaryReassociate succeed it will, when replacing the old instruction
with the new instruction, also recursively delete trivially
dead instructions from the old instruction. However, if the input to the
NaryReassociate pass contain dead code it is not save to recursively
delete trivially deadinstructions as it might lead to deleting the newly
created instruction.
This patch will fix the problem by using WeakVH to detect this
rare case, when the newly created instruction is dead, and it will then
restart the basic block iteration from the beginning.
This fixes pr37539
Reviewers: tra, meheff, grosser, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47139
llvm-svn: 333155
Summary:
StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.
However, we found a problem for a SubRegion which is a loop itself:
--> BB1 --> BB2 --> BB3 -->
In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.
In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.
Reviewers:
arsenm, jlebar
Differential Revision:
https://reviews.llvm.org/D46912
llvm-svn: 333111
Summary:
Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]].
Now that the backend is all done, we can finally fold it!
The canonical unfolded masked merge pattern is
```(x & m) | (y & ~m)```
There is a second, equivalent variant:
```(x | ~m) & (y | m)```
Only one of them (the or-of-and's i think) is canonical.
And if the mask is not a constant, we should fold it to:
```((x ^ y) & M) ^ y```
https://rise4fun.com/Alive/ndQw
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: nicholas, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D46814
llvm-svn: 333106
Loop unswitching makes substantial changes to a loop that can also affect cached
SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the
innermost loop in case of full unswitching and does not invalidate anything at all in
case of trivial unswitching. As result, we may end up with incorrect data in cache.
Differential Revision: https://reviews.llvm.org/D46045
Reviewed By: mzolotukhin
llvm-svn: 333072
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.
Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar
Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47088
llvm-svn: 333070
Also, produce the canonical IR abs (s<0) to be more efficient.
This is the libcall equivalent of the clang builtin change from:
rL333038
Pasting from that commit message:
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
llvm-svn: 333042