Looks like we intended to compare this->Members with Other->Members
here, but ended up comparing this->Members with this->Members. Oops. :)
Since CongruenceClass::Members is a SmallPtrSet anyway, we can probably
skip building std::sets if we're willing to write a bit more code.
This appears to be no functional change (for sufficiently lax values of
"no"): this equality check was only being called inside of an assert.
So, worst case, we'll catch more bugs in the form of assertion failures.
Thanks to d0k for noting this!
llvm-svn: 333601
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.
This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.
I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.
Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.
Differential Revision: https://reviews.llvm.org/D47408
llvm-svn: 333493
be both simpler and substantially more efficient.
Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.
Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.
On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.
Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.
This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.
I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.
Differential Revision: https://reviews.llvm.org/D47407
llvm-svn: 333461
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 333358
Reverting this to see if this is causing the failures of the
clang-with-thin-lto-ubuntu bot.
[IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333323
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333268
The plan had always been to move towards using this rather than so much
in-pass simplification within the loop pipeline, but we never got around
to it.... until only a couple months after it was removed due to disuse.
=/
This commit is just a pure revert of the removal. I will add tests and
do some basic cleanup in follow-up commits. Then I'll wire it into the
loop pass pipeline.
Differential Revision: https://reviews.llvm.org/D47353
llvm-svn: 333250
Summary:
In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update
only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically,
instead of all analyses that depend on the CFG (setPreservesCFG()).
This change should fix PR37323.
Reviewers: uabelho, davide, dberlin, Ka-Ka
Reviewed By: dberlin
Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46775
llvm-svn: 333198
Summary:
If NaryReassociate succeed it will, when replacing the old instruction
with the new instruction, also recursively delete trivially
dead instructions from the old instruction. However, if the input to the
NaryReassociate pass contain dead code it is not save to recursively
delete trivially deadinstructions as it might lead to deleting the newly
created instruction.
This patch will fix the problem by using WeakVH to detect this
rare case, when the newly created instruction is dead, and it will then
restart the basic block iteration from the beginning.
This fixes pr37539
Reviewers: tra, meheff, grosser, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47139
llvm-svn: 333155
Summary:
StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.
However, we found a problem for a SubRegion which is a loop itself:
--> BB1 --> BB2 --> BB3 -->
In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.
In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.
Reviewers:
arsenm, jlebar
Differential Revision:
https://reviews.llvm.org/D46912
llvm-svn: 333111
Loop unswitching makes substantial changes to a loop that can also affect cached
SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the
innermost loop in case of full unswitching and does not invalidate anything at all in
case of trivial unswitching. As result, we may end up with incorrect data in cache.
Differential Revision: https://reviews.llvm.org/D46045
Reviewed By: mzolotukhin
llvm-svn: 333072
This patch fixes two bugs:
* test1: Previously assume(a >= 5) concluded that a == 5. That's only
valid for assume(a == 5)...
* test2: If operands were swapped, additional users were added to the
wrong cmp operand. This resulted in an "unsettled iteration"
assertion failure.
Patch by Nikita Popov
Differential Revision: https://reviews.llvm.org/D46974
llvm-svn: 333007
Change matchSelectPattern to return X and -X for ABS/NABS in a well defined order. Adjust EarlyCSE to account for this. Ensure the SPF result is some kind of min/max and not abs/nabs in one place in InstCombine that made me nervous.
Prevously we returned the two operands of the compare part of the abs pattern. The RHS is always going to be a 0i, 1 or -1 constant. This isn't a very meaningful thing to return for any one. There's also some freedom in the abs pattern as to what happens when the value is equal to 0. This freedom led to early cse failing to match when different constants were used in otherwise equivalent operations. By returning the input and its negation in a defined order we can ensure an exact match. This also makes sure both patterns use the exact same subtract instruction for the negation. I believe CSE should evebntually make this happen and properly merge the nsw/nuw flags. But I'm not familiar with CSE and what order it does things in so it seemed like it might be good to really enforce that they were the same.
Differential Revision: https://reviews.llvm.org/D47037
llvm-svn: 332865
We were previously using a DT in CVP through SimplifyQuery, but not requiring it in
the new pass manager. Hence it would crash if DT was not already available. This now
gets DT directly and plumbs it through to where it is used (instead of using it
through SQ).
llvm-svn: 332836
In the patch rL329547, we have lifted the over-restrictive limitation on collected range
checks, allowing to work with range checks with the end of their range not being
provably non-negative. However it appeared that the non-negativity of this value was
assumed in the utility function `ClampedSubtract`. In particular, its reasoning is based
on the fact that `0 <= SINT_MAX - X`, which is not true if `X` is negative.
The function `ClampedSubtract` is only called twice, once with `X = 0` (which is OK)
and the second time with `X = IRC.getEnd()`, where we may now see the problem if
the end is actually a negative value. In this case, we may sometimes miscompile.
This patch is the conservative fix of the miscompile problem. Rather than rejecting
non-provably non-negative `getEnd()` values, we will check it for non-negativity in
runtime. For this, we use function `smax(smin(X, 0), -1) + 1` that is equal to `1` if `X`
is non-negative and is equal to 0 if `X` is negative. If we multiply `Begin, End` of safe
iteration space by this function calculated for `X = IRC.getEnd()`, we will get the original
`[Begin, End)` if `IRC.getEnd()` was non-negative (and, thus, `ClampedSubtract` worked
correctly) and the empty range `[0, 0)` in case if ` IRC.getEnd()` was negative.
So we in fact prohibit execution of the main loop if at least one of range checks was
made against a negative value (and we figured it out in runtime). It is still better than
what we have before (non-negativity had to be proved in compile time) and prevents
us from miscompile, however it is sometiles too restrictive for unsigned range checks
against a negative value (which in fact can be eliminated).
Once we re-implement `ClampedSubtract` in a way that it handles negative `X` correctly,
this limitation can be lifted, too.
Differential Revision: https://reviews.llvm.org/D46860
Reviewed By: samparker
llvm-svn: 332809
Summary:
The verifier accepts PHI nodes with multiple entries for the
same basic block, as long as the value is the same.
As seen in PR37203, SROA did not handle such PHI nodes properly
when speculating loads over the PHI, since it inserted multiple
loads in the predecessor block and changed the PHI into having
multiple entries for the same basic block, but with different
values.
This patch teaches SROA to reuse the same speculated load for
each PHI duplicate entry in such situations.
Resolves: https://bugs.llvm.org/show_bug.cgi?id=37203
Reviewers: uabelho, chandlerc, hfinkel, bkramer, efriedma
Reviewed By: efriedma
Subscribers: dberlin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D46426
llvm-svn: 332577
The current integer widening does not support rewriting partial split slices in rewriteIntegerStore (and rewriteIntegerLoad).
This patch adds explicit checks for this case in isIntegerWideningViableForSlice.
Before r322533, splitting is allowed only for the whole-alloca slice and hence the above case is implicitly rejected by another check `if (DL.getTypeStoreSize(ValueTy) > Size)` because whole-alloca slice is larger than the partition.
Differential Revision: https://reviews.llvm.org/D46750
llvm-svn: 332575
r332057 introduced distance() for ranges. Based on post-commit feedback,
this renames distance() to size(). The new size() is also only enabled
when the operation is O(1).
Differential Revision: https://reviews.llvm.org/D46976
llvm-svn: 332551
Author: Samuel Pitoiset
Without this patch, it appears to me that we are selecting
the wrong operand when inverting conditions. In the attached
test, it will select %tmp3 instead of %tmp4. To fix it, just
use 'A' as everywhere.
This fixes a regression introduced by
"[PatternMatch] define m_Not using m_Xor and cst_pred_ty"
https://reviews.llvm.org/D46351
llvm-svn: 332403
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Let separate-const-offset-from-gep pass handle trunc() when it calculates
constant offset relative to base. The pass itself may insert trunc()
instructions when it canonicalises array indices to pointer-size integers
and needs to handle trunc() in order to evaluate the offset.
Differential Revision: https://reviews.llvm.org/D46732
llvm-svn: 332142
Phi nodes can reside in live blocks but one of their incoming
arguments can come from a dead block. Dead blocks and reassociate
don't play nice together. In fact, reassociate performs an RPO
as a first step to avoid processing dead blocks.
The reason why Reassociate might not fixpoint when examining
dead blocks is that the following:
%xor0 = xor i16 %xor1, undef
%xor1 = xor i16 %xor0, undef
is perfectly valid LLVM IR (if it appears in a dead block),
so the worklist algorithm keeps pushing the two instructions for
reexamination. Note that this is not Reassociate fault, at least
not entirely. It's llvm that has a weird definition of dominance.
Fixes PR37390.
llvm-svn: 332100
This commit adds a wrapper for std::distance() which works with ranges.
As it would be a common case to write `distance(predecessors(BB))`, this
also introduces `pred_size()` and `succ_size()` helpers to make that
easier to write.
Differential Revision: https://reviews.llvm.org/D46668
llvm-svn: 332057
Summary:
This change teaches DSE that the atomic memory intrinsics can be overwriten
partially in the same way as the non-atomic forms. Specifically, that the
atomic memcpy & memset can be shortened at the end and that the atomic memset
can be shortened at the beginning, if they partially overwritten
by later stores.
Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith, spatel, filcab, sanjoy
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45584
llvm-svn: 331991
Summary:
MergedLoadStoreMotion::mergeStores is using some heuristics
to limit the amount of stores that it tries to sink (see
MagicCompileTimeControl in MergedLoadStoreMotion.cpp). The
heuristic involves counting the number of instructions in
one of the basic blocks that is part of the transformation.
We now ignore dbg intrinsics when counting instruction for
the MagicCompileTimeControl heuristic. This to make sure that
the amount of stores that are sunk doesn't depend on the amount
of debug information (if -g is used or not).
Reviewers: Gerolf, davide, majnemer
Reviewed By: davide
Subscribers: dberlin, bjope, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46600
llvm-svn: 331852
Computing this property within the existing walk ensures that the cost is linear with the size of the block. If we did this from within isGuaranteedToExecute, it would be quadratic without some very fancy caching.
This allows us to reliably catch a hoistable instruction within a header which may throw at some point *after* our hoistable instruction. It doesn't do anything for non-header cases, but given how common single block loops are, this seems very worthwhile.
llvm-svn: 331557
Two of these are immediately dereferenced on the next line. The other two are passed immediately to the IRBuilder constructor which can't handle a nullptr.
llvm-svn: 331500
These are casts on users of a PHINode to Instruction. I think since PHINode is an Instruction any users would also be Instructions. At least a cast will give us an assertion if its wrong.
llvm-svn: 331498
Summary:
Some of our internal testing detected a major compile time regression which I've
tracked down to:
r278938 - Revert "Reassociate: Reprocess RedoInsts after each inst".
It appears that processing long chains of reassociatable instructions causes
non-linear (potentially exponential) growth in the number of times an
instruction is revisited. For example, the included test revisits instructions
220 times in a 20-instruction test.
It appears that r278938 reversed the order instructions were visited and that
this is preventing scheduled revisits from being cancelled as a result of
visiting the instructions naturally during normal processing. However, simply
reversing the order also harmed the generated code. Upon closer inspection, it
was discovered that revisits occurred in the opposite order to the first pass
(Thanks to escha for spotting that).
This patch makes the revisit order consistent with the first pass which allows
more revisits to be cancelled. This does appear to have a small impact on the
generated code in few cases but it significantly reduces compile-time.
After this patch, our internal test that was most affected by the regression
dropped from ~2 million revisits to ~4k resulting in Reassociate having 0.46%
of the runtime it had before (99.54% improvement).
Here's the summaries reported by lnt for the LLVM test-suite with --benchmarking-only:
| metric | geomean before patch | geomean after patch | delta |
| ----- | ----- | ----- | ----- |
| compile time | 0.1956 | 0.1261 | -35.54% |
| execution time | 0.3240 | 0.3237 | - |
| code size | 7365.4459 | 7365.6079 | - |
The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:
| Performance Regressions - compile_time | Δ | Previous | Current |
| MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk | 9.82% | 2.0473 | 2.2483 |
Reviewers: javed.absar, dberlin
Reviewed By: dberlin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45734
llvm-svn: 331381
This is a follow-up to r331272.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
https://reviews.llvm.org/D46290
llvm-svn: 331275
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
phi is on lhs of a comparison op.
For the following testcase,
L1:
%t0 = add i32 %m, 7
%t3 = icmp eq i32* %t2, null
br i1 %t3, label %L3, label %L2
L2:
%t4 = load i32, i32* %t2, align 4
br label %L3
L3:
%t5 = phi i32 [ %t0, %L1 ], [ %t4, %L2 ]
%t6 = icmp eq i32 %t0, %t5
br i1 %t6, label %L4, label %L5
We know if we go through the path L1 --> L3, %t6 should always be true. However
currently, if the rhs of the eq comparison is phi, JumpThreading fails to
evaluate %t6 to true. And we know that Instcombine cannot guarantee always
canonicalizing phi to the left hand side of the comparison operation according
to the operand priority comparison mechanism in instcombine. The patch handles
the case when rhs of the comparison op is a phi.
Differential Revision: https://reviews.llvm.org/D46275
llvm-svn: 331266
unswitch and replace it with the amazingly simple update API code.
This addresses piles of FIXMEs around the update logic here and makes
everything substantially simpler.
llvm-svn: 331247
code review.
It turns out this *is* necessary, and I read the comment on the API
correctly the first time. ;]
The `applyUpdates` routine requires that updates are "balanced". This is
in order to cleanly handle cycles like inserting, removing, nad then
re-inserting the same edge. This precludes inserting the same edge
multiple times in a row as handling that would cause the insertion logic
to become *ordered* instead of *unordered* (which is what the API
provides).
It happens that in this specific case nothing (other than an assert and
contract violation) goes wrong because we're never inserting and
removing the same edge. The implementation *happens* to do the right
thing to eliminate redundant insertions in that case.
But the requirement is there and there is an assert to catch it.
Somehow, after the code review I never did another asserts-clang build
testing loop-unswich for a long time. As a consequence, I didn't notice
this despite a bunch of testing going on, but it shows up immediately
with an asserts build of clang itself.
llvm-svn: 331246
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes. This should help locality of access substaintially and avoids the cost of computing PostDom.
The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes. (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..)
llvm-svn: 331094