Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.
The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.
Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.
Reviewers: hfinkel, gchatelet, chandlerc, davidxl
Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74162
Summary:
This patch makes EarlyCSE fold equivalent freeze instructions.
Another optimization that I think will be useful is to remove freeze if its operand is used as a branch condition or at llvm.assume:
```
%c = ...
br i1 %c, label %A, ..
A:
%d = freeze %c ; %d can be optimized to %c because %c cannot be poison or undef (or 'br %c' would be UB otherwise)
```
If it make sense for EarlyCSE to support this as well, I will make a patch for this.
Reviewers: spatel, reames, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75334
SROA will drop the explicit alignment on allocas when the ABI guarantees
enough alignment. Because the alignment on new load/store instructions
are set based on the alloca's alignment, that means SROA would end up
dropping the alignment from atomic loads and stores, which is not
allowed (see bug). For those, make sure to always carry over the
alignment from the previous instruction.
Differential revision: https://reviews.llvm.org/D75266
Summary:
For now just insert the callback for stores, similar to how MSan tracks
origins. In the future we may want to add callbacks for loads, memcpy,
function calls, CMPs, etc.
Reviewers: pcc, vitalybuka, kcc, eugenis
Reviewed By: vitalybuka, kcc, eugenis
Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits, kcc
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75312
DSE would mistakenly remove store (2):
a = calloc(n+1)
for (int i = 0; i < n; i++) {
store 1, a[i+1] // (1)
store 0, a[i] // (2)
}
The fix is to do PHI transaltion while looking for clobbering
instructions between the store and the calloc.
Reviewed By: efriedma, bjope
Differential Revision: https://reviews.llvm.org/D68006
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.
To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.
This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.
This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.
Differential Revision: https://reviews.llvm.org/D75008
Use UnaryOperator::CreateFNeg instead.
Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D75130
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.
This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).
Differential revision: https://reviews.llvm.org/D74944
As suggested in D75145 -
I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.
We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.
Differential Revision: https://reviews.llvm.org/D75204
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)
Differential Revision: https://reviews.llvm.org/D72169
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.
The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.
Differential Revision: https://reviews.llvm.org/D75011
DevirtSCCRepeatedPass iteration. Needs ReviewPublic
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
See discussion on PR44792.
This reverts commit 02ce9d8ef5.
It also reverts the follow-up commits
8f46269f0 "[profile] Don't dump counters when forking and don't reset when calling exec** functions"
62c7d8402 "[profile] gcov_mutex must be static"
Summary:
Loop unswitch hoists branches on loop-invariant conditions. However, if this
condition is poison/undef and the branch wasn't originally reachable, loop
unswitch introduces UB (since the optimized code will branch on poison/undef and
the original one didn't)).
We fix this problem by freezing the condition to ensure we don't introduce UB.
We will now transform the following:
while (...) {
if (C) { A }
else { B }
}
Into:
C' = freeze(C)
if (C') {
while (...) { A }
} else {
while (...) { B }
}
This patch fixes the root cause of the following bug reports (which use the old loop unswitch, but can be reproduced with minor changes in the code and -enable-nontrivial-unswitch):
- https://llvm.org/bugs/show_bug.cgi?id=27506
- https://llvm.org/bugs/show_bug.cgi?id=31652
Reviewers: reames, majnemer, chenli, sanjoy, hfinkel
Reviewed By: reames
Subscribers: hiraditya, jvesely, nhaehnle, filcab, regehr, trentxintong, nlopes, llvm-commits, mzolotukhin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D29015
If we deduplicate OpenMP runtime calls we have multiple `ident_t*` that
represent information like source location. So far, we simply kept the
one used by the replacement call. However, as exposed by PR44893, that
can cause problems if we have stack allocated `ident_t` objects. While
we need to revisit the use of these as well, it is clear that we
eventually want to merge source location information in some way. With
this patch we add the infrastructure to do so but without doing the
actual merge. Instead we pick a global `ident_t` from the replaced
calls, if possible, or create a new one with an unknown location
instead.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74925
body
We started seeing cases where ARC optimizer would move retain calls into
loop bodies, causing imbalance in the number of retain and release
calls, after changes were made to delete inert ARC calls since the inert
calls that used to block code motion are gone.
Fix the bug by setting the CFG hazard flag when visiting a loop header.
rdar://problem/56908836
Summary:
Replacing uses of IV outside of the loop is likely generally useful,
but `rewriteLoopExitValues()` is cautious, and if it isn't told to always
perform the replacement, and there are hard uses of IV in loop,
it doesn't replace.
In [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]],
that prevents `-indvars` from replacing uses of induction variable
after the loop, which might be one of the optimization failures
preventing that code from being vectorized.
Instead, now that the cost model is fixed, i believe we should be
a little bit more optimistic, and also perform replacement
if we believe it is within our budget.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]].
Reviewers: reames, mkazantsev, asbirlea, fhahn, skatkov
Reviewed By: mkazantsev
Subscribers: nikic, hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73501
Summary:
In future patches`SCEVExpander::isHighCostExpansionHelper()` will respect the budget allocated by performing TTI cost modelling.
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73705
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
This reverts commit 8d22100f66.
There was a functional regression reported (https://bugs.llvm.org/show_bug.cgi?id=44996). I'm not actually sure the patch is wrong, but I don't have time to investigate currently, and this line of work isn't something I'm likely to get back to quickly.
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
As input, we have the following pattern:
Sh0 (Sh1 X, Q), K
We want to rewrite that as:
Sh x, (Q+K) iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.
We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.
We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.
AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).
Differential Revision: https://reviews.llvm.org/D75062
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.
Differential Revision: https://reviews.llvm.org/D51642
Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.
Reviewers: marco-c
Reviewed By: marco-c
Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74953
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6.
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.
Reviewers: dmgreen, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D73757
Summary:
Blocks in a loop can be in any order as long as the loop header is the
first block in Blocks.
With some order of Blocks, cloneLoopWithPreheader would trigger the
assertion in addBasicBlockToLoop.
Example:
define void @test(i64 %N) {
preheader.i:
br label %header.i
header.i:
%i = phi i64 [ 0, %preheader.i ], [ %inc.i, %latch.i ]
br label %header.j
header.j:
%j = phi i64 [ 0, %header.i ], [ %inc.j, %latch.j ]
br label %header.k
header.k:
%k = phi i64 [ 0, %header.j ], [ %inc.k, %latch.k ]
call void @baz(i64 %i, i64 %j, i64 %k)
br label %latch.k
latch.k:
%inc.k = add nsw i64 %k, 1
%cmp.k = icmp slt i64 %inc.k, %N
br i1 %cmp.k, label %header.k, label %latch.j
latch.j:
%inc.j = add nsw i64 %j, 1
%cmp.j = icmp slt i64 %inc.j, %N
br i1 %cmp.j, label %header.j, label %latch.i
latch.i:
%inc.i = add nsw i64 %i, 1
%cmp.i = icmp slt i64 %inc.i, %N
br i1 %cmp.i, label %header.i, label %exit.i
exit.i:
ret void
}
declare void @baz(i64, i64, i64)
If the blocks of loop-i is in the order: header.i, latch.k, header.k,
header.j, latch.j, latch.i,
then cloneLoopWithPreheader would trigger the assertion in
addBasicBlockToLoop
assert(contains(SameHeader) && getHeader() == SameHeader->getHeader() &&
"Incorrect LI specified for this loop!");
As latch.k is in both loop-j and loop-k, it would be set as the header
of both loops after adding latch.k.
If we update loop headers during cloning blocks, then after adding
header.k,
the header of loop-k would be updated with header.k,
while the header of loop-j stays as latch.k.
When adding header.j, SameHeader is loop-k, SameHeader->getHeader() is
header.k, but getHeader() is latch.k, which trigger the assertion.
Reviewer: jdoerfert, Meinersbur, fhahn, kbarton, hfinkel, bmahjour,
etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D74382
This should be the last step in the current cleanup.
Follow-ups should resolve the TODO about cost calc
and enable the more general case where we extract
different elements.
This fixes a small mistake from D72944: The worklist add should
happen before assigning the new operand, not after.
In case an actual replacement happens, the old operand needs to
be added for DCE. If no actual replacement happens, then old/new
are the same, so it doesn't matter.
This drops one iteration from the annotated test case.
Followup to D73919 with another batch of replacements of
setOperand() -> replaceOperand(), to make sure the old
operand gets DCEd right away.
Differential Revision: https://reviews.llvm.org/D74932
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.
Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D74959
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.
To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.
Differential Revision: https://reviews.llvm.org/D74792
Can be used like
-debug-counter=dse-memoryssa-skip=10,dse-memoryssa-counter-count=20
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72147
For tracked globals that are unknown after solving, we expect all
non-store uses to be replaced.
This is a follow-up to f8045b250d, which removed forcedconstant.
We should not mark unknown loads as overdefined, as they either load
from an unknown pointer or an undef global. Restore the original logic
for loads.
Summary:
After updating cost model in AMDGPU target (47a5c36b37) the pass started to
ignore some BBs since they got all instructions estimated as free.
Reviewers: arsenm, chandlerc, nhaehnle
Reviewed By: nhaehnle
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74825
We can look through calls with `returned` argument attributes when we
collect subsuming positions. This allows us to get existing attributes
from more places.
We are often interested in an assumed constant and sometimes it has to
be an integer constant. Before we only looked for the latter, now we can
ask for either.
If we propagate function pointers across function boundaries we can
create new call edges. These need to be represented in the CG if we run
as a CGSCC pass. In the new pass manager that is currently not handled
by the CallGraphUpdater so we need to prevent the situation for now.
We usually will ask for liveness of an argument anyway so we ended up
lazily creating the attribute anyway. However, that is not always the
case and even if it is we should go the eager route here. Various tests
show how this can improve the outcome. One test exposed a problem with
type mismatches between argument and call site argument, a fix is
included. For liveness various more tests were added as well.
If a function pointer is casted into a different type the resulting
expression can be a constant. If so, it can be used multiple times which
cannot be handled by the AbstractCallSite constructor alone. Instead, we
follow the cast expression uses now explicitly during the call site
traversal.
In builds with assertions enabled (!NDEBUG), IndVarSimplify does an
additional query to ScalarEvolution which may change future SCEV queries
since it fills the internal cache differently. The result is actually
only used with the -verify-indvars command line option. We fix the issue
by only calling SE->getBackedgeTakenCount(L) if -verify-indvars is
enabled such that only -verify-indvars shows the behavior, but not debug
builds themselves. Also add a remark to the description of
-verify-indvars about this behavior.
Fixes llvm.org/PR44815
Differential Revision: https://reviews.llvm.org/D74810
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.
This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV
This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D74228
Summary:
Depends on https://reviews.llvm.org/D71900.
The fourth in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-cleanup'.
No existing regression tests check the behavior of coro-cleanup on its
own, so this patch adds one. (A test named 'coro-cleanup.ll' exists, but
it relies on the entire coroutines pipeline being run. It's updated to
test the new pass manager in the 5th patch of this series.)
Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71901
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by 4698bf145d)
ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault.
AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB,
ValueMapping);
We can also allow unconditional PredBB, but the produced code is not
better.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D74747
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.
The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.
Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D74758
When simplifying demanded bits, we currently only report the
instruction on which SimplifyDemandedBits was called as changed.
However, this is a recursive call, and the actually modified
instruction will usually be further up the chain. Additionally,
all the intermediate instructions should also be revisited,
as additional combines may be possible after the demanded bits
simplification. We fix this by explicitly adding them back to the
worklist.
Differential Revision: https://reviews.llvm.org/D72944
The select-of-cttz transform can currently duplicate cttz intrinsics
and zext/trunc ops. The cause is that it unnecessarily duplicates
the intrinsic and the zext/trunc when setting the "undef_on_zero"
flag to false. However, it's always legal to set the flag from true
to false, so we can make this replacement even if there are extra users.
Differential Revision: https://reviews.llvm.org/D74685
Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have
a fold that converts icmp (and (ashr X, C3), C2), C1 into
icmp (and C2'), C1', but it imposed overly strict requirements on the
transform.
Relax this by checking that both C2 and C1 don't shift out bits
(in a signed sense) when forming the new constants.
Alive proofs (https://rise4fun.com/Alive/PTz0):
Name: ashr_legal
Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1
%a = ashr i16 %x, C3
%b = and i16 %a, C2
%c = icmp i16 %b, C1
=>
%d = and i16 %x, C2 << C3
%c = icmp i16 %d, C1 << C3
Name: ashr_shiftout_eq
Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1
%a = ashr i16 %x, C3
%b = and i16 %a, C2
%c = icmp eq i16 %b, C1
=>
%c = false
Note that >> corresponds to ashr here. The case of an equality
comparison has some special handling in this transform, because
it will form to a true/false result if the condition on the comparison
constant it violated.
Differential Revision: https://reviews.llvm.org/D74294
This patch adds a simplification if an OR weakens the overflow condition
for umul.with.overflow by treating any non-zero result as overflow. In that
case, we overflow if both umul.with.overflow operands are != 0, as in that
case the result can only be 0, iff the multiplication overflows.
Code like this is generated by code using __builtin_mul_overflow with
negative integer constants, e.g.
bool test(unsigned long long v, unsigned long long *res) {
return __builtin_mul_overflow(v, -4775807LL, res);
}
```
----------------------------------------
Name: D74141
%res = umul_overflow {i8, i1} %a, %b
%mul = extractvalue {i8, i1} %res, 0
%overflow = extractvalue {i8, i1} %res, 1
%cmp = icmp ne %mul, 0
%ret = or i1 %overflow, %cmp
ret i1 %ret
=>
%t0 = icmp ne i8 %a, 0
%t1 = icmp ne i8 %b, 0
%ret = and i1 %t0, %t1
ret i1 %ret
%res = umul_overflow {i8, i1} %a, %b
%mul = extractvalue {i8, i1} %res, 0
%cmp = icmp ne %mul, 0
%overflow = extractvalue {i8, i1} %res, 1
Done: 1
Optimization is correct!
```
Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D74141
Summary:
Depends on https://reviews.llvm.org/D71899.
The third in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements 'coro-elide'.
The new pass manager infrastructure does not implicitly repeat CGSCC
pass pipelines when a function is devirtualized, and so the tests
for the new pass manager that rely on that behavior now explicitly
specify `repeat<2>`.
Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71900
Summary:
This patch has four dependencies:
1. The first in this series of patches that implement coroutine passes in the
new pass manager: https://reviews.llvm.org/D71898.
2. A patch that introduces an API for CGSCC passes to add new reference
edges to a `LazyCallGraph`, `updateCGAndAnalysisManagerForCGSCCPass`:
https://reviews.llvm.org/D72025.
3. A patch that introduces a `CallGraphUpdater` helper class that is
capable of mutating internal `LazyCallGraph` state in order to insert
new function nodes into a specific SCC: https://reviews.llvm.org/D70927.
4. And finally, a small edge case fix for updating `LazyCallGraph` that
patch 3 above happens to run into: https://reviews.llvm.org/D72226.
This is the second in a series of patches that ports the LLVM coroutines
passes to the new pass manager infrastructure. This patch implements
'coro-split'.
Some notes:
* Using the new CGSCC pass manager resulted in IR being printed in the
reverse order in some tests. To prevent FileCheck checks from failing due
to these reversed orders, this patch splits up test files that test
multiple different coroutine functions: specifically
coro-alloc-with-param.ll, coro-split-eh.ll, and coro-eh-aware-edge-split.ll.
* CoroSplit.cpp contained 2 overloads of `splitCoroutine`, one of which
dispatched to the other based on the coroutine ABI being used (C++20
switch-based versus Swift returned-continuation-based). I found this
confusing, especially with the additional branching based on `CallGraph`
vs. `LazyCallGraph`, so I removed the ABI-checking overload of
`splitCoroutine`.
Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, qcolombet, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71899
Summary:
The first in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-early'.
NB: All coroutines passes begin by checking that coroutine intrinsics are
declared within the LLVM IR module they're operating on. To do so, they call
`coro::declaresIntrinsics`. The next 3 patches in this series, which add new
pass manager implementations of the 'coro-split', 'coro-elide', and
'coro-cleanup' passes, use a similar pattern as the one used here: a static
function is shared across both old and new passes to detect if relevant
coroutine intrinsics are delcared. To make this pattern easier to read, this
patch adds `const` keywords to the parameters of `coro::declaresIntrinsics`.
Reviewers: GorNishanov, lewissbaker, junparser, chandlerc, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: ychen, wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71898
Relative to the original commit, this fixes some warnings,
and is based on the deletion of the IRBuilder copy constructor
in D74693. The automatic copy constructor would no longer be
safe.
-----
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html
This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:
1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)
This patch addresses the issue by making the following changes:
* IRBuilderDefaultInserter::InsertHelper becomes virtual.
IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
implemented by ConstantFolder (default), NoFolder and TargetFolder.
IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
that methods can in the future replace their IRBuilder<> & uses
(or other specific IRBuilder types) with IRBuilderBase & and thus
be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
Essentially it only stores the folder and inserter and takes care
of constructing the base builder.
What this patch doesn't do, but should be simple followups after this change:
* Fixing use of the inserter for creation methods originally defined
on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.
From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.
Differential Revision: https://reviews.llvm.org/D73835
D73835 will make IRBuilder no longer trivially copyable. This patch
deletes the copy constructor in advance, to separate out the breakage.
Currently, the IRBuilder copy constructor is usually used by accident,
not by intention. In rG7c362b25d7a9 I've fixed a number of cases where
functions accepted IRBuilder rather than IRBuilder &, thus performing
an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an
IRBuilder was copied, while an InsertPointGuard should have been used
instead.
The only non-trivial use of the copy constructor is the
getIRBForDbgInsertion() helper, for which I separated construction and
setting of the insertion point in this patch.
Differential Revision: https://reviews.llvm.org/D74693
getOperationCost() is not the cost we wanted; that's not the
throughput value that the rest of the calculation uses.
We may want to switch everything in this code to use the
getInstructionThroughput() wrapper to avoid these kinds of
problems, but I'll look at that as a follow-up because that
can create other logical diffs via using optional parameters
(we'd need to speculatively create the vector instruction to
make a fair(er) comparison).
Rather than mixing creation of new instructions and in-place
modification here, create a new log2 intrinsic. This should be
NFC apart from worklist order changes.
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html
This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:
1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)
This patch addresses the issue by making the following changes:
* IRBuilderDefaultInserter::InsertHelper becomes virtual.
IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
implemented by ConstantFolder (default), NoFolder and TargetFolder.
IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
that methods can in the future replace their IRBuilder<> & uses
(or other specific IRBuilder types) with IRBuilderBase & and thus
be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
Essentially it only stores the folder and inserter and takes care
of constructing the base builder.
What this patch doesn't do, but should be simple followups after this change:
* Fixing use of the inserter for creation methods originally defined
on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.
From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.
Differential Revision: https://reviews.llvm.org/D73835
This includes a fix for cases where things get marked as overdefined in
ResolvedUndefsIn, but we later discover a constant. To avoid crashing,
we consistently bail out on overdefined values in the visitors. This is
similar to the previous behavior with forcedconstant.
This reverts the revert commit 02b72f564c.
In addition to a single bit per memory locations, e.g., globals and
arguments, we now collect more information about the actual accesses,
e.g., what instruction caused it, was it a read/write/read+write, and
what the underlying base pointer was. Follow up patches will make
explicit use of this.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73527
While the function return updateImpl did only look at call sites the
manifest method looked at return values. If we don't do this during the
updateImpl we might create new abstract attributes during manifest. This
is a problem when it comes to liveness information.
This caused an error when passes iterated over cached assumptions in the
tracker and assumed them to be `null` or an instruction. I failed to
create a test case so far.
In addition to memory behavior attributes (readonly/writeonly) we now
derive memory location attributes (argmemonly/inaccessiblememonly/...).
The former is part of AAMemoryBehavior and the latter part of
AAMemoryLocation. While they are similar in nature it got messy when
they were put in a single AA. Location attributes for arguments and
floating values will follow later.
Note that both memory attributes kinds can derive readnone. If there are
no accesses AAMemoryBehavior will derive readnone. If there are accesses
but only to stack (=local) locations AAMemoryLocation will derive
readnone.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73426
Due to the genericValueTraversal we might visit values for which we did
not create an AAValueConstantRange object, e.g., as they are behind a
PHI or select or call with `returned` argument. As a consequence we need
to validate the types as we are about to query AAValueConstantRange for
operands.
Summary:
Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408
In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes.
There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available.
The side-effect of this is that it will split the Loop pipeline.
This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA.
Reviewers: dmgreen, fedor.sergeev, nikic
Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74574
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.
Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.
However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.
Testing: stage2 RelWithDebInfo sanitized build, check-llvm
rdar://59397340
Differential Revision: https://reviews.llvm.org/D74517
binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C
This is a transform that has been considered for canonicalization (instcombine)
in the past because it reduces instruction count. But as shown in the x86 tests,
it's impossible to know if it's profitable without a cost model. There are many
potential target constraints to consider.
We have implemented similar transforms in the backend (DAGCombiner and
target-specific), but I don't think we have this exact fold there either (and if
we did it in SDAG, it wouldn't work across blocks).
Note: this patch was intended to handle the more general case where the extract
indexes do not match, but it got too big, so I scaled it back to this pattern
for now.
Differential Revision: https://reviews.llvm.org/D74495
This reverts commit 61b35e4111.
This commit causes a timeout in chromium builds; likely to have a
similar cause to the previous timeout issue caused by this commit (see
6ded69f294 for more details). It is possible that there is no way to
fix this bug that will not cause this issue; further investigations as
to the efficiency of handling large amounts of debug info will be
necessary.
Reapply 8a56d64d76 with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa.
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74372
We used coarse-grained liveness before, thus we looked if the
instruction was executed, but we did not use fine-grained liveness,
hence if the instruction was needed or could be deleted even if the
surrounding ones are live. This patches introduces this level of
liveness checks together with other liveness queries, e.g., for uses.
For more control we enforce that all liveness queries go through the
Attributor.
Test have been adjusted to reflect the changes or augmented to prevent
deletion of the parts we want to check.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73313
If we have a replacement for a value, via AAValueSimplify, the original
value will lose all its uses. Thus, as long as a value is simplified we
can skip the uses in checkForAllUses, given that these uses are
transitive uses for the simplified version and will therefore affect the
simplified version as necessary.
Since this allowed us to remove calls without side-effects and a known
return value, we need to make sure not to eliminate `musttail` calls.
Those we keep around, or later remove the entire `musttail` call chain.
We relied on wouldInstructionBeTriviallyDead before but that functions
does not take assumed information, especially for calls, into account.
The replacement, AAIsDead::isAssumeSideEffectFree, does.
This change makes AAIsDeadCallSiteReturn more complex as we can have
a dead call or only dead users.
The test have been modified to include a side effect where there was
none in order to keep the coverage.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73311
Various parts of the LLVM code generator assume that the address
argument of a dbg.declare is not a `ptrtoint`-of-alloca. ASan breaks
this assumption, and this results in local variables sometimes being
unavailable at -O0.
GlobalISel, SelectionDAG, and FastISel all do not appear to expect
dbg.declares to have a `ptrtoint` as an operand. This means that they do
not place entry block allocas in the usual side table reserved for local
variables available in the whole function scope. This isn't always a
problem, as LLVM can try to lower the dbg.declare to a DBG_VALUE, but
those DBG_VALUEs can get dropped for all the usual reasons DBG_VALUEs
get dropped. In the ObjC test case I'm looking at, the cause happens to
be that `replaceDbgDeclare` has hoisted dbg.declares into the entry
block, causing LiveDebugValues to "kill" the DBG_VALUEs because the
lexical dominance check fails.
To address this, I propose:
1) Have ASan (always) pass an alloca to dbg.declares (this patch). This
is a narrow bugfix for -O0 debugging.
2) Make replaceDbgDeclare not move dbg.declares around. This should be a
generic improvement for optimized debug info, as it would prevent the
lexical dominance check in LiveDebugValues from killing as many
variables.
This means reverting llvm/r227544, which fixed an assertion failure
(llvm.org/PR22386) but no longer seems to be necessary. I was able to
complete a stage2 build with the revert in place.
rdar://54688991
Differential Revision: https://reviews.llvm.org/D74369
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
This version includes a fix for a set of crashes caused by marking
values depending on a yet unknown & tracked call as overdefined.
In some cases, we would later discover that the call has a constant
result and try to mark a user of it as constant, although it was already
marked as overdefined. Most instruction handlers bail out early if the
instruction is already overdefined. But that is not necessary for
CastInsts for example. By skipping values that depend on skipped
calls, we resolve the crashes and also improve the precision in some
cases (see resolvedundefsin-tracked-fn.ll).
Note that we may not skip PHI nodes that may depend on a skipped call,
but they can be safely marked as overdefined, as we bail out early if
the PHI node is overdefined.
This reverts the revert commit
a74b31a3e9cd844c7ce2087978568e3f5ec8519.
Summary:
Passes ORE, BPI, BFI are not being preserved by Loop passes, hence it
is incorrect to retrieve these passes as cached.
This patch makes the loop passes in question compute a new instance.
In some of these cases, however, it may be beneficial to change the Loop pass to
a Function pass instead, similar to the change for LoopUnrollAndJam.
Reviewers: chandlerc, dmgreen, jdoerfert, reames
Subscribers: mehdi_amini, hiraditya, zzheng, steven_wu, dexonsmith, Whitney, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72891
This reverts commit 636c93ed11.
The original patch caused build failures on TSan buildbots. Commit 6ded69f294
fixes this issue by reducing the rate at which empty debug intrinsics
propagate, reducing the memory footprint and preventing a fatal spike.
This patch is a fix following the revert of 72ce759
(https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba)
and fixes the failure that it caused.
The above patch failed on the Thread Sanitizer buildbot with an out of
memory error. After an investigation, the cause was identified as an
explosion in debug intrinsics while running the Jump Threading pass on
ModuleMap.ll. The above patched prevented debug intrinsics from being
dropped when their Basic Block was deleted due to being "empty". In this
case, one of the functions in ModuleMap.ll had (after many optimization
passes) a very large number of debug intrinsics representing a set of
repeatedly inlined variables. Previously the vast majority of these were
silently dropped during Jump Threading when their blocks were deleted,
but as of the above patch they survived for longer, causing a large
increase in the number of debug intrinsics. These intrinsics were then
repeatedly cloned by the Jump Threading pass as edges were threaded,
multiplying the intrinsic count further. The memory consumed by this
process spiralled out of control, crashing the buildbot that uses TSan
(which has an estimated 5-10x memory overhead compared to non-sanitized
builds).
This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in
order to reduce the number of debug intrinsics down to a manageable
amount in cases where many intrinsics for the same variable end up
bunched together contiguously, as in this case.
Differential Revision: https://reviews.llvm.org/D73054
This causes a crash for the reproducer below
enum { a };
enum b { c, d };
e;
static _Bool g(struct f *h, enum b i) {
i &&j();
return a;
}
static k(char h, enum b i) {
_Bool l = g(e, i);
l;
}
m(h) {
k(h, c);
g(h, d);
}
This reverts commit aadb635e04.
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.
Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
As an approximation to a dead edge we can check if the terminator is
dead. If so, the corresponding operand use in a PHI node is dead even if
the PHI node itself is not.
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.
This is a followup to D73803, which uses the replaceOperand()
helper in more places.
This should be NFC apart from changes to worklist order.
Differential Revision: https://reviews.llvm.org/D73919
This patch removes forcedconstant to simplify things for the
move to ValueLattice, which includes constant ranges, but no
forced constants.
This patch removes forcedconstant and changes ResolvedUndefsIn
to mark instructions with unknown operands as overdefined. This
means we do not do simplifications based on undef directly in SCCP
any longer, but this seems to hardly come up in practice (see stats
below), presumably because InstCombine & others take care
of most of the relevant folds already.
It is still beneficial to keep ResolvedUndefIn, as it allows us delaying
going to overdefined until we propagated all known information.
I also built MultiSource, SPEC2000 and SPEC2006 and compared
sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact
is quite low:
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0%
test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5%
test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9%
test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0%
test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0%
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4%
test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1%
test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0%
test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0%
test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0%
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61314
This reverts commit d0c4d4fe09.
Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll."
This reverts commit 02266e64bb.
Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE."
This reverts commit 74f03e4ff0.
The variable was added to the initial commit via copy/paste of existing
code, but it wasn't actually used in the code. We can add it back with
the proper usage if/when that is needed.
As discussed in PR41083:
https://bugs.llvm.org/show_bug.cgi?id=41083
...we can assert/crash in EarlyCSE using the current hashing scheme and
instructions with flags.
ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or
other flags when detecting patterns such as min/max/abs composed of
compare+select. But the value numbering / hashing mechanism used by
EarlyCSE intersects those flags to allow more CSE.
Several alternatives to solve this are discussed in the bug report.
This patch avoids the issue by doing simple matching of min/max/abs
patterns that never requires instruction flags. We give up some CSE
power because of that, but that is not expected to result in much
actual performance difference because InstCombine will canonicalize
these patterns when possible. It even has this comment for abs/nabs:
/// Canonicalize all these variants to 1 pattern.
/// This makes CSE more likely.
(And this patch adds PhaseOrdering tests to verify that the expected
transforms are still happening in the standard optimization pipelines.
I left this code to use ValueTracking's "flavor" enum values, so we
don't have to change the callers' code. If we decide to go back to
using the ValueTracking call (by changing the hashing algorithm
instead), it should be obvious how to replace this chunk.
Differential Revision: https://reviews.llvm.org/D74285
Summary: It attempts to devirtualize a call on alloca through vtable loads.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
We were checking for extra uses of the negated operand even
if we were not going to create it as part of this canonicalization.
This was showing up as a regression when we limit EarlyCSE as
proposed in D74285.
This reverts commit b54a8ec1bc.
The commit triggered debug invariance (different output with/without
-g). The patch seems to have exposed a pre-existing invariance problem
in GlobalOpt, which I'll write a bug report for.
This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.
The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:
For all MemoryDefs StartDef:
1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
2. Check that there no reads between DomAccess and the StartDef by checking
all uses starting at DomAccess and walking until we see StartDef.
3. For each found DomDef, check that:
1. There are no barrier instructions between DomDef and StartDef (like
throws or stores with ordering constraints).
2. StartDef is executed whenever DomDef is executed.
3. StartDef completely overwrites DomDef.
4. Erase DomDef from the function and MemorySSA.
The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.
Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.
This is loosely based on D40480 by Dave Green.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72700
This copies the DSE tests into a MSSA subdirectory to test the MemorySSA
backed DSE implementation, without disturbing the original tests.
Differential Revision: https://reviews.llvm.org/D72145
This is a minimal but important advancement over the existing code. A
cast with an operand that is only used in the cast retains the no-alias
property of the operand.
Traversing PHI nodes is natural with the genericValueTraversal but also
a bit tricky. The problem is similar to the ones we have seen in AAAlign
and AADereferenceable, namely that we continue to increase the range in
each iteration. We use a pessimistic approach here to stop the
iterations. Nevertheless, optimistic information can now be propagated
through a PHI node.
The change is performed as stated by the FIXME and the tests are
adjusted. All changes look fine to me and values can be inferred as
undef without it being an error.
Casts can be handled natively by the ConstantRange class. We do limit it
to extends for now as we assume an integer type in different locations.
A TODO and a test case with a FIXME was added to remove that restriction
in the future.
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:
InstCombine - can't happen without TTI, but we don't want target-specific
folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped
for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement
basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.
Differential Revision: https://reviews.llvm.org/D73480
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.
Fixes PR3082 and PR8929.
Differential Revision: https://reviews.llvm.org/D69069
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.
The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D70767
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69954
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69984
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.
The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.
This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69930
The CallGraphUpdater is a helper that simplifies the process of updating
the call graph, both old and new style, while running an CGSCC pass.
The uses are contained in different commits, e.g. D70767.
More functionality is added as we need it.
Reviewed By: modocache, hfinkel
Differential Revision: https://reviews.llvm.org/D70927
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.
Differential Revision: https://reviews.llvm.org/D74079
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.
Differential Revision: https://reviews.llvm.org/D73849
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.
This tends to be more readable and less prone to worklist management
bugs.
Test changes are only superficial (instruction naming and order).
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).
Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.
As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.
Differential Revision: https://reviews.llvm.org/D74278
This improves on the following patch, which removed ARC runtime calls
taking inert global variables:
https://reviews.llvm.org/D62433
rdar://problem/59137105
Summary:
This enables it for large working set size cases only.
This does not enable it under sample PGO.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74073
IRCE pass checks that it can calculate loop bounds by checking
SCEV availability at loop entry. However it is possible that loop
bound SCEV is loop invariant, but instruction used to compute it
resides within loop. In such case adjusting loop bound in preheader
using IRBuilder leads to malformed SSA.
Use SCEVExpander instead to generate proper instructions.
Reviewed-by: mkazantsev
Differential Revision: https://reviews.llvm.org/D73496
Summary: This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44388 which incorrectly assigns an ABI alignment to memset when there was no explicit alignment given.
Reviewers: gchatelet, lenary, nikic
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74083
Summary:
Tune the profile threshold flag value for instrumentation PGO based on internal
benchmarks.
Also, add flags to allow profile guided size optimizations for non-cold code
to be enabled separately for instrumentation and sample PGSO.
Neither changes the default behavior (yet) as it's disabled for non-cold code.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72937
This reverts commit 41784bed01.
Since the original revision ead815924e,
this revision fixes three issues:
- This revision fixes the Windows build. My original patch improperly
copied EH pads on Windows. This patch disregards jump threading
opportunities having to do with EH pads.
- This revision fixes jump threading to a wrong destination.
Specifically, my original patch treated any Constant other than 0 as 1
while evaluating the branch condition. This bug led to treating
constant expressions like:
icmp ugt i8* null, inttoptr (i64 4 to i8*)
to "true". This patch fixes the bug by calling isOneValue.
- This revision fixes the cost calculation of two basic blocks being
threaded through. Note that getJumpThreadDuplicationCost returns
"(unsigned)~0" for those basic blocks that cannot be duplicated. If
we sum of two return values from getJumpThreadDuplicationCost, we
could have an unsigned overflow like:
(unsigned)~0 + 5 = 4
and mistakenly determine that it's safe and profitable to proceed
with the jump threading opportunity. The patch fixes the bug by
checking each return value before summing them up.
[JumpThreading] Thread jumps through two basic blocks
Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:
bb3:
%var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
by duplicating basic blocks like bb3 above. Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:
bb3:
%var = phi i32* [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb3.dup:
%var = phi i32* [ null, %bb1 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70247
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.
This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).
This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.
Depends on D71913.
Reviewers: pcc, evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
This is a bug noted in the recent D72733 and seen
in the similar transform just above the changed source code.
I added tests with illegal types and zexts to show the bug -
we could transform legal phi ops to illegal, etc. I did not add
tests with trunc because we won't see any diffs on those patterns.
That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to
do those transforms independently of datalayout. It can also create
more casts than are present in existing code.
There are some existing regression tests that do not include a
datalayout that would be altered by this fix. I assumed that the
lack of a datalayout in those regression files is an oversight, so
I added the minimal layout (make i32 legal) necessary to preserve
behavior on those tests.
Differential Revision: https://reviews.llvm.org/D73907
Adds the global (cl::opt) GVNOption enable-load-in-loop-pre in order
to control whether the optimization will be performed if the load
is part of a loop.
Patch by Hendrik Greving!
Differential Revision: https://reviews.llvm.org/D73804
Summary:
Method appendLoopsToWorklist is duplicate in LoopUnroll and in the
LoopPassManager as an internal method. Make it an utility.
Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi
Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73569
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.
This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.
Differential Revision: https://reviews.llvm.org/D73803
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
Summary:
A recent change to enable more importing of global variables with
references exposed some efficiency issues with export computation.
See D73724 for more information and detailed analysis.
The first was specific to variable importing. The code was marking every
copy of a referenced value (from possibly thousands of files in the case
of linkonce_odr) as exported, and we only need to mark the copy in the
module containing the variable def being imported as exported. The
reason is that this is tracking what values are newly exported as a
result of importing. Anything that was defined in another module and
simply used in the exporting module is already exported, and would have
been identified by the caller (e.g. the LTO API implementations).
The second issue is that the code was re-adding previously exported
values (along with all references). It is easy to identify when a
variable was already imported into the same module (via the
import list insert call return value), and we already did this for
function importing. However, what we weren't doing for either function
or variable importing was avoiding a re-insertion when it was previously
exported into a different importing module. The reason we couldn't do
this is there was no way of telling from the export list whether it was
previously inserted there because its definition was exported (in which
case we already marked all its references as exported) from when it was
inserted there because it was referenced by another exported value (in
which case we haven't yet inserted its own references).
To address this we can restructure the way the export list is
constructed. This patch only adds the actual imported definitions
(variable or function) to the export list for its module during the
import computation. After import computation is complete, where we were
already post-processing the export list we go ahead and add all
references made by those exported values to the export list.
These changes speed up the thin link not only with constant variable
importing enabled, but also without (due to the efficiency improvement
in function importing).
Some thin link user time measurements for one large application, average
of 5 runs:
With constant variable importing enabled:
- without this patch: 479.5s
- with this patch: 74.6s
Without constant variable importing enabled:
- without this patch: 80.6s
- with this patch: 70.3s
Note I have not re-enabled constant variable importing here, as I would
like to do additional compile time measurements with these fixes first.
Reviewers: evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73851
bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp
This patch depends on the splat analysis enhancement in D73549.
See the test with comment:
; Negative test - mismatched splat elements
...as the motivation for that first patch.
The motivating case for reassociating splatted ops is shown in PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174
In that example, a slight change in order-of-associative math results
in a big difference in IR and codegen. This patch gets all of the
unnecessary shuffles out of the way, but doesn't address the potential
scalarization (see D50992 or D73480 for that).
Differential Revision: https://reviews.llvm.org/D73703
Duplicating instructions can lead to code size increases but using
a threshold of 3 is good for reducing code size.
Differential Revision: https://reviews.llvm.org/D72916
If all call sites are in `norecurse` functions we can derive `norecurse`
as the ReversePostOrderFunctionAttrsPass does. This should make
ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is
enabled.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D72017
If we know that all call sites have been processed we can derive an
early fixpoint. The use in this patch is likely not to trigger right now
but a follow up patch will make use of it.
Reviewed By: uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D72016
If we had `noalias` on an argument the inliner created alias scope
metadata already. However, the call site `noalias` annotation was not
considered. Since the Attributor can derive such call site `noalias`
annotation we should treat them the same as argument annotations.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D73528
Fix attempt
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.
Add a variant for MBFIWrapper in the PGSO query interface.
Depends on D73494.
We may calculate reassociable math ops in arbitrary order when creating a shuffle reduction,
so there's no guarantee that things like 'nsw' hold on those intermediate values. Drop all
poison-generating flags for safety.
This change is limited to shuffle reductions because I don't think we have a problem in the
general case (where we intersect flags of each scalar op that goes into a vector op), but if
there's evidence of other cases being wrong, we can extend this fix to cover those cases.
https://bugs.llvm.org/show_bug.cgi?id=44536
Differential Revision: https://reviews.llvm.org/D73727
In line with current conventions, create new instructions rather
than modify two operands in place and performing manual worklist
management.
This should be NFC apart from possible worklist order changes.
from FC0.ExitBlock to FC1.ExitBlock when proven safe.
Summary:
Currently LoopFusion give up when the second loop nest guard
block or the first loop nest exit block is not empty. For example:
if (0 < N) {
for (int i = 0; i < N; ++i) {}
x+=1;
}
y+=1;
if (0 < N) {
for (int i = 0; i < N; ++i) {}
}
The above example should be safe to fuse.
This PR moves instructions in FC1 guard block (e.g. y+=1;) to
FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to
FC1 exit block, which then LoopFusion is able to fuse them.
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73641
Summary:
When constant folding, constants that are wrapped in metadata were not
folded. This could lead to dbg.values being the only user of a constant
expression, due to the non-dbg uses having been rewritten, resulting in
the constant later on being removed by some other pass. This occurred
with the attached test case, in which the non-rewritten GEP in the
dbg.value intrinsic was later on removed by globalopt.
This patch makes the code look through metadata and fold such constants.
I guess that we in the future may want to allow dbg.values using GEPs and
other constant expressions to be emittable even if there are no non-dbg
uses, but for example SelectionDAG does not support that.
Reviewers: jmorse, aprantl, vsk, davide
Reviewed By: aprantl, vsk, davide
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D73630
Summary:
Add trimming of unused components of s_buffer_load.
For s_buffer_load and unformatted buffer_load also trim unused
components at the beginning of vector and update offset accordingly.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71785
Summary:
In a release build this variable becomes unused and may break the build
with `-Werror,-Wunused-variable`.
Reviewers: gribozavr2, jdoerfert, sstefan1
Reviewed By: gribozavr2
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73683
InstCombine operates on the basic premise that the operands of the
currently processed instruction have already been simplified. It
achieves this by pushing instructions to the worklist in reverse
program order, so that instructions are popped off in program order.
The worklist management in the main combining loop also makes sure
to uphold this invariant.
However, the same is not true for all the code that is performing
manual worklist management. The largest problem (addressed in this
patch) are instructions inserted by InstCombine's IRBuilder. These
will be pushed onto the worklist in order of insertion (generally
matching program order), which means that a) the users of the
original instruction will be visited first, as they are pushed later
in the main loop and b) the newly inserted instructions will be
visited in reverse program order.
This causes a number of problems: First, folds operate on instructions
that have not had their operands simplified, which may result in
optimizations being missed (ran into this in
https://reviews.llvm.org/D72048#1800424, which was the original
motivation for this patch). Additionally, this increases the amount
of folds InstCombine has to perform, both within one iteration, and
by increasing the number of total iterations.
This patch addresses the issue by adding a Worklist.AddDeferred()
method, which is used for instructions inserted by IRBuilder. These
will only be added to the real worklist after the combine finished,
and in reverse order, so they will end up processed in program order.
I should note that the same should also be done to nearly all other
uses of Worklist.Add(), but I'm starting with just this occurrence,
which has by far the largest test fallout.
Most of the test changes are due to
https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where
we don't canonicalize something. These are neutral. One regression
has been addressed in D73575 and D73647. The remaining regression
in an shl+sdiv fold can't really be fixed without dropping another
transform, but does not seem particularly problematic in the first
place.
Differential Revision: https://reviews.llvm.org/D73411
Summary:
This patch makes sure that the field VFShape.VF is greater than zero
when demangling the vector function name of scalable vector functions
encoded in the "vector-function-abi-variant" attribute.
This change is required to be able to provide instances of VFShape
that can be used to query the VFDatabase for the vectorization passes,
as such passes always require a positive value for the Vectorization Factor (VF)
needed by the vectorization process.
It is not possible to extract the value of VFShape.VF from the mangled
name of scalable vector functions, because it is encoded as
`x`. Therefore, the VFABI demangling function has been modified to
extract such information from the IR declaration of the vector
function, under the assumption that _all_ vectors in the signature of
the vector function have the same number of lanes. Such assumption is
valid because it is also assumed by the Vector Function ABI
specifications supported by the demangling function (x86, AArch64, and
LLVM internal one).
The unit tests that demangle scalable names have been modified by
adding the IR module that carries the declaration of the vector
function name being demangled.
In particular, the demangling function fails in the following cases:
1. When the declaration of the scalable vector function is not
present in the module.
2. When the value of VFSHape.VF is not greater than 0.
Reviewers: jdoerfert, sdesmalen, andwar
Reviewed By: jdoerfert
Subscribers: mgorny, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73286
A pointer is privatizeable if it can be replaced by a new, private one.
Privatizing pointer reduces the use count, interaction between unrelated
code parts. This is a first step towards replacing argument promotion.
While we can already handle recursion (unlike argument promotion!) we
are restricted to stack allocations for now because we do not analyze
the uses in the callee.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68852
The helpers AAReturnedFromReturnedValues and
AACallSiteReturnedFromReturned are useful not only to avoid code
duplication but also to avoid recomputation of results. If we have N
call sites we should not recompute the function return information N
times but once. These are mostly straightforward usages with some minor
improvements on the helpers and addition of a new one
(IRPosition::getAssociatedType) that knows about function return types.
For the
icmp eq (add X, C1), C2 => icmp eq X, C2-C1
icmp eq (sub C1, X), C2 => icmp eq X, C1-C2
folds, this allows C1 to be non-splat and contain undefs.
C2 is still splat, due to the structure of the code.
This is to address the remaining part of the regression in D73411,
where demanded element analysis replaces some elements with undef.
Differential Revision: https://reviews.llvm.org/D73647
This commit fixes PR39321.
GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction.
This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet.
Differential Revision: https://reviews.llvm.org/D71959
proven safe.
Summary:
Currently LoopFusion give up when the second loop nest preheader is
not empty. For example:
for (int i = 0; i < 100; ++i) {}
x+=1;
for (int i = 0; i < 100; ++i) {}
The above example should be safe to fuse.
This PR moves instructions in FC1 preheader (e.g. x+=1; ) to
FC0 preheader, which then LoopFusion is able to fuse them.
Reviewer: kbarton, Meinersbur, jdoerfert, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71821
cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M
As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.
This patch handles the special (but presumably most common case) of
splat shuffles. If both operands are splats, then we can do the
comparison on the non-splat inputs followed by splat of the compare.
That should take care of the regression noted in D73411.
There's another potential fold requested in PR37463 to scalarize the
compare, but that's another patch (and it's not clear if we can do
that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463
Differential Revision: https://reviews.llvm.org/D73575
There was a TODO in AAValueConstantRangeArgument to reuse
AAArgumentFromCallSiteArguments. We now do this by allowing new States
to be build from the bestState.
If we invalidate an attribute we need to inform all dependent ones even
if the fixpoint state is not invalid. Before we only continued
invalidation if the fixpoint state was invalid, now we signal a change
in case the fixpoint state is valid.
The test case was already included in D71620 but the problem was hiding
because it only manifested with the old PM (for that input).