Only wanting to pass a single SCEV operand to use as the offset of
the GEP is a common operation. Right now this requires creating a
temporary stack array at every call site. Add an overload
that encapsulates that pattern and simplify the call sites.
Suggested-By: sanjoy (in https://reviews.llvm.org/D49832)
llvm-svn: 338072
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's
if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's
(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)
This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:
1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
second one that may cause wrapping within the extension operator, and
move the first one that doesn't affect wrapping out of the extension
operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
AddExpr the 2nd and following terms and in case of AddRecExpr the
Step component don't have to be in the C2*X form or constant
(respectively), they just need to have enough trailing zeros,
which in turn could be guaranteed by means other than arithmetics,
e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
transformation works and profitable for zext's as well.
Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:
double bar(double *a, unsigned n) {
double x = 0.0;
double y = 0.0;
for (unsigned i = 0; i < n; i += 2) {
x += a[i];
y += a[i + 1];
}
return x * y;
}
If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})
it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).
How it all works:
Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:
i
XXXX...X000
YYYY...YY00
...
ZZZZ...0000
then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.
Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:
j
CCCC...CCCC
XXXX...XX00 // this is X = (x + y + ...)
Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).
So let's split C to 2 constants like follows:
0000...00CC = D
CCCC...CC00 = (C - D)
and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:
CCCC...CC00
XXXX...XX00
----------- // let's add them up
YYYY...YY00
The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:
YYYY...YY00 = Y
0000...00CC = D
----------- <nuw><nsw>
YYYY...YYCC
Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.
Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:
0001 0101 // 21
XXXX XX00 // 12x
YYYY Y000 // 8y
0001 0101 // 21
ZZZZ ZZ00 // 12x + 8y
0000 0001 // D
0001 0100 // 21 - D = 20
ZZZZ ZZ00 // 12x + 8y
0000 0001 // D
WWWW WW00 // 21 - D + 12x + 8y = 20 + 12x + 8y
therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>
This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:
i
10 1110...0011 // this is C
XX X1XX...XX00 // this is X = (x + y + ...)
Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.
We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:
i
00 0010...0011 = D
10 1100...0000 = (C - D)
Let's compute the KnownBits of (C - D) + X:
XX1 1 = carry bit, blanks stand for known zeroes
10 1100...0000 = (C - D)
XX X1XX...XX00 = X
--- -----------
XX X0XX...XX00
Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:
0 X
00 0010...0011 = D
sX X0XX...XX00 = (C - D) + X
--- -----------
sX XXXX XX11
As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.
The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.
Reviewed By: mzolotukhin, efriedma
Differential Revision: https://reviews.llvm.org/D48853
llvm-svn: 337943
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.
Differential Revision: https://reviews.llvm.org/D49759
llvm-svn: 337936
if the top level addition in (D + (C-D + x + ...)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the
transformation and better canonicalization of such expressions.
This enables better canonicalization of expressions like
1 + zext(5 + 20 * %x + 24 * %y) and
zext(6 + 20 * %x + 24 * %y)
which get both transformed to
2 + zext(4 + 20 * %x + 24 * %y)
This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.
Reviewed By: mzolotukhin
Differential Revision: https://reviews.llvm.org/D48853
llvm-svn: 337859
Summary:
Check if the parent basic block and caller exists
before calling CS.getCaller when constant folding
strip.invariant.group instrinsic.
This avoids a crash when the function containing the intrinsic
is being inlined. The instruction is checked for any simplifiction
but has not yet been added to a basic block.
Reviewers: Prazek, rsmith, efriedma
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D49690
llvm-svn: 337742
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.
Differential Revision: https://reviews.llvm.org/D49425
llvm-svn: 337680
Summary:
This takes 22ms out of ~20s compiling sqlite3.c because we call it
for every unit of compilation and every pass.
Reviewers: paquette, anemet
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D49586
llvm-svn: 337654
Summary:
When splitting predecessors in BasicBlockUtils, we create a new block as an immediate predecessor of the original BB, then we connect a given set of predecessors to the new block.
The API in this patch will be used to update MemoryPhis for this CFG change.
If all predecessors are being moved, we move the MemoryPhi directly. Otherwise we create a new MemoryPhi in the NewBB and populate its incoming values, while deleting them from BB's Phi.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D49156
llvm-svn: 337581
SCEV tries to constant-fold arguments of trunc operands in SCEVAddExpr, and when it does
that, it passes wrong flags into the recursion. It is only valid to pass flags that are proved for
narrow type into a computation in wider type if we can prove that trunc instruction doesn't
actually change the value. If it did lose some meaningful bits, we may end up proving wrong
no-wrap flags for sum of arguments of trunc.
In the provided test we end up with `nuw` where it shouldn't be because of this bug.
The solution is to conservatively pass `SCEV::FlagAnyWrap` which is always a valid thing to do.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D49471
llvm-svn: 337435
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.
Differential Revision: https://reviews.llvm.org/D48372
llvm-svn: 337149
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776
We have another test to demonstrate the more general bug.
llvm-svn: 337127
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.
The iterator may be invalidated if a pass removes a function, ex.:
llvm::LegacyInlinerBase::inlineCalls
inlineCallsImpl
llvm::CallGraph::removeFunctionFromModule
llvm-svn: 337018
Summary:
This commit does two things:
1. modified the existing DivergenceAnalysis::dump() so it dumps the
whole function with added DIVERGENT: annotations;
2. added code to do that dump if the appropriate -debug-only option is
on.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47700
Change-Id: Id97b605aab1fc6f5a11a20c58a99bbe8c565bf83
llvm-svn: 336998
Summary:
The move APIs added in this patch will be used to update MemorySSA when CFG changes merge or split blocks, by moving memory accesses accordingly in MemorySSA's internal data structures.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48897
llvm-svn: 336860
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.
Differential Revision: https://reviews.llvm.org/D48860
llvm-svn: 336611
This patch ports hasDedicatedExits, getUniqueExitBlocks and
getUniqueExitBlock in Loop to LoopBase so that they can be used
from other LoopBase sub-classes.
Reviewers: chandlerc, sanjoy, hfinkel, fhahn
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D48817
llvm-svn: 336572
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.
This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.
Should fix PR38029.
llvm-svn: 336419
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48409
llvm-svn: 336140
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
Summary:
MemoryPhis now have APIs analogous to BB Phis to remove an incoming value/block.
The MemorySSAUpdater uses the above APIs when updating MemorySSA given a set of dead blocks about to be deleted.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48396
llvm-svn: 336015
Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or
profiling information. The colors are enabled by default and can be toggled
on/off for CFGPrinter by using the option -cfg-heat-colors for both
-dot-cfg[-only] and -view-cfg[-only]. Similarly, the colors can be toggled
on/off for CallPrinter by using the option -callgraph-heat-colors for both
-dot-callgraph and -view-callgraph.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D40425
llvm-svn: 335996
We can have AddRec with loops having many predecessors.
This changes an assert to an early return.
Differential Revision: https://reviews.llvm.org/D48766
llvm-svn: 335965
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].
The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
with an type assertion in cast, because `%dec` is no longer an `Instruction`,
If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.
So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.
Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi
Reviewed By: mkazantsev
Subscribers: evstupac, javed.absar, spatel, llvm-commits
Differential Revision: https://reviews.llvm.org/D48599
llvm-svn: 335950
This pass is being added in order to make the information available to BasicAA,
which can't do caching of this information itself, but possibly this information
may be useful for other passes.
Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm.
Differential Revision: https://reviews.llvm.org/D47893
llvm-svn: 335857
Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.
This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.
Reviewers: asbirlea, chandlerc, sanjoy, grosser
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48609
llvm-svn: 335751
Summary:
Adds a string saver to the ModuleSummaryIndex so it can store value
names in the case of adding a ValueInfo for a GUID when we don't
have the name stored in a Module string table. This is motivated
by the upcoming summary parser patch, where we will read value names
from the summary entry and want to store them, even when a Module
is not available.
Currently this allows us to store the name in the legacy bitcode case,
and I have added a test to show that.
Reviewers: pcc, dexonsmith
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D47842
llvm-svn: 335570
Summary:
I discovered when writing the summary parsing support that the
per-module index builder and writer are computing the GUID from the
value name alone (ignoring the linkage type). This was ok since those
GUID were not emitted in the bitcode, and there are never multiple
conflicting names in a single module.
However, I don't see a reason for making the GUID computation different
for the per-module case. It also makes things simpler on the parsing
side to have the GUID computation consistent. So this patch changes the
summary analysis phase and the per-module summary writer to compute the
GUID using the facility on the GlobalValue.
Reviewers: pcc, dexonsmith
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D47844
llvm-svn: 335560
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.
Reviewers: vsk, aprantl, sanjoy, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D47874
llvm-svn: 335513
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D48548
llvm-svn: 335493
We can prove that some delinearized subscripts do not wrap around to become
negative by the fact that they are from inbound geps of load/store locations.
This helps improve the delinearisation in cases where we can't prove that they
are non-negative from SCEV alone.
Differential Revision: https://reviews.llvm.org/D48481
llvm-svn: 335481