We want to insert no-op casts as close as possible to the def. This is
tricky when the cast is of a PHI node and the BasicBlocks between the
def and the use cannot hold any instructions. Iteratively walk EH pads
until we hit a non-EH pad.
This fixes PR25326.
llvm-svn: 251393
Use `getUnsignedMax` directly instead of special casing a wrapped
ConstantRange.
The previous code would have been "buggy" (and this would have been a
semantic change) if LLVM allowed !range metadata to denote full
ranges. E.g. in
%val = load i1, i1* %ptr, !range !{i1 1, i1 1} ;; == full set
ValueTracking would conclude that the high bit (IOW the only bit) in
%val was zero.
Since !range metadata does not allow empty or full ranges, this change
is just a minor stylistic improvement.
llvm-svn: 251380
Summary: This idiom is used elsewhere in LLVM, but was overlooked here.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13628
llvm-svn: 251348
This issue is triggered in PGO mode when bootstrapping LLVM. It seems that it is not guaranteed that edge weights are always greater than zero which are read from profile data.
llvm-svn: 251317
Even though we may not know the value of the shifter operand, it's possible we know the shifter operand is non-zero. This can allow us to infer more known bits - for example:
%1 = load %p !range {1, 5}
%2 = shl %q, %1
We don't know %1, but we do know that it is nonzero so %2[0] is known zero, and importantly %2 is known non-zero.
Calling isKnownNonZero is nontrivially expensive so use an Optional to run it lazily and cache its result.
llvm-svn: 251294
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.
Differential Revision: http://reviews.llvm.org/D13722
llvm-svn: 251237
The loop idiom creating a ConstantRange is repeated twice in the
codebase, time to give it a name and a home.
The loop is also repeated in `rangeMetadataExcludesValue`, but using
`getConstantRangeFromMetadata` there would not be an NFC -- the range
returned by `getConstantRangeFromMetadata` may contain a value that none
of the subranges did.
llvm-svn: 251180
First, the motivation: LLVM currently does not realize that:
((2072 >> (L == 0)) >> 7) & 1 == 0
where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.
This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:
1. InstCombine won't automatically pick up the associated logic in
InstSimplify (InstCombine uses InstSimplify, but not via the API that
passes in the original instruction).
2. Putting the logic in InstCombine allows the resulting simplifications to become
part of the iterative worklist
3. Putting the logic in InstSimplify allows the resulting simplifications to be
used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
and many others).
And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.
llvm-svn: 251146
Instead of checking `(FlagsPresent & ExpectedFlags) != 0`, check
`(FlagsPresent & ExpectedFlags) == ExpectedFlags`. Right now they're
equivalent since `ExpectedFlags` can only be either `FlagNUW` or
`FlagNSW`, but if we ever pass in `ExpectedFlags` as `FlagNUW | FlagNSW`
then checking `(FlagsPresent & ExpectedFlags) != 0` would be wrong.
llvm-svn: 251142
If the loaded type sizes don't match the element type of the sequential type, all bets are off and the addresses may, indeed, overlap.
Surprisingly, this just got caught in one test, on one builder, out of the 30+ builders testing this change. Congratulations go to http://lab.llvm.org:8011/builders/clang-aarch64-lnt/builds/5205.
llvm-svn: 251112
I could not come up a way to test this -- I think this bug is latent
today, and will not actually result in a miscompile.
In `getPreStartForExtend`, SCEV constructs `PreStart` as a sum of all of
`SA`'s operands except `Op`. It also uses `SA`'s no-wrap flags, and
this is problematic because removing an element from an add expression
can make it signed-wrap. E.g. if `SA` was `(127 + 1 + -1)`, then it
could safely be `<nsw>` (since `sext(127) + sext(1) + sext(-1)` ==
`sext(127 + 1 + -1)`), but `(127 + 1)` (== `PreStart` if `Op` is `-1`)
is not `<nsw>`.
Transferring `<nuw>` from `SA` to `PreStart` is safe, as far as I can
tell.
llvm-svn: 251097
In r251064 I removed a logically unreachable call to `redoLoop`, and
now there aren't any callers of this API at all. Remove the needless
complexity.
llvm-svn: 251067
The insertLoop() API is only used to add new loops, and has confusing
ownership semantics. Simplify it by replacing it with addLoop().
llvm-svn: 251064
Summary:
An unsigned comparision is equivalent to is corresponding signed version
if both the operands being compared are positive. Teach SCEV to use
this fact when profitable.
Reviewers: atrick, hfinkel, reames, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13687
llvm-svn: 251051
Summary:
- A s< (A + C)<nsw> if C > 0
- A s<= (A + C)<nsw> if C >= 0
- (A + C)<nsw> s< A if C < 0
- (A + C)<nsw> s<= A if C <= 0
Right now `C` needs to be a constant, but we can later generalize it to
be a non-constant if needed.
Reviewers: atrick, hfinkel, reames, nlewycky
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13686
llvm-svn: 251050
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.
Reviewers: atrick, hfinkel, nlewycky
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13613
llvm-svn: 251048
Instead of bailing out when we see loads, analyze them. If we can prove that the loaded-from address must escape, then we can conclude that a load from that address must escape too and therefore cannot alias a non-addr-taken global.
When checking if a Value can alias a non-addr-taken global, if the Value is a LoadInst of a non-global, recurse instead of bailing.
If we can follow a trail of loads up to some base that is captured, we know by inference that all the loads we followed are also captured.
llvm-svn: 251017
If the final indices of two GEPs can be proven to not be equal, and
the GEP is of a SequentialType (not a StructType), then the two GEPs
do not alias.
llvm-svn: 251016
isKnownNonEqual(A, B) returns true if it can be determined that A != B.
At the moment it only knows two facts, that a non-wrapping add of nonzero to a value cannot be that value:
A + B != A [where B != 0, addition is nsw or nuw]
and that contradictory known bits imply two values are not equal.
This patch also hooks this up to InstSimplify; InstSimplify had a peephole for the first fact but not the second so this teaches InstSimplify a new trick too (alas no measured performance impact!)
llvm-svn: 251012
"external" AA wrapper pass.
This is a generic hook that can be used to thread custom code into the
primary AAResultsWrapperPass for the legacy pass manager in order to
allow it to merge external AA results into the AA results it is
building. It does this by threading in a raw callback and so it is
*very* powerful and should serve almost any use case I have come up with
for extending the set of alias analyses used. The only thing not well
supported here is using a *different order* of alias analyses. That form
of extension *is* supportable with the new pass manager, and I can make
the callback structure here more elaborate to support it in the legacy
pass manager if this is a critical use case that people are already
depending on, but the only use cases I have heard of thus far should be
reasonably satisfied by this simpler extension mechanism.
It is hard to test this using normal facilities (the built-in AAs don't
use this for obvious reasons) so I've written a fairly extensive set of
custom passes in the alias analysis unit test that should be an
excellent test case because it models the out-of-tree users: it adds
a totally custom AA to the system. This should also serve as
a reasonably good example and guide for out-of-tree users to follow in
order to rig up their existing alias analyses.
No support in opt for commandline control is provided here however. I'm
really unhappy with the kind of contortions that would be required to
support that. It would fully re-introduce the analysis group
self-recursion kind of patterns. =/
I've heard from out-of-tree users that this will unblock their use cases
with extending AAs on top of the new infrastructure and let us retain
the new analysis-group-free-world.
Differential Revision: http://reviews.llvm.org/D13418
llvm-svn: 250894
We were keeping a reference to an object in a DenseMap then mutating it. At the end of the function we were attempting to clone that reference into other keys in the DenseMap, but DenseMap may well decide to resize its hashtable which would invalidate the reference!
It took an extremely complex testcase to catch this - many thanks to Zhendong Su for catching it in PR25225.
This fixes PR25225.
llvm-svn: 250692
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.
Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.
Differential Revision: http://reviews.llvm.org/D13850
llvm-svn: 250686
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
%.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!
I will re-commit this if the bot does not recover.
llvm-svn: 250366
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250345
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.
The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.
Differential Revision: http://reviews.llvm.org/D13253
llvm-svn: 250338
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250204
Weak linkage and friends allow a symbol to be overriden outside the
code generator's model, so GlobalsAA shouldn't assume that anything it
can compute about such a symbol is valid.
llvm-svn: 250156
In a later commit, `SplitBinaryAdd` will be used outside `IsConstDiff`,
so lift that out. And lift out `IsConstDiff` as
`computeConstantDifference` to keep things clean and to avoid playing
C++ access specifier games.
NFC.
llvm-svn: 250143
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250089