Use cast<> instead which will assert that the cast is correct and not just return null - the match() should have already failed if the cast isn't valid anyhow.
Fixes static analysis warning.
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
Restructure handling of cfg-hide-unreachable-paths and
cfg-hide-deoptimize-paths options so as to make it easier
to introduce new types of hidden blocks.
Add getDemandedBits method for uses so we can query demanded bits for each use. This can help getting better use information. For example, for the code below
define i32 @test_use(i32 %a) {
%1 = and i32 %a, -256
%2 = or i32 %1, 1
%3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose)
... some use of %3
ret %2
}
if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97074
When rewriting
powf(2.0, itofp(x)) -> ldexpf(1.0, x)
exp2(sitofp(x)) -> ldexp(1.0, sext(x))
exp2(uitofp(x)) -> ldexp(1.0, zext(x))
the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.
The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.
Differential Revision: https://reviews.llvm.org/D99438
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103429
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense. This triggers an assertion complaining
it's not loop-invariant.
Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.
Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.
Differential Revision: https://reviews.llvm.org/D102959
Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.
Fixes PR44286 (and another PR or two).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84951
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
`NonLocalDeps` to `NonLocalDepsMap`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D102358
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.
In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB
heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB
heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB
heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB
heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB
heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```
While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).
We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.
I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.
https://llvm.org/PR50500
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102594
This reverts commit 1ed7f8ede5.
This change can cause loop-distribute to crash in some cases. Revert
until I have more time to wrap up a fix.
See PR50296, PR5028 and D102266.
MSVC-style RTTI produces loads through a GEP of a local alias which
itself is a GEP. Currently we aren't able to devirtualize any virtual
calls when MSVC RTTI is enabled.
This patch attempts to simplify a load's GEP operand by calling
SymbolicallyEvaluateGEP() with an option to look through local aliases.
Differential Revision: https://reviews.llvm.org/D101100
This patch implements getSmallConstantTripMultiple(L) correctly for multiple exit loops. The previous implementation was both imprecise, and violated the specified behavior of the method. This was fine in practice, because it turns out the function was both dead in real code, and not tested for the multiple exit case.
Differential Revision: https://reviews.llvm.org/D103189
This came up in review for another patch, see https://reviews.llvm.org/D102982#2782407 for full context.
I've reviewed the callers to make sure they can handle multiple exit loops w/non-zero returns. There's two cases in target cost models where results might change (Hexagon and PowerPC), but the results looked legal and reasonable. If a target maintainer wishes to back out the effect of the costing change, they should explicitly check for multiple exit loops and handle them as desired.
Differential Revision: https://reviews.llvm.org/D103182
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.
By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.
Differential Revision: https://reviews.llvm.org/D102934
When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.
This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.
This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions
For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.
Differential Revision: https://reviews.llvm.org/D102796
The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd
We can find this pattern after demanded elements transforms.
As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.
This follows from the underlying logic for binops and min/max.
Although it does not appear that we handle this for min/max
intrinsics currently.
https://alive2.llvm.org/ce/z/Kq9Xnh
These functions were marked unavailable for MSVC targets before,
within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones
are unavailable on MinGW targets too.
This avoids generating calls to stpcpy for MinGW targets, which has
been happening since 6dbf0cfcf7 (in
some cases).
This fixes https://github.com/mstorsjo/llvm-mingw/issues/201.
Differential Revision: https://reviews.llvm.org/D102946
Replace use of host floating types with operations on APFloat when it is
possible. Use of APFloat makes analysis more convenient and facilitates
constant folding in the case of non-default FP environment.
Differential Revision: https://reviews.llvm.org/D102672
The previous rule:
(insert_vector _, (extract_vector X, 0), 0) -> X
is not quite correct. The correct fold should be:
(insert_vector Y, (extract_vector X, 0), 0) -> X
where: Y is X, or Y is undef
This commit updates the pattern.
Reviewed By: peterwaller-arm, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D102699
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.
With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.
Differential Revision: https://reviews.llvm.org/D102671
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
This code was re-implementing the same-BB case of
isPotentiallyReachable(). Historically, this was done because
CaptureTracking used additional caching for local dominance
queries. Now that it is no longer needed, the code is effectively
the same as isPotentiallyReachable().
The only difference are extra checks for invoke/phis. These are
misleading checks related to dominance in the value availability
sense that are not relevant for control reachability. The invoke
check was correct but redundant in that invokes are always
terminators, so `I` could never come before the invoke. The phi
check is a matter of interpretation (should an earlier phi node be
considered reachable from a later phi node in the same block?)
but ultimately doesn't matter because phis don't capture anyway.
Reapply after adjusting the synchronized.m test case, where the
TODO is now resolved. The pointer is only captured on the exception
handling path.
-----
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.
This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.
After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
This reverts commit 6b8b43e7af.
This causes clang test to fail (CodeGenObjC/synchronized.m).
Revert until I can figure out whether that's an expected change.
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.
This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.
After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
Reachability queries are very expensive, and currently performed
for each instruction we look at, even though most of them will
not lead to a capture and are thus ultimately irrelevant. It is
more efficient to walk a few unnecessary instructions than to
perform unnecessary reachability queries.
Theoretically, this may produce worse results, because the additional
instructions considered may cause us to hit the use count limit
earlier. In practice, this does not appear to be a problem, e.g.
on test-suite O3 we report only one more captured-before with this
change, with no resulting codegen differences.
This makes PointerMayBeCapturedBefore() significantly cheaper in
practice, hopefully allowing it to be used in more places.
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
Use comesBefore() instead of performing an instruction walk. In
line with the previous implementation, instructions are considered
to reach themselves.