Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.
This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.
By using DataLayout to get the size of the element we can also correctly
handle pointer element types.
Note the changes to the existing test, which seems to also use the wrong
offset for the end.
Fixes PR47751.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D88953
In some cases, we can negate instruction if only one of it's operands
negates. Previously, we assumed that constants would have been
canonicalized to RHS already, but that isn't guaranteed to happen,
because of InstCombine worklist visitation order,
as the added test (previously-hanging) shows.
So if we only need to negate a single operand,
we should ensure ourselves that we try constant operand first.
Do that by re-doing the complexity sorting ourselves,
when we actually care about it.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47752
And another step towards transformss not introducing inttoptr and/or
ptrtoint casts that weren't there already.
In this case, when load/store uses have conflicting types,
instead of falling back to the iN, we can try to use allocated sub-type.
As disscussed, this isn't the best idea overall (we shouldn't rely on
allocated type), but it works fine as a temporary measure.
I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed,
this results in +0.05% more bitcasts, -5.51% less inttoptr
and -1.05% less ptrtoint (at the end of middle-end opt pipeline)
See https://bugs.llvm.org/show_bug.cgi?id=47592
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D88788
The old function attribute deduction pass ignores reads of constant
memory and we need to copy this behavior to replace the pass completely.
First step are constant globals. TBAA can also describe constant
accesses and there are other possibilities. We might want to consider
asking the alias analyses that are available but for now this is simpler
and cheaper.
If the function is not assumed `noreturn` we should not wait for an
update to mark the call site as "may-return".
This has two kinds of consequences:
- We have less iterations in many tests.
- We have less deductions based on "known information" (since we ask
earlier, point 1, and therefore assumed information is not "known"
yet).
The latter is an artifact that we might want to tackle properly at some
point but which is not easily fixable right now.
The call slot optimization has some home-grown code for checking
whether the destination is dereferenceable. Replace this with the
generic isDereferenceableAndAlignedPointer() helper.
I'm not checking alignment here, because that is currently handled
separately and may be an enforced alignment for allocas. The clean
way of integrating that part would probably be to accept a callback
in isDereferenceableAndAlignedPointer() for the actual isAligned check,
which would then have a chance to use an enforced alignment instead.
This allows the destination to be a GEP (among other things), though
the two open TODOs may prevent it from working in practice.
Differential Revision: https://reviews.llvm.org/D88805
When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.
This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.
As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.
Differential Revision: https://reviews.llvm.org/D88799
Some of these depended on analyses being present that aren't provided
automatically in NPM.
early_dce_clobbers_callgraph.ll was previously inlining a noinline function?
cast-call-combine.ll relied on the legacy always-inline pass being a
CGSCC pass and getting rerun.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D88187
When we assume a return value is dead we might still visit return
instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`.
When we do so the "returned value" is potentially simplified to `undef`
as it is the assumed "returned value". This is a problem if there was a
preexisting `noundef` attribute that will only be removed as we manifest
the `undef` return value. We should not use this combination to derive
`unreachable` though. Two test cases fixed.
In AAMemoryBehaviorFloating we used to track benign uses in a SetVector.
With this change we look through benign uses eagerly to reduce the
number of elements (=Uses) we look at during an update.
The test does actually not fail prior to this commit but I already wrote
it so I kept it.
Regarding this bug I posted earlier: https://bugs.llvm.org/show_bug.cgi?id=47035
After reading through LLVM source code and getting familiar with VPlan I was able to vectorize the code using by enabling VPlan native path. After talking with @fhahn he suggested that I contribute this as a test case. So here it is. I tried to follow the available guides how to do this best I could. I modified IR code by hand to have more clear variable names instead of numbers.
One thing what I'd like to get input from someone is that is current CHECK lines sufficient enough to verify that the inner loop has been vectorized properly?
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87564
Drop `noundef` for return values that are replaced by void and make it
illegal to put `noundef` on a void value.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87306
Alignment attributes need to be dropped for non-pointer values.
This also introduces a check into the verifier to ensure you don't use
`align` on anything but a pointer. Test needed to be adjusted
accordingly.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87304
Use context to prove that load can be safely executed at a point where load is being hoisted.
Postpone the decision about safety of speculative load execution till the moment we know
where we hoist load and check safety at that context.
Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames
Reviewed By: reames, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88725
This reverts commit 20797989ea.
This patch (https://reviews.llvm.org/D69257) cannot complete a stage2
build due to the change:
```
CI->getCalledFunction()->getName().contains("longjmp")
```
There are several concrete issues here:
- The callee may not be a function, so `getCalledFunction` can assert.
- The called value may not have a name, so `getName` can assert.
- There's no distinction made between "my_longjmp_test_helper" and the
actual longjmp libcall.
At a higher level, there's a serious layering problem here. The
splitting pass makes policy decisions in a general way (e.g. based on
attributes or profile data). Special-casing certain names breaks the
layering. It subverts the work of library maintainers (who may now need
to opt-out of unexpected optimization behavior for any affected
functions) and can lead to inconsistent optimization behavior (as not
all llvm passes special-case ".*longjmp.*" in the same way).
The patch may need significant revision to address these issues.
But the immediate issue is that this crashes while compiling llvm's unit
tests in a stage2 build (due to the `getName` problem).
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
When retrying the "simplify with operand replaced" select
optimization without poison flags, also handle inbounds on GEPs.
Of course, this particular example would also be safe to transform
while keeping inbounds, but the underlying machinery does not
know this (yet).
This reverts commit a3caf7f610.
The ReleaseLTO-g test-suite configuration has been failing
to build since this commit, because clang segfaults while
building 7zip.
Added missing test coverage for shl(add(and(lshr(x,c1),c2),y),c1) -> add(and(x,c2<<c1),shl(y,c1)) combine
Rename tests as 'foo' and 'bar' isn't very extensible
Added vector tests with undefs and nonuniform constants
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.
Fixes : PR38815
Differential Revision: https://reviews.llvm.org/D87494
Apparently querying dereferenceability of array allocations is
being intentionally penalized (https://reviews.llvm.org/D41398),
so avoid using them in tests.
The case of a destination read between call and memcpy was not
covered anywhere (but is handled correctly).
However, a potentially throwing call between the call and the
memcpy appears to be miscompiled.
We could either try to make SROA more picky to the new type
and/or prevent InstCombine from creating the original problem (converting load-stores to operate on ints),
and/or make InstCombine recover the situation by cleaning up all that cruft.