Follow up to a6d2a8d6f5. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.
performScalarPREInsertion() inserts instructions into blocks that we
need to tell ImplicitControlFlowTracking about, otherwise the ICF cache
may be invalid.
Fixes PR49193.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99909
The key change (4f5e92c) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.
When we are able to SROA an alloca, we know all uses of it, meaning we
don't have to preserve the invariant group intrinsics and metadata.
It's possible that we could lose information regarding redundant
loads/stores, but that's unlikely to have any real impact since right
now the only user is Clang and vtables.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99760
When run under valgrind, or with a malloc that poisons freed memory,
this can lead to segfaults or other problems.
To avoid modifying the AdditionalUsers DenseMap while still iterating,
save the instructions to be notified in a separate SmallPtrSet, and use
this to later call OperandChangedState on each instruction.
Fixes PR49582.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D98602
The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering.
I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.
Reviewed By: mkazantsev, lebedev.ri
Differential Revision: https://reviews.llvm.org/D88287
This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492).
Consider this code:
```
call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 -1
; aligment of %arrayidx?
```
The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k.
Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned.
The reason why this happened is as follows.
`DiffSCEV` stores `%arrayidx - %a` which is -4.
`OffSCEV` stores the offset value of “align”, which is 28.
`DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D98759
The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts.
We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler.
(Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...)
p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.
Removes CFGAnalyses from the preserved analyses set
returned by LoopFlattenPass::run().
Reviewed By: Dave Green, Ta-Wei Tu
Differential Revision: https://reviews.llvm.org/D99700
Name GVN uses name 'LI' for two different unrelated things:
LoadInst and LoopInfo. This patch relates the variables with
former meaning into 'Load' to disambiguate the code.
Before this change, the `llvm.access.group` metadata was dropped
when moving a load instruction in GVN. This prevents vectorizing
a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`.
This change propagates the metadata as well as other metadata if
it is safe (the move-destination basic block and source basic
block belong to the same loop).
Differential Revision: https://reviews.llvm.org/D93503
This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 .
`select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later
transformations confused.
Simplify the branch condition to not have the form.
Unswitching a loop on a non-trivial divergent branch is expensive
since it serializes the execution of both version of the
loop. But identifying a divergent branch needs divergence analysis,
which is a function level analysis.
The legacy pass manager handles this dependency by isolating such a
loop transform and rerunning the required function analyses. This
functionality is currently missing in the new pass manager, and there
is no safe way for the SimpleLoopUnswitch pass to depend on
DivergenceAnalysis. So we conservatively assume that all non-trivial
branches are divergent if the target has divergence.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D98958
This is yet another attempt to fix tightlyNested().
Add checks in tightlyNested() for the inner loop exit block,
such that 1) if there is control-flow divergence in between the inner
loop exit block and the outer loop latch, or 2) if the inner loop exit
block contains unsafe instructions, tightlyNested() returns false.
The reasoning behind is that after interchange, the original inner loop
exit block, which was part of the outer loop, would be put into the new
inner loop, and will be executed different number of times before and
after interchange. Thus it should be dealt with appropriately.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D98263
LICM can sink instructions that have uses inside the loop, as
long as these uses are considered "free". However, if there were
only free uses inside the loop, and no uses outside the loop at
all, the instruction would still count towards the NumSunk
statistic. This resulted in a wild inflation of the NumSunk metric.
After this patch it drops down from 1141787 to 5852 on test-suite O3.
FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run
the alias analysis when searching for an equivalent value. However,
FindAvailablePtrLoadStore() calls the alias analysis framework with a
memory location for the load constructed from an address and a size,
which thus lacks TBAA metadata info. This commit modifies
FindAvailablePtrLoadStore() to accept an optional memory location as
parameter to allow FindAvailableLoadedValue() to create it based on the
load instruction, which would then have TBAA metadata info attached.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99206
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98874
The `InductionPHI` is not necessarily the increment instruction, as
demonstrated in pr49571.ll.
This patch removes the assertion and instead bails out from the
`LoopFlatten` pass if that happens.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49571
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D99252
The summary remarks are generated on a per-function basis. Using the
first instruction's location is sub-optimal for 2 reasons:
1. Sometimes the first instruction is missing !dbg
2. The location of the first instruction may be mis-leading.
Instead, just use the location of the function directly.
meetBDVState utility may sets the base pointer for the conflict state.
At this moment the base for conflict state does not have any meaning but
is used in comparison of BDV states. This comparison is used as an indicator
of progress done on iteration and RS4GC pass uses infinite loop to reach
fixed point.
As a result for added test on each iteration state for some phi nodes is updated
with other base value for conflict state and it indicates as a progress while
for conflict state there is no any progress more possible.
In reality the base value is transferred from one state to another and pass
detects the progress on these states.
The test is very fragile. The traversal order of states and operands of phi nodes
plays important role.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99058
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.
I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
08196e0b2e exposed LowerExpectIntrinsic's
internal implementation detail in the form of
LikelyBranchWeight/UnlikelyBranchWeight options to the outside.
While this isn't incorrect from the results viewpoint,
this is suboptimal from the layering viewpoint,
and causes confusion - should transforms also use those weights,
or should they use something else, D98898?
So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight
internal again, and fixing all the code that used it directly,
which currently is only clang codegen, thankfully,
to emit proper @llvm.expect intrinsics instead.
Upon reviewing D98898 i've come to realization that these are
implementation detail of LowerExpectIntrinsicPass,
and they should not be exposed to outside of it.
This reverts commit ee8b53815d.
This makes the settings available for use in other passes by housing
them within the Support lib, but NFC otherwise.
See D98898 for the proposed usage in SimplifyCFG
(where this change was originally included).
Differential Revision: https://reviews.llvm.org/D98945
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those when the checks are enabled (which is by default with
expensive checks on).
Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.
This is a reland of https://reviews.llvm.org/D98820 which was reverted
due to unacceptably large compile time regressions in normal debug
builds.
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those.
Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.
Only verify MSSA when VerifyMemorySSA, normally it's very expensive.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98820
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those.
Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98805
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.
This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.
Differential Revision: https://reviews.llvm.org/D98728
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently.
Differential Revision: https://reviews.llvm.org/D98316
A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added.
The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier.
Differential Revision: https://reviews.llvm.org/D98315
RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer.
This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change.
Differential Revision: D98122
Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass,
since MemCpyOpt now uses MemorySSA by default.
Differential Revision: https://reviews.llvm.org/D98484
This was (partially) reverted in cfe8f8e0 because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607.
Original commit message follows:
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)
We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.
Differential Revision: https://reviews.llvm.org/D97974
This fixes a regression from the MemDep-based implementation:
MemDep completely ignores lifetime.start intrinsics that aren't
MustAlias -- this is probably unsound, but it does mean that the
MemDep based implementation successfully eliminated memcpy's from
lifetime.start if the memcpy happens at an offset, rather than
the base address of the alloca.
Add a special case for the case where the lifetime.start spans the
whole alloca (which is pretty much the only kind of lifetime.start
that frontends ever emit), as we don't need to figure out our exact
aliasing relationship in that case, the whole alloca is dead prior
to the call.
If this doesn't cover all practically relevant cases, then it
would be possible to make use of the recently added PartialAlias
clobber offsets to make this more precise.
The added test case crashes before this fix:
```
opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed.
```
This is fully analogous to the previous commit,
with the pointer constant replaced to be something non-null.
The comparison here can be strength-reduced,
but the second operand of the comparison happens to be identical
to the constant pointer in the `catch` case of `landingpad`.
While LSRInstance::CollectLoopInvariantFixupsAndFormulae()
already gave up on uses in blocks ending up with EH pads,
it didn't consider this case.
Eventually, `LSRInstance::AdjustInsertPositionForExpand()`
will be called, but the original insertion point it will get
is the user instruction itself, and it doesn't want to
deal with EH pads, and asserts as much.
It would seem that this basically never happens in-the-wild,
otherwise it would have been reported already,
so it seems safe to take the cautious approach,
and just not deal with such users.
If a memset destination is overwritten by a memcpy and the sizes
are exactly the same, then the memset is simply dead. We can
directly drop it, instead of replacing it with a memset of zero
size, which is particularly ugly for the case of a dynamic size.
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
As readnone function they become movable and LICM can hoist them
out of a loop. As a result in LCSSA form phi node of type token
is created. No one is ready that GCRelocate first operand is phi node
but expects to be token.
GVN test were also updated, it seems it does not do what is expected.
Test for LICM is also added.
This reverts commit f352463ade.
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.
-----
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.
The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.
If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.
This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.
This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.
Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.
Differential Revision: https://reviews.llvm.org/D89264
The isOverwrite function is making sure to identify if two stores
are fully overlapping and ideally we would like to identify all the
instances of OW_Complete as they'll yield possibly killable stores.
The current implementation is incapable of spotting instances where
the earlier store is offsetted compared to the later store, but
still fully overlapped. The limitation seems to lie on the
computation of the base pointers with the
GetPointerBaseWithConstantOffset API that often yields different
base pointers even if the stores are guaranteed to partially overlap
(e.g. the alias analysis is returning AliasResult::PartialAlias).
The patch relies on the offsets computed and cached by BatchAAResults
(available after D93529) to determine if the offsetted overlapping
is OW_Complete.
Differential Revision: https://reviews.llvm.org/D97676
Currently DSE misses cases where the size is a non-const IR value, even
if they match. For example, this means that llvm.memcpy/llvm.memset
calls are not eliminated, even if they write the same number of bytes.
This patch extends isOverwite to try to get IR values for the number of
bytes written from the analyzed instructions. If the values match,
alias checks are performed and the result is returned.
At the moment this only covers llvm.memcpy/llvm.memset. In the future,
we may enable MemoryLocation to also track variable sizes, but this
simple approach should allow us to cover the important cases in DSE.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98284
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.
Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.
Differential Revision: https://reviews.llvm.org/D88232
Revert 3d8f842712
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.
Differential Revision: https://reviews.llvm.org/D89264
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.
Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.
`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D97290
The MemorySSA-based implementation has been enabled without issue
for a while now, so keeping the old implementation around doesn't
seem useful anymore. This drops the MemDep-based implementation.
Differential Revision: https://reviews.llvm.org/D97877
Hello all,
I'm trying to fix unsafe propagation of poison values in and/or conditions by using
equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`)
instead.
D93065 has links to patches for this.
This patch allows unswitch to happen if the condition is in this form as well.
`collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if
Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd().
Other than this, the remaining changes are almost straightforward and simply replaces
Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D97756
When materializing an available load value, do not explicitly
materialize the undef values from dead blocks. Doing so will
will force creation of a phi with an undef operand, even if there
is a dominating definition. The phi will be folded away on
subsequent GVN iterations, but by then we may have already
poisoned MDA cache slots.
Simply don't register these values in the first place, and let
SSAUpdater do its thing.
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.
However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.
This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)
Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.
Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.
Differential Revision: https://reviews.llvm.org/D98080
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)
We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.
Differential Revision: https://reviews.llvm.org/D97974
(Note: Part of the reviewed change was split and landed as f352463a)
For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone.
The EarlyCSE change is simply moving the special casing from readonly to readnone handling.
As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice.
This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.
If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value.
In it's current form, this is mostly a compile time optimization. Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation. We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently.
The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects). That's somewhat invasive and complicated, so we're defering it a bit longer.
Differential Revision: https://reviews.llvm.org/D97885
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.
This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.
An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.
Differential Revision: https://reviews.llvm.org/D97482
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.
The optimizations being unblocked are:
1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.
Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D97481
This is an attempt to improve handling of partial overlaps in case of unaligned begin\end.
Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end.
The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment.
I'll update with performance results as measured by the test-suite...it's still running...
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D93530
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.
This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.
Differential Revision: https://reviews.llvm.org/D96928
Probably should have done this before landing, but I forgot.
Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything. Also happens to set us up for handling non-add recurrences in the future if desired.
LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold.
The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment.
As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow.
Differential Revision: https://reviews.llvm.org/D97219
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.
The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.
If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.
This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.
This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.
Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.
Differential Revision: https://reviews.llvm.org/D89264
This now analyzes calls to both intrinsics and functions.
For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.
For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.
```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
int var[1024];
^
```
Differential Revision: https://reviews.llvm.org/D97489
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.
For now, support the store size, and whether it's atomic/volatile.
Example:
```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
int var;
^
```
Differential Revision: https://reviews.llvm.org/D97412
Using the !annotation metadata, emit remarks pointing to code added by
`-ftrivial-auto-var-init` that survived the optimizer.
Example:
```
auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks]
int buf[1024];
^
```
The tests are testing various situations like calls/stores/other
instructions, with debug locations, and extra debug information on
purpose: more patches will come to improve the reporting to make it more
user-friendly, and these tests will show how the reporting evolves.
Differential Revision: https://reviews.llvm.org/D97405
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88287
The **IsGuaranteedLoopInvariant** function is making sure to check if the
incoming pointer is guaranteed to be loop invariant, therefore I think
the case where the pointer is defined in the entry block of a function
automatically guarantees the pointer to be loop invariant, as the entry
block of a function cannot have predecessors or be part of a loop.
I implemented this small patch and tested it using
**ninja check-llvm-unit** and **ninja check-llvm**. I added a contained test
file that shows the problem and used **opt -O3 -debug** on it to make sure
the case is not currently handled (in fact the debug log is showing that
the DSE pass is bailing out when testing if the killer store is able to
clobber the dead store).
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96979
ICMP_NE predicates cannot be directly represented as constraint. But we
can use ICMP_UGT instead ICMP_NE for %x != 0.
See https://alive2.llvm.org/ce/z/XlLCsW
If the call is readnone, then there may not be any MemoryAccess
associated with the call. Bail out in that case.
This fixes the issue reported at
https://reviews.llvm.org/D94376#2578312.
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.
Differential Revision: https://reviews.llvm.org/D97154
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.
Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.
Differential Revision: https://reviews.llvm.org/D94376
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.
Differential Revision: https://reviews.llvm.org/D96993
This fixes https://bugs.llvm.org/show_bug.cgi?id=49185
When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`.
If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes.
However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes.
This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`.
Reviewed By: asbirlea, ychen
Differential Revision: https://reviews.llvm.org/D96727
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.
Differential Revision: https://reviews.llvm.org/D95826
This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change.
Changes that are not loop-nest related are split to D96650.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D96644
This adds a new flag -lsr-preferred-addressing-mode to override the target's
preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is
equivalent to preindexed addressing that is one of the options that
-lsr-preferred-addressing-mode accepts.
Differential Revision: https://reviews.llvm.org/D96855
This is a follow up D96600 and cleans up most calls to
getPreferredAddresingMode. I.e., we really don't need to query the same things
again and again, but get the preferred addressing mode once for each loop. So
this should be a lot friendlier for compile times, especially if we start
implementing getPreferredAddresingMode.
Differential Revision: https://reviews.llvm.org/D96772
This is a split patch of D96644.
Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`.
Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96650
Loop canonicalization may end up deleting blocks from CFG. And
Scalar Evolution may still keep cached referenced to those blocks
unless updated properly.
This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into
getPreferredAddressingMode() so that we have one interface to steer LSR in
generating the preferred addressing mode.
Differential Revision: https://reviews.llvm.org/D96600
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).
It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
This patch improves the index management during constraint building.
Previously, the code rejected constraints which used values that were not
part of Value2Index, but after combining the coefficients of the new
indices were 0 (if ShouldAdd was 0).
In those cases, no new indices need to be added. Instead of adding to
Value2Index directly, add new indices to the NewIndices map. The caller
can then check if it needs to add any new indices.
This enables checking constraints like `a + x <= a + n` to `x <= n`,
even if there is no constraint for `a` directly.
As discussed in:
https://llvm.org/PR49055
We invert instcombine's add->or transform here
because it makes it easier to identify factorization
transforms like the mul in the motivating test.
This extends the logic added with:
https://reviews.llvm.org/rG70472f3https://reviews.llvm.org/rG93f3d7f
(I intentionally kept the formatting fix in this patch
to provide more context about the calling logic.)
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.
Fixes PR49043.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D96168
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
This patch extends the condition collection logic to allow adding
conditions from pre-headers to loop headers, by allowing cases where the
target block dominates some of its predecessors.