This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.
This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is
complete and self-contained, i.e. CSR restore instructions (and the
corresponding CFI instructions are not split across two or more
blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or
has not executed any epilogue
- "does not have a call frame", if the function has not executed the
prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO
traversal.
In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.
From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions,
which come in two flavours:
- CFI instructions, which reset the unwind table state to the
initial one. This is done by a target specific hook and is
expected to be trivial to implement, for example it could be:
```
.cfi_def_cfa <sp>, 0
.cfi_same_value <rN>
.cfi_same_value <rN-1>
...
```
where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one
created by the function prologue. These are the sequence:
```
.cfi_restore_state
.cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
The reason why I am making this change is that before this commit,
EmitFuncArgumentDbgValue relied on a boolean flag IsDbgDeclare both to signal
that a DBG_VALUE should be made to be indirect /and/ that the original intrinsic
was a dbg.declare. This is no longer always true if we add support for handling
dbg.addr since we will have an indirect DBG_VALUE that is a different intrinsic
from dbg.declare.
With that in mind, in this NFC patch, we prepare for future fixes by introducing
a 3 case-enum argument to EmitFuncArgumentDbgValue that allows the caller to
explicitly specify how the argument's DBG_VALUE should be emitted. This then
allows us to turn the indirect checks into a != FuncArgumentDbgValueKind::Value
and prepare us for a future where we add support here for llvm.dbg.addr
directly.
rdar://83957028
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D122945
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.
Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.
This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122933
D122053 set the ExtendType for ConstantSDNodes in getCopyToRegs to
ZERO_EXTEND to match assumptions in ComputePHILiveOutRegInfo. PHIs
are probably not the only way ConstantSDNodeNodes can get to
getCopyToRegs.
This patch adds an ExtendType parameter to CopyValueToVirtualRegister and
has HandlePHINodesInSuccessorBlocks pass ISD::ZERO_EXTEND for ConstantInts.
This way we only affect ConstantSDNodes for PHIs.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122171
This is an extension of D70965 to avoid creating a mathlib
call where it did not exist in the original source. Also see
D70852 for discussion about an alternative proposal that was
abandoned.
In the motivating bug report:
https://github.com/llvm/llvm-project/issues/54554
...we also have a more general issue about handling "no-builtin" options.
Differential Revision: https://reviews.llvm.org/D122610
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C
This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.
I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).
Differential Revision: https://reviews.llvm.org/D122655
This patch fixes a (seemingly very rare) crash during vector constant
folding introduced in D113300.
Normally, during legalization, if we create an illegally-typed node during
a failed attempt at constant folding it's cleaned up before being
visited, due to it having no uses.
If, however, an illegally-typed node is created during one round of
legalization and isn't cleaned up, it's possible for a second round of
legalization to create new illegally-typed nodes which add extra uses to
the old illegal nodes. This means that we can end up visiting the old
nodes before they're known to be dead, at which point we crash.
I'm not happy about this fix. Creating illegal types at all seems like a
bad idea, but we all-too-often rely on illegal constants being
successfully folded and being fixed up afterwards. However, we can't
rely on constant folding actually happening, and we don't have a
foolproof way of peering into the future.
Perhaps the correct fix is to revisit the node-iteration order during
legalization, ensuring we visit all uses of nodes before the nodes
themselves. Or alternatively we could try and clean up dead nodes
immediately after failing constant folding.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122382
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h
This is an alternative to D122454.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D122458
We could only do this in limited ways (since we emit the TUs first, we
can't use ref_addr (& we can't use that in Split DWARF either) - so we
had to synthesize declarations into the TUs) and they were ambiguous in
some cases (if the CU type had internal linkage, parsing the TU would
require knowing which CU was referencing the TU to know which type the
declaration was for, which seems not-ideal). So to avoid all that, let's
just not reference types defined in the CU from TUs - instead moving the
TU type into the CU (recursively).
This does increase debug info size (by pulling more things out of type
units, into the compile unit) - about 2% of uncompressed dwp file size
for clang -O0 -g -gsplit-dwarf. (5% .debug_info.dwo section size
increase in the .dwp)
There is a case when a function has pseudo probe intrinsics but the module it resides does not have the probe desc. This could happen when the current module is not built with `-fpseudo-probe-for-profiling` while a function in it calls some other function from a probed module. In thinLTO mode, the callee function could be imported and inlined into the current function.
While this is undefined behavior, I'm fixing the asm printer to not ICE and warn user about this.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121737
The alignment needs to be part of the folding set hash. This is
handled by getAssertAlign when nodes are created, but needs to repeated here.
No test case as I found it as part of a very early experimental patch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D122279
Non-static class members declared under #ifndef NDEBUG should be declared
under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and
allow cross-linking, as discussed in D120714.
Differential Revision: https://reviews.llvm.org/D121549
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
Instead of using operator[], use DenseMap::find to prevent default
constructing an entry if it isn't already in the map.
Also simplify a condition to check for 0 instead of a virtual register.
I'm pretty sure we can only get 0 or a virtual register out of the value
map.
Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.
Reviewed By: rampitec, ruiling
Differential Revision: https://reviews.llvm.org/D121277
This is only called for instructions and the caller is already holding
an Instruction *. This makes the code more explicit and makes it
obvious the code doesn't make decisions about constants.
Change the implementation of isForwardableRegClassCopy so that it
does not rely on getMinimalPhysRegClass. Instead, iterate over all
classes looking for any that satisfy a required property.
NFCI on current upstream targets, but this copes better with
downstream AMDGPU changes where some new smaller classes have been
introduced, which was breaking regclass equality tests in the old
code like:
if (UseDstRC != CrossCopyRC && CopyDstRC == CrossCopyRC)
Differential Revision: https://reviews.llvm.org/D121903
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.
Both of normal and masked loads test cases added to guard compile crash.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D120953
Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge
Differential Revision: https://reviews.llvm.org/D122070
ComputePHILiveOutRegInfo assumes that constant incoming values to
Phis will be zero extended if they aren't a legal type. To guarantee
that we should zero_extend rather than any_extend constants.
This fixes a bug for RISCV where any_extend of constants can be
treated as a sign_extend.
Differential Revision: https://reviews.llvm.org/D122053
Fixes llvm-clang-x86_64-expensive-checks-debian failure with 2f497ec3.
expandAtomicStore always modifies the function, so make sure we set
MadeChange unconditionally. Not sure how nobody else has stumbled over
this before.
When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120865
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
This reverts commit c46aab01c0.
This evidently blocks compiling in some cases that used to work
before. I'm also not fully convinced this is the correct place to fix
this problem.
This patch adjusts what location is picked for a known variable value --
preferring to leave locations on the stack, even when a value is re-loaded
into a register. The benefit is reduced location list entropy, on a
clang-3.4 build I found that .debug_loclists reduces in size by 6%, from
29Mb down to 27Mb.
Testing: a few tests need the stack slot to be written to explicitly, to
force LiveDebugValues into restoring the variable location to a register.
I've added an explicit test for the desired behaviour in
livedebugvalues_recover_clobbers.mir .
Differential Revision: https://reviews.llvm.org/D120732
This fixes a reported bug that caused an infinite loop during the
SelectionDAG optimization phase in ISel, by creating an overridable hook
in `TargetLowering` that allows us to bail out from running
`SimplifyDemandedVectorElts`.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D121869