This is similar to D87251, but for CopyFromRegs nodes.
Even for local statepoint uses we generate CopyToRegs/CopyFromRegs
nodes. When generating CopyFromRegs in visitGCRelocate, we must chain
to current DAG root, not EntryNode, to ensure proper ordering of copy
w.r.t. statepoint node producing result for it.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D88639
Marks constants of an ICmp instruction as free if it's only user is a select
instruction that is part of a min(max()) pattern. Ensures that in loops, in
particular when loop unrolling is turned on, SSAT will still be correctly generated.
Differential Revision: https://reviews.llvm.org/D88662
v128.const was recently implemented in V8, but until it rolls into Chrome
stable, we can't enable it in the WebAssembly backend without breaking origin
trial users. So far we have been lowering build_vectors that would otherwise
have been lowered to v128.const to splats followed by sequences of replace_lane
instructions to initialize each lane individually. That produces large and
inefficient code, so this patch introduces new logic to lower integer vector
constants to a single i64x2.splat where possible, with at most a single
i64x2.replace_lane following it if necessary.
Adapted from a patch authored by @omnisip.
Differential Revision: https://reviews.llvm.org/D88591
For these cases, we already omit the prologue directives, if
(!AFI->hasStackFrame() && !windowsRequiresStackProbe && !NumBytes).
When writing the epilogue (after the prolog has been written), if
the function doesn't have the WinCFI flag set (i.e. if no prologue
was generated), assume that no epilogue will be needed either,
and don't emit any epilog start pseudo instruction. After completing
the epilogue, make sure that it actually matched the prologue.
Previously, when epilogue start/end was generated, but no prologue,
the unwind info for such functions actually was huge; 12 bytes xdata
(4 bytes header, 4 bytes for one non-folded epilogue header, 4 bytes
for padded opcodes) and 8 bytes pdata. Because the epilog consisted of
one opcode (end) but the prolog was empty (no .seh_endprologue), the
epilogue couldn't be folded into the prologue, and thus couldn't be
considered for packed form either.
On a 6.5 MB DLL with 110 KB pdata and 166 KB xdata, this gets rid of
38 KB pdata and 62 KB xdata.
Differential Revision: https://reviews.llvm.org/D88641
Summary: How we copying the CRRC to GRC is using a single MFOCRF to copy the contents of CR field n (CR bits 4×n+32:4×n+35) into bits 4×n+32:4×n+35 of register GRC. That’s not correct because we expect the value of destination register equals to source so we have to put the the contents of CR field in the lowest 4 bits. This patch adds a RLWINM after MFOCRF to achieve that.
The problem came up when adding builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp, as posted in D88278. We need to move the outputs (in CR register) to GRC. However outputs of these instructions may not in a fixed CR# register, so we can’t directly add a rotation instruction in the .td patterns, but need to wait until the CR register is determined. Then we confirmed this should be a bug in POST-RA PSEUDO PASS.
Reviewed By: nemanjai, shchenz
Differential Revision: https://reviews.llvm.org/D88274
Summary:
Some design decision worth noting about:
I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html
But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D88339
Similar to the FP case in `AArch64TargetLowering::LowerBR_CC`.
Instead of emitting the csets + a tbnz, just emit a compare + bcc
(or two bccs, depending on the condition code)
This improves cases like this: https://godbolt.org/z/v8hebx
This is a 0.1% geomean code size improvement for CTMark at -O3.
Differential Revision: https://reviews.llvm.org/D88624
Partially refactoring, partially fixing a bug.
- We shouldn't use TB(N)ZX unless the bit number is >= 32
- We can fold more than xor using emitTestBit
Also remove a check which isn't relevant anymore + update tests.
Rename select-brcond-of-not.mir to select-brcond-of-binop.mir, since it now
tests more than just G_XOR.
Differential Revision: https://reviews.llvm.org/D88702
Truncating to v8i8 is a case where we want to split the source but also generate
intermediate truncates to reduce the size of the source vector before truncating
down to v8i8. This implements the same strategy as what SelectionDAG does, but
I'm not certain where if anywhere in generic code it should live.
Use it for legalization of v8s8 = G_ICMP v8s32.
Differential Revision: https://reviews.llvm.org/D88191
This matches the corresponding existing case in
AArch64LoadStoreOpt::findMatchingUpdateInsnForward.
Both cases could also be modified to check
MBBI->getFlag(FrameSetup/FrameDestroy) instead of forbidding any
optimization involving SP, but the effect is probably pretty much
the same.
Differential Revision: https://reviews.llvm.org/D88541
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.
This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.
Reviewed By: sdesmalen, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D88321
Splitting the operand of a scalable [S|U]INT_TO_FP results in a
concat_vectors operation where the operands are unpacked FP
scalable vectors (e.g. nxv2f32).
This patch adds custom lowering of concat_vectors which
checks that the number of operands is 2, and isel patterns
to match concat_vectors of scalable FP types with uzp1.
Reviewed By: efriedma, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D88033
Before deciding to insert a [W|D]LSTP, check that defining LR with
the element count won't affect any other instructions that should be
taking the iteration count.
Differential Revision: https://reviews.llvm.org/D88549
Unfortunately the leaf SDAG patterns aren't supported yet so we need to do
this manually, but it's not a significant amount of code anyway.
Differential Revision: https://reviews.llvm.org/D87924
This patch fixes a corruption of the stack pointer and several registers in any AVR interrupt with non-empty stack frame. Previously, the callee-saved registers were popped before restoring the stack pointer, causing the pointer math to use the wrong base value while also corrupting the caller's register. This change fixes the code to restore the stack pointer last before exiting the interrupt service routine.
https://bugs.llvm.org/show_bug.cgi?id=47253
Reviewed By: dylanmckay
Differential Revision: https://reviews.llvm.org/D87735
Patch by Andrew Dona-Couch.
Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders require
std::initializer_list objects as parameters we can't programmatically generate the
type lists.
Refactor this so it's similar to the existing integer comparison code.
Also add some missing 64-bit testcases to select-fcmp.mir.
Refactoring to prep for improving selection for G_FCMP-related conditional
branches etc.
Differential Revision: https://reviews.llvm.org/D88614
This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.
Fixes more case for PR47049
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D88138
This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf
This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a **nop** operation as the first non-CFI instruction in the exception section.
**Background on Exception Handling in C++ ABI**
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf
Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.
The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:
| CallSite | --> Position of the call site (relative to the function entry)
| CallSite length | --> Length of the call site.
| Landing Pad | --> Position of the landing pad (relative to the landing pad fragment’s begin label)
| Action record offset | --> Position of the first action record
The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table).
The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.
The action record offset is an index into the action table which includes information about which exception types are caught.
**C++ Exceptions with Basic Block Sections**
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section.
This patch introduces a new **CallSiteRange** structure which specifies the range of call-sites which correspond to every section:
`struct CallSiteRange {
// Symbol marking the beginning of the precedure fragment.
MCSymbol *FragmentBeginLabel = nullptr;
// Symbol marking the end of the procedure fragment.
MCSymbol *FragmentEndLabel = nullptr;
// LSDA symbol for this call-site range.
MCSymbol *ExceptionLabel = nullptr;
// Index of the first call-site entry in the call-site table which
// belongs to this range.
size_t CallSiteBeginIdx = 0;
// Index just after the last call-site entry in the call-site table which
// belongs to this range.
size_t CallSiteEndIdx = 0;
// Whether this is the call-site range containing all the landing pads.
bool IsLPRange = false;
};`
With N basic-block-sections, the call-site table is partitioned into N call-site ranges.
Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed.
Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange).
| @LPStart | ---> Landing pad fragment ( LSDA1 points here)
| CallSite Table Length | ---> Used to find the action table.
| CallSites |
| … |
| … |
| @LPStart | ---> Landing pad fragment ( LSDA2 points here)
| CallSite Table Length |
| CallSites |
| … |
| … |
…
…
| Action Table |
| Types Table |
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73739
When pairing loads, we should check if in between the two loads the
base register has been modified. If that is the case then avoid pairing
them because the second load actually loads from a different address.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D86956
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.
This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node. Adding a check for whether we are in PPC64 before that fixes the issue.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D87046
This allows LiveDebugValues to insert the proper DBG_VALUEs in live
out blocks if a spill is inserted before the use of a
register. Previously, this would see the register use as the last
DBG_VALUE, even though the stack slot should be treated as the live
out value.
This avoids an lldb test regression when D52010 is re-applied.
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
We weren't looking at global uses of a value, so we could happily
overwrite the register incorrectly.
Differential Revision: https://reviews.llvm.org/D88554
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
Following on from D87757 "[SplitKit] Only copy live lanes", it is
possible to split a live range at a point when none of its subranges
are live. This patch handles that case by inserting an implicit def
of the superreg.
Patch by Quentin Colombet!
Differential Revision: https://reviews.llvm.org/D88397
Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.
Differential Revision: https://reviews.llvm.org/D88315