Summary:
Emit the correct .toc psuedo op when we change to the TOC and emit
TC entries. Make sure TOC psuedos get the right symbols via overriding
getMCSymbolForTOCPseudoMO on AIX. Add a test for TOC assembly writing
and update tests to include TOC entries.
Also make sure external globals have a csect set and handle external function descriptor (originally authored by Jason Liu) so we can emit TOC entries for them.
Reviewers: DiggerLin, sfertile, Xiangling_L, jasonliu, hubert.reinterpretcast
Reviewed By: jasonliu
Subscribers: arphaman, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70461
This is a continuation of D70262
The previous patch as listed above added the future CPU in clang. This patch
adds the future CPU in the PowerPC backend. At this point the patch simply
assumes that a future CPU will have the same characteristics as pwr9. Those
characteristics may change with later patches.
Differential Revision: https://reviews.llvm.org/D70333
X86 has some calling conventions where bits 127:0 of a vector register are callee saved, but the upper bits aren't. Previously we could detect that the full ymm register was clobbered when the xmm portion was really preserved. This patch checks the subregisters to make sure they aren't preserved.
Fixes PR44140
Differential Revision: https://reviews.llvm.org/D70699
This is based on what's required for softening fp128 operations on 32-bit X86 assuming f32/f64/f80 are legal. So there could be some things missing.
Differential Revision: https://reviews.llvm.org/D70654
Summary:
While updatePostDominatedByUnreachable attemps to find basic blocks that are post-domianted by unreachable blocks, it currently cannot handle loops precisely, because it doesn't use the actual post dominator tree analysis but relies on heuristics of visiting basic blocks in post-order. More precisely, when the entire loop is post-dominated by the unreachable block, current algorithm fails to detect the entire loop as post-dominated by the unreachable because when the algorithm reaches to the loop latch it fails to tell all its successors (including the loop header) will "eventually" be post-domianted by the unreachable block, because the algorithm hasn't visited the loop header yet. This makes BPI for the loop latch to assume that loop backedges are taken with 100% of probability. And because of this, block frequency info sometimes marks virtually dead loops (which are post dominated by unreachable blocks) super hot, because 100% backedge-taken probability makes the loop iteration count the max value. updatePostDominatedByColdCall has the exact same problem as well.
To address this problem, this patch makes PostDominatedByUnreachable/PostDominatedByColdCall to be computed with the actual post-dominator tree.
Reviewers: skatkov, chandlerc, manmanren
Reviewed By: skatkov
Subscribers: manmanren, vsk, apilipenko, Carrot, qcolombet, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70104
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.
Differential Revision: https://reviews.llvm.org/D69350
This reapplies: 8ff85ed905
Original commit message:
As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
This reverts commit c9ddb02659.
This is the following patch of D68854.
This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations.
Patch by Chen Liu(LiuChen3)
Differential Revision: https://reviews.llvm.org/D68857
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.
To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.
This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
legality of masked operations as well as normal ones. This array is
fairly small, so doubling the size still won't make it very large.
Offset masked loads can then be controlled with
setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
the same way.
- The ARM backend is then adjusted to make use of these indexed masked
loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
Summary: Adds tablegen patterns to explicitly handle fcopysign where the
magnitude and sign arguments have different types, due to the sign value casts
being removed the by DAGCombiner. Support for RV32IF follows in a separate
commit. Adds tests for all relevant scenarios except RV32IF.
Reviewers: lenary
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70678
Add some more helper functions to ReachingDefs to query the uses of
a given MachineInstr and also to query whether two MachineInstrs use
the same def of a register.
For Arm, while tail-predicating, these helpers are used in the
low-overhead loops to remove the dead code that calculates the number
of loop iterations.
Differential Revision: https://reviews.llvm.org/D70240
Add several new methods to ReachingDefAnalysis:
- getReachingMIDef, instead of returning an integer, return the
MachineInstr that produces the def.
- getInstFromId, return a MachineInstr for which the given integer
corresponds to.
- hasSameReachingDef, return whether two MachineInstr use the same
def of a register.
- isRegUsedAfter, return whether a register is used after a given
MachineInstr.
These methods have been used in ARMLowOverhead to replace searching
for uses/defs.
Differential Revision: https://reviews.llvm.org/D70009
When inserting a non-decrementing LE, the basic block was being
resized to take into consideration that a tCMP and tBcc had been
combined into one T1 instruction. This is not true in the LE case
where we generate a CBN?Z and an LE.
Differential Revision: https://reviews.llvm.org/D70536
Currently, BPF backend creates some global variables with name like
<type_name>:<reloc_type>:<patch_imm>$<access_str>
to carry certain information to BPF backend.
With direct clang compilation, the following code in
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
is triggered and the above globals are emitted to the ELF file.
(clang enabled this as opt flag -faddrsig is on by default.)
if (TM.Options.EmitAddrsig) {
// Emit address-significance attributes for all globals.
OutStreamer->EmitAddrsig();
for (const GlobalValue &GV : M.global_values())
if (!GV.use_empty() && !GV.isThreadLocal() &&
!GV.hasDLLImportStorageClass() && !GV.getName().startswith("llvm.") &&
!GV.hasAtLeastLocalUnnamedAddr())
OutStreamer->EmitAddrsigSym(getSymbol(&GV));
}
...
10162: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:0:2048$0:117
10163: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:0:2112$0:126:0
10164: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:1:8$0:31:6
...
While in llc, those globals are not emited since EmitAddrsig
default option is false for llc. The llc flag "-addrsig" can be used to
enable the above code.
This patch added "llvm." prefix to these internal globals so that
they can be ignored in the above codes and possible other
places.
Differential Revision: https://reviews.llvm.org/D70703
These need to emit a libcall like we do for the non-strict version.
32-bit mode needs to SoftenFloat support to be implemented for strict FP nodes.
Differential Revision: https://reviews.llvm.org/D70504
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
We already have this simplification at node-creation-time, but
the test from:
https://bugs.llvm.org/show_bug.cgi?id=44139
...shows that we can combine our way to an assert/crash too.
This patch adds instruction selection patterns for the TT, TTT, TTA, and TTAT
instructions and tests for llvm.arm.cmse.tt, llvm.arm.cmse.ttt,
llvm.arm.cmse.tta, and llvm.arm.cmse.ttat intrinsics (added in a previous
patch).
Patch by Javed Absar.
Differential Revision: https://reviews.llvm.org/D70407
// Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not
// equal to the incoming stack pointer, but to incoming stack pointer plus
// 160. The getOffsetOfLocalArea() returned value is interpreted as "the
// offset of the local area from the CFA".
The immediate offsets into the Register save area returned by
getCalleeSavedSpillSlots() should take this offset into account, which this
patch makes sure of.
Patch and review by Ulrich Weigand.
https://reviews.llvm.org/D70427
If an inline asm statement clobbers a VSX register that overlaps with a
callee-saved Altivec register or FPR, we will not record the clobber and will
therefore violate the ABI. This is clearly a bug so this patch fixes it.
Differential revision: https://reviews.llvm.org/D68576
Summary:
This patch sets up the infrastructure for
1. Associate MCSymbolXCOFF with an MCSectionXCOFF when it could not
get implicitly associated.
2. Generate undefined symbols. The patch itself generates undefined symbol
for external function call only. Generate undefined symbol for external
global variable and external function descriptors will be handled in
separate patch(s) after this is land.
Differential Revision: https://reviews.llvm.org/D70443
Adds a pattern to ARMInstrMVE.td to use a VQNEG
instruction if an equivalent multi-instruction
construct is found.
Differential Revision: https://reviews.llvm.org/D70491
This patch aims to spill CR[0-7]LT bits on POWER9 using the setb instruction.
The sequence on P9 to spill these bits will be:
setb %reg, %CRREG
stw %reg, $FI
Instead of the typical sequence:
mfocrf %reg, %CRREG
rlwinm %reg1, %reg, $SH, 0, 0
stw %reg1, $FI
Differential Revision: https://reviews.llvm.org/D68443
Fix two problems that popped up after my last patch. One is that the
stiching of prologue/epilogue can be wrong when reading a value from a
previsou stage. Also changed how we duplicate phi instructions to avoid
generating extra phi that we delete later.
Differential Revision: https://reviews.llvm.org/D70213
Summary:
The waitcnt pass can overflow the counters when the number of outstanding events
for a type exceed the capacity of the counter. This can lead to inefficient
insertion of waitcnts, or to waitcnt instructions with max values for each type.
The last situation can cause an instruction which when disassembled appears to
be an illegal waitcnt without an operand.
In these cases we should add a wait for the 'counter maximum' - 1, and update the
waitcnt brackets accordingly.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70418
The original commit message follows.
This patch adds support for debug_loclists.dwo section in llvm and llvm-dwarfdump.
Also Fixes PR43622, PR43623.
Reviewers: dblaikie, probinson, labath, aprantl, jini.susan.george
Differential Revision: https://reviews.llvm.org/D69462
This patch adds support for debug_loclists.dwo section in llvm and llvm-dwarfdump.
Also Fixes PR43622, PR43623.
Reviewers: dblaikie, probinson, labath, aprantl, jini.susan.george
https://reviews.llvm.org/D69462
Commit a0841dfe85 ("[BPF] Fix a bug in peephole optimization")
fixed a bug in peephole optimization. Recursion is introduced
to handle COPY and PHI instructions.
Unfortunately, multiple PHI instructions may form a cycle
and this will cause infinite recursion, eventual segfault.
For Commit a0841dfe85, I indeed tried a few loops to ensure
that I won't see the recursion, but I did not try with
complex control flows, which, as demonstrated with the test case
in this patch, may introduce PHI cycles.
This patch fixed the issue by introducing a set to remember
visited PHI instructions. This way, cycles can be properly
detected and handled.
Differential Revision: https://reviews.llvm.org/D70586
Summary:
This patch is a follow up on read-only assembly patch D70182.
It intends to enable object file generation for the read-only data section on AIX.
Reviewers: DiggerLin, daltenty
Differential Revision: https://reviews.llvm.org/D70455
Summary:
This is a preparatory cleanup before i add more
of this fold to deal with comparisons with non-zero.
In essence, the current lowering is:
```
Name: (X % C1) == 0 -> X * C3 <= C4
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, 0
=>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%r = icmp ule i8 %n3, %C4
```
https://rise4fun.com/Alive/oqd
It kinda just works, really no weird edge-cases.
But it isn't all that great for when comparing with non-zero.
In particular, given `(X % C1) == C2`, there will be problems
in the always-false tautological case where `C2 u>= C1`:
https://rise4fun.com/Alive/pH3
That case is tautological, always-false:
```
Name: (X % Y) u>= Y
%o0 = urem i8 %x, %y
%r = icmp uge i8 %o0, %y
=>
%r = false
```
https://rise4fun.com/Alive/ofu
While we can't/shouldn't get such tautological case normally,
we do deal with non-splat vectors, so unless we want to give up
in this case, we need to fixup/short-circuit such lanes.
There are two lowering variants:
1. We can blend between whatever computed result and the correct tautological result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
=>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%res = icmp ule i8 %n3, %C4
%r = select i1 %is_tautologically_false, i1 0, i1 %res
```
https://rise4fun.com/Alive/PjT5https://rise4fun.com/Alive/1KV
2. We can invert the comparison result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
=>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n3, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/2xChttps://rise4fun.com/Alive/jpb5
3. We can expand into `and`/`or`:
https://rise4fun.com/Alive/WGnhttps://rise4fun.com/Alive/lcb5
Blend-one is likely better since we avoid having to load the
replacement from constant pool. `xor` is second best since
it's still pretty general. I'm not adding `and`/`or` variants.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: nick, hiraditya, xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70051
Power9 has instructions to implement the semantics of SIGN_EXTEND_INREG for vector type.
Mark it as legal and add the match pattern.
Differential Revision: https://reviews.llvm.org/D69601
Add explicit setOperation actions for some to match their none
strict counterparts. This isn't required, but makes the code
self documenting that we didn't forget about strict fp. I've
used LibCall instead of Expand since that's more explicitly what
we want.
Only lrint/llrint/lround/llround are missing now.
The regcall was making 32-bit mode pass things in xmm registers
which made 32-bit and 64-bit more similar. But I have upcoming
patches that require them to be separated anyway.
Summary:
The insertion of most CFI instructions during AArch64 frame lowering can
be disabled (e.g. using the function attribute `nounwind`).
This patch enables conditional insertion for one more CFI instruction.
Reviewers: t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70129
The custom code just emits a libcall, but we can do the same
with generic code. The only difference is that the generic code
can form tail calls where the custom code couldn't. This is
responsible for the test changes.
This avoids needing to modify the Custom handling for strict fp.
Summary:
The fix in BranchFolder related to non debug invariant problems
done in commit ec32dff0b0 actually introduced some new
problems with debug invariance.
Before that patch ComputeCommonTailLength would move iterators
back, past debug instructions, in order to make ProfitableToMerge
make consistent answers "when one block differs from the other
only by whether debugging pseudos are present at the beginning".
But the changes in ec32dff0b0 undid that by moving the iterators
forward again.
This patch refactors ComputeCommonTailLength. The function was
really complex, considering that the SkipTopCFIAndReturn part
always moved the iterators forward to the first "real" instruction
in the found tail after ec32dff0b0.
The patch also restores the logic to "back past possible debugging
pseudos at beginning of block" to make sure ProfitableToMerge
gives consistent answers independent of DBG_VALUE instructions
before the tail. That is now done by ProfitableToMerge instead of
being hidden as a side-effect in ComputeCommonTailLength.
Reviewers: probinson, yechunliang, jmorse
Reviewed By: jmorse
Subscribers: Orlando, mehdi_amini, dexonsmith, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70091
The intrinsic int_hexagon_S2_asr_i_vh was mapped to S2_asr_r_vh, which
is wrong. The testcase vasrh.select.ll was using an invalid immediate
for that intrinsic. This is not a proper testcase, since at the MIR
level such use of this intrinsic should never appear.
Together with 824b25fc02, this completes the fix for llvm.org/PR44090.
Having a generic CPU removes a warning when creating a subtarget without
the CPU being explicitly specified.
Differential Revision: https://reviews.llvm.org/D70490
Summary:
Add a function attribute to allow the target specific default loop unroll threshold
to be specified on a per-function basis. This allows a front-end to give guidance
where it has insight that is not available to the back-end, while still allowing the
target specific heuristics to also have an effect.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68873
One of current peephole optimiations is to remove SLL/SRL if
the sub register has been zero extended. This phase has two bugs
and one limitations.
First, for the physical subregister used in pseudo insn COPY
like below, it permits incorrect optimization.
%0:gpr32 = COPY $w0
...
%4:gpr = MOV_32_64 %0:gpr32
%5:gpr = SLL_ri %4:gpr(tied-def 0), 32
%6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The $w0 could be from the return value of a previous function call
and its upper 32-bit value might contain some non-zero values.
The same applies to function arguments.
Second, the current code may permits removing SLL/SRA like below:
%0:gpr32 = COPY $w0
%1:gpr32 = COPY %0:gpr32
...
%4:gpr = MOV_32_64 %1:gpr32
%5:gpr = SLL_ri %4:gpr(tied-def 0), 32
%6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The reason is that it did not follow def-use chain to skip all
intermediate 32bit-to-32bit COPY instructions.
The current implementation is also very conservative for PHI
instructions. If any PHI insn component is another PHI or COPY insn,
it will just permit SLL/SRA.
This patch fixed the issue as follows:
- During def/use chain traversal, if any physical register is read,
SLL/SRA will be preserved as these physical registers are mostly
from function return values or current function arguments.
- Recursively visit all COPY and PHI instructions.
This patch lowering jump table, constant pool and block address in assembly.
1. On AIX, jump table index is always relative;
2. Put CPI and JTI into ReadOnlySection until we support unique data sections;
3. Create the temp symbol for block address symbol;
4. Update MIR testcases and add related assembly part;
Differential Revision: https://reviews.llvm.org/D70243
Summary:
Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.
Work remaining: match ADD, etc.
Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast
Reviewed By: lebedev.ri
Subscribers: chfast, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70079
Adds a pattern to ARMInstrMVE.td to use a VQABS
instruction if an equivalent multi-instruction
construct is found.
Differential revision: https://reviews.llvm.org/D70181
Hostcall is a service that allows a kernel to submit requests to the
host using shared buffers, and block until a response is
received. This will eventually replace the shared buffer currently
used for printf, and repurposes the same hidden kernel argument. This
change introduces a new ValueKind in the HSA metadata to represent the
hostcall buffer.
Differential Revision: https://reviews.llvm.org/D70038
AL is only used for varargs on SysV platforms. Don't forward it on
Windows. This allows control flow guard to set up an extra hidden
parameter in RAX, as described in PR44049.
This also has the effect of freeing up RAX for use in virtual member
pointer thunks, which may also be a nice little code size improvement on
Win64.
Fixes PR44049
Reviewers: ajpaverd, efriedma, hans
Differential Revision: https://reviews.llvm.org/D70413
This is a first pass at Custom lowering for these operations. I also updated some of the vector code where it was obviously easy and straightforward. More work needed in follow up.
This enables these operations to be handled with X87 where special rounding control adjustments are needed to perform a truncate.
Still need to fix Promotion in the target independent code in LegalizeDAG.
llrint/llround split into separate test file because we can't make a strict libcall properly yet either and we need to do that when i64 isn't a legal type.
This does not include any isel support. So we still rely on the mutation in SelectionDAGIsel to remove the strict from this stuff later. Except for the X87 stuff which goes through custom nodes that already had chains.
Differential Revision: https://reviews.llvm.org/D70214
Summary:
In particular, 1- and 2-byte loads and stores ignore the pointer tag
when using SP as the base register.
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70341
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.
The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.
We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.
Differential Revision: https://reviews.llvm.org/D69392
This patch implements writing function descriptors and TOC base into
data section, and also add function descriptors(both csect and label)
and TOC base symbols to the symbol table.
This fills in the small family of MVE intrinsics that have nothing to
do with vectors: they implement bit-shift operations on 32- or 64-bit
values held in one or two general-purpose registers. Most of these
shift operations saturate if shifting left, and round to nearest if
shifting right, although LSLL and ASRL behave like ordinary shifts.
When these instructions take a variable shift count in a register,
they pay attention to its sign, so that (for example) LSLL or UQRSHLL
will shift left if given a positive number but right if given a
negative one. That makes even LSLL and ASRL different enough from
standard LLVM IR shift semantics that I couldn't see any better
alternative than to simply model the whole family as a set of
MVE-specific IR intrinsics.
(The //immediate// forms of LSLL and ASRL, on the other hand, do
behave exactly like a standard IR shift of a 64-bit value. In fact,
those forms don't have ACLE intrinsics defined at all, because you can
just write an ordinary C shift operation if you want one of those.)
The 64-bit shifts have to be instruction-selected in C++, because they
deliver two output values. But the 32-bit ones are simple enough that
I could write a DAG isel pattern directly into each Instruction
record.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70319