This is consistent with the target independent intrinsic handling.
Not sure this really matters since we just pull the constant out
using getZExtValue later.
llvm-svn: 367736
With newly added debuginfo type
metadata for preserve_array_access_index() intrinsic,
this patch did the following two things:
(1). checking validity before adding a new access index
to the access chain.
(2). calculating access byte offset in IR phase
BPFAbstractMemberAccess instead of when BTF is emitted.
For (1), the metadata provided by all preserve_*_access_index()
intrinsics are used to check whether the to-be-added type
is a proper struct/union member or array element.
For (2), with all available metadata, calculating access byte
offset becomes easier in BPFAbstractMemberAccess IR phase.
This enables us to remove the unnecessary complexity in
BTFDebug.cpp.
New tests are added for
. user explicit casting to array/structure/union
. global variable (or its dereference) as the source of base
. multi demensional arrays
. array access given a base pointer
. cases where we won't generate relocation if we cannot find
type name.
Differential Revision: https://reviews.llvm.org/D65618
llvm-svn: 367735
These cases can come up when the extending loads combiner doesn't combine a
zext(load) to a zextload op, due to some other operation being in between, which
then gets simplified at a later stage.
Differential Revision: https://reviews.llvm.org/D65360
llvm-svn: 367723
Summary:
As part of this, define DenseMapInfo for MCRegister (and Register while I'm at it)
Depends on D65599
Reviewers: arsenm
Subscribers: MatzeB, qcolombet, jvesely, wdng, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65605
llvm-svn: 367719
Add an equivalent ComplexRendererFns function for SelectNegArithImmed. This
allows us to select immediate adds of -1 by turning them into subtracts.
Update select-binop.mir to show that the pattern works.
Differential Revision: https://reviews.llvm.org/D65460
llvm-svn: 367700
This optimisation isn't generally profitable for ARM, because we can
save/restore many registers in the prologue and epilogue using the PUSH
and POP instructions, but mostly use individual LDR/STR instructions for
other spills.
Differential revision: https://reviews.llvm.org/D64910
llvm-svn: 367670
- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
with a null RegScavenger. Simply not updating the register scavenger
is fine because IPRA only cares about the SavedRegs vector, the acutal
code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
can be clobbered inside a call, even if the compiler can see both
sides, by linker-generated code.
Differential revision: https://reviews.llvm.org/D64908
llvm-svn: 367669
Summary:
When combining `extsw` and `sldi` in `PPCMIPeephole`, we have to check
if `extsw`'s second operand is a virtual register, otherwise we might
get miscompile.
Differential Revision: https://reviews.llvm.org/D65315
llvm-svn: 367645
Summary:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=42441
Used to print:
<unknown>:0: error: Cannot represent a difference across sections
(the location was null).
Now prints:
err.s:20:3: error: Cannot represent a difference across sections
i32.const foo-bar
^
Note: I looked at adding a test for this, but I don't think it is
worth it. We're not testing error formatting in the Wasm backend :)
Reviewers: sbc100, jgravelle-google
Subscribers: dschuff, aheejin, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65602
llvm-svn: 367619
AMDGPU sometimes has legal s16 and <2 x s16> operations, but all
registers are really 32-bit. An unmerge destination really should ben
widened to a 32-bit register. If widening a scalarizing vector with a
target size that matches the vector size, bitcast to integer and
extract the relevant bits with shifts.
I'm not sure if this is the right place for this. This could arguably
be part of widenScalar for the result. I also have a growing feeling
that we're missing a bitcast legalize action.
llvm-svn: 367604
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.
This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...
Differential Revision: https://reviews.llvm.org/D65533
llvm-svn: 367601
A TYPE_INDEX operand (as used by call_indirect) used to be represented
by the InstPrinter as a symbol (e.g. .Ltype_index0@TYPE_INDEX) which
was a bit of a mismatch with the WasmObjectWriter which expects an
unnamed symbol, to receive the signature from and then turn into a
reloc.
There was really no good way to round-trip this information. An earlier
version of this patch tried to attach the signature information using
a .functype, but that ran into trouble when the symbol was re-emitted
without a name. Removing the name was a giant hack also.
The current version changes the assembly syntax to have an inline
signature spec for TYPEINDEX operands that is always unnamed, which
is much more elegant both in syntax and in implementation (as now the
assembler is able to follow the same path as the regular backend)
Reviewers: sbc100, dschuff, aheejin, jgravelle-google, sunfish, tlively
Subscribers: arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64758
llvm-svn: 367590
If an operand of the `lw/sw` instructions is a symbol, these instructions
incorrectly lowered using not-position-independent chain of commands.
For PIC code we should use `lw/addiu` instructions with the `R_MIPS_GOT16`
and `R_MIPS_LO16` relocations respectively. Instead of that LLVM generates
position dependent code with the `R_MIPS_HI16` and `R_MIPS_LO16`
relocations.
This patch provides a fix for the bug by handling PIC case separately in
the `MipsAsmParser::expandMemInst`. The main idea is to generate a chain
of PIC instructions to load a symbol address into a register and then
load the address content.
The fix is not optimal and does not fix all PIC-related problems. This
is a task for subsequent patches.
Differential Revision: https://reviews.llvm.org/D65524
llvm-svn: 367580
This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains.
llvm-svn: 367570
Summary:
GCC Accepts both (reg) and 0(reg) for atomic instruction memory
operands. These instructions do not allow for an offset in their
encoding, so in the latter case, the 0 is silently dropped.
Due to how we have structured the RISCVAsmParser, the easiest way to add
support for parsing this offset is to add a custom AsmOperand and
parser. This parser drops all the parens, and just keeps the register.
This commit also adds a custom printer for these operands, which matches
the GCC canonical printer, printing both `(a0)` and `0(a0)` as `(a0)`.
Reviewers: asb, lewis-revill
Reviewed By: asb
Subscribers: s.egerton, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65205
llvm-svn: 367553
Summary:
While there is always a `Value::replaceAllUsesWith()`,
sometimes the replacement needs to be conditional.
I have only cleaned a few cases where `replaceUsesWithIf()`
could be used, to both add test coverage,
and show that it is actually useful.
Reviewers: jdoerfert, spatel, RKSimon, craig.topper
Reviewed By: jdoerfert
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, george.burgess.iv, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65528
llvm-svn: 367548
The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the
cross-beat nature of the instruction. This adds an earlyclobber to Qd, which
seems to be the same way we deal with this on other instructions like the
write-back on loads and stores.
Differential Revision: https://reviews.llvm.org/D65502
llvm-svn: 367544
Fix an issue where the compiler still allocates an emergency spill slot even
though it already decided to spill an extra callee-save register to use
as a scratch register.
Reviewers: gberry, thegameg, mstorsjo, t.p.northover
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D65504
llvm-svn: 367540
Fold load/store + G_GEP + G_CONSTANT when
immediate in G_CONSTANT fits into 16 bit signed integer.
Differential Revision: https://reviews.llvm.org/D65507
llvm-svn: 367535
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.
Differential Revision: https://reviews.llvm.org/D65063
llvm-svn: 367516
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.
llvm-svn: 367506
This is extremely specific, but saves three instructions when it's
legal. I don't think the code can be usefully generalized.
Differential Revision: https://reviews.llvm.org/D65351
llvm-svn: 367492
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.
It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.
Differential Revision: https://reviews.llvm.org/D65175
llvm-svn: 367491
We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it.
This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk.
Differential Revision: https://reviews.llvm.org/D65538
llvm-svn: 367488
Summary:
This will make it possible to improve IPRA by taking into account
register usage in indirect calls.
NFC yet; this is just laying the groundwork to start building
up patches to take advantage of the information for improved register
allocation.
Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette
Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65488
llvm-svn: 367476
This feature instructs the backend to allow locally defined global variable
addresses to contain a pointer tag in bits 56-63 that will be ignored by
the hardware (i.e. TBI), but may be used by an instrumentation pass such
as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD
sequence that sets bits 48-63 to the corresponding bits of the global, with
the linker bounds check disabled on the ADRP instruction to prevent the tag
from causing a link failure.
This implementation of the feature omits the MOVK when loading from or storing
to a global, which is sufficient for TBI. If the same approach is extended
to MTE, assuming that 0 is not configured as a catch-all tag, we will most
likely also need the MOVK in this case in order to avoid a tag mismatch.
Differential Revision: https://reviews.llvm.org/D65364
llvm-svn: 367475
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().
Differential Revision: https://reviews.llvm.org/D65465
llvm-svn: 367474
Summary:
According to the Armv8.1-M manual CSEL, CSINC, CSINV and CSNEG are
"constrained unpredictable" when SP is used as the source register Rn.
The assembler should diagnose this case.
Reviewers: momchil.velikov, dmgreen, ostannard, simon_tatham, t.p.northover
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65505
llvm-svn: 367433
Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions.
llvm-svn: 367429
Summary:
This adds the 'f' inline assembly constraint, as supported by GCC. An
'f'-constrained operand is passed in a floating point register. Exactly
which kind of floating-point register (32-bit or 64-bit) is decided
based on the operand type and the available standard extensions (-f and
-d, respectively).
This patch adds support in both the clang frontend, and LLVM itself.
Reviewers: asb, lewis-revill
Reviewed By: asb
Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65500
llvm-svn: 367403
Use a switch instead of many isa<> while checking for supported
values. Also be explicit about which cast instructions are supported;
This allows the removal of SIToFP from GenerateSignBits.
llvm-svn: 367402
Summary:
* Loads and stores in SVE2 are gather/scatter not contiguous, fixed by
renaming multiclasses to reflect this and also updated comments.
* Remove aliases from load/store multiclasses that reflect the behaviour
of the original form.
* Fix bug in scatter store implementation, vector list should be used as
input, not output.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65392
llvm-svn: 367398
This adds the required extension to RISC-V's getRegForInlineAsmConstraint
in order to be able to correctly distringuish between the 32 and 64-bit
floating point registers when the generic fX name appears in inlineasm
clobber contraints. It also adds a check to validate that callee saved
floating point registers are only saved in this case when a hard-float
ABI is selected.
Differential Revision: https://reviews.llvm.org/D64751
llvm-svn: 367397
Summary:
* Clarify comment with SVE2 for predicated shifts and move next to other
shift instructions.
* Clarify comments for various instructions.
* Move FCVTX instruction next to other fp conversions.
* Move FLOGB to next to other fp instructions and fix description.
* Remove "cons" from non-constructive multiclass for bitwise shift-right
and accumulate instructions.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65390
llvm-svn: 367396
Summary:
This patch fixes a bug in the following instructions that should have been
implemented as destructive. A destructive instruction is an instruction where
one of the source registers also acts as the destination register. Therefore,
the contents of the source register, when the instruction begins execution, are
replaced by the result of the instruction when the instruction completes
execution [1]:
* SRI/SLI
* EORBT/EORTB
* TBX
* Narrowing top instructions
* FP convert precision instructions
These changes are non-functional from the assembler/diassembler point-of-view
but are necessary for correct codegen.
[1] https://static.docs.arm.com/ddi0584/ae/DDI0584A_e_SVE_supp_armv8A.pdf
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65389
llvm-svn: 367394
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.
llvm-svn: 367382
We had couple places which still return 10 as a maximum
occupancy. Fixed.
Also print comment about occupancy as compiler see it.
Differential Revision: https://reviews.llvm.org/D65423
llvm-svn: 367381
AMDGPU uses some custom code predicates for testing alignments.
I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.
llvm-svn: 367373
Summary:
- Use the passed `DL` directly as retrieving data layout from CS by
checking the called function is not reliable. Under indirect function
call, there is no called function.
Subscribers: jholewinski, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65468
llvm-svn: 367349
Summary:
return_call and return_call_indirect are only valid if the return
types of the callee and caller match. We were previously not enforcing
that, which was producing invalid modules.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65246
llvm-svn: 367339
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.
This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.
llvm-svn: 367326
Addresses number of comment made on D64652 after commiting:
- Reorders function decls in the TargetLoweringObjectFileXCOFF class.
- Fix comment in MCSectionXCOFF to include description of external reference
csects.
- Convert several llvm_unreachables to report_fatal_error
- Convert several dyn_casts to casts as they are expected not to fail.
- Avoid copying DataLayout object.
llvm-svn: 367324
PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' of the SUBV_BROADCAST 'sub-vector' source.
This patch recognizes these cases and extracts the sub-sub-vector instead of trying to broadcast to a type smaller than the 'sub-vector' source.
llvm-svn: 367306
The code is now in a good enough state to pass the bunch of tests that
I have run (after fixing the bugs), so let's enable it by default.
Differential Revision: https://reviews.llvm.org/D65277
llvm-svn: 367297
Revert the hardware loop upon finding a LoopEnd that doesn't target
the loop header, instead of asserting a failure.
Differential Revision: https://reviews.llvm.org/D65268
llvm-svn: 367296
Summary:
The LoadStoreOptimizer was creating instructions with 2
MachineMemOperands, which meant they were assumed to alias with all other instructions,
because MachineInstr:mayAlias() returns true when an instruction has multiple
MachineMemOperands.
This was preventing these instructions from being merged again, and was
giving the scheduler less freedom to reorder them.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65036
llvm-svn: 367237
As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth.
This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc.
llvm-svn: 367232
For llvm/test/MC/RISCV/rv64i-aliases-invalid.s, UBSan reports:
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9: runtime error:
load of value 3879186881, which is not a valid value for type
'RISCVMCExpr::VariantKind'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9 in
It turns out that evaluateConstantImm does not set `VK` and it remains
unitialized when doing comparisons in `isImmXLenLI()`.
Differential Revision: https://reviews.llvm.org/D65347
llvm-svn: 367230
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.
This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.
This might allow D63953 or other similar workarounds to be removed.
Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue
Reviewed By: nhaehnle
Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65141
llvm-svn: 367218
- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
MulCandidate, as it's the only value we care about. This means we
don't have to perform any ugly (and unnecessary) iterations of the
list later on.
llvm-svn: 367208
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)
Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65325
Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.
I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.
There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.
llvm-svn: 367198
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.
llvm-svn: 367195
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.
Differential Revision: https://reviews.llvm.org/D65133
llvm-svn: 367192
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.
Differential Revision: https://reviews.llvm.org/D65103
llvm-svn: 367191
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.
llvm-svn: 367172
Add partial instruction selection for intrinsics like this:
```
declare i32 @llvm.aarch64.stlxr(i64, i32*)
```
(This only handles the case where a G_ZEXT is feeding the intrinsic.)
Also make sure that the added store instruction actually has the memory op from
the original G_STORE.
Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D65355
llvm-svn: 367163
Adds machine operand lowering for MCSymbolSDNodes to the PowerPC
backend. This is needed to produce call instructions in assembly for AIX
because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for
asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet.
Differential Revision: https://reviews.llvm.org/D63738
llvm-svn: 367133
Summary:
The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2
extensions
Patch by Maciej Gabka.
Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin
Reviewed By: SjoerdMeijer, rengolin
Differential Revision: https://reviews.llvm.org/D65327
llvm-svn: 367124
In preperation for AIX support in FrameLowering: replace a number of literal
'8' that represent the stack offset of the condition register save area with
a member in PPCFrameLowering.
Patch by Chris Bowler.
llvm-svn: 367111
Void return used to have unsigned with value 0 for virtual register
but with addition of Register class and changes to arguments to lowerCall
this is no longer valid.
Check for void return by inspecting the Ty field in OrigRet.
Differential Revision: https://reviews.llvm.org/D65321
llvm-svn: 367107
Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm
only if there is other WQM computation in the shader.
Reviewers: nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64935
llvm-svn: 367097
Embedded Trace Extension and Trace Buffer Extension are optional
future architecture extensions.
(cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools)
Their system registers are documented here:
https://developer.arm.com/docs/ddi0601/a
ETE shares register names with ETM. One exception is the ETE
TRCEXTINSELR0 register, which has the same encoding as the ETM
TRCEXTINSELR register (but different semantics). This patch treats
them as aliases: the assembler will accept both names, emitting
identical encoding, and the disassembler will keep disassembling
to TRCEXRINSELR.
Differential Revision: https://reviews.llvm.org/D63707
llvm-svn: 367093
Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair,
so add an implicit def to these pseudo instructions in case that WLS
and LE aren't generated.
Differential Revision: https://reviews.llvm.org/D65275
llvm-svn: 367089
Summary:
This is an alternate approach to D57970.
Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.
This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.
Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper,
RKSimon
Subscribers: rnk, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63396
Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 367088
handleAssignments gives up pretty easily on structs, and i8 values for
some reason. The other case that doesn't work is when an implicit sret
needs to be inserted if the return size exceeds the number of return
registers.
llvm-svn: 367082
Currently, the CO-RE offset relocation does not work
if any struct/union member or array element is a typedef.
For example,
typedef const int arr_t[7];
struct input {
arr_t a;
};
func(...) {
struct input *in = ...;
... __builtin_preserve_access_index(&in->a[1]) ...
}
The BPF backend calculated default offset is 0 while
4 is the correct answer. Similar issues exist for struct/union
typedef's.
When getting struct/union member or array element type,
we should trace down to the type by skipping typedef
and qualifiers const/volatile as this is what clang did
to generate getelementptr instructions.
(const/volatile member type qualifiers are already
ignored by clang.)
This patch fixed this issue, for each access index,
skipping typedef and const/volatile/restrict BTF types.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D65259
llvm-svn: 367062
Currently, we expect the CO-RE offset relocation records
a string encoding the original getelementptr access index,
so kernel bpf loader can decode it correctly.
For example,
struct s { int a; int b; };
struct t { int c; int d; };
#define _(x) (__builtin_preserve_access_index(x))
int get_value(const void *addr1, const void *addr2);
int test(struct s *arg1, struct t *arg2) {
return get_value(_(&arg1->b), _(&arg2->d));
}
We expect two offset relocations:
reloc 1: type s, access index 0, 1
reloc 2: type t, access index 0, 1
Two globals are created to retain access indexes for the
above two relocations with global variable names.
The first global has a name "0:1:". Unfortunately,
the second global has the name "0:1:.1" as the llvm
internals automatically add suffix ".1" to a global
with the same name. Later on, the BPF peels the last
character and record "0:1" and "0:1:." in the
relocation table.
This is not desirable. BPF backend could use the global
variable suffix knowledge to generate correct access str.
This patch rather took an approach not relying on
that knowledge. It generates "s:0:1:" and "t:0:1:" to
avoid global variable suffixes and later on generate
correct index access string "0:1" for both records.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D65258
llvm-svn: 367030
Summary:
- As LCSSA is turned on just before isel, it may create PHI of the flow,
which is consumed by pseudo structurized CFG instructions. When that
PHIs are eliminated in O0, COPY may be placed wrongly as the these
pseudo structurized CFG instructions are considering prologue of MBB.
- Run extra `unreachable-mbb-elimination` at the end of isel to clean up
PHIs.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64353
llvm-svn: 367023
... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue
supporting the exsting behaviour of not requiring an explicit size
specifier. The preferred disasembly is *with* the specifier.
This is implemented by redefining intruction forms to require vector predicates
with explicit size and adding aliases, which allow a predicate with no size.
Differential Revision: https://reviews.llvm.org/D65145
llvm-svn: 367019
Summary:
In PostRA phase, we often have to find out the most recent definition
of a register. This patch adds getDefMIPostRA so that other methods
can use it rather than implementing it repeatedly.
Differential Revision: https://reviews.llvm.org/D65131
llvm-svn: 366990
Summary:
Add a new method which tries to compute the target address referenced by an operand.
This patch supports x86_64 RIP-relative addressing for now.
It is necessary to print referenced symbol names in llvm-objdump.
Reviewers: andreadb, MaskRay, grosbach, jgalenson, craig.topper
Reviewed By: MaskRay, craig.topper
Subscribers: bcain, rupprecht, jhenderson, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63847
llvm-svn: 366987
Before, we weren't able to select things like this for G_GEP:
add x0, x8, #8
And instead we'd materialize the 8.
This teaches GISel to do that. It gives some considerable code size savings
on 252.eon-- about 4%!
Differential Revision: https://reviews.llvm.org/D65248
llvm-svn: 366959
Throughout the legalizerinfo we currently make the assumption that the target
has neon and FP target features available. Fixing it will require a refactor of
the whole thing, so until then make sure we fall back.
Works around PR42734
Differential Revision: https://reviews.llvm.org/D65244
llvm-svn: 366957
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH
InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.
Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.
Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.
Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm
Reviewed By: spatel
Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62871
llvm-svn: 366955
If we have a G_MUL, and either the LHS or the RHS of that mul is the legal
shift value for a load addressing mode, we can fold it into the load.
This gives some code size savings on some SPEC tests. The best are around 2%
on 300.twolf and 3% on 254.gap.
Differential Revision: https://reviews.llvm.org/D65173
llvm-svn: 366954
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.
The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.
Differential Revision: https://reviews.llvm.org/D65167
llvm-svn: 366951
To support prefetch mode 3 we need to pad current
cacheline and fill 3 cachelines after. Current padding
is only sufficient for mode 2.
Differential Revision: https://reviews.llvm.org/D65236
llvm-svn: 366938
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.
Differential Revision: https://reviews.llvm.org/D65072
llvm-svn: 366934
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:
e.g. <8 x i32> reduction pattern in a <16 x i32> vector:
<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.
I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.
Fixes the x86 partial reduction sum cases in PR33758 and PR42023.
Differential Revision: https://reviews.llvm.org/D65047
llvm-svn: 366933
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.
Differential Revision: https://reviews.llvm.org/D65066
llvm-svn: 366931
Fix an off-by-one error which made us not look at the last element of the
zero vector. This caused a miscompile in 188.ammp.
Differential Revision: https://reviews.llvm.org/D65168
llvm-svn: 366930
MVE VCMP instructions can use a general purpose register as the second operand.
This adds the combines for it, selecting from a compare of a vdup.
Differential Revision: https://reviews.llvm.org/D65061
llvm-svn: 366924
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.
Differential Revision: https://reviews.llvm.org/D65059
llvm-svn: 366920
Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where
the second vcmp becomes predicated on the first.
The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the
VPTBlockPass.
Differential Revision: https://reviews.llvm.org/D65058
llvm-svn: 366910
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.
Some original code by David Sherwood
Differential Revision: https://reviews.llvm.org/D65054
llvm-svn: 366909
This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It
does this by going into and out of GPRs, doing the operation on scalars.
Code by David Sherwood.
Differential Revision: https://reviews.llvm.org/D65053
llvm-svn: 366907