Commit Graph

65490 Commits

Author SHA1 Message Date
Krzysztof Parzyszek cd997689f2 [Hexagon] Fix isTypeForHVX to recognize floating point types
Co-authored-by: Sumanth Gundapaneni <sgundapa@quicinc.com>
2021-12-30 10:01:05 -08:00
Krzysztof Parzyszek 23423638cc [Hexagon] Handle HVX/FP shuffles, insertion and extraction
Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>
2021-12-30 08:44:10 -08:00
Krzysztof Parzyszek 95c7dd8810 Revert "[Hexagon] Don't build two halves of HVX vector in parallel"
This reverts commit ba07f300c6.

A build-vector sequence is made of pairs: rotate+insert. When constructing
a single vector, this results in a chain of 2*N instructions. The rotate
operation is a permute operation, but the insert uses a multiplication
resource: insert and rotate can execute in the same cycle, but obviously
they cannot operate on the same vector. The original halving idea is still
beneficial since it does allow for insert/rotate overlap, and for hiding
insert's latency.
2021-12-30 07:57:11 -08:00
jacquesguan 128c6ed73b [RISCV] Teach VSETVLInsert to eliminate redundant vsetvli for vmv.s.x and vfmv.s.f.
Differential Revision: https://reviews.llvm.org/D116307
2021-12-30 17:16:18 +08:00
Sjoerd Meijer 550d90e692 Revert "[AArch64] Add a tablegen pattern for UZP2."
This reverts commit ada028c32f.

A performance regression was reported that we need to investigate:

https://github.com/llvm/llvm-project/issues/52919
2021-12-30 09:15:46 +00:00
jacquesguan 1dd5e6fed5 [RISCV] Use vmv.s.x instead of vfmv.s.f when the floating point scalar is 0.
Use integer vector scalar move instruction when move 0 to avoid add a integer-float move instruction.

Differential Revision: https://reviews.llvm.org/D116365
2021-12-30 10:16:54 +08:00
Chenbing.Zheng 43c8296cda [RISCV] Refactor immediate comparison instructions patterns
The patterns of the immediate comparison instruction is rewrite here, and put similar code to a class.
Do not change any function of the original code, making the code more concise.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116215
2021-12-30 09:31:01 +08:00
Krzysztof Parzyszek ba07f300c6 [Hexagon] Don't build two halves of HVX vector in parallel
There can only be one permute operations per packet, so this actually
pessimizes the code (due to the extra "or").
2021-12-29 11:00:01 -08:00
Joshua Herrera 505d57486e [Hexagon] Improve BUILD_VECTOR codegen
For vectors with repeating values, old codegen would rotate and insert
every duplicate element. This patch replaces that behavior with a splat
of the most common element, vinsert/vror only occur when needed.
2021-12-29 10:18:21 -08:00
Craig Topper 015ff729cb [RISCV] Add a few more instructions to hasAllNBitUsers. 2021-12-29 09:17:47 -08:00
Krzysztof Parzyszek 4df2aba294 [Hexagon] Calling conventions for floating point vectors
They are the same as for the other HVX vectors, but types need to be
listed explicitly. Also, add a detailed codegen testcase.

Co-authored-by: Abhikrant Sharma <quic_abhikran@quicinc.com>
2021-12-29 09:01:07 -08:00
Krzysztof Parzyszek 2ce586bc49 [Hexagon] Handle floating point splats
Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>
2021-12-29 06:52:24 -08:00
Krzysztof Parzyszek 33fc675e16 [Hexagon] Handle floating point vector loads/stores 2021-12-29 05:52:39 -08:00
Kazu Hirata 8414321bec [Hexagon] Use range-based for loops (NFC) 2021-12-28 23:47:25 -08:00
Krzysztof Parzyszek 6a6ac3b36f [Hexagon] Support BUILD_VECTOR of floating point HVX vectors
Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>
Co-authored-by: Ankit Aggarwal <aankit@quicinc.com>
2021-12-28 14:59:08 -08:00
Krzysztof Parzyszek 7df136bcf2 [Hexagon] Delete unused declaration of LowerHvxMul, NFC 2021-12-28 11:36:07 -08:00
David Green 43e500d791 [AArch64] Minor AArch64MIPeepholeOpt cleanup. NFC
We should always be in SSA form when running the pass, so turn a check
into an assert.
2021-12-28 19:10:01 +00:00
Brian Cain 1e7bd93ff2 [Hexagon] Add HexagonMCInstrInfo::IsABranchingInst, NFC 2021-12-28 09:51:27 -08:00
Brian Cain c5327137df [Hexagon] Fix for producer operands search w/z-reg
Z-register does not show up in defs, so checks searching
for the def operand must look for a different def index
than they would normally.
2021-12-28 09:19:59 -08:00
Kazu Hirata 5a667c0e74 [llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2021-12-28 08:52:25 -08:00
Krzysztof Parzyszek 648246cce6 [Hexagon] Remove isPredicateRegister in favor of isPredReg, NFC
HexagonMCChecker has its own function isPredicateRegister, which does
the same thing as HexagonMCInstrInfo::isPredReg.
2021-12-28 08:40:40 -08:00
Hsiangkai Wang a1c7ddf926 [RISCV] Support passing scalable vectur values through the stack.
After consuming all vector registers, the scalable vector values will be
passed indirectly. The pointer values will be saved in general
registers. If all general registers are used up, we will report an error to
notify users the compiler does not support passing scalable vector
values through the stack. In this patch, we remove the restriction. After
all general registers are used up, we use the stack to save the
pointers which point to the indirect passed scalable vector values.

Differential Revision: https://reviews.llvm.org/D116310
2021-12-28 09:26:36 +08:00
Kazu Hirata 8445883327 [llvm] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-12-27 15:58:03 -08:00
David Green 2ec3ca7477 [ARM] Extend IsCMPZCSINC to handle CMOV
A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp',
and can be treated the same in IsCMPZCSINC added in D114013. This allows
us to remove the unnecessary CMOV in the same way that we could remove a
CSINC.

Differential Revision: https://reviews.llvm.org/D115188
2021-12-27 14:15:03 +00:00
Simon Pilgrim a0a0eb192e [X86] Use WriteVecMove scheduler classes for VPMOVM2* instructions
These match the port behaviour of reg-reg predicated xmm/ymm/zmm moves

Fixes #34958
2021-12-27 13:21:29 +00:00
Simon Pilgrim 29475e0286 [X86] Add scheduler classes for zmm vector reg-reg move instructions
Basic zmm reg-reg moves (with predication) are more port limited than xmm/ymm moves, so we need to add a separate class for them.

We still appear to be missing move-elimination patterns for most of the intel models, which looks to be one of the main diffs for basic codegen analysis between llvm-mca and uops.info

Load/stores are a bit messier and might be better handled as overrides.
2021-12-27 12:13:42 +00:00
Nikita Popov 7c3cf4c2c0 [Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660)
X86 allows inlining functions if the callee target features are a
subset of the caller target features. This ensures that we don't
inline something into a caller that does not support it.

However, this does not account for possible call ABI mismatches as
a result of inlining. If a call passing a vector argument was
originally in a -avx function, calling another -avx function, the
vector is passed in xmm. If we now inline it into a +avx function,
then it will be passed in ymm, even though the callee expects it in xmm.

Fix this by scanning over all calls in the function and checking
whether ABI incompatibility is possible. Calls that only pass scalar
types are excluded, as I believe those always use the same ABI
independent of target features.

Fixes https://github.com/llvm/llvm-project/issues/52660.

Differential Revision: https://reviews.llvm.org/D116036
2021-12-27 09:36:21 +01:00
Kazu Hirata 0a5788ab57 [Target] Use range-based for loops (NFC) 2021-12-26 23:49:38 -08:00
Hsiangkai Wang 5d47e7d768 [RISCV] Convert whole register copies as the source defined explicitly.
The implicit defines may come from a partial define in an instruction.
It does not mean the defining instruction and the COPY instruction have
the same vl and vtype. When the source comes from the implicit defines,
do not convert the whole register copies to vmv.v.v.

Differential Revision: https://reviews.llvm.org/D115866
2021-12-27 13:59:49 +08:00
Shao-Ce SUN 70a98008ea [RISCV] Reduce repetitive codes in flw, fsw
Trying to improve code reuse in F,D,Zfh *.td files.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116089
2021-12-27 09:29:35 +08:00
Kazu Hirata e7774f499b Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-12-26 14:26:44 -08:00
Kazu Hirata 0542d15211 Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-12-26 09:39:26 -08:00
Kazu Hirata 2d303e6781 Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-12-24 23:17:54 -08:00
alex-t 8020458c5d [AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1 pattern
In 'trunc' i16/32/64 to i1 pattern the 'and $src, 1' node supply operand to 'setcc'.
The latter is selected to S_CMP_EQ/V_CMP_EQ dependent on the divergence. In case the 'and' is scalar
and 'setcc' is divergent, we need VGPR to SGPR copy to adjust input operand for V_CMP_EQ.
This patch changes the S_AND_B32 to V_AND_B32_e64 in the 'trunc to i1' divergent patterns.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116241
2021-12-24 18:24:49 +03:00
Simon Pilgrim 159da56737 [X86] Enable v32i16 ISD::ROTL/ROTR lowering on AVX512BW targets 2021-12-24 13:30:52 +00:00
Alexandros Lamprineas bb84dd8159 [AArch64] Add a tablegen pattern for RADDHN/RADDHN2.
Converts RSHRN/RSHRN2 to RADDHN/RADDHN2 when the shift amount is half
the width of the vector element. The latter has twice the throughput
and half the latency on Arm out-of-order cores. Setting up the zero
register adds no latency.

Differential Revision: https://reviews.llvm.org/D116166
2021-12-24 11:13:25 +00:00
Phoebe Wang 24c68ea1eb Reland "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables"
This reverts commit a954558e87.

Thanks Yuanfang's help. I think I found the root cause of the buildbot
fail.

The failed test has both Memory and Immediate X86Operand. All data of
different operand kinds share the same memory space by a union
definition. So it has chance we get the wrong result if we don't check
the operand kind.

It's probably it happen to be the correct value in my local environment
so that I can't reproduce the fail.

Differential Revision: https://reviews.llvm.org/D116090
2021-12-24 17:42:51 +08:00
Jim Lin 02478a26f2 [RISCV] Use DAG variable directly instead of DCI.DAG
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116087
2021-12-24 13:06:55 +08:00
Craig Topper 0a35211b34 [RISCV] Don't allow vector types to be used with inline asm 'r' constraint
The 'r' constraint uses the GPR class. There is generic support
for bitcasting and extending/truncating non-integer VTs to the
required integer VT. This doesn't work for scalable vectors and
instead crashes.

To prevent this, explicitly reject vectors. Fixed vectors might
work without crashing, but it doesn't seem worthwhile to allow.

While there remove an unnecessary level of indentation in the
"vr" and "vm" constraint handling.

Differential Revision: https://reviews.llvm.org/D115810
2021-12-23 20:32:36 -06:00
Victor Perez 10b3675aa9 [RISCV][VP] Lower mask vector VP AND/OR/XOR to RVV instructions
For fixed and scalable vectors, each intrinsic x is lowered to vmx.mm,
dropping the mask, which is safe to do as masked-off elements are
undef anyway.

Differential Revision: https://reviews.llvm.org/D115339
2021-12-23 15:02:32 -06:00
Brendon Cahoon d45a247998 [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect
when the frame index is shared by multiple objects, which may
occur due to stack slot coloring. The problem is that subsequent
code that processes the other object will assert because the stack
frame index is marked dead.

Removing dead frame indices is needed prior to stack slot
coloring, which is what happens with SGPR to VGPR spills. These
spills are lowered prior to stack slot coloring, but the VGPR
to AGPR spills are processed afterwards during the Prolog/Epilog
Inserter pass. This patch marks the VGPR to AGPR spill slot as
dead if the slot is not used by another object.

Differential Revision: https://reviews.llvm.org/D115996
2021-12-23 11:09:19 -06:00
Krzysztof Parzyszek f766bc093c [Hexagon] Introduce Hexagon v69 ISA 2021-12-23 08:46:03 -08:00
Craig Topper 7704c503ec [RISCV] Use positive 0.0 for the neutral element in fadd reductions if nsz is present.
-0.0 requires a constant pool. +0.0 can be made with vmv.v.x x0.

Not doing this in getNeutralElement for fear of changing other targets.

Differential Revision: https://reviews.llvm.org/D115978
2021-12-23 10:38:00 -06:00
Craig Topper b7b260e19a [RISCV] Support strict FP conversion operations.
This adds support for strict conversions between fp types and between
integer and fp.

NOTE: RISCV has static rounding mode instructions, but the constrainted
intrinsic metadata is not used to select static rounding modes. Dynamic
rounding mode is always used.

Differential Revision: https://reviews.llvm.org/D115997
2021-12-23 09:40:58 -06:00
Alexandros Lamprineas e70ef6d924 [AArch64] Add a tablegen pattern for SQXTN2.
Converts concat_vectors(Vd, trunc(smin(smax Vm, -2^n), 2^n-1) to
sqxtn2(Vd, Vm). Deliberately not handling v2i64 ~> v2i32 as the
min/max nodes are not legal (same thing we did for the SQXTN
patterns in https://reviews.llvm.org/D103263).

Differential Revision: https://reviews.llvm.org/D116105
2021-12-23 15:19:13 +00:00
Simon Pilgrim 71fc4bbdd2 [X86][SSE] Add ISD::ROTR support
Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift

This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.
2021-12-23 15:07:30 +00:00
Craig Topper a9486a40f7 [RISCV] Disable interleaving scalar loops in the loop vectorizer.
The loop vectorizer can interleave scalar loops even if it doesn't
vectorize them. I don't believe we intended to enable this when
we enabled interleaving for vector instructions.

Disable interleaving for VF=1 like X86 and AMDGPU already do. Test
lifted from AMDGPU.

Differential Revision: https://reviews.llvm.org/D115975
2021-12-23 08:37:24 -06:00
Simon Pilgrim a3f50fb06d [X86] isVectorShiftByScalarCheap - vXi8 select(shift(x,splat0),shift(x,splat1)) is better than shift(x,select(splat0,splat1))
Even though we don't have vXi8 vector shifts (apart from XOP), it is still better to prefer shift (or funnel-shift/rotate) by scalar where possible.

https://llvm.godbolt.org/z/6ss6ffTxv

Differential Revision: https://reviews.llvm.org/D116191
2021-12-23 14:30:02 +00:00
Petar Avramovic 29f88b93fd [GlobalISel] Rework more/fewer elements for vectors
Artifact combiner is not able to access individual elements after using
LCMTy style merge/unmerge, extract and insert to change vector number of
elements (pad with undef or split to sub-vector instructions).
Use unmerge to individual elements instead and then merge elements into
requested types.
Change argument lowering for vectors and moreElementsVector to use
buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements.
FewerElementsVector had a few helpers that had different behavior,
introduce new helper for most of the opcodes.
FewerElementsVector helper is more flexible since it can create leftover
instruction smaller then requested type (useful in case target wants to
avoid pad with undef and use fewer registers). If target does not want
leftover of different type it should call more elements first.
Some helpers were performing more elements first to have split without
leftover. Opcodes that used this helper use clampMaxNumElementsStrict
(does more elements first) in LegalizerInfo to avoid test changes.
Fixes failures caused by failing to combine artifacts created during
more/fewer elements vector.

Differential Revision: https://reviews.llvm.org/D114198
2021-12-23 14:30:02 +01:00
Jay Foad 74ce7ff5dc [AMDGPU] Remove a TODO that was done by D98081 2021-12-23 10:19:37 +00:00
Phoebe Wang a954558e87 Revert "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables"
This reverts commit 682d01a1c1.

Revert for buildbot fails.
2021-12-23 12:44:33 +08:00
Phoebe Wang 682d01a1c1 [X86][MS-InlineAsm] Use exact conditions to recognize MS global variables
D115225 tried to roll back the effects on symbols of MS inline asm
introduced by D113096. But the combination of the conditions cannot
match all the changes. As a result, there are still fails after the
patch.

This patch fixes the problem by checking the exact conditions for MS
global variables, i.e., variable (by FrontendSize != 0) + non rip/eip
(by DefaultBaseReg == 0), so that we can fully roll back for D113096.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D116090
2021-12-23 11:46:03 +08:00
jacquesguan 28a3e7dea2 [RISCV] Override hasAndNotCompare to use more andn when have Zbb extension.
Enable transform (X & Y) == Y ---> (~X & Y) == 0 and (X & Y) != Y ---> (~X & Y) != 0 when have Zbb extension to use more andn instruction.

Differential Revision: https://reviews.llvm.org/D115922
2021-12-23 10:42:20 +08:00
alex-t e4103c91f8 [AMDGPU] Select build_vector DAG nodes according to the divergence
This change enables divergence-driven instruction selection for the build_vector DAG nodes.
It also enables packed i16 instructions for GFX9.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116187
2021-12-23 02:27:12 +03:00
Simon Pilgrim 8b58344efb Remove superfluous semicolon.
Missed by MSVC
2021-12-22 17:42:45 +00:00
Simon Pilgrim 4639461531 [DAG][X86] Add TargetLowering::isSplatValueForTargetNode override
Add callback to enable us to test target nodes if they are splat vectors

Added some basic X86ISD::VBROADCAST + X86ISD::VBROADCAST_LOAD handling
2021-12-22 16:57:44 +00:00
Ron Lieberman 09b53296cf Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1f.

 Failed amdgpu runtime buildbot # 3514
2021-12-22 11:39:28 -05:00
RamNalamothu 9075009d1f [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652
2021-12-22 20:51:12 +05:30
Nikita Popov 3b0f5a4856 [Mips16HardFloat] Simplify attribute change (NFC)
As we're only removing and adding a single attribute, there is no
need to go through AttrBuilder.
2021-12-22 09:41:11 +01:00
Nikita Popov f5ac23b5ae [ArgPromotion][TTI] Pass types to ABI compatibility hook
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.

This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.

This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).

Differential Revision: https://reviews.llvm.org/D116031
2021-12-22 09:37:51 +01:00
Kazu Hirata 9db0e21660 [llvm] Use depth_first (NFC) 2021-12-21 22:28:48 -08:00
Nemanja Ivanovic 1674d9b6b2 [PowerPC] Fix vector equality comparison for v2i64 pre-Power8
The current code makes the assumption that equality
comparison can be performed with a word comparison
instruction. While this is true if the entire 64-bit
results are used, it does not generally work. It is
possible that the low order words and high order
words produce different results and a user of only
one will get the wrong result.

This patch adds an and of the result words so that
each word has the result of the comparison of the
entire doubleword that contains it.

Differential revision: https://reviews.llvm.org/D115678
2021-12-21 14:28:41 -06:00
Nemanja Ivanovic a3ea9052d6 [PowerPC] Do not increase cost for getUserCost with MMA types
Commit 150681f increases
cost of producing MMA types (vector pair and quad).
However, it increases the cost for getUserCost() which is
used in unrolling. As a result, loops that contain these
types already (from the user code) cannot be unrolled
(even with the user's unroll pragma). This was an unintended
sideeffect. Reverting that portion of the commit to allow
unrolling such loops.

Differential revision: https://reviews.llvm.org/D115424
2021-12-21 13:36:08 -06:00
Alexandros Lamprineas ada028c32f [AArch64] Add a tablegen pattern for UZP2.
Converts concat_vectors((trunc (lshr)), (trunc (lshr))) to UZP2
when the shift amount is half the width of the vector element.

Differential Revision: https://reviews.llvm.org/D116021
2021-12-21 16:21:46 +00:00
Simon Pilgrim dfa2ad1ad8 [X86] getTargetVShiftNode - remove shift-by-constant handling.
Move shift-by-constant handling and move it into its only user (VSHIFT intrinsics lowering).

This is some prep-work for getTargetVShiftNode to no longer take a scalar shift amount - we're introducing temporary ISD::EXTRACT_VECTOR_ELT nodes via SelectionDAG::getSplatValue to accommodate this which can cause various issues, including unnecessary scalarization and xmm->gpr->xmm transfers, and causes problems for 32-bit codegen if we fail to remove an (illegal) i64 scalar extracted from a (legal) vXi64 vector.
2021-12-21 13:16:48 +00:00
Simon Pilgrim 0caf8a3daf [X86] LowerRotate - enable vXi32 splat handling
Pull out the "rotl(x,y) --> (unpack(x,x) << zext(splat(y % bw))) >> bw" special case from vXi8 lowering so we can reuse it for vXi32 types as well.

There's still some regressions with vXi16 to handle before this becomes entirely general.

It also allows us to remove the now unnecessary hack for handling amount-modulo before splatting.
2021-12-21 11:19:23 +00:00
Andrew Wei 03dc2975d0 [AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2
Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D113376
2021-12-21 18:39:09 +08:00
Simon Moll b2cea573c9 [VE] FADD,FSUB,FMUL,FDIV v256f32|f64 isel and tests
Depends on D115940 for the `Binary_rv_vr_vv` pattern class op isel
fragment used for divisions.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D116035
2021-12-21 09:15:31 +01:00
Simon Moll 8c51812913 [VE] U|SDIV v256i32|64 isel and tests
Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115940
2021-12-21 08:51:01 +01:00
Kazu Hirata 500c4b68dc [llvm] Construct SmallVector with iterator ranges (NFC) 2021-12-20 23:43:24 -08:00
Kazu Hirata c5cf7d910e [ARM] Use range-based for loops (NFC) 2021-12-20 23:06:47 -08:00
Yonghong Song 76b7d73429 BPF: report better error message for BTF_TYPE_ID_REMOTE relo failure
Matteo Croce reported a bpf backend fatal error in
https://github.com/llvm/llvm-project/issues/52779

A simplified case looks like:
  $ cat bug.c
  extern int do_smth(int);
  int test() {
    return __builtin_btf_type_id(*(typeof(do_smth) *)do_smth, 1);
  }
  $ clang -target bpf -O2 -g -c bug.c
  fatal error: error in backend: Empty type name for BTF_TYPE_ID_REMOTE reloc
  ...

The reason for the fatal error is that the relocation is against
a DISubroutineType like type 13 below:
  !10 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
  !11 = !{}
  !12 = !DILocation(line: 3, column: 10, scope: !7)
  !13 = !DISubroutineType(types: !14)
  !14 = !{!10, !10}

The DISubroutineType doesn't have a name and there is no way for
downstream bpfloader/kernel to do proper relocation for it.

But we can improve error message to be more specific for this case.
The patch improved the error message to be:
  fatal error: error in backend: SubroutineType not supported for BTF_TYPE_ID_REMOTE reloc

Differential Revision: https://reviews.llvm.org/D116063
2021-12-20 21:06:19 -08:00
Esme-Yi b66328701a [PowerPC][llvm-objdump] enable --symbolize-operands for PowerPC ELF/XCOFF.
Summary: When disassembling, symbolize a branch target operand
to print a label instead of a real address.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D114492
2021-12-21 04:17:57 +00:00
Matt Arsenault c222972442 AMDGPU/GlobalISel: Stop using NarrowScalar/FewerElements for unaligned splitting
These actions should only be used for adjusting the register types
(and the memory type as needed to satisfy the register
type). Unaligned accesses should be split as a type of lowering.

This has the effect of improving the code in many cases since now we
produce zextloads instead of separate loads with ands. The load/store
legality rules still seem far more complicated than necessary though.
2021-12-20 18:07:11 -05:00
Kazu Hirata de90490060 Revert "[ARM] Use range-based for loops (NFC)"
This reverts commit 93d79cac2e.

This patch seems to break
llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.
2021-12-20 10:51:36 -08:00
alex-t 19727e31fb [AMDGPU] Enable divergence predicates for ctlz/cttz
ctlz/cttz get lowered to the set of target opcodes
This change enables the ISel to select SALU or VALU form according to the SDNode divergence.
CTLZ - S_FLBIT_I32_B32 if uniform and V_FFBH_U32_e64 if divergent
CTTZ - S_FF1_I32_B32   if uniform and V_FFBL_B32_e64 if divergent
Also @llvm.amdgcn.sffbh.i32 gets lowered to S_FLBIT_I32 if uniform and V_FFBH_I32_e64 if divergent
NOTE: 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B64 are not currently supported by the DAG ISel.
ctlz/cttz with i64 input are split into two 32bit instructions. Nevertheless, they already have the patterns
and were equipped with the divergence predicates to make sure they will be selected correctly when enabled.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116044
2021-12-20 20:53:48 +03:00
Sander de Smalen b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Nemanja Ivanovic 2fb9029f26 [PowerPC] Support hwsync extended mnemonic
This mnemonic has been supported by GAS for years and
it was added to the PowerPC ISA as of ISA 3.1. We will
support the mnemonic to be compatible with GAS.
2021-12-20 10:08:31 -06:00
Jay Foad 8b997adc64 [AMDGPU] Remove dead code after D109052 2021-12-20 14:20:02 +00:00
alex-t 98d09705e1 [AMDGPU] Re-enabling divergence predicates for min/max
This patch enables divergence predicates for min/max nodes.
It makes ISD::MIN/MAX selected to S_MIN_I(U)32/S_MAX_I(U)32 or V_MIN_I(U)32_e64/V_MAX_I(U)32_e64

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D115954
2021-12-20 16:10:55 +03:00
alex-t 1448aa9dbd [AMDGPU] Expand not pattern according to the XOR node divergence
The "not" is defined as XOR $src -1.
 We need to transform this pattern to either S_NOT_B32 or V_NOT_B32_e32
 dependent on the "xor" node divergence.

Reviewed By: rampitec, foad

Differential Revision: https://reviews.llvm.org/D115884
2021-12-20 14:41:38 +03:00
Kazu Hirata 93d79cac2e [ARM] Use range-based for loops (NFC) 2021-12-20 00:04:53 -08:00
Shao-Ce SUN 68bc6d7cae [RISCV] Remove Zvamo Extention
Based on D111692. Zvamo is not part of the 1.0 V spec. Remove it.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D115709
2021-12-20 10:28:39 +08:00
David Green 4ece4cd77e [ARM] Fold away CMP/CSINC from CMOV
This makes use of the code in D114013 to fold away unnecessary
CMPZ/CSINC starting from a CMOV, in a similar way to how we fold away
CSINV/CSINC/etc

Differential Revision: https://reviews.llvm.org/D115185
2021-12-19 21:53:50 +00:00
Simon Pilgrim 47bd9ebda4 [X86][AVX512] cvt_by_vec_width - don't hardcode the schedule class. NFC.
Pull out the WriteMove schedule class into the cvt_mask_by_elt_width wrapper
2021-12-19 16:40:12 +00:00
Simon Pilgrim 67cce1ceee [X86] Adjust some IceLake fp shuffle schedule classes (PR48110)
The IceLake scheduler model is still mainly a copy of the SkylakeServer model.

This patch adjusts the fp shuffle classes to account for most instructions now working on Port 1 as well as Port 5.

This is based off Agner + uops.info as well as the PR48110 report.

Differential Revision: https://reviews.llvm.org/D115752
2021-12-19 13:00:11 +00:00
Jakub Kuderski 1e93f3895f [AMDGPU] Use enum_seq to iterator over InstCounterTypes. NFC.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115900
2021-12-18 16:07:28 -05:00
Jakub Kuderski d9ae852fcc [AMDGPU] Fix data race in SIInsertWaitcnts
The race condition happened when two pass managers ran on two different modules but modified/read the global variables.

To address this, I considered using singletons and freestanding functions to allow getting/setting `HardwareLimits` and `RegisterEncoding`, or making it local to the pass. I chose the latter and made it a member of `WaitcntsBrackets`, to minimizes the amount of global state.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115896
2021-12-18 16:03:09 -05:00
Michael Berg f95ee6074a [RISCV] Add target specific loop unrolling and peeling preferences
Both these preference helper functions have initial support with
this change. The loop unrolling preferences are set with initial
settings to control thresholds, size and attributes of loops to
unroll with some tuning done.  The peeling preferences may need
some tuning as well as the initial support looks much like what
other architectures utilize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113798
2021-12-18 12:54:50 -08:00
Kazu Hirata 766d32f582 [Hexagon] Use is_contained (NFC) 2021-12-17 14:34:30 -08:00
Kazu Hirata 26bd534a79 [llvm] Use none_of instead of \!any_of (NFC) 2021-12-17 13:48:57 -08:00
Kazu Hirata 2b7be47b22 [llvm] Strip redundant lambda (NFC) 2021-12-17 10:51:40 -08:00
Kazu Hirata f78c1b07cb [Target] Use range-based for loops (NFC) 2021-12-17 10:11:08 -08:00
Craig Topper be41996f4f [RISCV} Add FSGNJ_H to isAsCheapAsAMove and isCopyInstrImpl.
This matches FSGNJ_S and FSGNJ_D.
2021-12-17 09:14:20 -08:00
David Truby 7e44eb079d [AArch64][SVE] Improve code generation for VLS i1 masks
This patch partially resolves an issue for VLS code generation
where a mask is generated from a smaller width integer comparison
than the instruction using the mask requires.

Instead of sign extending a p register by converting it to a z
register, extending that, and converting back, we instead just
do an unpack of the p register.

A separate issue causes the code generation to still be poor when
the mask generation would fit in a neon register, as we then use
a neon comparison operation and have to convert that to a p register.
This will be resolved in a separate patch.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D111221
2021-12-17 16:26:49 +00:00
Matthew Devereau e00f22c1b1 [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-12-17 15:04:45 +00:00
David Truby 5c9684704d [DAG][sve] Lowering for VLS masked truncating stores
This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Reviewed By: peterwaller-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108115
2021-12-17 15:04:45 +00:00
Paul Walker 22370530a3 [NFC][SVE] Add missing tests for i32 INC/DEC patterns.
D111441 included trunc isel patterns for sve_int_pred_pattern_a
but no accompanying tests. This patch adds the missing tests and
also simplifies the isel patterns that use sve_cnt_shl_imm.

Differential Revision: https://reviews.llvm.org/D115512
2021-12-17 13:13:36 +00:00
rkorsa c680fb69d6 [AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16 bit on
The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions.
'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk.
The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand.

Differential Revision: https://reviews.llvm.org/D111754
2021-12-17 16:11:51 +05:30
David Green 6bd8f114c8 [ARM] Handle splats of constants for MVE qr instruction
Some MVE instructions have qr variants that take a Q and R register,
splatting the R register for each lane. This is usually handled fine for
standard splats as we sink the splat into the loop and combine the
resulting dup into the qr instruction. It does not work for constant
splats though, as we generate a vmovimm or constant pool load instead.

This intercepts that, generating a vdup of the constant instead where we
can turn the result into a qr instruction variant.

Differential Revision: https://reviews.llvm.org/D115242
2021-12-17 09:16:28 +00:00
Heejin Ahn 4625b84879 [WebAssembly] Support clang -fwasm-exceptions for bitcode
This supports bitcode compilation using `clang -fwasm-exceptions`.

---

The current situation:

Currently the backend requires two options for Wasm EH:
`-wasm-enable-eh` and `-exception-model=wasm`. Wasm SjLj requires two
options as well: `-wasm-enable-sjlj` and `-exception-model=wasm`. When
using Wasm EH via Emscripten, you only need to pass `-fwasm-exceptions`,
and these options will be added within the clang driver. This
description will focus on the case of Wasm EH going forward, but Wasm
SjLj's case is similar.

When you pass `-fwasm-exceptions` to emcc and clang driver, the clang
driver adds these options to the command line that calls the clang
frontend (`clang -cc1`): `-mllvm -wasm-enable-eh` and
`-exception-model=wasm`. `-wasm-enable-eh` is prefixed with `-mllvm`, so
it is passed as is to the backend. But `-exception-model` is parsed and
processed within the clang frontend and stored in `LangOptions` class.
This info is later transferred to `TargetOptions` class, and then
eventually passed to `MCAsmInfo` class. All LLVM code queries this
`MCAsmInfo` to get the exception model.

---

Problem:

The problem is the whole `LangOptions` processing is bypassed when
compiling bitcode, so the information transfer of `LangOptions` ->
`TargetOptions` -> `MCAsmInfo` does not happen. They are all set to
`ExceptionHandling::None`, which is the default value.

---

What other targets do, and why we can't do the same:

Other targets support bitcode compilation by the clang driver, but they
can do that by using different triples. For example, X86 target supports
multiple triples, each of which has its own subclass of `MCAsmInfo`, so
it can hardcode the appropriate exception model within those subclasses'
constructors. But we don't have separate triples for each exception
mode: none, emscripten, and wasm.

---

What this CL does:

If we can figure out whether `-wasm-enable-eh` is passed to the backend,
we can programatically set the exception model from the backend, rather
than requiring it to be passed.

So we check `WasmEnableEH` and `WasmEnableSjLj` variables, which are
`cl::opt` for `-wasm-enable-eh` and `-wasm-enable-sjlj`, in
`WebAssemblyMCAsmInfo` constructor, and if either of them is set, we set
`MCAsmInfo.ExceptionType` to Wasm. `TargetOptions` cannot be updated
there, so we make sure they are the same later.

Fixes https://github.com/emscripten-core/emscripten/issues/15712.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D115893
2021-12-16 16:49:24 -08:00
Ron Lieberman 8a85be807b Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"
Offload abort in Nekbone

This reverts commit 2b48761575.
2021-12-16 21:21:32 +00:00
Craig Topper 66bbefeb13 [RISCV] Revert Zfhmin related changes that aren't tested and depend on f16 being a legal type.
Our Zfhmin support is only MC layer, but these are CodeGen layer
interfaces. If f16 isn't a Legal type for CodeGen with Zfhmin, then
these interfaces should keep their non-Zfh behavior.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D115822
2021-12-16 08:55:28 -08:00
Simon Pilgrim a640f16ca2 [X86] combineAnd - don't demand operand vector elements if the other operand element is zero
If either operand has a zero element, then we don't need the equivalent element from the other operand, as no bits will be set.
2021-12-16 16:54:27 +00:00
Matt Arsenault 4132dc917e AMDGPU: Return result from indicatePessimisticFixpoint
I don't think this fixes anything.
2021-12-16 11:26:30 -05:00
Simon Pilgrim 3267de7215 [X86] combineAnd - pull out repeated getOperand() and SDLoc() calls. NFCI. 2021-12-16 16:22:39 +00:00
Simon Pilgrim 4712a71415 [X86] Rename LowerScalarImmediateShift/LowerScalarVariableShift helpers. NFC.
Rename them to LowerShiftByScalarImmediate/LowerShiftByScalarVariable to make it easier to find them wrt LowerShift()
2021-12-16 16:01:14 +00:00
Neumann Hon 9a35844990 [z/OS] Implement prologue and epilogue generation for z/OS target.
This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D114457
2021-12-16 09:04:05 -05:00
Paulo Matos c92d45913c [WebAssembly] Fix typechecking for else MCInst
When hitting an else clause the type Stack should be reset to as it was at the start of the if, without taking into account the Type inserted into the Stack during the then branch of the if.

Reviewed By: aardappel

Differential Revision: https://reviews.llvm.org/D115748
2021-12-16 11:18:01 +01:00
jacquesguan 7dfbf0b60f [RISCV] Fold (and (not (srl X, C)), 1) to (xor (bexti X, C), 1) when have Zbs extension.
When have Zbs extension, we could use bexti to fold (and (not (srl X, C)), 1) to (xor (bexti X, C), 1).

Differential Revision: https://reviews.llvm.org/D115629
2021-12-16 15:01:05 +08:00
jacquesguan d3c2ad154e [RISCV] Fix whole vector register move instruction's vector register constraint.
According to the v-spec, the source and destination VR of vmv<nr>r.v should be aligned for the VR group size.

Differential Revision: https://reviews.llvm.org/D115720
2021-12-16 10:58:55 +08:00
Matt Arsenault f0cc43cc91 AMDGPU: Use v_accvgpr_mov_b32 when copying AGPR tuples on gfx90a
This is an optimization, but also fixes a compile failure when no free
VGPRs are available. The problem still exists for gfx908 where a
scratch register is still required. This also still exists for the
SGPR to AGPR case.
2021-12-15 18:20:49 -05:00
Matt Arsenault 45f16eabd6 AMDGPU: Combine is.shared/is.private of null/undef 2021-12-15 18:20:49 -05:00
Matt Arsenault 2b48761575 AMDGPU: Remove AMDGPUFixFunctionBitcasts pass
This was a workaround for not supporting indirect calls when
instcombine didn't eliminate constant expression casts of the callee
at -O0. Indirect calls are supposed to work now, so drop the hack.
2021-12-15 18:20:48 -05:00
Arthur Eubanks 5a81a60391 [NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
2021-12-15 14:40:57 -08:00
Arnold Schwaighofer d87e617048 Teach the backend to make references to swift_async_extendedFramePointerFlags weak if it emits it
When references to the symbol `swift_async_extendedFramePointerFlags`
are emitted they have to be weak.

References to the symbol `swift_async_extendedFramePointerFlags` get
emitted only by frame lowering code. Therefore, the backend needs to track
references to the symbol and mark them weak.

Differential Revision: https://reviews.llvm.org/D115672
2021-12-15 10:02:06 -08:00
Joe Nash da9c6ea007 [AMDGPU] Extract helper function in AsmParser. NFC
NFC refactor to extract useful helper function isRegOrInline.

Reviewed By: rampitec, dp

Differential Revision: https://reviews.llvm.org/D115753

Change-Id: Ief52db9a62615c053fb5f429248657b97cb41453
2021-12-15 09:53:23 -05:00
Simon Pilgrim 52cb0bbec3 [X86] LowerRotate - use vXi8 custom lowering for non-uniform constant amounts
Instead of bailing and using the default expansion, we can more efficiently use the shl(unpack(x,x),unpack(amt,zero)) pattern for vXi8 rotl, as we'll then use vXi16 fast PMULLW (or PSLLVW).

This required some minor changes to improve constant folding during unpack shuffle creation and convertShiftLeftToScale to support constants that have already been lowered to constant pools.
2021-12-15 14:51:15 +00:00
Andrew Wei dc7b672f96 [AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw
Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114960
2021-12-15 21:53:00 +08:00
Jay Foad 54fc9eb9b3 [AMDGPU] Use v_fma_f16 on GFX10
Teach convertToThreeAddress to use the V_FMA_F16_gfx9 pseudo (i.e. the
standard instruction in GFX9 onwards) instead of V_FMA_F16 (the legacy
pseudo for GFX8 compatibility, which is no longer supported in GFX10).
This follows the example of macToMad in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D115731
2021-12-15 13:14:48 +00:00
Jay Foad 4db7422771 [AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes
Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls
v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the
same zeroing behaviour as on GFX8.

Differential Revision: https://reviews.llvm.org/D115729
2021-12-15 13:14:48 +00:00
Jay Foad 6a7db0dc8e [AMDGPU] Skip some work on subtargets without scalar stores. NFC. 2021-12-15 12:46:33 +00:00
Simon Pilgrim 36b0325c44 [X86] Enable v16i8/v32i8/v64i8 rotation on AVX512 targets
We currently rely on generic promotion to vXi16/vXi32 types for rotation lowering on various AVX512 targets.

We can more efficiently perform this by making use of the shl(unpack(x,x),amt) style pattern that we already use for vXi8 rotation by splat amounts, either by widening to a larger vector type or unpacking lo/hi halves of the subvectors so we can access whatever vXi16/vXi32 per-element shifts are supported.

This uncovered an issue in the supportedVectorShiftWithImm/supportedVectorVarShift legality checkers which was using hasAVX512() instead of useAVX512Regs() to detect support for 512-bit vector shifts.

NOTE: I'm actually hoping to eventually reuse this code for shl(unpack(y,x),amt) funnel shift lowering (vXi8 and wider), but initially I just want to ensure we have efficient ISD::ROTL lowering for all targets.

Differential Revision: https://reviews.llvm.org/D115180
2021-12-15 11:17:45 +00:00
Simon Moll 676af1272b [VE] SHL,SRA,SRL v256i32|64 isel and tests
Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115734
2021-12-15 11:32:18 +01:00
Jon Chesterfield 624f12d34f [amdgpu] Drop lowering of LDS used by global variables
Approximately revert D103431.

LDS variables are allocated at kernel launch and deallocated at kernel exit.
The address is therefore kernel execution dependent. Global variables are
initialized by values written to .data, which can't be done for a LDS variable
as there is no kernel running, or by a global constructor. Initializing the
global to the address of some LDS allocated by a global constructor is possible
but indistinguishable from undef.

Assigning the address of a LDS variable to a global should be a sema error. It
isn't for openmp, haven't checked other languages. Failing that it could be set
to undef, perhaps in this pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D115413
2021-12-14 21:59:26 +00:00
Muiez Ahmed ebf5497b26 Revert "[z/OS] Implement prologue and epilogue generation for z/OS target."
This reverts commit ffad4d777b because it introduced buildbot failures.
2021-12-14 14:22:11 -05:00
Simon Pilgrim 74d1fc742a [X86] Adjust some IceLake integer shuffle schedule classes (PR48110)
The IceLake scheduler model is still mainly a copy of the SkylakeServer model.

This patch adjusts the integer shuffle classes to account for most instructions now working on Port 1 as well as Port 5.

This is based off Agner + uops.info as well as the PR48110 report.

Differential Revision: https://reviews.llvm.org/D115547
2021-12-14 18:56:13 +00:00
Craig Topper 3926893439 [RISCV] Add isel support for scalar STRICT_FADD/FSUB/FMUL/FDIV/FSQRT.
Test that STRICT_FMINNUM/FMAXNUM are lowered to libcalls for f32/f64.
The RISC-V instructions don't match the behavior of fmin/fmax libcalls
with respect to SNaN.

Promoting FMINNUM/FMAXNUM for f16 needs more work outside of the
RISC-V backend.

Reviewed By: asb, arcbbb

Differential Revision: https://reviews.llvm.org/D115680
2021-12-14 10:50:55 -08:00
Kazu Hirata ee99426c77 [AArch64] Revise a warning fix
This patch revises the warning fix done in
a93b1792f1.  Specifically, it rolls the
MRI.getType call into the assert, thereby avoiding the named variable.
2021-12-14 10:46:57 -08:00
Craig Topper 3f1c403a2b [RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode.
In order to support constrained FP intrinsics we need to model FRM
dependency. Whether or not a instruction uses FRM is based on a 3
bit field in the instruction. Because of this we can't add
'Uses = [FRM]' to the tablegen descriptions.

This patch examines the immediate after isel and adds an implicit
use of FRM. This idea came from Roger Ferrer Ibanez.

Other ideas:
We could be overly conservative and just pretend all instructions with
frm field read the FRM register. Or we could have pseudoinstructions
for CodeGen with rounding mode.

Reviewed By: asb, frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D115555
2021-12-14 10:17:57 -08:00
Craig Topper d4d76409d1 [RISCV] Add mayRaiseFPException to RISCV scalar FP instructions.
FRM dependency will be added in a future patch.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D115540
2021-12-14 09:53:30 -08:00
Craig Topper 7598ac5ec5 [RISCV] Convert (splat_vector (load)) to vlse with 0 stride.
We already do this for splat nodes that carry a VL, but not for
splats that use VLMAX.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D115483
2021-12-14 09:14:03 -08:00
Jing Bao 2a4a229d6d [WebAssembly] Custom optimization for truncate
When possible, optimize TRUNCATE to generate Wasm SIMD narrow
instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate
lots of extract_lane and replace_lane.

Closes #50350.
2021-12-14 08:42:39 -08:00
Alexandros Lamprineas 61bb8b5d40 [AArch64] Convert sra(X, elt_size(X)-1) to cmlt(X, 0)
CMLT has twice the execution throughput of SSHR on Arm out-of-order cores.

Differential Revision: https://reviews.llvm.org/D115457
2021-12-14 16:03:02 +00:00
Matt Devereau fb47725d14 [AArch64][SVE] Instcombine SDIV to ASRD
Instcombine SDIV to ASRD when the third operand of SDIV is a power of 2

Differential Revision: https://reviews.llvm.org/D115448
2021-12-14 15:58:28 +00:00
Zaara Syeda 3f066ac648 Test commit 2021-12-14 15:37:28 +00:00
Simon Pilgrim 4f2e183229 [X86] combineOr - don't demand operand elements if the other operand element is 'allones'
If either operand has an element with allbits set, then we don't need the equivalent element from the other operand, as allbits are guaranteed to be set.
2021-12-14 15:36:33 +00:00
Simon Pilgrim a9d811405f [X86] combineOr - pull out repeated SDLoc(). NFCI. 2021-12-14 15:36:32 +00:00
David Green 26f6fbe2be [ARM] Add AddrModeT2_i8neg addressing mode support for frame lowering.
As reported from a failing firefox build, we can sometimes get frame
indices with negative offsets from a t2LDRi8. This adds support for
them, to prevent the crash.
2021-12-14 12:49:27 +00:00
Simon Moll 6847379e89 [VE] MUL,SUB,OR,XOR v256i32|64 isel
v256i32|i64 isel patterns and tests.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115643
2021-12-14 13:23:48 +01:00
Alexandros Lamprineas 65033ef9e8 [AArch64] Add a tablegen pattern for UZP1.
Converts concat_vectors(V64 (trunc V128), V64 (trunc V128)), which
would otherwise be lowered as xtn followed by xtn2, to uzp1.

Differential Revision: https://reviews.llvm.org/D115435
2021-12-14 11:51:05 +00:00
John Brawn dc9f65be45 [AArch64][SVE] Fix handling of stack protection with SVE
Fix a couple of things that were causing stack protection to not work
correctly in functions that have scalable vectors on the stack:
 * Use TypeSize when determining if accesses to a variable are
   considered out-of-bounds so that the behaviour is correct for
   scalable vectors.
 * When stack protection is enabled move the stack protector location
   to the top of the SVE locals, so that any overflow in them (or the
   other locals which are below that) will be detected.

Fixes: https://github.com/llvm/llvm-project/issues/51137

Differential Revision: https://reviews.llvm.org/D111631
2021-12-14 11:30:48 +00:00
Florian Hahn ff3b085ab0
[X86] Use bundle for CALL_RVMARKER expansion.
This patch updates expandCALL_RVMARKER to wrap the call, marker and
objc runtime call in an instruction bundle. This ensures later passes,
like machine block placement, cannot break them up.

On AArch64, the instruction sequence is already wrapped in a bundle.
Keeping the whole instruction sequence together is highly desirable for
performance and outweighs potential other benefits from breaking the
sequence up.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D115230
2021-12-14 10:53:22 +00:00
Craig Topper 3cda38796c [RISCV] Add rs2 encoding to the FPUnaryOp_r and FPUnaryOp_r_frm template arguments.
Instead of having unary instruction include a 'let' in their class
body, add rs2val as a template parameter. Then we can use a let
in FPUnaryOp_r and FPUnaryOp_r_frm. This reduces the overall
verbosity of the FP files.

Reviewed By: achieveartificialintelligence

Differential Revision: https://reviews.llvm.org/D115537
2021-12-13 21:38:42 -08:00
Nelson Chu 10a71981e9 [RISCV] Support named opcodes in .insn directive.
This patch is one of the TODO of commit, 283879793d

We build the GenericTable for these opcodes, and also extend class RISCVOpcode, to store the names of opcodes.  Then we call the parseInsnDirectiveOpcode to parse the opcode filed in .insn directive.  We only allow users to write the recognized opcode names, or just write the immediate values in the 7 bits range.

Documentation: https://sourceware.org/binutils/docs-2.37/as/RISC_002dV_002dFormats.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115224
2021-12-13 20:59:33 -08:00
Simon Atanasyan 2de9338587 [MIPS] Allow i1 values for 'r' constraint in inline-asm
The bug was reported in the issue #52638.
2021-12-14 01:19:34 +03:00
Neumann Hon ffad4d777b [z/OS] Implement prologue and epilogue generation for z/OS target.
This patch adds support for prologue and epilogue generation for
the z/OS target under the XPLINK64 ABI for functions with a stack
size of less than 1048576 bytes (huge stack frames).

Reviewed by: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D114457
2021-12-13 17:03:23 -05:00
Fangrui Song a6a07a514b [MachineOutliner] Don't outline functions starting with PATCHABLE_FUNCTION_ENTER/FENTRL_CALL
MachineOutliner may outline a "patchable-function-entry" function whose body has
a TargetOpcode::PATCHABLE_FUNCTION_ENTER MachineInstr. This is incorrect because
the special code sequence must stay unchanged to be used at run-time.
Avoid outlining PATCHABLE_FUNCTION_ENTER. While here, avoid outlining FENTRY_CALL too
(which doesn't reproduce currently) to allow phase ordering flexibility.

Fixes #52635

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D115614
2021-12-13 13:24:29 -08:00
Paulo Matos b5b5f0ac77 [WebAssembly] Lower global syms representing tables with .tabletype
This patch implements a fix to recognize global symbols that represent
WebAssembly appropriately and generate the necessary .tabletype
directives.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D115511
2021-12-13 18:17:03 +01:00
Craig Topper b18b2a01ef [RISCV] Don't use VLMAX for start value splat in reduction lowering.
The reduction instructions only reads the first element. The
execution time for a splat may take longer with a larger VL.
We should use the smallest VL we can.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D115536
2021-12-13 09:06:42 -08:00
Kirill Stoimenov 89577be895 [ASan] Replace IR based callbacks with shared assembly code callbacks.
This change moves optimized callbacks from each .o file to compiler-rt.

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D115396
2021-12-13 16:33:06 +00:00
Nikita Popov 220815a91a [AMDGPUPerfHintAnalysis] Avoid getPointerElementType()
Extract the load/store type from the instruction rather than
fetching it from the pointer element type.
2021-12-13 16:48:21 +01:00
Neubauer, Sebastian 26924b57e8 [AMDGPU] Ignore special ABI registers for graphics
Fixed ABI arguments are compute specific and should not be added to
graphics shaders or functions, so do not try to add them.

Differential Revision: https://reviews.llvm.org/D115344
2021-12-13 16:44:37 +01:00
Jay Foad 16de2c09dd [AMDGPU] SIShrinkInstructions: sink code to where it's used. NFC. 2021-12-13 14:46:40 +00:00
Jay Foad 63681527ee [AMDGPU] SIShrinkInstructions: remove redundant check
canShrink already calls hasVALU32BitEncoding, so there is no need
to call it again here.
2021-12-13 14:46:40 +00:00
Jay Foad 61f8af2657 [AMDGPU] Remove a FIXME implemented in D11061 2021-12-13 14:46:40 +00:00
Daniil Fukalov e5c64b45be [CostModel][AMDGPU] Fix intrinsics costs estimations.
1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs.
2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat
   intrisics since they have special processing in cost model.
3. Minor intrisics' costs tests updat and refinement.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D115385
2021-12-13 17:17:34 +03:00
Peter Waller 921e89c59a [SVE] Only combine (fneg (fma)) => FNMLA with nsz
-(Za + Zm * Zn) != (-Za + Zm * (-Zn))
when the FMA produces a zero output (e.g. all zero inputs can produce -0
output)

Add a PatFrag to check presence of nsz on the fneg, add tests which
ensure the combine does not fire in the absense of nsz.

See https://reviews.llvm.org/D90901 for a similar discussion on X86.

Differential Revision: https://reviews.llvm.org/D109525
2021-12-13 11:33:07 +00:00
Kazushi (Jam) Marukawa cffce86a1c [VE] Support srel32 in symbol reference
Support R_VE_SREL32 in symbol references in MC layer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D115591
2021-12-13 20:29:17 +09:00
Matt Devereau 2e585dd91a [AArch64][SVE] Lower vector.insert to predicated merged MOV
Use predicated SEL for vector.insert instead of going through memory

Differential Revision: https://reviews.llvm.org/D115259
2021-12-13 11:17:55 +00:00
Simon Moll 9feeb2fb61 [VE][NFC] Cleanup vector patterns
Cleanup VE vector isel patterns and follow the downstream LLVM-VE
pattern naming convention.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115516
2021-12-13 10:12:27 +01:00
Kazu Hirata bb6447a78c [llvm] Use llvm::reverse (NFC) 2021-12-12 16:13:49 -08:00
Simon Pilgrim 9ad5969b5e [X86][Atom] Fix CVT uops + port usage
Fix overrides to use both ports. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.
2021-12-12 22:57:53 +00:00
Jon Chesterfield 24b28db8cc [amdgpu] Increase alignment of all LDS variables
Currently the superalign option only increases the alignment of
variables that are moved into the module.lds block. Change that to all LDS
variables. Also only increase the alignment once, instead of once per function.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D115488
2021-12-12 19:30:32 +00:00
Kazu Hirata d2377f24e1 Ensure newlines at the end of files (NFC) 2021-12-12 11:04:44 -08:00
Simon Pilgrim 41052fd699 [X86][MMX] Remove superfluous 'i' from MMX cvt opnames. NFCI.
This is a very old copy+paste typo - none of these cvt ops have an immediate operand.

Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.
2021-12-12 17:59:16 +00:00
Simon Pilgrim 0a08813cad [X86][MMX] Remove superfluous 'i' from MMX binop opnames. NFCI.
This is a very old copy+paste typo - none of these binops have an immediate operand.

Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.
2021-12-12 17:59:16 +00:00
Kazu Hirata 483499670e [Target] Use llvm::reverse (NFC) 2021-12-12 08:34:24 -08:00
Simon Pilgrim c02f9791c6 [X86][AVX512] Remove xmm->xmm vpmovsx/vpmovzx rm overrides
The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX.Folded
2021-12-12 16:08:10 +00:00
Simon Pilgrim fc02ceb12a [X86][AVX512] Use WriteShuffleX for xmm->xmm extensions
The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX
2021-12-12 15:22:32 +00:00
Simon Pilgrim 8e833d081b [X86][AVX512] Use WriteVPMOV256 sched class for all truncations/extensions.
At the moment these are identical to WriteShuffle256 (which we were using), but it should be WriteVPMOV256 to match the AVX2 instruction, plus it will help us remove some unnecessary overrides by tweaking the WriteVPMOV256 class.

Also, as D115547 shows, we still need to split off 128-bit extensions/truncations sched classes to remove some other overrides.
2021-12-12 13:24:26 +00:00
Kazu Hirata 67aeae0138 [llvm] Use range-based for loops (NFC) 2021-12-11 22:34:07 -08:00
Kazu Hirata c2bb9637d9 Use llvm::any_of and llvm::all_of (NFC) 2021-12-11 11:54:37 -08:00
Kazu Hirata 36b8a4f9f3 [llvm] Use llvm::is_contained (NFC) 2021-12-11 11:42:09 -08:00
Kazu Hirata d395befa65 [llvm] Use range-based for loops (NFC) 2021-12-11 11:29:12 -08:00
Matt Arsenault 6bcf1f9181 AMDGPU: Indicate pessimistic fixpoint for entry functions
There aren't going to be any callers for these, so avoid running
through the machinery to look at the callers.
2021-12-11 11:42:34 -05:00
Kirill Stoimenov a55e51f9a6 Revert "[ASan] Replace IR based callbacks with shared assembly code callbacks."
This reverts commit db124df495.

Broke some builds:
https://lab.llvm.org/buildbot/#/builders/98/builds/9895
https://lab.llvm.org/buildbot/#/builders/91/builds/434

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D115564
2021-12-11 00:42:18 +00:00
Matt Arsenault 06b90175e7 AMDGPU: Remove fixed function ABI option 2021-12-10 19:41:19 -05:00
Jon Chesterfield 86caf517bf Revert "[amdgpu][nfc] Delete dead code in LowerModuleLDS"
This reverts commit 7b9ab06d10.
Said code is better removed as part of a larger change.
2021-12-11 00:31:51 +00:00
Kirill Stoimenov db124df495 [ASan] Replace IR based callbacks with shared assembly code callbacks.
This change moves optimized callbacks from each .o file to compiler-rt.

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D115396
2021-12-11 00:02:32 +00:00
Bogdan Graur ea81cea816 Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr"
This reverts commit 847a680733.

The reverted revision was causing miscompiles that manifest on AMD
machines.

Differential Revision: https://reviews.llvm.org/D115528
2021-12-10 23:01:24 +01:00
Craig Topper 80ed2f6b36 [RISCV] Share tablegen classes for F, D, and Zfh. Other simplifications. NFC
By adding the register class and funct as template parameters we
can share the classes with all 3 extensions.

I've used "let SchedRW =" to avoid repeating scheduler classes on
multiple lines where we previously inherited from the Sched class.

A subsequent patch will add mayRaiseFPException and FRM dependencies.
Reducing the number of classes means less repeating for those changes.

This of course conflicts with the Zfinx patch D93298.

Reviewed By: achieveartificialintelligence

Differential Revision: https://reviews.llvm.org/D115469
2021-12-10 09:35:51 -08:00
Craig Topper 5861cf77da [RISCV] Remove FCSR from RISCVRegisterInfo.
We only used this to mark it as a reserved register. But that's not
important if we don't do anything else with it.

I think if we were ever to do anything with it, we would need to
model it as a super register of FRM and FFLAGS. But it might be
easier to reference both FRM and FFLAGS in implicit defs/uses
for anything we were to do with "fcsr".

Reviewed By: sepavloff

Differential Revision: https://reviews.llvm.org/D115455
2021-12-10 09:24:13 -08:00
Kazu Hirata a93b1792f1 [AArch64] Fix a warning
This patch fixes:

  llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp:315:7:
  error: unused variable 'ValTy' [-Werror,-Wunused-variable]
2021-12-10 08:33:07 -08:00
Archibald Elliott 52faad83c9 [AArch64] Use Feature for A53 Erratum 835769 Fix
When this pass was originally implemented, the fix pass was enabled
using a llvm command-line flag. This works fine, except in the case of
LTO, where the flag is not passed into the linker plugin in order to
enable the function pass in the LTO backend.

Now LTO exists, the expectation now is to use target features rather
than command-line arguments to control code generation, as this ensures
that different command-line arguments in different files are correctly
represented, and target-features always get to the LTO plugin as they
are encoded into LLVM IR.

The fall-out of this change is that the fix pass has to always be added
to the backend pass pipeline, so now it makes no changes if the function
does not have the right target feature to enable it. This should make a
minimal difference to compile time.

One advantage is it's now much easier to enable when compiling for a
Cortex-A53, as CPUs imply their own individual sets of target-features,
in a more fine-grained way. I haven't done this yet, but it is an
option, if the fix should be enabled in more places.

Existing tests of the user interface are unaffected, the changes are to
reflect that the argument is now turned into a target feature.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D114703
2021-12-10 15:09:59 +00:00
Min-Yih Hsu d2b68c4476 [M68k][NFC] Fixed unused argument warnings in M68kInstrControl.td
Removed those unused template arguments. NFC.
2021-12-10 22:06:29 +08:00
Brian Cain 1e68c79987 Reapply [xray] add support for hexagon
Adds x-ray support for hexagon to llvm codegen, clang driver,
compiler-rt libs.

Differential Revision: https://reviews.llvm.org/D113638

Reapplying this after 543a9ad7c4,
which fixes the leak introduced there.
2021-12-10 05:32:28 -08:00
Christudasan Devadasan cf58b9ce98 [AMDGPU] Add AV class spill pseudo instructions
While enabling vector superclasses with D109301,
the AV spills are converted into VGPR spills by
introducing appropriate copies. The whole thing
ended up adding two instructions per spill (a copy
+ vgpr spill pseudo) and caused an incorrect
liverange update during inline spiller.

This patch adds the pseudo instructions for all
AV spills from 32b to 1024b and handles them in
the way all other spills are lowered.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115439
2021-12-10 03:10:34 -05:00
eopXD a4bf1b449d [RISCV] Unify depedency check and extension implication parsing logics
Originially there are two places that does parsing - `parseArchString` and
`parseFeatures`, each with its code on dependency check and implication.
This patch extracts common parts of the two  as functions of `RISCVISAInfo`
and let them 2 use it.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D112359
2021-12-09 21:16:04 -08:00
Kazu Hirata f829630d2e [llvm] Use llvm::count (NFC) 2021-12-09 20:50:38 -08:00
Amara Emerson 98095afbcb [AArch64][GlobalISel] Split vector stores of zero.
This results in a very minor improvement in most cases, generating
stores of xzr instead of moving zero to a vector register.

Differential Revision: https://reviews.llvm.org/D115479
2021-12-09 19:04:48 -08:00
Phoebe Wang d7c07f60b3 [X86][MS-InlineAsm] Make the constraint *m to be simple place holder
D113096 solved the "undefined reference to xxx" issue by adding
constraint *m for the global var. But it has strong side effect due to
the symbol in the assembly being replaced with constraint variable.
This leads to some lowering fails. https://godbolt.org/z/h3nWoerPe

This patch fix the problem by use the constraint *m as place holder
rather than real constraint. It has negligible effect for the existing
code generation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D115225
2021-12-10 09:29:38 +08:00
Jon Chesterfield 7b9ab06d10 [amdgpu][nfc] Delete dead code in LowerModuleLDS 2021-12-10 00:43:46 +00:00
Jessica Paquette afdec434d4 [AArch64][GlobalISel] Add regbankselect support for G_FMAXIMUM/G_FMINIMUM
These always use FPRs only.

Differential Revision: https://reviews.llvm.org/D115376
2021-12-09 12:52:32 -08:00
Craig Topper b3db7dde79 [TargetInstrInfo][PowerPC] Remove virtual function that is only called from PPC specific code.
There are two signatures of setSpecialOperandAttr in TargetInstrInfo.
One of them is only called from PPCInstrInfo which has an override
of it.

Remove it from TargetInstrInfo and make it a non-virtual method in
PPCInstrInfo.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D115404
2021-12-09 12:51:27 -08:00
Jessica Paquette 47e1f672e1 [AArch64][GlobalISel] Legalize scalar G_FMAXIMUM + G_FMINIMUM
Necessary for implementing some combines on floating point selects.

Differential Revision: https://reviews.llvm.org/D115372
2021-12-09 11:54:14 -08:00
Craig Topper 6f7de819b9 [RISCV] Use MULHU for more division by constant cases.
D113805 improved handling of i32 divu/remu on RV64. The basic idea
from that can be extended to (mul (and X, C2), C1) where C2 is any
mask constant.

We can replace the and with an SLLI by shifting by the number of
leading zeros in C2 if we also shift C1 left by XLen - lzcnt(C1)
bits. This will give the full product XLen additional trailing zeros,
putting the result in the output of MULHU. If we can't use ANDI,
ZEXT.H, or ZEXT.W, this will avoid materializing C2 in a register.

The downside is it make take 1 additional instruction to create C1.
But since that's not on the critical path, it can hopefully be
interleaved with other operations.

The previous tablegen pattern is replaced by custom isel code.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D115310
2021-12-09 09:10:14 -08:00
Jon Chesterfield 04b2f6ea8a [amdgpu][nfc] Drop dead PtrSet, fix a comment 2021-12-09 17:05:20 +00:00
Arthur Eubanks 1172712f46 [NFC] Replace some deprecated getAlignment() calls with getAlign()
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D115370
2021-12-09 08:43:19 -08:00
David Sherwood 8b0448ce5d [AArch64][Analysis] Add on overhead costs for SVE gathers and scatters
This patch adds on an overhead cost for gathers and scatters, which
is a rough estimate based on performance investigations I have
performed on SVE hardware for various micro-benchmarks.

Differential Revision: https://reviews.llvm.org/D115143
2021-12-09 16:02:59 +00:00