Commit Graph

60949 Commits

Author SHA1 Message Date
Luo, Yuanke 20013d02f3 [X86][AMX] Fix tile config register spill issue.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
Besides we also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.
             ldtilecfg
             /       \
            BB1      BB2
            /         \
           call       BB3
           /           \
       %1=tileload   %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.

Differential Revision: https://reviews.llvm.org/D94155
2021-01-21 16:01:50 +08:00
madhur13490 dd8ae42674 [IndirectFunctions] Skip propagating attributes to address taken functions
In case of indirect calls or address taken functions,
skip propagating any attributes to them. We just
propagate features to such functions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94585
2021-01-21 07:04:28 +00:00
Kazu Hirata 8f5da41c4d [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-20 21:35:52 -08:00
Max Kazantsev d6bb96e677 [X86] Add experimental option to separately tune alignment of innermost loops
We already have an experimental option to tune loop alignment. Its impact
is very wide (and there is a suspicion that it's not always profitable). We want
to have something more narrow to play with. This patch adds similar option that
overrides preferred alignment for innermost loops. This is for experimental
purposes, default values do not change the existing behavior.

Differential Revision: https://reviews.llvm.org/D94895
Reviewed By: pengfei
2021-01-21 11:15:16 +07:00
Hsiangkai Wang a8b96eadfd [RISCV] Implement vssseg intrinsics.
Define vlsseg intrinsics and pseudo instructions. Lower vlsseg
intrinsics to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94863
2021-01-21 11:51:35 +08:00
Hsiangkai Wang e5e329023b [RISCV] Implement vlsseg intrinsics.
Define vlsseg intrinsics and pseudo instructions. Lower vlsseg intrinsics
to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94763
2021-01-21 11:51:35 +08:00
Hsiangkai Wang 47228f7854 [RISCV] Implement vsseg intrinsics.
Define vsseg intrinsics and pseudo instructions. Lower vsseg intrinsics
to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94688
2021-01-21 11:51:35 +08:00
Craig Topper e996f1d419 [RISCV] Add another isel pattern for slliu.w.
Previously we only matched (and (shl X, C1), 0xffffffff << C1)
which matches the InstCombine canonicalization order. But its
possible to see (shl (and X, 0xffffffff), C1) if the pattern
is introduced in SelectionDAG. For example, through expansion of
a GEP.
2021-01-20 14:54:40 -08:00
Thomas Lively 11802eced5 [WebAssembly] Prototype new f64x2 conversions
As proposed in https://github.com/WebAssembly/simd/pull/383.

Differential Revision: https://reviews.llvm.org/D95012
2021-01-20 11:28:06 -08:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Hans Wennborg a51226057f Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d69.
2021-01-20 20:06:55 +01:00
Craig Topper 9d792fef57 [RISCV] Remove unnecessary APInt copy. NFC
getAPIntValue returns a const APInt& so keep it as a reference.
2021-01-20 10:33:09 -08:00
Simon Pilgrim b8b5e87e6b [X86][AVX] Handle vperm2x128 shuffling of a subvector splat.
We already handle "vperm2x128 (ins ?, X, C1), (ins ?, X, C1), 0x31" for shuffling of the upper subvectors, but we weren't dealing with the case when we were splatting the upper subvector from a single source.
2021-01-20 18:16:33 +00:00
Fangrui Song 36e62b1ff7 [AArch64] Fix -Wunused-but-set-variable in GCC -DLLVM_ENABLE_ASSERTIONS=off build 2021-01-20 10:14:11 -08:00
Albion Fung 719b563ecf [PowerPC][Power10] Exploit splat instruction xxsplti32dx in Power10
Exploits the instruction xxsplti32dx.

It can be used to materialize any 64 bit scalar/vector splat by using two instances, one for the upper 32 bits and the other for the lower 32 bits. It should not materialize the cases which can be materialized by using the instruction xxspltidp.

Differential Revision: https://https://reviews.llvm.org/D90173
2021-01-20 12:55:52 -05:00
Craig Topper b11b6ab3e0 [RISCV] Add way to mark CompressPats that should only be used for compressing.
There can be muliple patterns that map to the same compressed
instruction. Reversing those leads to multiple ways to uncompress
an instruction, but its not easily controllable which one will
be chosen by the tablegen backend.

This patch adds a flag to mark patterns that should only be used
for compressing. This allows us to leave one canonical pattern
for uncompressing.

The obvious benefit of this is getting c.mv to uncompress to
the addi patern that is aliased to the mv pseudoinstruction. For
the add/and/or/xor/li patterns it just removes some unreachable
code from the generated code.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D94894
2021-01-20 09:20:15 -08:00
Simon Pilgrim cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Simon Pilgrim 19d02842ee [X86][AVX] Fold extract_subvector(VSRLI/VSHLI(x,32)) -> VSRLI/VSHLI(extract_subvector(x),32)
As discussed on D56387, if we're shifting to extract the upper/lower half of a vXi64 vector then we're actually better off performing this at the subvector level as its very likely to fold into something.

combineConcatVectorOps can perform this in reverse if necessary.
2021-01-20 14:34:54 +00:00
Amanieu d'Antras 21bfd068b3 [AArch64] Add support for the GNU ILP32 ABI
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.

The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.

There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.

This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D94143
2021-01-20 13:34:47 +00:00
Mark Murray cab20f6105 [AArch64] Add missing "flagm" feature to the .arch_extension directive.
Depends on D94970

Differential Revision: https://reviews.llvm.org/D94971
2021-01-20 11:57:39 +00:00
Mark Murray f344c028de [AArch64] Add missing "pauth" feature to the .arch_extension directive.
Differential Revision: https://reviews.llvm.org/D94970
2021-01-20 11:57:39 +00:00
Mirko Brkusanin a6a72dfdf2 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Petar Avramovic 4ab704d628 [AMDGPU][MC] Add tfe disassembler support MIMG opcodes
With tfe on there can be a vgpr write to vdata+1.
Add tablegen support for 5 register vdata store.
This is required for 4 register vdata store with tfe.

Differential Revision: https://reviews.llvm.org/D94960
2021-01-20 10:37:09 +01:00
Gabriel Hjort Åkerlund 2aeaaf841b [GlobalISel] Add missing operand update when copy is required
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D91244
2021-01-20 10:32:52 +01:00
Bill Wendling e22295385c [X86] Add segment and address-size override prefixes
X86 allows for the "addr32" and "addr16" address size override prefixes.
Also, these and the segment override prefixes should be recognized as
valid prefixes.

Differential Revision: https://reviews.llvm.org/D94726
2021-01-19 23:54:31 -08:00
Hsiangkai Wang 8ca4b174d7 [RISCV] Implement vlseg intrinsics.
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,

when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...

We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.

The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.

Differential Revision: https://reviews.llvm.org/D94229
2021-01-20 14:26:04 +08:00
Kazu Hirata 8857202489 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
ShihPo Hung 4dae2247fd [RISCV] refactor VPatBinary (NFC)
Make it easier to reuse for intrinsic vrgatherei16
which needs to encode both LMUL & EMUL in the instruction name,
like PseudoVRGATHEREI16_VV_M1_M1 and PseudoVRGATHEREI16_VV_M1_M2.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94951
2021-01-19 19:09:56 -08:00
Sam Clegg 96ef4f307d Revert "[WebAssembly] call_indirect issues table number relocs"
This reverts commit 418df4a6ab.

This change broke emscripten tests, I believe because it started
generating 5-byte a wide table index in the call_indirect instruction.
Neither v8 nor wabt seem to be able to handle that.  The spec
currently says that this is single 0x0 byte and:

"In future versions of WebAssembly, the zero byte occurring in the
encoding of the call_indirectcall_indirect instruction may be used to
index additional tables."

So we need to revisit this change.  For backwards compat I guess
we need to guarantee that __indirect_function_table is always at
address zero.   We could also consider making this a single-byte
relocation with and assert if have more than 127 tables (for now).

Differential Revision: https://reviews.llvm.org/D95005
2021-01-19 15:06:07 -08:00
Craig Topper e75a4b6ea9 [RISCV] Remove NotHasStdExtZbb predicate from zext.h/sext.b/sext.h InstAliases. NFC
NotHasStdExtZbb doesn't have an AssemblerPredicate associated with it
so it didn't do anything. We don't need it either because the sorting
rules in tablegen prioritize by number of predicates. So the
dedicated instructions in the B extension that have predicates
will be prioritized automatically.
2021-01-19 14:31:48 -08:00
Craig Topper ce8b3937dd [RISCV] Add DAG combine to turn (setcc X, 1, setne) -> (setcc X, 0, seteq) if we can prove X is 0/1.
If we are able to compare with 0 instead of 1, we might be able
to fold the setcc into a beqz/bnez.

Often these setccs start life as an xor that gets converted to
a setcc by DAG combiner's rebuildSetcc. I looked into a detecting
(xor X, 1) and converting to (seteq X, 0) based on boolean contents
being 0/1 in rebuildSetcc instead of using computeKnownBits. It was
very perturbing to AMDGPU tests which I didn't look closely at.
It had a few changes on a couple other targets, but didn't seem
to be much if any improvement.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D94730
2021-01-19 11:21:48 -08:00
Brendon Cahoon 57443bfb4a [Hexagon] Fix segment start to adjust for gaps between segments
The Hexagon Vector Combine pass genertes stores for a complete
aligned vector. The start of each section is a multiple of the
vector size, so that value is passed to normalize to compute
the offset of the stores in the section.  The first store may
not occur at offset 0 when there is a gap between sections.
2021-01-19 12:49:39 -06:00
Jay Foad 18cb7441b6 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
Fraser Cormack 9c6a00fe99 [RISCV] Add ISel patterns for scalable mask exts & truncs
Original patch by @rogfer01.

This patch adds support for sign-, zero-, and any-extension from
scalable mask vector types to integer vector types, as well as
truncation in the opposite direction.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94590
2021-01-19 18:13:15 +00:00
David Green 6a563eef13 [ARM] Expand vXi1 VSELECT's
We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as
expanded to turn them into a series of logical operations.

Differential Revision: https://reviews.llvm.org/D94946
2021-01-19 17:56:50 +00:00
Fraser Cormack 15fd6bae0e [RISCV] Extend RVV VType info with the type's AVL (NFC)
This patch factors out the "VLMax" operand passed to most
scalable-vector ISel patterns into a property of each VType.

This is seen as a preparatory change to allow RVV in the future to
more easily support fixed-length vector types with constrained vector
lengths, with the AVL operand set to the length of the fixed-length
vector. It has no effect on the scalable code generation path.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94594
2021-01-19 15:46:56 +00:00
David Green f373b30923 [ARM] Add MVE add.sat costs
This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs,
based on when the instruction is legal. With smaller than legal types
that are promoted we generate shr(qadd(shl, shl)), so the cost is 4
appropriately.

Differential Revision: https://reviews.llvm.org/D94958
2021-01-19 15:38:46 +00:00
Victor Huang 909d6c86ea [PowerPC] Fix the check for the instruction using FRSP/XSRSP output register
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.

We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
  FRSP/XSRSP. which stays in a undef status.

This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.

Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
2021-01-19 09:20:03 -06:00
Tim Northover 6259fbd8b6 AArch64: add apple-a14 as a CPU
This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to
Clang. A bit weird, but the best way for things like xnu to detect the new
features it cares about.
2021-01-19 14:04:53 +00:00
Caroline Concatto 172f1f8952 [AArch64][SVE]Add cost model for vector reduce for scalable vector
This patch computes the cost for vector.reduce<operand> for scalable vectors.
The cost is split into two parts:  the legalization cost and the horizontal
reduction.

Differential Revision: https://reviews.llvm.org/D93639
2021-01-19 11:54:16 +00:00
Simon Pilgrim 5626adcd6b [X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c))
If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.
2021-01-19 11:04:13 +00:00
Jay Foad 49dce85584 [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00
2021-01-19 10:39:56 +00:00
Yvan Roux 244ad228f3 [ARM][MachineOutliner] Add stack fixup feature
This patch handles cases where we have to save/restore the link register
into the stack and and load/store instruction which use the stack are
part of the outlined region. It checks that there will be no overflow
introduced by the new offset and fixup these instructions accordingly.

Differential Revision: https://reviews.llvm.org/D92934
2021-01-19 10:59:09 +01:00
Fraser Cormack c81ea9429f [RISCV] Add scalable-vector integer extension patterns
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94694
2021-01-19 09:30:36 +00:00
Tres Popp 170199f562 [llvm][nvptx] add atomicity to counter in ISelLowering
Previously uniqueCallSite could have race conditions between different
threads. Now it is accessed with an atomic RMW and will be unique
between different threads.

Differential Revision: https://reviews.llvm.org/D94784
2021-01-19 10:20:20 +01:00
Andy Wingo 418df4a6ab [WebAssembly] call_indirect issues table number relocs
This patch changes to make call_indirect explicitly refer to the
corresponding function table, residualizing TABLE_NUMBER relocs against
it.

With this change, wasm-ld now sees all references to tables, and can
link multiple tables.

Differential Revision: https://reviews.llvm.org/D90948
2021-01-19 09:32:45 +01:00
ShihPo Hung 9cf511aa08 [RISCV] Add intrinsics for vector AMO operations
Add vamoswap, vamoadd, vamoxor, vamoand, vamoor,
    vamomin, vamomax, vamominu, vamomaxu intrinsics.

Reviewed By: craig.topper, khchen

Differential Revision: https://reviews.llvm.org/D94589
2021-01-18 23:11:10 -08:00
Nemanja Ivanovic 61f69153e8 [PowerPC] Sign extend comparison operand for signed atomic comparisons
As of 8dacca943a, we sign extend the atomic loaded
operand for signed subword comparisons. However, the assumption that the other
operand is correctly sign extended doesn't always hold. This patch sign extends
the other operand if it needs to be sign extended.

This is a second fix for https://bugs.llvm.org/show_bug.cgi?id=30451

Differential revision: https://reviews.llvm.org/D94058
2021-01-18 21:19:25 -06:00
Sanjay Patel d27bb5c375 [x86] add cast to avoid compile-time warning; NFC 2021-01-18 17:47:04 -05:00
Craig Topper 1c31459153 [RISCV] Remove empty Sched instantiations from the end of InstAlias defs. NFCI
InstAliases don't need scheduling information so I'm not sure what
these lines were even doing. Especially since the records don't
have names.
2021-01-18 14:08:29 -08:00