Commit Graph

64734 Commits

Author SHA1 Message Date
Simon Pilgrim 82e0eb22af [X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI.
This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.
2021-11-02 18:04:35 +00:00
Fraser Cormack d065b03801 [RISCV] Optimize vp.load with an all-ones mask
Similar to D110206, this patch optimizes unmasked vp.load intrinsics to
avoid the need of a vmset instruction to set the mask. It does so by
selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113022
2021-11-02 17:23:39 +00:00
Jay Foad be1a8f8834 [AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731
2021-11-02 15:03:37 +00:00
Matt 895145aacb Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA"
This reverts commit fc28a2f8ce.
2021-11-02 14:56:01 +00:00
Simon Pilgrim e173631dd1 [X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI.
Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.
2021-11-02 14:07:22 +00:00
Simon Pilgrim 8ca666a280 [X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI. 2021-11-02 14:07:21 +00:00
Martin Liska c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
David Callahan 4ec1b8eeac [RISCV] Fix invalid kill on callee save
A callee save may be live (specifically X1) on entry and so a spill
should not mark it killed.

Differential Revision: https://reviews.llvm.org/D111285
2021-11-02 11:56:54 +00:00
Wouter van Oortmerssen ac65366485 [WebAssembly] support "return" and unreachable code in asm type checker
To support return (it not being supported well was the ground cause for
https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have
at least a basic notion of unreachable, which in this case just means to stop
type checking until there is an end_block (an incoming control flow edge).
This is conservative (may miss on some type checking opportunities) but is
simple and an improvement over what we had before.

Differential Revision: https://reviews.llvm.org/D112953
2021-11-01 15:42:58 -07:00
Yonghong Song f63405f6e3 BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin
Commit acabad9ff6 ("[InstCombine] try to canonicalize icmp with
trunc op into mask and cmp") added a transformation to
convert "(conv)a < power_2_const" to "a & <const>" in certain
cases and bpf kernel verifier has to handle the resulted code
conservatively and this may reject otherwise legitimate program.

This commit tries to prevent such a transformation. A bpf backend
builtin llvm.bpf.compare is added. The ICMP insn, which is subject to
above InstCombine transformation, is converted to the builtin
function. The builtin function is later lowered to original ICMP insn,
certainly after InstCombine pass.

With this change, all affected bpf strobemeta* selftests are
passed now.

Differential Revision: https://reviews.llvm.org/D112938
2021-11-01 14:46:20 -07:00
Cameron McInally 702fd3d323 [SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive.
For NEON, FMA matching is done in the MachineCombiner, and not the
DAGCombiner. That causes problems with VLS lowering, since the
vectors are fixed width at the DAGCombiner, but are scalable in
the MachineCombiner. This patch corrects it by matching FMAs for
VLS vectors in the DAGCombiner.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112557
2021-11-01 10:43:52 -07:00
Kazu Hirata d000431fb2 [X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC)
Note that the identically named class is defined in an anonymous
namespace in X86ELFObjectWriter.cpp.
2021-11-01 08:31:54 -07:00
Jay Foad 7afef22926 [AMDGPU] Use MachineInstrBuilder::addReg. NFC. 2021-11-01 15:29:51 +00:00
Jay Foad 2b548b18c1 [AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32
Differential Revision: https://reviews.llvm.org/D112917
2021-11-01 13:55:53 +00:00
Mubashar Ahmad 0b83a18a2b [AArch64] Enablement of Cortex-X2
Enables support for Cortex-X2 cores.

Differential Revision: https://reviews.llvm.org/D112459
2021-11-01 11:55:24 +00:00
Simon Pilgrim 6fc50e531d [CostModel][X86] Remove old FIXME comments for AVX512F vector splitting
Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.
2021-11-01 11:11:11 +00:00
Simon Pilgrim fd485d8cda [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations
The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances.

This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source.

There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses?

Noticed while investigating the quality of interleaved load/store codegen.

Differential Revision: https://reviews.llvm.org/D111960
2021-11-01 10:45:50 +00:00
Kazu Hirata 476e1ee3da [AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC) 2021-10-31 22:58:56 -07:00
Chen Zheng eeed1545b2 [PowerPC] turn off chain commoning by default. 2021-11-01 04:11:10 +00:00
Zi Xuan Wu cf78715cae [CSKY] First patch to construct codegen infra and generate first add instruction
Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully.

Differential Revision: https://reviews.llvm.org/D112206
2021-11-01 10:06:56 +08:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Kazu Hirata 72710af233 [CodeGen, Target] Use MachineBasicBlock::terminators (NFC) 2021-10-31 07:57:34 -07:00
Kazu Hirata 5970249439 [Hexagon] Remove chksetELFHeaderEFlags (NFC)
The function was introduced without any use on Nov 9, 2015 in commit
7cd0892729.
2021-10-30 08:43:43 -07:00
Kazu Hirata c3d63a0697 [Hexagon] Remove ValidArch (NFC)
This function seems to be unused for at least one year.
2021-10-30 08:43:41 -07:00
Kazu Hirata c5cd371cc9 [Hexagon] Remove unused struct InstTy (NFC) 2021-10-30 08:43:39 -07:00
Christudasan Devadasan aa2d3b59ce GlobalISel/Utils: Use incoming regbank while constraining the superclasses
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.

This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112323
2021-10-30 07:20:45 -04:00
Stanislav Mekhanoshin e5340ed30c [AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is
regbank selected for the first time. Defer caching of usesAGPRs()
in this case.

Differential Revision: https://reviews.llvm.org/D112644
2021-10-29 14:23:14 -07:00
Sam Clegg 3b039c68f2 Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass"
This reverts commit a66451ebbe.

This caused a failure when integrated with emscripten:
https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview
2021-10-29 13:34:18 -07:00
Nick Desaulniers 39e5dd113f [SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4
These compiler-rt-only symbols aren't available in libgcc.  Similar to
D108842, D108844, and D108926.

Fixes: pr/52043

Reviewed By: craig.topper, rengolin

Differential Revision: https://reviews.llvm.org/D112750
2021-10-29 13:14:09 -07:00
Sanjay Patel 285b8abce4 [x86] limit vector increment fold to allow load folding
The tests are based on the example from:
https://llvm.org/PR52032

I suspect that it looks worse than it actually is. :)
That is, llvm-mca says there's no uop/timing difference with the
load folding and pcmpeq vs. broadcast on Haswell (and probably
other targets).
The load-folding definitely makes the code smaller, so it's good
for that at least. So this requires carving a narrow hole in the
transform to get just this case without changing others that look
good as-is (in other words, the transform still seems good for
most examples).

Differential Revision: https://reviews.llvm.org/D112464
2021-10-29 15:48:35 -04:00
Sanjay Patel 837518d6a0 [x86] make mayFold* helpers visible to more files; NFC
The first function is needed for D112464, but we might
as well keep these together in case the others can be
used someday.
2021-10-29 15:48:35 -04:00
Amara Emerson 5dd9e019dd [AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added.
rdar://84674985
2021-10-29 11:47:00 -07:00
Matt Morehouse 33cc0cfd46 [X86] Don't affect jump tables under +tagged-globals.
`classifyLocalReference(nullptr)` is called to get the appropriate
relocation type for jump tables.  We should not use @GOTPCREL for this
case.

The new test cases trigger assertions without this patch.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D112832
2021-10-29 10:37:43 -07:00
Craig Topper aefcd59895 [RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better.
If the VL operand of a mask register instruction comes from an
explicit vsetvli with a different VTYPE, we can still avoid needing
a vsetvli as long as the SEW/LMUL ratio is the same and policy bits
match.

Differential Revision: https://reviews.llvm.org/D112762
2021-10-29 09:49:36 -07:00
Simon Pilgrim 6102e5d56b [CostModel][X86] Remove old TODO comment
BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c
2021-10-29 17:28:45 +01:00
Bradley Smith 86972f1114 [AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes
Add support for generating TargetFrameIndex in complex patterns for
indexed addressing modes in SVE. Additionally, add missing load/stores
to getMemOpInfo and getLoadStoreImmIdx.

Differential Revision: https://reviews.llvm.org/D112617
2021-10-29 14:44:16 +00:00
Jay Foad 1b758925ad [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791
2021-10-29 15:02:58 +01:00
Jay Foad 21a1d4cf71 [AMDGPU] Change numBitsSigned for simplicity and document it. NFC.
Change numBitsSigned to return the minimum size of a signed integer that
can hold the value. This is different by one from the previous result
but is more consistent with numBitsUnsigned. Update all callers. All
callers are now more consistent between the signed and unsigned cases,
and some callers get simpler, especially the ones that deal with
quantities like numBitsSigned(LHS) + numBitsSigned(RHS).

Differential Revision: https://reviews.llvm.org/D112813
2021-10-29 14:22:06 +01:00
Chen Zheng 7591d21032 [PowerPC] fix a miscompile for Solaris build 2021-10-29 12:06:25 +00:00
Bradley Smith bf72a469ba [AArch64][SVE] Fix build failure introduced in 13faa5f440 2021-10-29 11:57:02 +00:00
Simon Pilgrim 154c036ebb [X86] combineX86GatherScatter - only fold scale if the index isn't extended
As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe.

If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.
2021-10-29 11:48:05 +01:00
Bradley Smith 13faa5f440 [AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types
This adds support for SVE structured loads/stores to the relevant target
hooks, such that we can support these instructions in the InterleavedAccess
pass.

Depends on D112078

Differential Revision: https://reviews.llvm.org/D112303
2021-10-29 09:35:57 +00:00
Cullen Rhodes 8686626244 [Sparc] NFC: Remove unused tblgen template args
Identified in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109712
2021-10-29 09:16:15 +00:00
Vang Thao 52b43d1549 [AMDGPU] Fix cvt_f32_ubyte combine with shl
Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112733
2021-10-28 21:43:06 -07:00
Kazu Hirata 01b4789b62 [AMDGPU] Remove hasDefinedInitializer (NFC)
The last use was removed on Sep 16, 2021 in commit
7a62a5b56d.
2021-10-28 20:33:34 -07:00
Kazu Hirata dd5d46b009 [AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC)
This field seems to be unused for at least one year.
2021-10-28 20:33:32 -07:00
Kazu Hirata 309357c01a [AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC) 2021-10-28 20:33:30 -07:00
Thomas Lively fb67f3d969 [WebAssembly] Add prototype relaxed float to int trunc instructions
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.

These are only exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112186
2021-10-28 14:01:53 -07:00
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen a66451ebbe [WebAssembly] Fix debug locations for ExplicitLocals pass
Differential Revision: https://reviews.llvm.org/D112487
2021-10-28 12:35:46 -07:00