llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	82e0eb22af	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI. This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.	2021-11-02 18:04:35 +00:00
Fraser Cormack	d065b03801	[RISCV] Optimize vp.load with an all-ones mask Similar to D110206, this patch optimizes unmasked vp.load intrinsics to avoid the need of a vmset instruction to set the mask. It does so by selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113022	2021-11-02 17:23:39 +00:00
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
Matt	895145aacb	Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA" This reverts commit `fc28a2f8ce`.	2021-11-02 14:56:01 +00:00
Simon Pilgrim	e173631dd1	[X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI. Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.	2021-11-02 14:07:22 +00:00
Simon Pilgrim	8ca666a280	[X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI.	2021-11-02 14:07:21 +00:00
Martin Liska	c5029023fb	Fix building with GCC 12: Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380 Differential Revision: https://reviews.llvm.org/D112990	2021-11-02 14:28:00 +01:00
David Callahan	4ec1b8eeac	[RISCV] Fix invalid kill on callee save A callee save may be live (specifically X1) on entry and so a spill should not mark it killed. Differential Revision: https://reviews.llvm.org/D111285	2021-11-02 11:56:54 +00:00
Wouter van Oortmerssen	ac65366485	[WebAssembly] support "return" and unreachable code in asm type checker To support return (it not being supported well was the ground cause for https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have at least a basic notion of unreachable, which in this case just means to stop type checking until there is an end_block (an incoming control flow edge). This is conservative (may miss on some type checking opportunities) but is simple and an improvement over what we had before. Differential Revision: https://reviews.llvm.org/D112953	2021-11-01 15:42:58 -07:00
Yonghong Song	f63405f6e3	BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin Commit `acabad9ff6` ("[InstCombine] try to canonicalize icmp with trunc op into mask and cmp") added a transformation to convert "(conv)a < power_2_const" to "a & <const>" in certain cases and bpf kernel verifier has to handle the resulted code conservatively and this may reject otherwise legitimate program. This commit tries to prevent such a transformation. A bpf backend builtin llvm.bpf.compare is added. The ICMP insn, which is subject to above InstCombine transformation, is converted to the builtin function. The builtin function is later lowered to original ICMP insn, certainly after InstCombine pass. With this change, all affected bpf strobemeta* selftests are passed now. Differential Revision: https://reviews.llvm.org/D112938	2021-11-01 14:46:20 -07:00
Cameron McInally	702fd3d323	[SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive. For NEON, FMA matching is done in the MachineCombiner, and not the DAGCombiner. That causes problems with VLS lowering, since the vectors are fixed width at the DAGCombiner, but are scalable in the MachineCombiner. This patch corrects it by matching FMAs for VLS vectors in the DAGCombiner. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112557	2021-11-01 10:43:52 -07:00
Kazu Hirata	d000431fb2	[X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC) Note that the identically named class is defined in an anonymous namespace in X86ELFObjectWriter.cpp.	2021-11-01 08:31:54 -07:00
Jay Foad	7afef22926	[AMDGPU] Use MachineInstrBuilder::addReg. NFC.	2021-11-01 15:29:51 +00:00
Jay Foad	2b548b18c1	[AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32 Differential Revision: https://reviews.llvm.org/D112917	2021-11-01 13:55:53 +00:00
Mubashar Ahmad	0b83a18a2b	[AArch64] Enablement of Cortex-X2 Enables support for Cortex-X2 cores. Differential Revision: https://reviews.llvm.org/D112459	2021-11-01 11:55:24 +00:00
Simon Pilgrim	6fc50e531d	[CostModel][X86] Remove old FIXME comments for AVX512F vector splitting Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.	2021-11-01 11:11:11 +00:00
Simon Pilgrim	fd485d8cda	[X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances. This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source. There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses? Noticed while investigating the quality of interleaved load/store codegen. Differential Revision: https://reviews.llvm.org/D111960	2021-11-01 10:45:50 +00:00
Kazu Hirata	476e1ee3da	[AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC)	2021-10-31 22:58:56 -07:00
Chen Zheng	eeed1545b2	[PowerPC] turn off chain commoning by default.	2021-11-01 04:11:10 +00:00
Zi Xuan Wu	cf78715cae	[CSKY] First patch to construct codegen infra and generate first add instruction Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully. Differential Revision: https://reviews.llvm.org/D112206	2021-11-01 10:06:56 +08:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Kazu Hirata	5970249439	[Hexagon] Remove chksetELFHeaderEFlags (NFC) The function was introduced without any use on Nov 9, 2015 in commit `7cd0892729`.	2021-10-30 08:43:43 -07:00
Kazu Hirata	c3d63a0697	[Hexagon] Remove ValidArch (NFC) This function seems to be unused for at least one year.	2021-10-30 08:43:41 -07:00
Kazu Hirata	c5cd371cc9	[Hexagon] Remove unused struct InstTy (NFC)	2021-10-30 08:43:39 -07:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Sam Clegg	3b039c68f2	Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass" This reverts commit `a66451ebbe`. This caused a failure when integrated with emscripten: https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview	2021-10-29 13:34:18 -07:00
Nick Desaulniers	39e5dd113f	[SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4 These compiler-rt-only symbols aren't available in libgcc. Similar to D108842, D108844, and D108926. Fixes: pr/52043 Reviewed By: craig.topper, rengolin Differential Revision: https://reviews.llvm.org/D112750	2021-10-29 13:14:09 -07:00
Sanjay Patel	285b8abce4	[x86] limit vector increment fold to allow load folding The tests are based on the example from: https://llvm.org/PR52032 I suspect that it looks worse than it actually is. :) That is, llvm-mca says there's no uop/timing difference with the load folding and pcmpeq vs. broadcast on Haswell (and probably other targets). The load-folding definitely makes the code smaller, so it's good for that at least. So this requires carving a narrow hole in the transform to get just this case without changing others that look good as-is (in other words, the transform still seems good for most examples). Differential Revision: https://reviews.llvm.org/D112464	2021-10-29 15:48:35 -04:00
Sanjay Patel	837518d6a0	[x86] make mayFold* helpers visible to more files; NFC The first function is needed for D112464, but we might as well keep these together in case the others can be used someday.	2021-10-29 15:48:35 -04:00
Amara Emerson	5dd9e019dd	[AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added. rdar://84674985	2021-10-29 11:47:00 -07:00
Matt Morehouse	33cc0cfd46	[X86] Don't affect jump tables under +tagged-globals. `classifyLocalReference(nullptr)` is called to get the appropriate relocation type for jump tables. We should not use @GOTPCREL for this case. The new test cases trigger assertions without this patch. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D112832	2021-10-29 10:37:43 -07:00
Craig Topper	aefcd59895	[RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better. If the VL operand of a mask register instruction comes from an explicit vsetvli with a different VTYPE, we can still avoid needing a vsetvli as long as the SEW/LMUL ratio is the same and policy bits match. Differential Revision: https://reviews.llvm.org/D112762	2021-10-29 09:49:36 -07:00
Simon Pilgrim	6102e5d56b	[CostModel][X86] Remove old TODO comment BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c	2021-10-29 17:28:45 +01:00
Bradley Smith	86972f1114	[AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes Add support for generating TargetFrameIndex in complex patterns for indexed addressing modes in SVE. Additionally, add missing load/stores to getMemOpInfo and getLoadStoreImmIdx. Differential Revision: https://reviews.llvm.org/D112617	2021-10-29 14:44:16 +00:00
Jay Foad	1b758925ad	[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into a basic block. Implement this directly in getAsInstruction by adding an InsertBefore parameter and change all callers to use it. NFC. A follow-up patch will remove createReplacementInstr. Differential Revision: https://reviews.llvm.org/D112791	2021-10-29 15:02:58 +01:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Chen Zheng	7591d21032	[PowerPC] fix a miscompile for Solaris build	2021-10-29 12:06:25 +00:00
Bradley Smith	bf72a469ba	[AArch64][SVE] Fix build failure introduced in `13faa5f440`	2021-10-29 11:57:02 +00:00
Simon Pilgrim	154c036ebb	[X86] combineX86GatherScatter - only fold scale if the index isn't extended As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe. If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.	2021-10-29 11:48:05 +01:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Cullen Rhodes	8686626244	[Sparc] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109712	2021-10-29 09:16:15 +00:00
Vang Thao	52b43d1549	[AMDGPU] Fix cvt_f32_ubyte combine with shl Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112733	2021-10-28 21:43:06 -07:00
Kazu Hirata	01b4789b62	[AMDGPU] Remove hasDefinedInitializer (NFC) The last use was removed on Sep 16, 2021 in commit `7a62a5b56d`.	2021-10-28 20:33:34 -07:00
Kazu Hirata	dd5d46b009	[AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC) This field seems to be unused for at least one year.	2021-10-28 20:33:32 -07:00
Kazu Hirata	309357c01a	[AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC)	2021-10-28 20:33:30 -07:00
Thomas Lively	fb67f3d969	[WebAssembly] Add prototype relaxed float to int trunc instructions Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u, i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112186	2021-10-28 14:01:53 -07:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen	a66451ebbe	[WebAssembly] Fix debug locations for ExplicitLocals pass Differential Revision: https://reviews.llvm.org/D112487	2021-10-28 12:35:46 -07:00

1 2 3 4 5 ...

64734 Commits