llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	cd997689f2	[Hexagon] Fix isTypeForHVX to recognize floating point types Co-authored-by: Sumanth Gundapaneni <sgundapa@quicinc.com>	2021-12-30 10:01:05 -08:00
Krzysztof Parzyszek	23423638cc	[Hexagon] Handle HVX/FP shuffles, insertion and extraction Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>	2021-12-30 08:44:10 -08:00
Krzysztof Parzyszek	95c7dd8810	Revert "[Hexagon] Don't build two halves of HVX vector in parallel" This reverts commit `ba07f300c6`. A build-vector sequence is made of pairs: rotate+insert. When constructing a single vector, this results in a chain of 2*N instructions. The rotate operation is a permute operation, but the insert uses a multiplication resource: insert and rotate can execute in the same cycle, but obviously they cannot operate on the same vector. The original halving idea is still beneficial since it does allow for insert/rotate overlap, and for hiding insert's latency.	2021-12-30 07:57:11 -08:00
jacquesguan	128c6ed73b	[RISCV] Teach VSETVLInsert to eliminate redundant vsetvli for vmv.s.x and vfmv.s.f. Differential Revision: https://reviews.llvm.org/D116307	2021-12-30 17:16:18 +08:00
Sjoerd Meijer	550d90e692	Revert "[AArch64] Add a tablegen pattern for UZP2." This reverts commit `ada028c32f`. A performance regression was reported that we need to investigate: https://github.com/llvm/llvm-project/issues/52919	2021-12-30 09:15:46 +00:00
jacquesguan	1dd5e6fed5	[RISCV] Use vmv.s.x instead of vfmv.s.f when the floating point scalar is 0. Use integer vector scalar move instruction when move 0 to avoid add a integer-float move instruction. Differential Revision: https://reviews.llvm.org/D116365	2021-12-30 10:16:54 +08:00
Chenbing.Zheng	43c8296cda	[RISCV] Refactor immediate comparison instructions patterns The patterns of the immediate comparison instruction is rewrite here, and put similar code to a class. Do not change any function of the original code, making the code more concise. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116215	2021-12-30 09:31:01 +08:00
Krzysztof Parzyszek	ba07f300c6	[Hexagon] Don't build two halves of HVX vector in parallel There can only be one permute operations per packet, so this actually pessimizes the code (due to the extra "or").	2021-12-29 11:00:01 -08:00
Joshua Herrera	505d57486e	[Hexagon] Improve BUILD_VECTOR codegen For vectors with repeating values, old codegen would rotate and insert every duplicate element. This patch replaces that behavior with a splat of the most common element, vinsert/vror only occur when needed.	2021-12-29 10:18:21 -08:00
Craig Topper	015ff729cb	[RISCV] Add a few more instructions to hasAllNBitUsers.	2021-12-29 09:17:47 -08:00
Krzysztof Parzyszek	4df2aba294	[Hexagon] Calling conventions for floating point vectors They are the same as for the other HVX vectors, but types need to be listed explicitly. Also, add a detailed codegen testcase. Co-authored-by: Abhikrant Sharma <quic_abhikran@quicinc.com>	2021-12-29 09:01:07 -08:00
Krzysztof Parzyszek	2ce586bc49	[Hexagon] Handle floating point splats Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>	2021-12-29 06:52:24 -08:00
Krzysztof Parzyszek	33fc675e16	[Hexagon] Handle floating point vector loads/stores	2021-12-29 05:52:39 -08:00
Kazu Hirata	8414321bec	[Hexagon] Use range-based for loops (NFC)	2021-12-28 23:47:25 -08:00
Krzysztof Parzyszek	6a6ac3b36f	[Hexagon] Support BUILD_VECTOR of floating point HVX vectors Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com> Co-authored-by: Ankit Aggarwal <aankit@quicinc.com>	2021-12-28 14:59:08 -08:00
Krzysztof Parzyszek	7df136bcf2	[Hexagon] Delete unused declaration of LowerHvxMul, NFC	2021-12-28 11:36:07 -08:00
David Green	43e500d791	[AArch64] Minor AArch64MIPeepholeOpt cleanup. NFC We should always be in SSA form when running the pass, so turn a check into an assert.	2021-12-28 19:10:01 +00:00
Brian Cain	1e7bd93ff2	[Hexagon] Add HexagonMCInstrInfo::IsABranchingInst, NFC	2021-12-28 09:51:27 -08:00
Brian Cain	c5327137df	[Hexagon] Fix for producer operands search w/z-reg Z-register does not show up in defs, so checks searching for the def operand must look for a different def index than they would normally.	2021-12-28 09:19:59 -08:00
Kazu Hirata	5a667c0e74	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2021-12-28 08:52:25 -08:00
Krzysztof Parzyszek	648246cce6	[Hexagon] Remove isPredicateRegister in favor of isPredReg, NFC HexagonMCChecker has its own function isPredicateRegister, which does the same thing as HexagonMCInstrInfo::isPredReg.	2021-12-28 08:40:40 -08:00
Hsiangkai Wang	a1c7ddf926	[RISCV] Support passing scalable vectur values through the stack. After consuming all vector registers, the scalable vector values will be passed indirectly. The pointer values will be saved in general registers. If all general registers are used up, we will report an error to notify users the compiler does not support passing scalable vector values through the stack. In this patch, we remove the restriction. After all general registers are used up, we use the stack to save the pointers which point to the indirect passed scalable vector values. Differential Revision: https://reviews.llvm.org/D116310	2021-12-28 09:26:36 +08:00
Kazu Hirata	8445883327	[llvm] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-12-27 15:58:03 -08:00
David Green	2ec3ca7477	[ARM] Extend IsCMPZCSINC to handle CMOV A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp', and can be treated the same in IsCMPZCSINC added in D114013. This allows us to remove the unnecessary CMOV in the same way that we could remove a CSINC. Differential Revision: https://reviews.llvm.org/D115188	2021-12-27 14:15:03 +00:00
Simon Pilgrim	a0a0eb192e	[X86] Use WriteVecMove scheduler classes for VPMOVM2* instructions These match the port behaviour of reg-reg predicated xmm/ymm/zmm moves Fixes #34958	2021-12-27 13:21:29 +00:00
Simon Pilgrim	29475e0286	[X86] Add scheduler classes for zmm vector reg-reg move instructions Basic zmm reg-reg moves (with predication) are more port limited than xmm/ymm moves, so we need to add a separate class for them. We still appear to be missing move-elimination patterns for most of the intel models, which looks to be one of the main diffs for basic codegen analysis between llvm-mca and uops.info Load/stores are a bit messier and might be better handled as overrides.	2021-12-27 12:13:42 +00:00
Nikita Popov	7c3cf4c2c0	[Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660) X86 allows inlining functions if the callee target features are a subset of the caller target features. This ensures that we don't inline something into a caller that does not support it. However, this does not account for possible call ABI mismatches as a result of inlining. If a call passing a vector argument was originally in a -avx function, calling another -avx function, the vector is passed in xmm. If we now inline it into a +avx function, then it will be passed in ymm, even though the callee expects it in xmm. Fix this by scanning over all calls in the function and checking whether ABI incompatibility is possible. Calls that only pass scalar types are excluded, as I believe those always use the same ABI independent of target features. Fixes https://github.com/llvm/llvm-project/issues/52660. Differential Revision: https://reviews.llvm.org/D116036	2021-12-27 09:36:21 +01:00
Kazu Hirata	0a5788ab57	[Target] Use range-based for loops (NFC)	2021-12-26 23:49:38 -08:00
Hsiangkai Wang	5d47e7d768	[RISCV] Convert whole register copies as the source defined explicitly. The implicit defines may come from a partial define in an instruction. It does not mean the defining instruction and the COPY instruction have the same vl and vtype. When the source comes from the implicit defines, do not convert the whole register copies to vmv.v.v. Differential Revision: https://reviews.llvm.org/D115866	2021-12-27 13:59:49 +08:00
Shao-Ce SUN	70a98008ea	[RISCV] Reduce repetitive codes in flw, fsw Trying to improve code reuse in F,D,Zfh *.td files. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116089	2021-12-27 09:29:35 +08:00
Kazu Hirata	e7774f499b	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-12-26 14:26:44 -08:00
Kazu Hirata	0542d15211	Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-12-26 09:39:26 -08:00
Kazu Hirata	2d303e6781	Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-12-24 23:17:54 -08:00
alex-t	8020458c5d	[AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1 pattern In 'trunc' i16/32/64 to i1 pattern the 'and $src, 1' node supply operand to 'setcc'. The latter is selected to S_CMP_EQ/V_CMP_EQ dependent on the divergence. In case the 'and' is scalar and 'setcc' is divergent, we need VGPR to SGPR copy to adjust input operand for V_CMP_EQ. This patch changes the S_AND_B32 to V_AND_B32_e64 in the 'trunc to i1' divergent patterns. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116241	2021-12-24 18:24:49 +03:00
Simon Pilgrim	159da56737	[X86] Enable v32i16 ISD::ROTL/ROTR lowering on AVX512BW targets	2021-12-24 13:30:52 +00:00
Alexandros Lamprineas	bb84dd8159	[AArch64] Add a tablegen pattern for RADDHN/RADDHN2. Converts RSHRN/RSHRN2 to RADDHN/RADDHN2 when the shift amount is half the width of the vector element. The latter has twice the throughput and half the latency on Arm out-of-order cores. Setting up the zero register adds no latency. Differential Revision: https://reviews.llvm.org/D116166	2021-12-24 11:13:25 +00:00
Phoebe Wang	24c68ea1eb	Reland "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables" This reverts commit `a954558e87`. Thanks Yuanfang's help. I think I found the root cause of the buildbot fail. The failed test has both Memory and Immediate X86Operand. All data of different operand kinds share the same memory space by a union definition. So it has chance we get the wrong result if we don't check the operand kind. It's probably it happen to be the correct value in my local environment so that I can't reproduce the fail. Differential Revision: https://reviews.llvm.org/D116090	2021-12-24 17:42:51 +08:00
Jim Lin	02478a26f2	[RISCV] Use DAG variable directly instead of DCI.DAG Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116087	2021-12-24 13:06:55 +08:00
Craig Topper	0a35211b34	[RISCV] Don't allow vector types to be used with inline asm 'r' constraint The 'r' constraint uses the GPR class. There is generic support for bitcasting and extending/truncating non-integer VTs to the required integer VT. This doesn't work for scalable vectors and instead crashes. To prevent this, explicitly reject vectors. Fixed vectors might work without crashing, but it doesn't seem worthwhile to allow. While there remove an unnecessary level of indentation in the "vr" and "vm" constraint handling. Differential Revision: https://reviews.llvm.org/D115810	2021-12-23 20:32:36 -06:00
Victor Perez	10b3675aa9	[RISCV][VP] Lower mask vector VP AND/OR/XOR to RVV instructions For fixed and scalable vectors, each intrinsic x is lowered to vmx.mm, dropping the mask, which is safe to do as masked-off elements are undef anyway. Differential Revision: https://reviews.llvm.org/D115339	2021-12-23 15:02:32 -06:00
Brendon Cahoon	d45a247998	[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur due to stack slot coloring. The problem is that subsequent code that processes the other object will assert because the stack frame index is marked dead. Removing dead frame indices is needed prior to stack slot coloring, which is what happens with SGPR to VGPR spills. These spills are lowered prior to stack slot coloring, but the VGPR to AGPR spills are processed afterwards during the Prolog/Epilog Inserter pass. This patch marks the VGPR to AGPR spill slot as dead if the slot is not used by another object. Differential Revision: https://reviews.llvm.org/D115996	2021-12-23 11:09:19 -06:00
Krzysztof Parzyszek	f766bc093c	[Hexagon] Introduce Hexagon v69 ISA	2021-12-23 08:46:03 -08:00
Craig Topper	7704c503ec	[RISCV] Use positive 0.0 for the neutral element in fadd reductions if nsz is present. -0.0 requires a constant pool. +0.0 can be made with vmv.v.x x0. Not doing this in getNeutralElement for fear of changing other targets. Differential Revision: https://reviews.llvm.org/D115978	2021-12-23 10:38:00 -06:00
Craig Topper	b7b260e19a	[RISCV] Support strict FP conversion operations. This adds support for strict conversions between fp types and between integer and fp. NOTE: RISCV has static rounding mode instructions, but the constrainted intrinsic metadata is not used to select static rounding modes. Dynamic rounding mode is always used. Differential Revision: https://reviews.llvm.org/D115997	2021-12-23 09:40:58 -06:00
Alexandros Lamprineas	e70ef6d924	[AArch64] Add a tablegen pattern for SQXTN2. Converts concat_vectors(Vd, trunc(smin(smax Vm, -2^n), 2^n-1) to sqxtn2(Vd, Vm). Deliberately not handling v2i64 ~> v2i32 as the min/max nodes are not legal (same thing we did for the SQXTN patterns in https://reviews.llvm.org/D103263). Differential Revision: https://reviews.llvm.org/D116105	2021-12-23 15:19:13 +00:00
Simon Pilgrim	71fc4bbdd2	[X86][SSE] Add ISD::ROTR support Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.	2021-12-23 15:07:30 +00:00
Craig Topper	a9486a40f7	[RISCV] Disable interleaving scalar loops in the loop vectorizer. The loop vectorizer can interleave scalar loops even if it doesn't vectorize them. I don't believe we intended to enable this when we enabled interleaving for vector instructions. Disable interleaving for VF=1 like X86 and AMDGPU already do. Test lifted from AMDGPU. Differential Revision: https://reviews.llvm.org/D115975	2021-12-23 08:37:24 -06:00
Simon Pilgrim	a3f50fb06d	[X86] isVectorShiftByScalarCheap - vXi8 select(shift(x,splat0),shift(x,splat1)) is better than shift(x,select(splat0,splat1)) Even though we don't have vXi8 vector shifts (apart from XOP), it is still better to prefer shift (or funnel-shift/rotate) by scalar where possible. https://llvm.godbolt.org/z/6ss6ffTxv Differential Revision: https://reviews.llvm.org/D116191	2021-12-23 14:30:02 +00:00
Petar Avramovic	29f88b93fd	[GlobalISel] Rework more/fewer elements for vectors Artifact combiner is not able to access individual elements after using LCMTy style merge/unmerge, extract and insert to change vector number of elements (pad with undef or split to sub-vector instructions). Use unmerge to individual elements instead and then merge elements into requested types. Change argument lowering for vectors and moreElementsVector to use buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements. FewerElementsVector had a few helpers that had different behavior, introduce new helper for most of the opcodes. FewerElementsVector helper is more flexible since it can create leftover instruction smaller then requested type (useful in case target wants to avoid pad with undef and use fewer registers). If target does not want leftover of different type it should call more elements first. Some helpers were performing more elements first to have split without leftover. Opcodes that used this helper use clampMaxNumElementsStrict (does more elements first) in LegalizerInfo to avoid test changes. Fixes failures caused by failing to combine artifacts created during more/fewer elements vector. Differential Revision: https://reviews.llvm.org/D114198	2021-12-23 14:30:02 +01:00
Jay Foad	74ce7ff5dc	[AMDGPU] Remove a TODO that was done by D98081	2021-12-23 10:19:37 +00:00
Phoebe Wang	a954558e87	Revert "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables" This reverts commit `682d01a1c1`. Revert for buildbot fails.	2021-12-23 12:44:33 +08:00
Phoebe Wang	682d01a1c1	[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables D115225 tried to roll back the effects on symbols of MS inline asm introduced by D113096. But the combination of the conditions cannot match all the changes. As a result, there are still fails after the patch. This patch fixes the problem by checking the exact conditions for MS global variables, i.e., variable (by FrontendSize != 0) + non rip/eip (by DefaultBaseReg == 0), so that we can fully roll back for D113096. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D116090	2021-12-23 11:46:03 +08:00
jacquesguan	28a3e7dea2	[RISCV] Override hasAndNotCompare to use more andn when have Zbb extension. Enable transform (X & Y) == Y ---> (~X & Y) == 0 and (X & Y) != Y ---> (~X & Y) != 0 when have Zbb extension to use more andn instruction. Differential Revision: https://reviews.llvm.org/D115922	2021-12-23 10:42:20 +08:00
alex-t	e4103c91f8	[AMDGPU] Select build_vector DAG nodes according to the divergence This change enables divergence-driven instruction selection for the build_vector DAG nodes. It also enables packed i16 instructions for GFX9. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116187	2021-12-23 02:27:12 +03:00
Simon Pilgrim	8b58344efb	Remove superfluous semicolon. Missed by MSVC	2021-12-22 17:42:45 +00:00
Simon Pilgrim	4639461531	[DAG][X86] Add TargetLowering::isSplatValueForTargetNode override Add callback to enable us to test target nodes if they are splat vectors Added some basic X86ISD::VBROADCAST + X86ISD::VBROADCAST_LOAD handling	2021-12-22 16:57:44 +00:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit `9075009d1f`. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00
RamNalamothu	9075009d1f	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652	2021-12-22 20:51:12 +05:30
Nikita Popov	3b0f5a4856	[Mips16HardFloat] Simplify attribute change (NFC) As we're only removing and adding a single attribute, there is no need to go through AttrBuilder.	2021-12-22 09:41:11 +01:00
Nikita Popov	f5ac23b5ae	[ArgPromotion][TTI] Pass types to ABI compatibility hook The areFunctionArgsABICompatible() hook currently accepts a list of pointer arguments, though what we're actually interested in is the ABI compatibility after these pointer arguments have been converted into value arguments. This means that a) the current API is incompatible with opaque pointers (because it requires inspection of pointee types) and b) it can only be used in the specific context of ArgPromotion. I would like to reuse the API when inspecting calls during inlining. This patch converts it into an areTypesABICompatible() hook, which accepts a list of types. This makes the method more generally usable, and compatible with opaque pointers from an API perspective (the actual usage in ArgPromotion/Attributor is still incompatible, I'll follow up on that in separate patches). Differential Revision: https://reviews.llvm.org/D116031	2021-12-22 09:37:51 +01:00
Kazu Hirata	9db0e21660	[llvm] Use depth_first (NFC)	2021-12-21 22:28:48 -08:00
Nemanja Ivanovic	1674d9b6b2	[PowerPC] Fix vector equality comparison for v2i64 pre-Power8 The current code makes the assumption that equality comparison can be performed with a word comparison instruction. While this is true if the entire 64-bit results are used, it does not generally work. It is possible that the low order words and high order words produce different results and a user of only one will get the wrong result. This patch adds an and of the result words so that each word has the result of the comparison of the entire doubleword that contains it. Differential revision: https://reviews.llvm.org/D115678	2021-12-21 14:28:41 -06:00
Nemanja Ivanovic	a3ea9052d6	[PowerPC] Do not increase cost for getUserCost with MMA types Commit `150681f` increases cost of producing MMA types (vector pair and quad). However, it increases the cost for getUserCost() which is used in unrolling. As a result, loops that contain these types already (from the user code) cannot be unrolled (even with the user's unroll pragma). This was an unintended sideeffect. Reverting that portion of the commit to allow unrolling such loops. Differential revision: https://reviews.llvm.org/D115424	2021-12-21 13:36:08 -06:00
Alexandros Lamprineas	ada028c32f	[AArch64] Add a tablegen pattern for UZP2. Converts concat_vectors((trunc (lshr)), (trunc (lshr))) to UZP2 when the shift amount is half the width of the vector element. Differential Revision: https://reviews.llvm.org/D116021	2021-12-21 16:21:46 +00:00
Simon Pilgrim	dfa2ad1ad8	[X86] getTargetVShiftNode - remove shift-by-constant handling. Move shift-by-constant handling and move it into its only user (VSHIFT intrinsics lowering). This is some prep-work for getTargetVShiftNode to no longer take a scalar shift amount - we're introducing temporary ISD::EXTRACT_VECTOR_ELT nodes via SelectionDAG::getSplatValue to accommodate this which can cause various issues, including unnecessary scalarization and xmm->gpr->xmm transfers, and causes problems for 32-bit codegen if we fail to remove an (illegal) i64 scalar extracted from a (legal) vXi64 vector.	2021-12-21 13:16:48 +00:00
Simon Pilgrim	0caf8a3daf	[X86] LowerRotate - enable vXi32 splat handling Pull out the "rotl(x,y) --> (unpack(x,x) << zext(splat(y % bw))) >> bw" special case from vXi8 lowering so we can reuse it for vXi32 types as well. There's still some regressions with vXi16 to handle before this becomes entirely general. It also allows us to remove the now unnecessary hack for handling amount-modulo before splatting.	2021-12-21 11:19:23 +00:00
Andrew Wei	03dc2975d0	[AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2 Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D113376	2021-12-21 18:39:09 +08:00
Simon Moll	b2cea573c9	[VE] FADD,FSUB,FMUL,FDIV v256f32\|f64 isel and tests Depends on D115940 for the `Binary_rv_vr_vv` pattern class op isel fragment used for divisions. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D116035	2021-12-21 09:15:31 +01:00
Simon Moll	8c51812913	[VE] U\|SDIV v256i32\|64 isel and tests Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115940	2021-12-21 08:51:01 +01:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Kazu Hirata	c5cf7d910e	[ARM] Use range-based for loops (NFC)	2021-12-20 23:06:47 -08:00
Yonghong Song	76b7d73429	BPF: report better error message for BTF_TYPE_ID_REMOTE relo failure Matteo Croce reported a bpf backend fatal error in https://github.com/llvm/llvm-project/issues/52779 A simplified case looks like: $ cat bug.c extern int do_smth(int); int test() { return __builtin_btf_type_id((typeof(do_smth) )do_smth, 1); } $ clang -target bpf -O2 -g -c bug.c fatal error: error in backend: Empty type name for BTF_TYPE_ID_REMOTE reloc ... The reason for the fatal error is that the relocation is against a DISubroutineType like type 13 below: !10 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !11 = !{} !12 = !DILocation(line: 3, column: 10, scope: !7) !13 = !DISubroutineType(types: !14) !14 = !{!10, !10} The DISubroutineType doesn't have a name and there is no way for downstream bpfloader/kernel to do proper relocation for it. But we can improve error message to be more specific for this case. The patch improved the error message to be: fatal error: error in backend: SubroutineType not supported for BTF_TYPE_ID_REMOTE reloc Differential Revision: https://reviews.llvm.org/D116063	2021-12-20 21:06:19 -08:00
Esme-Yi	b66328701a	[PowerPC][llvm-objdump] enable --symbolize-operands for PowerPC ELF/XCOFF. Summary: When disassembling, symbolize a branch target operand to print a label instead of a real address. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D114492	2021-12-21 04:17:57 +00:00
Matt Arsenault	c222972442	AMDGPU/GlobalISel: Stop using NarrowScalar/FewerElements for unaligned splitting These actions should only be used for adjusting the register types (and the memory type as needed to satisfy the register type). Unaligned accesses should be split as a type of lowering. This has the effect of improving the code in many cases since now we produce zextloads instead of separate loads with ands. The load/store legality rules still seem far more complicated than necessary though.	2021-12-20 18:07:11 -05:00
Kazu Hirata	de90490060	Revert "[ARM] Use range-based for loops (NFC)" This reverts commit `93d79cac2e`. This patch seems to break llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.	2021-12-20 10:51:36 -08:00
alex-t	19727e31fb	[AMDGPU] Enable divergence predicates for ctlz/cttz ctlz/cttz get lowered to the set of target opcodes This change enables the ISel to select SALU or VALU form according to the SDNode divergence. CTLZ - S_FLBIT_I32_B32 if uniform and V_FFBH_U32_e64 if divergent CTTZ - S_FF1_I32_B32 if uniform and V_FFBL_B32_e64 if divergent Also @llvm.amdgcn.sffbh.i32 gets lowered to S_FLBIT_I32 if uniform and V_FFBH_I32_e64 if divergent NOTE: 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B64 are not currently supported by the DAG ISel. ctlz/cttz with i64 input are split into two 32bit instructions. Nevertheless, they already have the patterns and were equipped with the divergence predicates to make sure they will be selected correctly when enabled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116044	2021-12-20 20:53:48 +03:00
Sander de Smalen	b1ff20fd35	[LV] Enable scalable vectorization by default for SVE cores. The availability of SVE should be sufficient to enable scalable auto-vectorization. This patch adds a new TTI interface to query the target what style of vectorization it wants when scalable vectors are available. For other targets than AArch64, this currently defaults to 'FixedWidthOnly'. Differential Revision: https://reviews.llvm.org/D115651	2021-12-20 16:23:29 +00:00
Nemanja Ivanovic	2fb9029f26	[PowerPC] Support hwsync extended mnemonic This mnemonic has been supported by GAS for years and it was added to the PowerPC ISA as of ISA 3.1. We will support the mnemonic to be compatible with GAS.	2021-12-20 10:08:31 -06:00
Jay Foad	8b997adc64	[AMDGPU] Remove dead code after D109052	2021-12-20 14:20:02 +00:00
alex-t	98d09705e1	[AMDGPU] Re-enabling divergence predicates for min/max This patch enables divergence predicates for min/max nodes. It makes ISD::MIN/MAX selected to S_MIN_I(U)32/S_MAX_I(U)32 or V_MIN_I(U)32_e64/V_MAX_I(U)32_e64 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115954	2021-12-20 16:10:55 +03:00
alex-t	1448aa9dbd	[AMDGPU] Expand not pattern according to the XOR node divergence The "not" is defined as XOR $src -1. We need to transform this pattern to either S_NOT_B32 or V_NOT_B32_e32 dependent on the "xor" node divergence. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D115884	2021-12-20 14:41:38 +03:00
Kazu Hirata	93d79cac2e	[ARM] Use range-based for loops (NFC)	2021-12-20 00:04:53 -08:00
Shao-Ce SUN	68bc6d7cae	[RISCV] Remove Zvamo Extention Based on D111692. Zvamo is not part of the 1.0 V spec. Remove it. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D115709	2021-12-20 10:28:39 +08:00
David Green	4ece4cd77e	[ARM] Fold away CMP/CSINC from CMOV This makes use of the code in D114013 to fold away unnecessary CMPZ/CSINC starting from a CMOV, in a similar way to how we fold away CSINV/CSINC/etc Differential Revision: https://reviews.llvm.org/D115185	2021-12-19 21:53:50 +00:00
Simon Pilgrim	47bd9ebda4	[X86][AVX512] cvt_by_vec_width - don't hardcode the schedule class. NFC. Pull out the WriteMove schedule class into the cvt_mask_by_elt_width wrapper	2021-12-19 16:40:12 +00:00
Simon Pilgrim	67cce1ceee	[X86] Adjust some IceLake fp shuffle schedule classes (PR48110) The IceLake scheduler model is still mainly a copy of the SkylakeServer model. This patch adjusts the fp shuffle classes to account for most instructions now working on Port 1 as well as Port 5. This is based off Agner + uops.info as well as the PR48110 report. Differential Revision: https://reviews.llvm.org/D115752	2021-12-19 13:00:11 +00:00
Jakub Kuderski	1e93f3895f	[AMDGPU] Use enum_seq to iterator over InstCounterTypes. NFC. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115900	2021-12-18 16:07:28 -05:00
Jakub Kuderski	d9ae852fcc	[AMDGPU] Fix data race in SIInsertWaitcnts The race condition happened when two pass managers ran on two different modules but modified/read the global variables. To address this, I considered using singletons and freestanding functions to allow getting/setting `HardwareLimits` and `RegisterEncoding`, or making it local to the pass. I chose the latter and made it a member of `WaitcntsBrackets`, to minimizes the amount of global state. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115896	2021-12-18 16:03:09 -05:00
Michael Berg	f95ee6074a	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-18 12:54:50 -08:00
Kazu Hirata	766d32f582	[Hexagon] Use is_contained (NFC)	2021-12-17 14:34:30 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Kazu Hirata	2b7be47b22	[llvm] Strip redundant lambda (NFC)	2021-12-17 10:51:40 -08:00
Kazu Hirata	f78c1b07cb	[Target] Use range-based for loops (NFC)	2021-12-17 10:11:08 -08:00
Craig Topper	be41996f4f	[RISCV} Add FSGNJ_H to isAsCheapAsAMove and isCopyInstrImpl. This matches FSGNJ_S and FSGNJ_D.	2021-12-17 09:14:20 -08:00
David Truby	7e44eb079d	[AArch64][SVE] Improve code generation for VLS i1 masks This patch partially resolves an issue for VLS code generation where a mask is generated from a smaller width integer comparison than the instruction using the mask requires. Instead of sign extending a p register by converting it to a z register, extending that, and converting back, we instead just do an unpack of the p register. A separate issue causes the code generation to still be poor when the mask generation would fit in a neon register, as we then use a neon comparison operation and have to convert that to a p register. This will be resolved in a separate patch. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D111221	2021-12-17 16:26:49 +00:00
Matthew Devereau	e00f22c1b1	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-12-17 15:04:45 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Paul Walker	22370530a3	[NFC][SVE] Add missing tests for i32 INC/DEC patterns. D111441 included trunc isel patterns for sve_int_pred_pattern_a but no accompanying tests. This patch adds the missing tests and also simplifies the isel patterns that use sve_cnt_shl_imm. Differential Revision: https://reviews.llvm.org/D115512	2021-12-17 13:13:36 +00:00
rkorsa	c680fb69d6	[AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16 bit on The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions. 'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk. The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand. Differential Revision: https://reviews.llvm.org/D111754	2021-12-17 16:11:51 +05:30
David Green	6bd8f114c8	[ARM] Handle splats of constants for MVE qr instruction Some MVE instructions have qr variants that take a Q and R register, splatting the R register for each lane. This is usually handled fine for standard splats as we sink the splat into the loop and combine the resulting dup into the qr instruction. It does not work for constant splats though, as we generate a vmovimm or constant pool load instead. This intercepts that, generating a vdup of the constant instead where we can turn the result into a qr instruction variant. Differential Revision: https://reviews.llvm.org/D115242	2021-12-17 09:16:28 +00:00
Heejin Ahn	4625b84879	[WebAssembly] Support clang -fwasm-exceptions for bitcode This supports bitcode compilation using `clang -fwasm-exceptions`. --- The current situation: Currently the backend requires two options for Wasm EH: `-wasm-enable-eh` and `-exception-model=wasm`. Wasm SjLj requires two options as well: `-wasm-enable-sjlj` and `-exception-model=wasm`. When using Wasm EH via Emscripten, you only need to pass `-fwasm-exceptions`, and these options will be added within the clang driver. This description will focus on the case of Wasm EH going forward, but Wasm SjLj's case is similar. When you pass `-fwasm-exceptions` to emcc and clang driver, the clang driver adds these options to the command line that calls the clang frontend (`clang -cc1`): `-mllvm -wasm-enable-eh` and `-exception-model=wasm`. `-wasm-enable-eh` is prefixed with `-mllvm`, so it is passed as is to the backend. But `-exception-model` is parsed and processed within the clang frontend and stored in `LangOptions` class. This info is later transferred to `TargetOptions` class, and then eventually passed to `MCAsmInfo` class. All LLVM code queries this `MCAsmInfo` to get the exception model. --- Problem: The problem is the whole `LangOptions` processing is bypassed when compiling bitcode, so the information transfer of `LangOptions` -> `TargetOptions` -> `MCAsmInfo` does not happen. They are all set to `ExceptionHandling::None`, which is the default value. --- What other targets do, and why we can't do the same: Other targets support bitcode compilation by the clang driver, but they can do that by using different triples. For example, X86 target supports multiple triples, each of which has its own subclass of `MCAsmInfo`, so it can hardcode the appropriate exception model within those subclasses' constructors. But we don't have separate triples for each exception mode: none, emscripten, and wasm. --- What this CL does: If we can figure out whether `-wasm-enable-eh` is passed to the backend, we can programatically set the exception model from the backend, rather than requiring it to be passed. So we check `WasmEnableEH` and `WasmEnableSjLj` variables, which are `cl::opt` for `-wasm-enable-eh` and `-wasm-enable-sjlj`, in `WebAssemblyMCAsmInfo` constructor, and if either of them is set, we set `MCAsmInfo.ExceptionType` to Wasm. `TargetOptions` cannot be updated there, so we make sure they are the same later. Fixes https://github.com/emscripten-core/emscripten/issues/15712. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D115893	2021-12-16 16:49:24 -08:00
Ron Lieberman	8a85be807b	Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" Offload abort in Nekbone This reverts commit `2b48761575`.	2021-12-16 21:21:32 +00:00
Craig Topper	66bbefeb13	[RISCV] Revert Zfhmin related changes that aren't tested and depend on f16 being a legal type. Our Zfhmin support is only MC layer, but these are CodeGen layer interfaces. If f16 isn't a Legal type for CodeGen with Zfhmin, then these interfaces should keep their non-Zfh behavior. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D115822	2021-12-16 08:55:28 -08:00
Simon Pilgrim	a640f16ca2	[X86] combineAnd - don't demand operand vector elements if the other operand element is zero If either operand has a zero element, then we don't need the equivalent element from the other operand, as no bits will be set.	2021-12-16 16:54:27 +00:00
Matt Arsenault	4132dc917e	AMDGPU: Return result from indicatePessimisticFixpoint I don't think this fixes anything.	2021-12-16 11:26:30 -05:00
Simon Pilgrim	3267de7215	[X86] combineAnd - pull out repeated getOperand() and SDLoc() calls. NFCI.	2021-12-16 16:22:39 +00:00
Simon Pilgrim	4712a71415	[X86] Rename LowerScalarImmediateShift/LowerScalarVariableShift helpers. NFC. Rename them to LowerShiftByScalarImmediate/LowerShiftByScalarVariable to make it easier to find them wrt LowerShift()	2021-12-16 16:01:14 +00:00
Neumann Hon	9a35844990	[z/OS] Implement prologue and epilogue generation for z/OS target. This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames). Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D114457	2021-12-16 09:04:05 -05:00
Paulo Matos	c92d45913c	[WebAssembly] Fix typechecking for else MCInst When hitting an else clause the type Stack should be reset to as it was at the start of the if, without taking into account the Type inserted into the Stack during the then branch of the if. Reviewed By: aardappel Differential Revision: https://reviews.llvm.org/D115748	2021-12-16 11:18:01 +01:00
jacquesguan	7dfbf0b60f	[RISCV] Fold (and (not (srl X, C)), 1) to (xor (bexti X, C), 1) when have Zbs extension. When have Zbs extension, we could use bexti to fold (and (not (srl X, C)), 1) to (xor (bexti X, C), 1). Differential Revision: https://reviews.llvm.org/D115629	2021-12-16 15:01:05 +08:00
jacquesguan	d3c2ad154e	[RISCV] Fix whole vector register move instruction's vector register constraint. According to the v-spec, the source and destination VR of vmv<nr>r.v should be aligned for the VR group size. Differential Revision: https://reviews.llvm.org/D115720	2021-12-16 10:58:55 +08:00
Matt Arsenault	f0cc43cc91	AMDGPU: Use v_accvgpr_mov_b32 when copying AGPR tuples on gfx90a This is an optimization, but also fixes a compile failure when no free VGPRs are available. The problem still exists for gfx908 where a scratch register is still required. This also still exists for the SGPR to AGPR case.	2021-12-15 18:20:49 -05:00
Matt Arsenault	45f16eabd6	AMDGPU: Combine is.shared/is.private of null/undef	2021-12-15 18:20:49 -05:00
Matt Arsenault	2b48761575	AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls are supposed to work now, so drop the hack.	2021-12-15 18:20:48 -05:00
Arthur Eubanks	5a81a60391	[NFC] Remove more calls to getAlignment() These are deprecated and should be replaced with getAlign(). Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.	2021-12-15 14:40:57 -08:00
Arnold Schwaighofer	d87e617048	Teach the backend to make references to swift_async_extendedFramePointerFlags weak if it emits it When references to the symbol `swift_async_extendedFramePointerFlags` are emitted they have to be weak. References to the symbol `swift_async_extendedFramePointerFlags` get emitted only by frame lowering code. Therefore, the backend needs to track references to the symbol and mark them weak. Differential Revision: https://reviews.llvm.org/D115672	2021-12-15 10:02:06 -08:00
Joe Nash	da9c6ea007	[AMDGPU] Extract helper function in AsmParser. NFC NFC refactor to extract useful helper function isRegOrInline. Reviewed By: rampitec, dp Differential Revision: https://reviews.llvm.org/D115753 Change-Id: Ief52db9a62615c053fb5f429248657b97cb41453	2021-12-15 09:53:23 -05:00
Simon Pilgrim	52cb0bbec3	[X86] LowerRotate - use vXi8 custom lowering for non-uniform constant amounts Instead of bailing and using the default expansion, we can more efficiently use the shl(unpack(x,x),unpack(amt,zero)) pattern for vXi8 rotl, as we'll then use vXi16 fast PMULLW (or PSLLVW). This required some minor changes to improve constant folding during unpack shuffle creation and convertShiftLeftToScale to support constants that have already been lowered to constant pools.	2021-12-15 14:51:15 +00:00
Andrew Wei	dc7b672f96	[AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D114960	2021-12-15 21:53:00 +08:00
Jay Foad	54fc9eb9b3	[AMDGPU] Use v_fma_f16 on GFX10 Teach convertToThreeAddress to use the V_FMA_F16_gfx9 pseudo (i.e. the standard instruction in GFX9 onwards) instead of V_FMA_F16 (the legacy pseudo for GFX8 compatibility, which is no longer supported in GFX10). This follows the example of macToMad in SIFoldOperands. Differential Revision: https://reviews.llvm.org/D115731	2021-12-15 13:14:48 +00:00
Jay Foad	4db7422771	[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the same zeroing behaviour as on GFX8. Differential Revision: https://reviews.llvm.org/D115729	2021-12-15 13:14:48 +00:00
Jay Foad	6a7db0dc8e	[AMDGPU] Skip some work on subtargets without scalar stores. NFC.	2021-12-15 12:46:33 +00:00
Simon Pilgrim	36b0325c44	[X86] Enable v16i8/v32i8/v64i8 rotation on AVX512 targets We currently rely on generic promotion to vXi16/vXi32 types for rotation lowering on various AVX512 targets. We can more efficiently perform this by making use of the shl(unpack(x,x),amt) style pattern that we already use for vXi8 rotation by splat amounts, either by widening to a larger vector type or unpacking lo/hi halves of the subvectors so we can access whatever vXi16/vXi32 per-element shifts are supported. This uncovered an issue in the supportedVectorShiftWithImm/supportedVectorVarShift legality checkers which was using hasAVX512() instead of useAVX512Regs() to detect support for 512-bit vector shifts. NOTE: I'm actually hoping to eventually reuse this code for shl(unpack(y,x),amt) funnel shift lowering (vXi8 and wider), but initially I just want to ensure we have efficient ISD::ROTL lowering for all targets. Differential Revision: https://reviews.llvm.org/D115180	2021-12-15 11:17:45 +00:00
Simon Moll	676af1272b	[VE] SHL,SRA,SRL v256i32\|64 isel and tests Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115734	2021-12-15 11:32:18 +01:00
Jon Chesterfield	624f12d34f	[amdgpu] Drop lowering of LDS used by global variables Approximately revert D103431. LDS variables are allocated at kernel launch and deallocated at kernel exit. The address is therefore kernel execution dependent. Global variables are initialized by values written to .data, which can't be done for a LDS variable as there is no kernel running, or by a global constructor. Initializing the global to the address of some LDS allocated by a global constructor is possible but indistinguishable from undef. Assigning the address of a LDS variable to a global should be a sema error. It isn't for openmp, haven't checked other languages. Failing that it could be set to undef, perhaps in this pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115413	2021-12-14 21:59:26 +00:00
Muiez Ahmed	ebf5497b26	Revert "[z/OS] Implement prologue and epilogue generation for z/OS target." This reverts commit `ffad4d777b` because it introduced buildbot failures.	2021-12-14 14:22:11 -05:00
Simon Pilgrim	74d1fc742a	[X86] Adjust some IceLake integer shuffle schedule classes (PR48110) The IceLake scheduler model is still mainly a copy of the SkylakeServer model. This patch adjusts the integer shuffle classes to account for most instructions now working on Port 1 as well as Port 5. This is based off Agner + uops.info as well as the PR48110 report. Differential Revision: https://reviews.llvm.org/D115547	2021-12-14 18:56:13 +00:00
Craig Topper	3926893439	[RISCV] Add isel support for scalar STRICT_FADD/FSUB/FMUL/FDIV/FSQRT. Test that STRICT_FMINNUM/FMAXNUM are lowered to libcalls for f32/f64. The RISC-V instructions don't match the behavior of fmin/fmax libcalls with respect to SNaN. Promoting FMINNUM/FMAXNUM for f16 needs more work outside of the RISC-V backend. Reviewed By: asb, arcbbb Differential Revision: https://reviews.llvm.org/D115680	2021-12-14 10:50:55 -08:00
Kazu Hirata	ee99426c77	[AArch64] Revise a warning fix This patch revises the warning fix done in `a93b1792f1`. Specifically, it rolls the MRI.getType call into the assert, thereby avoiding the named variable.	2021-12-14 10:46:57 -08:00
Craig Topper	3f1c403a2b	[RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode. In order to support constrained FP intrinsics we need to model FRM dependency. Whether or not a instruction uses FRM is based on a 3 bit field in the instruction. Because of this we can't add 'Uses = [FRM]' to the tablegen descriptions. This patch examines the immediate after isel and adds an implicit use of FRM. This idea came from Roger Ferrer Ibanez. Other ideas: We could be overly conservative and just pretend all instructions with frm field read the FRM register. Or we could have pseudoinstructions for CodeGen with rounding mode. Reviewed By: asb, frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D115555	2021-12-14 10:17:57 -08:00
Craig Topper	d4d76409d1	[RISCV] Add mayRaiseFPException to RISCV scalar FP instructions. FRM dependency will be added in a future patch. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D115540	2021-12-14 09:53:30 -08:00
Craig Topper	7598ac5ec5	[RISCV] Convert (splat_vector (load)) to vlse with 0 stride. We already do this for splat nodes that carry a VL, but not for splats that use VLMAX. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D115483	2021-12-14 09:14:03 -08:00
Jing Bao	2a4a229d6d	[WebAssembly] Custom optimization for truncate When possible, optimize TRUNCATE to generate Wasm SIMD narrow instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate lots of extract_lane and replace_lane. Closes #50350.	2021-12-14 08:42:39 -08:00
Alexandros Lamprineas	61bb8b5d40	[AArch64] Convert sra(X, elt_size(X)-1) to cmlt(X, 0) CMLT has twice the execution throughput of SSHR on Arm out-of-order cores. Differential Revision: https://reviews.llvm.org/D115457	2021-12-14 16:03:02 +00:00
Matt Devereau	fb47725d14	[AArch64][SVE] Instcombine SDIV to ASRD Instcombine SDIV to ASRD when the third operand of SDIV is a power of 2 Differential Revision: https://reviews.llvm.org/D115448	2021-12-14 15:58:28 +00:00
Zaara Syeda	3f066ac648	Test commit	2021-12-14 15:37:28 +00:00
Simon Pilgrim	4f2e183229	[X86] combineOr - don't demand operand elements if the other operand element is 'allones' If either operand has an element with allbits set, then we don't need the equivalent element from the other operand, as allbits are guaranteed to be set.	2021-12-14 15:36:33 +00:00
Simon Pilgrim	a9d811405f	[X86] combineOr - pull out repeated SDLoc(). NFCI.	2021-12-14 15:36:32 +00:00
David Green	26f6fbe2be	[ARM] Add AddrModeT2_i8neg addressing mode support for frame lowering. As reported from a failing firefox build, we can sometimes get frame indices with negative offsets from a t2LDRi8. This adds support for them, to prevent the crash.	2021-12-14 12:49:27 +00:00
Simon Moll	6847379e89	[VE] MUL,SUB,OR,XOR v256i32\|64 isel v256i32\|i64 isel patterns and tests. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115643	2021-12-14 13:23:48 +01:00
Alexandros Lamprineas	65033ef9e8	[AArch64] Add a tablegen pattern for UZP1. Converts concat_vectors(V64 (trunc V128), V64 (trunc V128)), which would otherwise be lowered as xtn followed by xtn2, to uzp1. Differential Revision: https://reviews.llvm.org/D115435	2021-12-14 11:51:05 +00:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Florian Hahn	ff3b085ab0	[X86] Use bundle for CALL_RVMARKER expansion. This patch updates expandCALL_RVMARKER to wrap the call, marker and objc runtime call in an instruction bundle. This ensures later passes, like machine block placement, cannot break them up. On AArch64, the instruction sequence is already wrapped in a bundle. Keeping the whole instruction sequence together is highly desirable for performance and outweighs potential other benefits from breaking the sequence up. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D115230	2021-12-14 10:53:22 +00:00
Craig Topper	3cda38796c	[RISCV] Add rs2 encoding to the FPUnaryOp_r and FPUnaryOp_r_frm template arguments. Instead of having unary instruction include a 'let' in their class body, add rs2val as a template parameter. Then we can use a let in FPUnaryOp_r and FPUnaryOp_r_frm. This reduces the overall verbosity of the FP files. Reviewed By: achieveartificialintelligence Differential Revision: https://reviews.llvm.org/D115537	2021-12-13 21:38:42 -08:00
Nelson Chu	10a71981e9	[RISCV] Support named opcodes in .insn directive. This patch is one of the TODO of commit, `283879793d` We build the GenericTable for these opcodes, and also extend class RISCVOpcode, to store the names of opcodes. Then we call the parseInsnDirectiveOpcode to parse the opcode filed in .insn directive. We only allow users to write the recognized opcode names, or just write the immediate values in the 7 bits range. Documentation: https://sourceware.org/binutils/docs-2.37/as/RISC_002dV_002dFormats.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115224	2021-12-13 20:59:33 -08:00
Simon Atanasyan	2de9338587	[MIPS] Allow i1 values for 'r' constraint in inline-asm The bug was reported in the issue #52638.	2021-12-14 01:19:34 +03:00
Neumann Hon	ffad4d777b	[z/OS] Implement prologue and epilogue generation for z/OS target. This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames). Reviewed by: uweigand, Kai Differential Revision: https://reviews.llvm.org/D114457	2021-12-13 17:03:23 -05:00
Fangrui Song	a6a07a514b	[MachineOutliner] Don't outline functions starting with PATCHABLE_FUNCTION_ENTER/FENTRL_CALL MachineOutliner may outline a "patchable-function-entry" function whose body has a TargetOpcode::PATCHABLE_FUNCTION_ENTER MachineInstr. This is incorrect because the special code sequence must stay unchanged to be used at run-time. Avoid outlining PATCHABLE_FUNCTION_ENTER. While here, avoid outlining FENTRY_CALL too (which doesn't reproduce currently) to allow phase ordering flexibility. Fixes #52635 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D115614	2021-12-13 13:24:29 -08:00
Paulo Matos	b5b5f0ac77	[WebAssembly] Lower global syms representing tables with .tabletype This patch implements a fix to recognize global symbols that represent WebAssembly appropriately and generate the necessary .tabletype directives. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D115511	2021-12-13 18:17:03 +01:00
Craig Topper	b18b2a01ef	[RISCV] Don't use VLMAX for start value splat in reduction lowering. The reduction instructions only reads the first element. The execution time for a splat may take longer with a larger VL. We should use the smallest VL we can. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D115536	2021-12-13 09:06:42 -08:00
Kirill Stoimenov	89577be895	[ASan] Replace IR based callbacks with shared assembly code callbacks. This change moves optimized callbacks from each .o file to compiler-rt. Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D115396	2021-12-13 16:33:06 +00:00
Nikita Popov	220815a91a	[AMDGPUPerfHintAnalysis] Avoid getPointerElementType() Extract the load/store type from the instruction rather than fetching it from the pointer element type.	2021-12-13 16:48:21 +01:00
Neubauer, Sebastian	26924b57e8	[AMDGPU] Ignore special ABI registers for graphics Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them. Differential Revision: https://reviews.llvm.org/D115344	2021-12-13 16:44:37 +01:00
Jay Foad	16de2c09dd	[AMDGPU] SIShrinkInstructions: sink code to where it's used. NFC.	2021-12-13 14:46:40 +00:00
Jay Foad	63681527ee	[AMDGPU] SIShrinkInstructions: remove redundant check canShrink already calls hasVALU32BitEncoding, so there is no need to call it again here.	2021-12-13 14:46:40 +00:00
Jay Foad	61f8af2657	[AMDGPU] Remove a FIXME implemented in D11061	2021-12-13 14:46:40 +00:00
Daniil Fukalov	e5c64b45be	[CostModel][AMDGPU] Fix intrinsics costs estimations. 1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs. 2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat intrisics since they have special processing in cost model. 3. Minor intrisics' costs tests updat and refinement. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115385	2021-12-13 17:17:34 +03:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Kazushi (Jam) Marukawa	cffce86a1c	[VE] Support srel32 in symbol reference Support R_VE_SREL32 in symbol references in MC layer. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D115591	2021-12-13 20:29:17 +09:00
Matt Devereau	2e585dd91a	[AArch64][SVE] Lower vector.insert to predicated merged MOV Use predicated SEL for vector.insert instead of going through memory Differential Revision: https://reviews.llvm.org/D115259	2021-12-13 11:17:55 +00:00
Simon Moll	9feeb2fb61	[VE][NFC] Cleanup vector patterns Cleanup VE vector isel patterns and follow the downstream LLVM-VE pattern naming convention. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115516	2021-12-13 10:12:27 +01:00
Kazu Hirata	bb6447a78c	[llvm] Use llvm::reverse (NFC)	2021-12-12 16:13:49 -08:00
Simon Pilgrim	9ad5969b5e	[X86][Atom] Fix CVT uops + port usage Fix overrides to use both ports. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.	2021-12-12 22:57:53 +00:00
Jon Chesterfield	24b28db8cc	[amdgpu] Increase alignment of all LDS variables Currently the superalign option only increases the alignment of variables that are moved into the module.lds block. Change that to all LDS variables. Also only increase the alignment once, instead of once per function. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115488	2021-12-12 19:30:32 +00:00
Kazu Hirata	d2377f24e1	Ensure newlines at the end of files (NFC)	2021-12-12 11:04:44 -08:00
Simon Pilgrim	41052fd699	[X86][MMX] Remove superfluous 'i' from MMX cvt opnames. NFCI. This is a very old copy+paste typo - none of these cvt ops have an immediate operand. Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.	2021-12-12 17:59:16 +00:00
Simon Pilgrim	0a08813cad	[X86][MMX] Remove superfluous 'i' from MMX binop opnames. NFCI. This is a very old copy+paste typo - none of these binops have an immediate operand. Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.	2021-12-12 17:59:16 +00:00
Kazu Hirata	483499670e	[Target] Use llvm::reverse (NFC)	2021-12-12 08:34:24 -08:00
Simon Pilgrim	c02f9791c6	[X86][AVX512] Remove xmm->xmm vpmovsx/vpmovzx rm overrides The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX.Folded	2021-12-12 16:08:10 +00:00
Simon Pilgrim	fc02ceb12a	[X86][AVX512] Use WriteShuffleX for xmm->xmm extensions The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX	2021-12-12 15:22:32 +00:00
Simon Pilgrim	8e833d081b	[X86][AVX512] Use WriteVPMOV256 sched class for all truncations/extensions. At the moment these are identical to WriteShuffle256 (which we were using), but it should be WriteVPMOV256 to match the AVX2 instruction, plus it will help us remove some unnecessary overrides by tweaking the WriteVPMOV256 class. Also, as D115547 shows, we still need to split off 128-bit extensions/truncations sched classes to remove some other overrides.	2021-12-12 13:24:26 +00:00
Kazu Hirata	67aeae0138	[llvm] Use range-based for loops (NFC)	2021-12-11 22:34:07 -08:00
Kazu Hirata	c2bb9637d9	Use llvm::any_of and llvm::all_of (NFC)	2021-12-11 11:54:37 -08:00
Kazu Hirata	36b8a4f9f3	[llvm] Use llvm::is_contained (NFC)	2021-12-11 11:42:09 -08:00
Kazu Hirata	d395befa65	[llvm] Use range-based for loops (NFC)	2021-12-11 11:29:12 -08:00
Matt Arsenault	6bcf1f9181	AMDGPU: Indicate pessimistic fixpoint for entry functions There aren't going to be any callers for these, so avoid running through the machinery to look at the callers.	2021-12-11 11:42:34 -05:00
Kirill Stoimenov	a55e51f9a6	Revert "[ASan] Replace IR based callbacks with shared assembly code callbacks." This reverts commit `db124df495`. Broke some builds: https://lab.llvm.org/buildbot/#/builders/98/builds/9895 https://lab.llvm.org/buildbot/#/builders/91/builds/434 Reviewed By: kstoimenov Differential Revision: https://reviews.llvm.org/D115564	2021-12-11 00:42:18 +00:00
Matt Arsenault	06b90175e7	AMDGPU: Remove fixed function ABI option	2021-12-10 19:41:19 -05:00
Jon Chesterfield	86caf517bf	Revert "[amdgpu][nfc] Delete dead code in LowerModuleLDS" This reverts commit `7b9ab06d10`. Said code is better removed as part of a larger change.	2021-12-11 00:31:51 +00:00
Kirill Stoimenov	db124df495	[ASan] Replace IR based callbacks with shared assembly code callbacks. This change moves optimized callbacks from each .o file to compiler-rt. Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D115396	2021-12-11 00:02:32 +00:00
Bogdan Graur	ea81cea816	Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr" This reverts commit `847a680733`. The reverted revision was causing miscompiles that manifest on AMD machines. Differential Revision: https://reviews.llvm.org/D115528	2021-12-10 23:01:24 +01:00
Craig Topper	80ed2f6b36	[RISCV] Share tablegen classes for F, D, and Zfh. Other simplifications. NFC By adding the register class and funct as template parameters we can share the classes with all 3 extensions. I've used "let SchedRW =" to avoid repeating scheduler classes on multiple lines where we previously inherited from the Sched class. A subsequent patch will add mayRaiseFPException and FRM dependencies. Reducing the number of classes means less repeating for those changes. This of course conflicts with the Zfinx patch D93298. Reviewed By: achieveartificialintelligence Differential Revision: https://reviews.llvm.org/D115469	2021-12-10 09:35:51 -08:00
Craig Topper	5861cf77da	[RISCV] Remove FCSR from RISCVRegisterInfo. We only used this to mark it as a reserved register. But that's not important if we don't do anything else with it. I think if we were ever to do anything with it, we would need to model it as a super register of FRM and FFLAGS. But it might be easier to reference both FRM and FFLAGS in implicit defs/uses for anything we were to do with "fcsr". Reviewed By: sepavloff Differential Revision: https://reviews.llvm.org/D115455	2021-12-10 09:24:13 -08:00
Kazu Hirata	a93b1792f1	[AArch64] Fix a warning This patch fixes: llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp:315:7: error: unused variable 'ValTy' [-Werror,-Wunused-variable]	2021-12-10 08:33:07 -08:00
Archibald Elliott	52faad83c9	[AArch64] Use Feature for A53 Erratum 835769 Fix When this pass was originally implemented, the fix pass was enabled using a llvm command-line flag. This works fine, except in the case of LTO, where the flag is not passed into the linker plugin in order to enable the function pass in the LTO backend. Now LTO exists, the expectation now is to use target features rather than command-line arguments to control code generation, as this ensures that different command-line arguments in different files are correctly represented, and target-features always get to the LTO plugin as they are encoded into LLVM IR. The fall-out of this change is that the fix pass has to always be added to the backend pass pipeline, so now it makes no changes if the function does not have the right target feature to enable it. This should make a minimal difference to compile time. One advantage is it's now much easier to enable when compiling for a Cortex-A53, as CPUs imply their own individual sets of target-features, in a more fine-grained way. I haven't done this yet, but it is an option, if the fix should be enabled in more places. Existing tests of the user interface are unaffected, the changes are to reflect that the argument is now turned into a target feature. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D114703	2021-12-10 15:09:59 +00:00
Min-Yih Hsu	d2b68c4476	[M68k][NFC] Fixed unused argument warnings in M68kInstrControl.td Removed those unused template arguments. NFC.	2021-12-10 22:06:29 +08:00
Brian Cain	1e68c79987	Reapply [xray] add support for hexagon Adds x-ray support for hexagon to llvm codegen, clang driver, compiler-rt libs. Differential Revision: https://reviews.llvm.org/D113638 Reapplying this after `543a9ad7c4`, which fixes the leak introduced there.	2021-12-10 05:32:28 -08:00
Christudasan Devadasan	cf58b9ce98	[AMDGPU] Add AV class spill pseudo instructions While enabling vector superclasses with D109301, the AV spills are converted into VGPR spills by introducing appropriate copies. The whole thing ended up adding two instructions per spill (a copy + vgpr spill pseudo) and caused an incorrect liverange update during inline spiller. This patch adds the pseudo instructions for all AV spills from 32b to 1024b and handles them in the way all other spills are lowered. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115439	2021-12-10 03:10:34 -05:00
eopXD	a4bf1b449d	[RISCV] Unify depedency check and extension implication parsing logics Originially there are two places that does parsing - `parseArchString` and `parseFeatures`, each with its code on dependency check and implication. This patch extracts common parts of the two as functions of `RISCVISAInfo` and let them 2 use it. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D112359	2021-12-09 21:16:04 -08:00
Kazu Hirata	f829630d2e	[llvm] Use llvm::count (NFC)	2021-12-09 20:50:38 -08:00
Amara Emerson	98095afbcb	[AArch64][GlobalISel] Split vector stores of zero. This results in a very minor improvement in most cases, generating stores of xzr instead of moving zero to a vector register. Differential Revision: https://reviews.llvm.org/D115479	2021-12-09 19:04:48 -08:00
Phoebe Wang	d7c07f60b3	[X86][MS-InlineAsm] Make the constraint m to be simple place holder D113096 solved the "undefined reference to xxx" issue by adding constraint m for the global var. But it has strong side effect due to the symbol in the assembly being replaced with constraint variable. This leads to some lowering fails. https://godbolt.org/z/h3nWoerPe This patch fix the problem by use the constraint *m as place holder rather than real constraint. It has negligible effect for the existing code generation. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D115225	2021-12-10 09:29:38 +08:00
Jon Chesterfield	7b9ab06d10	[amdgpu][nfc] Delete dead code in LowerModuleLDS	2021-12-10 00:43:46 +00:00
Jessica Paquette	afdec434d4	[AArch64][GlobalISel] Add regbankselect support for G_FMAXIMUM/G_FMINIMUM These always use FPRs only. Differential Revision: https://reviews.llvm.org/D115376	2021-12-09 12:52:32 -08:00
Craig Topper	b3db7dde79	[TargetInstrInfo][PowerPC] Remove virtual function that is only called from PPC specific code. There are two signatures of setSpecialOperandAttr in TargetInstrInfo. One of them is only called from PPCInstrInfo which has an override of it. Remove it from TargetInstrInfo and make it a non-virtual method in PPCInstrInfo. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D115404	2021-12-09 12:51:27 -08:00
Jessica Paquette	47e1f672e1	[AArch64][GlobalISel] Legalize scalar G_FMAXIMUM + G_FMINIMUM Necessary for implementing some combines on floating point selects. Differential Revision: https://reviews.llvm.org/D115372	2021-12-09 11:54:14 -08:00
Craig Topper	6f7de819b9	[RISCV] Use MULHU for more division by constant cases. D113805 improved handling of i32 divu/remu on RV64. The basic idea from that can be extended to (mul (and X, C2), C1) where C2 is any mask constant. We can replace the and with an SLLI by shifting by the number of leading zeros in C2 if we also shift C1 left by XLen - lzcnt(C1) bits. This will give the full product XLen additional trailing zeros, putting the result in the output of MULHU. If we can't use ANDI, ZEXT.H, or ZEXT.W, this will avoid materializing C2 in a register. The downside is it make take 1 additional instruction to create C1. But since that's not on the critical path, it can hopefully be interleaved with other operations. The previous tablegen pattern is replaced by custom isel code. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D115310	2021-12-09 09:10:14 -08:00
Jon Chesterfield	04b2f6ea8a	[amdgpu][nfc] Drop dead PtrSet, fix a comment	2021-12-09 17:05:20 +00:00
Arthur Eubanks	1172712f46	[NFC] Replace some deprecated getAlignment() calls with getAlign() Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D115370	2021-12-09 08:43:19 -08:00
David Sherwood	8b0448ce5d	[AArch64][Analysis] Add on overhead costs for SVE gathers and scatters This patch adds on an overhead cost for gathers and scatters, which is a rough estimate based on performance investigations I have performed on SVE hardware for various micro-benchmarks. Differential Revision: https://reviews.llvm.org/D115143	2021-12-09 16:02:59 +00:00

... 2 3 4 5 6 ...

65490 Commits