llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	f47b07315a	[X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands These instructions read their inputs from fixed registers rather than using a modrm byte. We shouldn't require the user to list them when parsing assembly. This matches the GNU assembler. This patch adds InstAliases so we can accept either form. It also changes the printing code to use the form without registers. This will change the behavior of llvm-objdump, but should be consistent with binutils objdump. This also matches what we already do in LLVM for clzero and monitorx which also used fixed registers. I need to add and improve tests before this can be commited. The disassembler tests exist, but weren't checking the fixed register so they pass before and after this change. Fixes https://github.com/ClangBuiltLinux/linux/issues/1216 Differential Revision: https://reviews.llvm.org/D93524	2020-12-19 11:01:55 -08:00
Kazu Hirata	56edfcada9	[Target, Transforms] Use contains (NFC)	2020-12-19 10:43:19 -08:00
Zakk Chen	9cf3b1b666	[RISCV] Define vlxe/vsxe/vsuxe intrinsics. Define vlxe/vsxe intrinsics and lower to vlxei<EEW>/vsxei<EEW> instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Differential Revision: https://reviews.llvm.org/D93471	2020-12-19 06:50:20 -08:00
Kristof Beyls	df8ed39283	[ARM] harden-sls-blr: avoid r12 and lr in indirect calls. As a linker is allowed to clobber r12 on function calls, the code transformation that hardens indirect calls is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. This patch makes sure that r12 or lr are not used for indirect calls when harden-sls-blr is enabled. Differential Revision: https://reviews.llvm.org/D92469	2020-12-19 12:39:59 +00:00
Kristof Beyls	a4c1f5160e	[ARM] Harden indirect calls against SLS To make sure that no barrier gets placed on the architectural execution path, each indirect call calling the function in register rN, it gets transformed to a direct call to __llvm_slsblr_thunk_mode_rN. mode is either arm or thumb, depending on the mode of where the indirect call happens. The llvm_slsblr_thunk_mode_rN thunk contains: bx rN <speculation barrier> Therefore, the indirect call gets split into 2; one direct call and one indirect jump. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber r12 on function calls, the above code transformation is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. Avoiding r12/lr being used is done in a follow-on patch to make reviewing this code easier. Differential Revision: https://reviews.llvm.org/D92468	2020-12-19 12:33:42 +00:00
Kristof Beyls	320fd3314e	[ARM] Implement harden-sls-retbr for Thumb mode The only non-trivial consideration in this patch is that the formation of TBB/TBH instructions, which is done in the constant island pass, does not understand the speculation barriers inserted by the SLSHardening pass. As such, when harden-sls-retbr is enabled for a function, the formation of TBB/TBH instructions in the constant island pass is disabled. Differential Revision: https://reviews.llvm.org/D92396	2020-12-19 12:32:47 +00:00
Kristof Beyls	195f44278c	[ARM] Implement harden-sls-retbr for ARM mode Some processors may speculatively execute the instructions immediately following indirect control flow, such as returns, indirect jumps and indirect function calls. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every indirect control flow where control flow doesn't return to the next instruction, such as returns and indirect jumps, but not indirect function calls. Hardening of indirect function calls will be done in a later, independent patch. This patch is implementing the same functionality as the AArch64 counter part implemented in https://reviews.llvm.org/D81400. For AArch64, returns and indirect jumps only occur on RET and BR instructions and hence the function attribute to control the hardening is called "harden-sls-retbr" there. On AArch32, there is a much wider variety of instructions that can trigger an indirect unconditional control flow change. I've decided to stick with the name "harden-sls-retbr" as introduced for the corresponding AArch64 mitigation. This patch implements this for ARM mode. A future patch will extend this to also support Thumb mode. The inserted barriers are never on the correct, architectural execution path, and therefore performance overhead of this is expected to be low. To ensure these barriers are never on an architecturally executed path, when the harden-sls-retbr function attribute is present, indirect control flow is never conditionalized/predicated. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D92395	2020-12-19 11:42:39 +00:00
Kazushi (Jam) Marukawa	af83b74dc2	[VE] Support copy of vector mask registers Support VM and VMP registers in copyPhysReg() function. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93547	2020-12-19 09:16:43 +09:00
Harald van Dijk	adc55b5a5a	[X86] Avoid generating invalid R_X86_64_GOTPCRELX relocations We need to make sure not to emit R_X86_64_GOTPCRELX relocations for instructions that use a REX prefix. If a REX prefix is present, we need to instead use a R_X86_64_REX_GOTPCRELX relocation. The existing logic for CALL64m, JMP64m, etc. already handles this by checking the HasREX parameter and using it to determine which relocation type to use. Do this for all instructions that can use relaxed relocations. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D93561	2020-12-18 23:38:38 +00:00
Fraser Cormack	7948cd11d1	[RISCV] Address clang-tidy warnings in RISCVTargetMachine. NFC.	2020-12-18 21:50:55 +00:00
Fraser Cormack	d4ed253d0b	[RISCV] Assume no-op addrspacecasts by default To support OpenCL, which typically uses SPIR as an IR, non-zero address spaces must be accounted for. This patch makes the RISC-V target assume no-op address space casts across the board, which effectively removes the need to support addrspacecast instructions in the backend. For a RISC-V implementation with different configurations or specialized address spaces where casts aren't no-ops, the function can be adjusted as required. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D93536	2020-12-18 21:03:37 +00:00
Craig Topper	69c8d121f7	[RISCV] Add intrinsics for vsetvli instruction This patch adds two IR intrinsics for vsetvli instruction. One to set the vector length to a user specified value and one to set it to vlmax. The vlmax uses the X0 source register encoding. Clang builtins will follow in a separate patch Differential Revision: https://reviews.llvm.org/D92973	2020-12-18 12:10:09 -08:00
Craig Topper	09468a9148	[RISCV] Sign extend constant arguments to V intrinsics when promoting to XLen. The default behavior for any_extend of a constant is to zero extend. This occurs inside of getNode rather than allowing type legalization to promote the constant which would sign extend. By using sign extend with getNode the constant will be sign extended. This gives a better chance for isel to find a simm5 immediate since all xlen bits are examined there. For instructions that use a uimm5 immediate, this change only affects constants >= 128 for i8 or >= 32768 for i16. Constants that large already wouldn't have been eligible for uimm5 and would need to use a scalar register. If the instruction isn't able to use simm5 or the immediate is too large, we'll need to materialize the immediate in a register. As far as I know constants with all 1s in the upper bits should materialize as well or better than all 0s. Longer term we should probably have a SEW aware PatFrag to ignore the bits above SEW before checking simm5. I updated about half the test cases in some tests to use a negative constant to get coverage for this. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D93487	2020-12-18 11:43:38 -08:00
Craig Topper	1c3a6671c2	Recommit "[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f" This time with tests. Original message: Similar to D93365, but for floating point. No need for special ISD opcodes though. We can directly isel these from intrinsics. I had to use anyfloat_ty instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not crash when imported into the -gen-dag-isel tablegen backend. Differential Revision: https://reviews.llvm.org/D93426	2020-12-18 11:19:05 -08:00
Craig Topper	cd3e811864	Revert "[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f" This reverts commit `46a40c4bc1`. I forgot to git add the tests.	2020-12-18 11:16:36 -08:00
Craig Topper	46a40c4bc1	[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f Similar to D93365, but for floating point. No need for special ISD opcodes though. We can directly isel these from intrinsics. I had to use anyfloat_ty instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not crash when imported into the -gen-dag-isel tablegen backend. Differential Revision: https://reviews.llvm.org/D93426	2020-12-18 11:11:15 -08:00
Craig Topper	86d282baed	[RISCV] Add intrinsics for vmv.x.s and vmv.s.x This adds intrinsics for vmv.x.s and vmv.s.x. I've used stricter type constraints on these intrinsics than what we've been doing on the arithmetic intrinsics so far. This will allow us to not need to pass the scalar type to the Intrinsic::getDeclaration call when creating these intrinsics. A custom ISD is used for vmv.x.s in order to implement the change in computeNumSignBitsForTargetNode which can remove sign extends on the result. I also modified the MC layer description of these instructions to show the tied source/dest operand. This is different than what we do for masked instructions where we drop the tied source operand when converting to MC. But it is a more accurate description of the instruction. We can't do this for masked instructions since we use the same MC instruction for masked and unmasked. Tools like llvm-mca operate in the MC layer and rely on ins/outs and Uses/Defs for analysis so I don't know if we'll be able to maintain the current behavior for masked instructions. So I went with the accurate description here since it was easy. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D93365	2020-12-18 10:30:48 -08:00
diggerlin	d551e40f1c	[AIX] Change the code based on https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201214/864235.html Summary: change the code based on the discussion as: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201214/864235.html	2020-12-18 13:02:41 -05:00
Craig Topper	fc7b7fc066	[RISCV] Add intrinsics for vmv.v.v, vmv.v.x, and vmv.x.i We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D93514	2020-12-18 09:49:07 -08:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
Lucas Prates	1a9577bde1	[AArch64] Add support for ls64 to the .arch_extension asm directive This adds support for the 'ls64' AArch64 extension to the `.arch_extension` asm directive. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92574	2020-12-18 15:55:55 +00:00
Simon Pilgrim	8767f3bb97	[X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969) Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead. Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.	2020-12-18 15:49:53 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
Tomas Matheson	fc712eb7aa	[AArch64] Fix Copy Elemination for negative values Redundant Copy Elimination was eliminating a MOVi32imm -1 when it determined that the value of the destination register is already -1. However, it didn't take into account that the MOVi32imm zeroes the upper 32 bits (which are FFFFFFFF) and therefore cannot be eliminated. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D93100	2020-12-18 13:30:46 +00:00
Paul Walker	c0bc169cb1	[NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions. During isel there's no need to protect illegal types. Patch also adds a missing unit test for tbl2 intrinsic using bfloat types. Differential Revision: https://reviews.llvm.org/D93404	2020-12-18 13:20:41 +00:00
Kerry McLaughlin	52e4084d9c	[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate addressing modes for scalable masked gathers & scatters. selectGatherScatterAddrMode checks if the base pointer is null, in which case we can swap the base pointer and the index, e.g. getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices) -> getelementptr %offset, <vscale x N x T> %indices Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93132	2020-12-18 11:56:36 +00:00
Simon Pilgrim	992fad03e2	[X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling. Simplifies a few more cases, notably shuffle demanded elts cases.	2020-12-18 11:51:10 +00:00
Carl Ritson	7722494834	[AMDGPU][NFC] Remove unused Hi16Elt definition	2020-12-18 20:38:54 +09:00
Lucas Prates	51fe17b047	[AArch64] Add support for the SPE-EEF feature This is an addition to the existing Statistical Profiling extension, which introduces an extra system register that is enabled by the new 'spe-eef' subtarget feature. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92391	2020-12-18 11:11:56 +00:00
Lucas Prates	da21f7ec14	[AArch64] Add support for the Branch Record Buffer extension This introduces asm support for the Branch Record Buffer extension, through the new 'brbe' subtarget feature. It consists of a new set of system registers that enable the handling of branch records. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92389	2020-12-18 11:11:06 +00:00
Cullen Rhodes	7c8796f9db	[TTI] Add supportsScalableVectors target hook This is split off from D91718 and adds a new target hook supportsScalableVectors that can be queried to check if scalable vectors are supported by the backend. For AArch64 this returns true if SVE is enabled. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93060	2020-12-18 10:37:01 +00:00
QingShan Zhang	477b6505fa	[PowerPC] Select the D-Form load if we know its offset meets the requirement The LD/STD likewise instruction are selected only when the alignment in the load/store >= 4 to deal with the case that the offset might not be known(i.e. relocations). That means we have to select the X-Form load for %0 = load i64, i64* %arrayidx, align 2 In fact, we can still select the D-Form load if the offset is known. So, we only query the load/store alignment when we don't know if the offset is a multiple of 4. Reviewed By: jji, Nemanjai Differential Revision: https://reviews.llvm.org/D93099	2020-12-18 07:27:26 +00:00
Hsiangkai Wang	7087ae7be9	[RISCV] Remove NoVReg to avoid compile warning messages.	2020-12-18 11:37:47 +08:00
Monk Chiang	ee2cb90e3b	[RISCV] Define vsadd/vsaddu/vssub/vssubu intrinsics. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Co-Authored-by: Monk Chiang <monk.chiang@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93366	2020-12-18 10:24:24 +08:00
Zakk Chen	4b07c515ef	[RISCV] Define vlse/vsse intrinsics. Define vlse/vsse intrinsics and lower to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93445	2020-12-17 17:00:01 -08:00
Baptiste Saleil	c2892978e9	[PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_ On PPC, the vector pair instructions are independent from MMA. This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names. We also move the vector pair type/intrinsic/builtin tests to their own files. Differential Revision: https://reviews.llvm.org/D91974	2020-12-17 13:19:27 -05:00
Jinsong Ji	ab6cb31642	[PowerPC][NFC] Cleanup PPCCTRLoopsVerify pass The PPCCTRLoop pass has been moved to HardwareLoops, so the comments and some useless code are deprecated now. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D93336	2020-12-17 11:16:33 -05:00
Yvan Roux	923ca0b411	[ARM][MachineOutliner] Fix costs model. Fix candidates calls costs models allocation and prepare stack fixups handling. Differential Revision: https://reviews.llvm.org/D92933	2020-12-17 16:08:23 +01:00
Lucas Prates	c5046ebdf6	[ARM] Adding v8.7-A command-line support for the ARM target This extends the command-line support for the 'armv8.7-a' architecture name to the ARM target. Based on a patch written by Momchil Velikov. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D93231	2020-12-17 13:48:54 +00:00
Lucas Prates	c4d851b079	[ARM][AAarch64] Initial command-line support for v8.7-A This introduces command-line support for the 'armv8.7-a' architecture name (and an alias without the '-', as usual), and for the 'ls64' extension name. Based on patches written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91776	2020-12-17 13:47:28 +00:00
Lucas Prates	313889191e	[AArch64] Adding the v8.7-A LD64B/ST64B Accelerator extension This adds support for the v8.7-A LD64B/ST64B Accelerator extension through a subtarget feature called "ls64". It adds four 64-byte load/store instructions with an operand in the new GPR64x8 register class, and one system register that's part of the same extension. Based on patches written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91775	2020-12-17 13:46:23 +00:00
Lucas Prates	97c006aabb	[AArch64] Add a GPR64x8 register class This adds a GPR64x8 register class that will be needed as the data operand to the LD64B/ST64B family of instructions in the v8.7-A Accelerator Extension, which load or store a contiguous range of eight x-regs. It has to be its own register class so that register allocation will have visibility of the full set of registers actually read/written by the instructions, which will be needed when we add intrinsics and/or inline asm access to this piece of architecture. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91774	2020-12-17 13:45:46 +00:00
Lucas Prates	42b92b31b8	[ARM][AArch64] Adding basic support for the v8.7-A architecture This introduces support for the v8.7-A architecture through a new subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT" instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new HCRX_EL2 system register. Based on patches written by Simon Tatham and Victor Campos. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91772	2020-12-17 13:45:08 +00:00
Lucas Prates	83ea17fc5f	[NFC][AArch64] Capturing multiple feature requirements in AsmParser messages This enables the capturing of multiple required features in the AArch64 AsmParser's SysAlias error messages. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92388	2020-12-17 13:44:17 +00:00
Lucas Prates	b5bbb4b2b7	[NFC][AArch64] Move AArch64 MSR/MRS into a new decoder namespace This removes the general forms of the AArch64 MSR and MRS instructions from the same decoding table that contains many more specific instructions that supersede them. They're now in a separate decoding table of their own, called "Fallback", which is only consulted in the event of the main decoder table failing to produce an answer. This should avoid decoding conflicts on future specialized instructions in the MSR space. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91771	2020-12-17 13:40:10 +00:00
Kerry McLaughlin	6d2a78996b	[SVE][CodeGen] Add bfloat16 support to scalable masked gather Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93307	2020-12-17 11:08:15 +00:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Simon Pilgrim	931e66bd89	[X86] Remove extract_subvector(subv_broadcast_load()) fold. This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.	2020-12-17 11:02:49 +00:00
Simon Pilgrim	cdb692ee0c	[X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969) Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns. This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts. As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes. Later patches will continue to replace X86ISD::SUBV_BROADCAST usage. Differential Revision: https://reviews.llvm.org/D92645	2020-12-17 10:25:25 +00:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00

1 2 3 4 5 ...

60560 Commits