llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	28124a0a63	AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering We really need to put this undef padding stuff into a helper somewhere, but leave that for when this is moved to generic code.	2020-08-06 09:55:35 -04:00
Matt Arsenault	37894ba661	AMDGPU/GlobalISel: Make s16 phi legal If we were to have an operation with an s16 def that needs to be executed in a waterfall loop, not having s16 legal would place an avoidable burden on RegBankSelect to widen it.	2020-08-06 09:41:14 -04:00
Matt Arsenault	5316256709	AMDGPU/GlobalISel: Fix assert on copy to vcc This was trying to constrain a physical register. By the verifier's understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so don't try to handle physregs.	2020-08-06 09:41:14 -04:00
Petar Avramovic	d893278bba	[GlobalISel][InlineAsm] Fix matching input constraint to physreg Add given input and mark it as tied. Doesn't create additional copy compared to matching input constraint to virtual register. Differential Revision: https://reviews.llvm.org/D85122	2020-08-06 14:35:51 +02:00
Simon Pilgrim	13b4db4ec2	[X86][SSE] Expose all memory offsets in expand load tests Since we're messing with individual element loads we need to expose this to show whats going on. Part of the work to fix the masked_expandload.ll regressions in D66004	2020-08-06 11:28:42 +01:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Ruiling Song	5ddc8b49ba	[AMDGPU] add buffer_atomic_swap for float The functionality is used when calling imageAtomicExhange() on float type imageBuffer in Graphics shaders. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85187	2020-08-06 09:45:48 +08:00
Craig Topper	978165bf02	[X86] Rename mod128.ll to divmod128.ll and add test cases for sdiv/udiv/urem. This improves code coverage on the switch in LowerWin64_i128OP.	2020-08-05 16:04:00 -07:00
Craig Topper	08b2d0a963	[X86] Disable copy elision in LowerMemArgument for scalarized vectors when the loc VT is a different size than the original element. For example a v4f16 argument is scalarized to 4 i32 values. So the values are spread out instead of being packed tightly like in the original vector. Fixes PR47000.	2020-08-05 15:44:54 -07:00
Craig Topper	13796d1423	[X86] Add test case for PR47000. NFC	2020-08-05 15:44:53 -07:00
Stanislav Mekhanoshin	0bcda1a261	[AMDGPU] Scavenge temp reg for AGPR spill Differential Revision: https://reviews.llvm.org/D85234	2020-08-05 13:29:19 -07:00
Rahman Lavaee	20a568c29d	[Propeller]: Use a descriptive temporary symbol name for the end of the basic block. This patch changes the functionality of AsmPrinter to name the basic block end labels as LBB_END${i}_${j}, with ${i} being the identifier for the function and ${j} being the identifier for the basic block. The new naming scheme is consistent with how basic block labels are named (.LBB${i}_{j}), and how function end symbol are named (.Lfunc_end${i}) and helps to write stronger tests for the upcoming patch for BB-Info section (as proposed in https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). The end label is used with basicblock-labels (BB-Info section in future) and basicblock-sections to compute the size of basic blocks and basic block sections, respectively. For BB sections, the section containing the entry basic block will not have a BB end label since it already gets the function end-label. This label is cached for every basic block (CachedEndMCSymbol) like the label for the basic block (CachedMCSymbol). Differential Revision: https://reviews.llvm.org/D83885	2020-08-05 13:17:19 -07:00
Matt Arsenault	ec8c172d01	AMDGPU: Correct prolog SP initialization logic Having callees that will read SP is not the only reason we need to reference the stack pointer.	2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Matt Arsenault	b1dac0cfcd	AMDGPU: Remove leftover test	2020-08-05 14:43:21 -04:00
Matt Arsenault	3e52667433	AMDGPU: Fix verifier error with undef source producing s_bitset* This needs to preserve the undef flag.	2020-08-05 14:42:20 -04:00
Simon Pilgrim	b60f998859	[X86][SSE] Fold 128-bit PACK(EXTEND(X),EXTEND(Y)) -> CONCAT(X,Y) subvectors This is seen in the sub-128-bit vector trunc(ext()) of comparison results Fixes pr46585.ll regression in D66004	2020-08-05 18:27:40 +01:00
Simon Pilgrim	a57bfb44bc	[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for integer types	2020-08-05 15:09:51 +01:00
Denis Antrushin	d21ce40821	[Statepoints] Operand folding in presense of tied registers. Implement proper folding of statepoint meta operands (deopt and GC) when statepoint uses tied registers. For deopt operands it is just about properly preserving tiedness in new instruction. For tied GC operands folding is a little bit more tricky. We can fold tied GC operands only from InlineSpiller, because it knows how to properly reload tied def after it was turned into memory operand. Other users (e.g. peephole) cannot properly fold such operands as they do not know how (or when) to reload them from memory. We do this by un-tieing operand we want to fold in InlineSpiller and allowing to fold only untied operands in foldPatchpoint.	2020-08-05 20:18:28 +07:00
Roman Lebedev	f5df5cd558	Recommit "[InstCombine] Negator: -(X << C) --> X * (-1 << C)" This reverts commit `ac70b37a00` which reverted commit `8aeb2fe13a` because codegen tests got broken and i needed time to investigate. This shows some regressions in tests, but they are all around GEP's, so i'm not really sure how important those are. https://rise4fun.com/Alive/1Gn	2020-08-05 15:59:13 +03:00
Simon Pilgrim	300899b9c4	[X86][AVX] Add test showing unnecessary duplicate HADD instructions Taken from internal fuzz test	2020-08-05 12:00:27 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Simon Pilgrim	4aaf301fb8	[DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x))) We currently don't do anything to fold any_extend vector loads as no target has such an instruction. Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on. We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication. Fixes the regression I mentioned in rG99a971cadff7 Differential Revision: https://reviews.llvm.org/D85129	2020-08-05 11:22:23 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Jay Foad	8cbf4a17ac	[AMDGPU] Propagate fast math flags in frem lowering Differential Revision: https://reviews.llvm.org/D84518	2020-08-05 09:09:38 +01:00
Jay Foad	1bb07e1b91	[AMDGPU] Precommit tests for D84518 Propagate fast math flags in frem lowering	2020-08-05 09:09:02 +01:00
Jay Foad	04cf4a5a65	[AMDGPU] Lower frem f16 Without this it would fail to select on subtargets that have 16-bit instructions. Differential Revision: https://reviews.llvm.org/D84517	2020-08-05 09:08:40 +01:00
Yonghong Song	00602ee7ef	BPF: simplify IR generation for __builtin_btf_type_id() This patch simplified IR generation for __builtin_btf_type_id(). For __builtin_btf_type_id(obj, flag), previously IR builtin looks like if (obj is a lvalue) llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type else llvm.bpf.btf.type.id(obj, 0, flag) !type The purpose of the 2nd argument is to differentiate __builtin_btf_type_id(obj, flag) where obj is a lvalue vs. __builtin_btf_type_id(obj.ptr, flag) Note that obj or obj.ptr is never used by the backend and the `obj` argument is only used to derive the type. This code sequence is subject to potential llvm CSE when - obj is the same .e.g., nullptr - flag is the same - metadata type is different, e.g., typedef of struct "s" and strust "s". In the above, we don't want CSE since their metadata is different. This patch change IR builtin to llvm.bpf.btf.type.id(seq_num, flag) !type and seq_num is always increasing. This will prevent potential llvm CSE. Also report an error if the type name is empty for remote relocation since remote relocation needs non-empty type name to do relocation against vmlinux. Differential Revision: https://reviews.llvm.org/D85174	2020-08-04 16:29:42 -07:00
Arthur Eubanks	f50b3ff02e	[Hexagon] Use InstSimplify instead of ConstantProp This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp. Add -hexagon-instsimplify option to enable skipping of instsimplify in tests that can't handle the extra optimization. Differential Revision: https://reviews.llvm.org/D85047	2020-08-04 15:42:39 -07:00
Eli Friedman	4a47f1c4ce	[SelectionDAG][SVE] Support scalable vectors in getConstantFP() Differential Revision: https://reviews.llvm.org/D85249	2020-08-04 15:32:43 -07:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	f8fb7835d6	GlobalISel: Add utilty for getting function argument live ins Get the argument register and ensure there's a copy to the virtual register. AMDGPU and AArch64 have similarish code to get the livein value, and I also want to use this in multiple places. This is a bit more aggressive about setting the register class than the original function, but that's probably OK. I think we're missing a few verifier checks for function live ins. I noticed AArch64's calling convention code is not actually adding liveins to functions, only the entry block (which apparently might not matter that much?). There should probably be a verifier check that entry block live ins are also live into the function. We also might need a verifier check that the copy to the livein virtual register is in the entry block.	2020-08-04 16:55:55 -04:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Cameron McInally	0f2b47b6da	[FastISel] Don't transform FSUB(-0, X) -> FNEG(X) in FastISel This corresponds with the SelectionDAGISel change in D84056. Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC. Differential Revision: https://reviews.llvm.org/D85149	2020-08-04 14:42:53 -05:00
Yonghong Song	6d218b4adb	BPF: support type exist/size and enum exist/value relocations Four new CO-RE relocations are introduced: - TYPE_EXISTENCE: whether a typedef/record/enum type exists - TYPE_SIZE: the size of a typedef/record/enum type - ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists - ENUM_VALUE: the enum value of an enum type These additional relocations will make CO-RE bpf programs more adaptive for potential kernel internal data structure changes. Differential Revision: https://reviews.llvm.org/D83878	2020-08-04 12:35:39 -07:00
Matt Arsenault	3e16e2152c	GlobalISel: Handle llvm.localescape This one is pretty easy and shrinks the list of unhandled intrinsics. I'm not sure how relevant the insert point is. Using the insert position of EntryBuilder will place this after constants. SelectionDAG seems to end up emitting these after argument copies and before anything else, but I don't think it really matters. This also ends up emitting these in the opposite order from SelectionDAG, but I don't think that matters either. This also needs a fix to stop the later passes dropping this as a dead instruction. DeadMachineInstructionElim's version of isDead special cases LOCAL_ESCAPE for some reason, and I'm not sure why it's excluded from MachineInstr::isLabel (or why isDead doesn't check it). I also noticed DeadMachineInstructionElim never considers inline asm as dead, but GlobalISel will drop asm with no constraints.	2020-08-04 15:19:02 -04:00
Matt Arsenault	14ed5cf5c4	AMDGPU/GlobalISel: Add baseline tests for andn2/orn2 matching	2020-08-04 15:13:49 -04:00
Cameron McInally	724b035fe4	[GlobalISel] Remove redundant FNEG tests. These tests were made redundant by D85139.	2020-08-04 11:32:15 -05:00
Matt Arsenault	0de547ed4a	AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES Fixes verifier error with SGPR unmerges with 96-bit result types.	2020-08-04 12:27:34 -04:00
Cameron McInally	23adbac9ee	[GlobalISel] Don't transform FSUB(-0, X) -> FNEG(X) in GlobalISel. This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the MIR. This corresponds with the SelectionDAGISel change in D84056. Differential Revision: https://reviews.llvm.org/D85139	2020-08-04 11:27:09 -05:00
Matt Arsenault	444401c31f	GlobalISel: Hack a test to avoid a bug introducing a verifier error There seems to be an unrelated CSEMIRBuilder bug that was causing expensive checks failures in this case. Hack the test to avoid this problem for now until that's fixed.	2020-08-04 11:57:04 -04:00
Nemanja Ivanovic	14d726acd6	[PowerPC] Don't remove single swap between the load and store The swap removal pass looks to remove swaps when a loaded value is swapped, some number of lane-insensitive operations are performed and then the value is swapped again and stored. However, in a situation where we load the value, swap it and then store it without swapping again, the pass erroneously removes the single swap. The reason is that both checks in the same equivalence class: - load feeds a swap - swap feeds a store pass. However, there is no check that the two swaps are actually a single swap. This patch just fixes that. Differential revision: https://reviews.llvm.org/D84785	2020-08-04 10:38:15 -05:00
Jay Foad	28e322ea93	[PowerPC] Custom lowering for funnel shifts The custom lowering saves an instruction over the generic expansion, by taking advantage of the fact that PowerPC shift instructions are well defined in the shift-by-bitwidth case. Differential Revision: https://reviews.llvm.org/D83948	2020-08-04 16:30:49 +01:00
Jay Foad	8ec8ad868d	[AMDGPU] Use fma for lowering frem This gives shorter f64 code and perhaps better accuracy. Differential Revision: https://reviews.llvm.org/D84516	2020-08-04 16:18:23 +01:00
Jay Foad	ee75cf36bb	[AMDGPU] Generate frem test checks Differential Revision: https://reviews.llvm.org/D84515	2020-08-04 16:18:23 +01:00
Simon Pilgrim	36750ba5bd	[X86][AVX] isHorizontalBinOp - relax lane-crossing limits for AVX1-only targets. Permit lane-crossing post shuffles on AVX1 targets as long as every element comes from the same source lane, which for v8f32/v4f64 cases can be efficiently lowered with the LowerShuffleAsLanePermuteAnd* style methods.	2020-08-04 14:27:01 +01:00
Sander de Smalen	bb3344c7d8	[AArch64][SVE] Add missing unwind info for SVE registers. This patch adds a CFI entry for each SVE callee saved register that needs unwind info at an offset from the CFA. The offset is a DWARF expression because the offset is partly scalable. The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84044	2020-08-04 11:47:06 +01:00
Sander de Smalen	fd6584a220	[AArch64][SVE] Fix CFA calculation in presence of SVE objects. The CFA is calculated as (SP/FP + offset), but when there are SVE objects on the stack the SP offset is partly scalable and should instead be expressed as the DWARF expression: SP + offset + scalable_offset * VG where VG is the Vector Granule register, containing the number of 64bits 'granules' in a scalable vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84043	2020-08-04 11:47:06 +01:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00

1 2 3 4 5 ...

35152 Commits