llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	7cdb1df8c7	[AMDGPU] Divergence driven selection for fused bitlogic The change adds divergence predicates for fused logical operations. The problem with selecting a scalar fused op such as S_NOR_B32 is that it does not have a VALU counterpart and will be split in moveToVALU. At the same time it prevents selection of a better opcode on the VALU side (such as V_OR3_B32) which does not have a counterpart on SALU side. XNOR opcodes are left as is and selected as scalar to get advantage of the SIInstrInfo::lowerScalarXnor() code which can commute operations to keep one of two opcodes on SALU if possible. See xnor.ll test for this. Differential Revision: https://reviews.llvm.org/D111907	2021-10-18 01:44:25 -07:00
Jay Foad	3828ea6181	[AMDGPU] Divergence-driven instruction selection for mul i32 Differential Revision: https://reviews.llvm.org/D109881	2021-09-22 09:36:34 +01:00
Jay Foad	128a49727a	[AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC. The warning is implemented by D109359 which is still in review. Differential Revision: https://reviews.llvm.org/D109826	2021-09-16 09:07:18 +01:00
Stanislav Mekhanoshin	fe197ef9f1	[AMDGPU] Mark relevant rematerializable VOP3 instructions Differential Revision: https://reviews.llvm.org/D106110	2021-07-21 14:44:13 -07:00
Jay Foad	472f856714	[AMDGPU] Tweak VOP3_INTERP16 profile Set the output register class based on the output type, instead of hard-coding VGPR_32. I think this is more correct. It doesn't make any difference at the moment because we use the same class for 16- and 32-bit results, but it might in future if we make more use of true 16-bit register classes. Differential Revision: https://reviews.llvm.org/D102622	2021-05-17 15:28:00 +01:00
Joe Nash	168228d76a	[AMDGPU] Make some VOP3 insts commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of some U/I/B vop3 instructions. This patch revises `d35d8da7d6`. It contains the commute opportunities excluding float insts Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D101474 Change-Id: I62938173d750453839f2457a3851661a29135faf	2021-04-28 13:59:08 -04:00
Dmitry Preobrazhensky	cd953434f2	[AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645. Differential Revision: https://reviews.llvm.org/D99413	2021-04-01 14:21:00 +03:00
Dmitry Preobrazhensky	0f5ebbcc7f	[AMDGPU][MC] Added flag to identify VOP instructions which have a single variant By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086. This fix simplifies handling of single VOP instructions by adding a dedicated flag. Differential Revision: https://reviews.llvm.org/D99408	2021-04-01 13:53:12 +03:00
Joe Nash	45fd7c02af	Revert "[AMDGPU] Mark additional VOP3 as commutable" This reverts commit `d35d8da7d6`.	2021-03-29 14:48:11 -04:00
Joe Nash	d35d8da7d6	[AMDGPU] Mark additional VOP3 as commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of more instructions. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D99376 Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc	2021-03-29 14:22:20 -04:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Mirko Brkusanin	608ac62540	[AMDGPU] Fix use of HasModifiers in VopProfile HasModifiers should be true if at least one modifier is used. This should make the use of this field bit more consistent. Differential Revision: https://reviews.llvm.org/D94795	2021-01-26 15:21:11 +01:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
Joe Nash	bcec0f27a2	[AMDGPU] Deduplicate VOP tablegen asm & ins VOP3 and VOP DPP subroutines to generate input operands and asm strings were essentially copy pasted several times. They are deduplicated to reduce the maintenance burden and allow faster development. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D94102 Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea	2021-01-11 13:49:26 -05:00
Joe Nash	60466fad2d	[AMDGPU] Remove deprecated V_MUL_LO_I32 from GFX10 It was removed in GFX10 GPUs, but LLVM could generate it. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D94020 Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c	2021-01-05 11:59:57 -05:00
Carl Ritson	62c246eda2	[AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0 These parameters set a default value of 0, so I believe they should include a 0 suffix. This allows for versions which do not set a default value in future. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93187	2020-12-14 20:01:56 +09:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Stanislav Mekhanoshin	78ae1f6c90	[AMDGPU] Change predicate for fma/fmac legacy I do not exactly like the use of a negative predicate to enable instructions' support. Change HasNoMadMacF32Insts with HasFmaLegacy32. Differential Revision: https://reviews.llvm.org/D90250	2020-10-27 12:03:52 -07:00
Jay Foad	f6a5699c6c	[AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC.	2020-10-21 09:56:43 +01:00
Jay Foad	1417abe54c	[AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic Differential Revision: https://reviews.llvm.org/D89558	2020-10-16 17:10:21 +01:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
Matt Arsenault	a4edc04693	AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat We also have never handled this for SelectionDAG, which needs additional work.	2020-07-28 11:18:05 -04:00
Matt Arsenault	219a9fea14	AMDGPU: Rename gfx9 version of v_add_i32/v_sub_i32 The carry-out opcode is renamed, so eliminate the deceptive _gfx9, which looked like the encoded instruction. The real encoded version was named _gfx9_gfx9. Move it into the VI encoding namespace. The gfx9 namespace is just to deal with the renamed instructions that reinterpret the opcode. When codegened, it would fail to find the real instruction since it wasn't in the right namespace.	2020-07-16 13:32:05 -04:00
Matt Arsenault	c5c58fd6b5	AMDGPU: Remove intermediate DAG node for trig_preop intrinsic We weren't doing anything with this, and keeping it would just add more boilerplate for GlobalISel.	2020-06-16 21:06:25 -04:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Matt Arsenault	483d4daa5e	AMDGPU: Select strict_fma Like with strict_fadd, the legalization is scalarizing the v4f16 when it should split.	2020-06-04 17:49:00 -04:00
Matt Arsenault	ae26c064ce	AMDGPU: Select strict_fadd	2020-06-04 17:49:00 -04:00
Matt Arsenault	d259668731	AMDGPU: Set mayRaiseFPException This may be missing a few overrides to set it off still in some special cases. Since the flags set during selection should now be reliably preserved, this should not change codegen for non-strictfp functions.	2020-06-04 17:35:27 -04:00
Dmitry Preobrazhensky	45251ef534	[AMDGPU][MC] Corrected v_writelane_b32 to fix a decoding bug Corrected vdst_in to match vdst operand type. See bug 45193: https://bugs.llvm.org/show_bug.cgi?id=45193 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80636	2020-05-28 14:43:49 +03:00
Matt Arsenault	4b4496312e	AMDGPU: Start adding MODE register uses to instructions This is the groundwork required to implement strictfp. For now, this should be NFC for regular instructoins (many instructions just gain an extra use of a reserved register). Regalloc won't rematerialize instructions with reads of physical registers, but we were suffering from that anyway with the exec reads. Should add it for all the related FP uses (possibly with some extras). I did not add it to either the gpr index mode instructions (or every single VALU instruction) since it's a ridiculous feature already modeled as an arbitrary side effect. Also work towards marking instructions with FP exceptions. This doesn't actually set the bit yet since this would start to change codegen. It seems nofpexcept is currently not implied from the regular IR FP operations. Add it to some MIR tests where I think it might matter.	2020-05-27 14:47:00 -04:00
Dmitry Preobrazhensky	18a5428e60	[AMDGPU][MC][GFX9+] Enabled clamp for v_add_i32 and v_sub_i32 See bug 45830: https://bugs.llvm.org/show_bug.cgi?id=45830 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D79585	2020-05-13 14:17:20 +03:00
Kazuaki Ishizaki	0312b9f550	[llvm] NFC: Fix trivial typo in rst and td files Differential Revision: https://reviews.llvm.org/D77469	2020-04-23 14:26:32 +09:00
Matt Arsenault	e87ec66762	AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll	2020-04-06 11:50:16 -04:00
Matt Arsenault	cbf719b568	AMDGPU: Use DAG patterns for div_fmas	2020-04-06 09:28:30 -04:00
Matt Arsenault	a950e3beef	AMDGPU: Move towards deprecating alignbit intrinsic This is equivalent to llvm.fshr, so legalize the intrinsic to the generic node.	2020-03-20 11:03:04 -04:00
alex-t	48a9cf9043	[AMDGPU] Enable SEXT divergence driven selection. Summary: This change enable the divergence driven selection for the SEXT DAG opcode. Reviewers: vpykhtin, rampitec Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Differential Revision: https://reviews.llvm.org/D76230	2020-03-17 17:30:11 +03:00
Matt Arsenault	15bf916b54	AMDGPU: Remove VOP3OpSelMods0 complex pattern Use default operand of 0 instead.	2020-03-04 17:18:22 -05:00
Jay Foad	7d973307d5	[AMDGPU] Fix scheduling model for V_MULLIT_F32 This was incorrectly marked as a half rate 64-bit instruction by D45073.	2020-02-28 23:22:58 +00:00
Jay Foad	addcbc401c	[AMDGPU] Update a comment missed in `74e2974ac6`	2020-02-28 13:35:55 +00:00
Matt Arsenault	db06870dbd	AMDGPU: Move dot intrinsic patterns to instruction def I tried to use some of the new tablegen features to avoid creating different operand list permutations, but I still don't see a way to programmatically build a source pattern dag. Also add GlobalISel tests, which now all import successfully. Some of the fneg fold tests are incorrect, which need to be fixed in a future commit	2020-02-21 13:35:40 -05:00
Matt Arsenault	60023e3471	AMDGPU: Use default operand for VOP3P clamp We don't use this, and matching from the def doesn't make much sense. There are multiple tablegen bugs with default operand handling. undef_tied_input should work to handle the vdst_in correctly, but this breaks the operand register class constraint which it should be able to infer.	2020-02-21 12:14:18 -05:00
alex-t	5df1ac7846	[AMDGPU] fixed divergence driven shift operations selection Differential Revision: https://reviews.llvm.org/D73483 Reviewers: rampitec	2020-01-31 20:49:56 +03:00
Matt Arsenault	7f3280ecdd	AMDGPU/GlobalISel: Select permlane16/permlanex16	2020-01-29 17:55:31 -05:00
Matt Arsenault	68b102b97a	AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16 Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.	2020-01-29 08:24:31 -08:00
Matt Arsenault	618fa77ae4	AMDGPU/GlobalISel: Select V_ADD3_U32/V_XOR3_B32 The other 3-op patterns should also be theoretically handled, but currently there's a bug in the inferred pattern complexity. I'm not sure what the error handling strategy should be for potential constant bus violations. I think the correct strategy is to never produce mixed SGPR and VGPR operands in a typical VOP instruction, which will trivially avoid them. However, it's possible to still have hand written MIR (or erroneously transformed code) with these operands. When these fold, the restriction will be violated. We currently don't have any verifiers for reg bank legality. For now, just ignore the restriction. It might be worth triggering a DAG fallback on verifier error.	2020-01-23 12:04:20 -05:00
Matt Arsenault	91e758b732	AMDGPU: Move permlane discard vdst_in optimization This case can be handled as a regular selection pattern, so move it out of the weird post-isel folding code which doesn't have an exactly equivalent place in GlobalISel. I think it doesn't make much sense to do this optimization here though, and it would be more useful in instcombine. There's not really any new information that will be gained during lowering since these inputs were known from the beginning.	2020-01-16 17:27:53 -05:00
Matt Arsenault	bd7658a212	AMDGPU: Partially directly select llvm.amdgcn.interp.p1.f16 The 16 bank LDS case is complicated due to using multiple instructions. If I attempt to write a pattern for it, the generated selector incorrectly places the copy to m0 after the first instruction, so that needs to be separately addressed. Also fix not gluing the copy to m0 to the second operation in the second half of the 16 bank lowering.	2020-01-15 08:58:58 -05:00
Matt Arsenault	452f6243c9	AMDGPU: Select llvm.amdgcn.interp.p2.f16 directly This will enable automatic GlobalISel support in a future commit.	2020-01-06 20:34:21 -05:00
Stanislav Mekhanoshin	4312c4afd4	[AMDGPU] deduplicate tablegen predicates We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815	2019-11-04 12:19:17 -08:00
Dmitry Preobrazhensky	b8042dbe2b	[AMDGPU][MC][GFX10] Added v_interp_[p1/p2/mov]_f32_e64 See https://bugs.llvm.org/show_bug.cgi?id=43747 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69348	2019-10-28 15:03:43 +03:00

1 2 3

118 Commits