llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	eef92f25cc	AMDGPU: Remove custom node for exports I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic. Requires hacking out the illegal type on SI, so manually select that case during lowering.	2020-01-15 18:33:15 -05:00
Matt Arsenault	b4a647449f	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.	2020-01-09 17:37:52 -05:00
Matt Arsenault	c1d4963b44	AMDGPU: Use new PatFrag system for d16 load nodes	2020-01-09 10:29:32 -05:00
Matt Arsenault	c3a10faadc	AMDGPU: Remove VOP3Mods0Clamp0OMod Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.	2020-01-07 15:10:08 -05:00
Matt Arsenault	e93b1ffc84	AMDGPU: Use default operands for clamp/omod We have a lot of complex pattern variants that just set the source modifiers that are really handled, and then set the output modifiers to 0. We're unlikely to ever match output modifiers from the use instruction side, and we already match clamp/omod in a separate pass.	2020-01-06 20:22:13 -05:00
Matt Arsenault	a506efff18	AMDGPU: Use ImmLeaf This solves one GlobalISel importer error, but the pattern still fails for another reason.	2020-01-06 17:21:51 -05:00
Matt Arsenault	14d25052a2	AMDGPU: Use ImmLeaf for inline immediate predicates	2020-01-06 17:21:51 -05:00
Matt Arsenault	e16a71382d	AMDGPU: Select global atomicrmw fadd This only works if there is no use of the return value.	2019-11-06 16:06:38 -08:00
Stanislav Mekhanoshin	4312c4afd4	[AMDGPU] deduplicate tablegen predicates We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815	2019-11-04 12:19:17 -08:00
Matt Arsenault	1aad3835f8	AMDGPU: Fix missing OPERAND_IMMEDIATE llvm-svn: 375365	2019-10-20 16:56:10 +00:00
Stanislav Mekhanoshin	befab66a2c	[AMDGPU] drop getIsFP td helper We already have isFloatType helper, and they are out of sync. Drop one and merge the type list. Differential Revision: https://reviews.llvm.org/D69138 llvm-svn: 375175	2019-10-17 21:46:56 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Stanislav Mekhanoshin	1fb584f7a2	[AMDGPU] Added MI bit IsDOT NFC, needed for future commit. Differential Revision: https://reviews.llvm.org/D67669 llvm-svn: 372151	2019-09-17 17:56:13 +00:00
Matt Arsenault	4a73c6eada	AMDGPU/GlobalISel: Select G_CTPOP llvm-svn: 371798	2019-09-13 00:11:20 +00:00
Matt Arsenault	63e6d8db1c	AMDGPU/GlobalISel: Select atomic loads A new check for an explicitly atomic MMO is needed to avoid incorrectly matching pattern for non-atomic loads llvm-svn: 371418	2019-09-09 16:18:07 +00:00
Matt Arsenault	ebbd6e4976	AMDGPU: Remove code address space predicates Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413	2019-09-09 16:02:07 +00:00
Matt Arsenault	cfdc2b9bd9	AMDGPU: Disambiguate v3f16 format in load/store tables Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202	2019-08-18 00:20:43 +00:00
Austin Kerbow	a05c384132	Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969	2019-08-06 02:16:11 +00:00
Dmitri Gribenko	37aa8ad663	Revert "[AMDGPU] Use S_DENORM_MODE for gfx10" This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904	2019-08-05 18:36:43 +00:00
Austin Kerbow	8d229dbb47	[AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882	2019-08-05 16:09:49 +00:00
Nicolai Haehnle	e204786b6c	AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec} Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821	2019-08-05 09:36:06 +00:00
Matt Arsenault	ae87b9f2c2	AMDGPU/GlobalISel: Select local atomic cmpxchg llvm-svn: 367511	2019-08-01 03:41:41 +00:00
Matt Arsenault	e6ce48422c	AMDGPU: Start redefining atomic PatFrags Start migrating to a form that will be compatible with the global isel emitter. Also should fix some overly lax checks on the memory type, which allowed mis-selecting some illegal atomics. llvm-svn: 367506	2019-08-01 03:25:52 +00:00
Matt Arsenault	70e20c0f08	AMDGPU: Correct FP atomic patterns These need to use an fadd, not an add. Also make the noret part clear in the name. llvm-svn: 367505	2019-08-01 03:22:40 +00:00
Matt Arsenault	3baf4d3418	AMDGPU/GlobalISel: Select simple local stores llvm-svn: 367504	2019-08-01 03:09:15 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Matt Arsenault	52c262484f	TableGen: Add MinAlignment predicate AMDGPU uses some custom code predicates for testing alignments. I'm still having trouble comprehending the behavior of predicate bits in the PatFrag hierarchy. Any attempt to abstract these properties unexpectdly fails to apply them. llvm-svn: 367373	2019-07-31 00:14:43 +00:00
Matt Arsenault	5e23f42820	AMDGPU: Avoid custom predicates for stores with glue llvm-svn: 366613	2019-07-19 21:01:30 +00:00
Stanislav Mekhanoshin	1dfae6fe50	[AMDGPU] use v32f32 for 3 mfma intrinsics These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972	2019-07-12 22:42:01 +00:00
Jay Foad	7816ad918f	[AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32 Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904	2019-07-12 15:02:59 +00:00
Matt Arsenault	e5fb434d92	AMDGPU: s_waitcnt field should be treated as unsigned Also make it an ImmLeaf, so it should work with global isel as well, which was part of the point of moving it in the first place. llvm-svn: 365842	2019-07-11 23:42:57 +00:00
Stanislav Mekhanoshin	e93279fd1b	[AMDGPU] gfx908 atomic fadd and atomic pk_fadd Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717	2019-07-11 00:10:17 +00:00
Stanislav Mekhanoshin	50d7f46460	[AMDGPU] gfx908 mAI instructions, MC part Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563	2019-07-09 21:43:09 +00:00
Stanislav Mekhanoshin	c776dc0b60	[AMDGPU] Added td definitions for HW regs Infrastructure work for future commit. NFC. Differential Revision: https://reviews.llvm.org/D64370 llvm-svn: 365432	2019-07-09 03:20:33 +00:00
Matt Arsenault	9e7cbc0e7d	AMDGPU: Split extload/zextload local load patterns This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398	2019-07-08 22:08:23 +00:00
Matt Arsenault	430b0497e7	AMDGPU: Move waitcnt intrinsic to instruction definition pattern llvm-svn: 365349	2019-07-08 16:53:48 +00:00
Dmitry Preobrazhensky	2eff0318c6	[AMDGPU][MC] Corrected parsing of FLAT offset modifier Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321	2019-07-08 14:27:37 +00:00
Nicolai Haehnle	4dc3b2bf95	AMDGPU: Support GDS atomics Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814	2019-07-01 17:17:45 +00:00
Ryan Taylor	9ab812d475	[AMDGPU] Fix for branch offset hardware workaround Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451	2019-06-26 17:34:57 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00
Stanislav Mekhanoshin	bdf7f81b89	[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074	2019-06-21 16:30:14 +00:00
Matt Arsenault	b7f87c0ecf	AMDGPU: Treat undef as an inline immediate This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941	2019-06-20 16:01:09 +00:00
Stanislav Mekhanoshin	0846c125f9	[AMDGPU] gfx1010 core wave32 changes Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934	2019-06-20 15:08:34 +00:00
Matt Arsenault	e24b34e9c9	AMDGPU: Undo sub x, c canonicalization for v2i16 Should avoid regression from D62341 llvm-svn: 363899	2019-06-19 23:37:43 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	cdf339266b	[AMDGPU] gfx1010 BoolReg definition. NFC. Earlier commit has added AMDGPUOperand::isBoolReg(). Turns out gcc issues warning about unused function since D63204 is not yet submitted. Added NFC part of D63204 to have a use of that function and mute the warning. llvm-svn: 363416	2019-06-14 16:25:46 +00:00
Stanislav Mekhanoshin	8bcc9bb595	[AMDGPU] gfx1010 base changes for wave32 Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299	2019-06-13 19:18:29 +00:00
Stanislav Mekhanoshin	245b5ba344	[AMDGPU] gfx1010 dpp16 and dpp8 Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186	2019-06-12 18:02:41 +00:00

1 2 3 4 5 ...

319 Commits