llvm-project

Commit Graph

Author	SHA1	Message	Date
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
alex-t	39e1a90784	[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block Summary: When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop. To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result. %95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32 results to s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_mov_b64 exec, s[6:7] The bug appeared in case this expansion occurs in the ELSE block of the CF. Originally %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32, %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec expanded to ****************** <== here exec has "THEN" context s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten. s_mov_b64 exec, s[6:7] The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask" SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec. Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75472	2020-03-10 14:04:22 +03:00
Matt Arsenault	dd6cf159ba	AMDGPU: Stop adding m0 implicit def to SGPR spills r375293 removed the SGPR spilling with scalar stores path, so this is no longer necessary. This also always had the defect of adding the def even when this path wasn't in use. llvm-svn: 375448	2019-10-21 19:42:29 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	828b685ebe	RegAllocFast: Improve hinting heuristic Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887	2019-05-16 12:50:39 +00:00
Matt Arsenault	b6c599afd3	Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block" This reverts commit r359912. This should pass now, since the clang test was made less fragile in r359918. llvm-svn: 359919	2019-05-03 19:06:57 +00:00
Nico Weber	bb852a9672	Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block" Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail. llvm-svn: 359912	2019-05-03 18:08:03 +00:00
Matt Arsenault	daf2d653fa	RegAllocFast: Add heuristic to detect values not live-out of a block Add an improved/new heuristic to catch more cases when values are not live out of a basic block. Patch by Matthias Braun llvm-svn: 359906	2019-05-03 17:03:24 +00:00
Stanislav Mekhanoshin	a6322941ff	[AMDGPU] gfx1010 VMEM and SMEM implementation Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621	2019-04-30 22:08:23 +00:00
Neil Henning	0a30f33ce2	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure. This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400	2019-04-01 15:19:52 +00:00
Scott Linder	e2c5847414	[AMDGPU] Consider XOR in waterfall loop as a terminator Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator. Differential Revision: https://reviews.llvm.org/D57703 llvm-svn: 353207	2019-02-05 19:50:32 +00:00

11 Commits