llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	32e90cbcd1	[AMDGPU] Disable endcf collapse There are some functional regressions and I suspect our scopes are not as perfectly enclosed as I expected. Disable it for now. Differential Revision: https://reviews.llvm.org/D76148	2020-03-13 12:33:22 -07:00
Stanislav Mekhanoshin	360aff0493	[AMDGPU] Simplify nested SI_END_CF This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still in place and will be removed later, but we know for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64 with EXEC. The subsequent change needs to convert s_and_saveexec into s_and and address new TODO lines in tests, then code block guarded by the -amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer will be removed. Differential Revision: https://reviews.llvm.org/D76033	2020-03-12 11:25:07 -07:00
Stanislav Mekhanoshin	d4757a6cf1	[AMDGPU] pre-commit collapse-endcf.mir. NFC. Pre commit test before D76033.	2020-03-11 16:15:29 -07:00
Stanislav Mekhanoshin	9801e5469b	[AMDGPU] Disable nested endcf collapse The assumption is that conditional regions are perfectly nested and a mask restored at the exit from the inner block will be completely covered by a mask restored in the outer. It turns out with our current structurizer this is not always the case. Disable the optimization for now, but I want to keep it around for a while to either try after further structurizer changes or to move it into control flow lowering where we have more info and reuse the test. Differential Revision: https://reviews.llvm.org/D75958	2020-03-11 11:24:20 -07:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	4b7fc85c0b	Revert "AMDGPU: Fix iterator error when lowering SI_END_CF" This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417	2019-08-20 17:45:25 +00:00
Matt Arsenault	d48324ff6f	Reapply "AMDGPU: Split block for si_end_cf" This reverts commit r359363, reapplying r357634 llvm-svn: 367500	2019-08-01 01:25:27 +00:00
Stanislav Mekhanoshin	a6322941ff	[AMDGPU] gfx1010 VMEM and SMEM implementation Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621	2019-04-30 22:08:23 +00:00
Mark Searles	76c5b62988	Revert "AMDGPU: Split block for si_end_cf" This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363	2019-04-27 00:51:18 +00:00
Matt Arsenault	396653f8a1	AMDGPU: Split block for si_end_cf Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634	2019-04-03 20:53:20 +00:00
Matt Arsenault	a353fd572a	AMDGPU: Make exec mask optimzations more resistant to block splits Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170	2019-03-28 14:01:39 +00:00
Matt Arsenault	4d47ac3b30	AMDGPU: Add additional MIR tests for exec mask optimizations Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091	2019-03-27 16:58:30 +00:00
Matt Arsenault	4ab28b64b4	AMDGPU: Skip debug_instr when collapsing end_cf Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090	2019-03-27 16:58:27 +00:00

13 Commits