llvm-project

Commit Graph

Author	SHA1	Message	Date
Changpeng Fang	da38b5fd49	AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355	2018-02-16 16:31:30 +00:00
Philip Reames	232951dfb2	2nd attempt at "fixing" amdgpu tests after r321575 The test needs to be changed; it was exercising UB and that likely wasn't the intent of the test author. I simply removed the checks because I have absolutely no idea what this test was trying to accomplish. With multiple check patterns, no explanation, and no familiarity on my part with the ISA a true fix is going to have to come from someone familiar with the target. llvm-svn: 321591	2017-12-31 03:34:36 +00:00
Philip Reames	3580c90458	Test fix after r321575 The test in question was checking for a particular intepretation of undefined behavior. Relax the test to check that we simply don't crash. Sorry for the breakage, I don't generally build AMDGPU locally and just saw the failure this morning. llvm-svn: 321589	2017-12-30 18:42:37 +00:00
Zvi Rackover	ffaed72089	AMDGPU Tests: Change a case to be run with -O0 D40231 requires to run case with -O0 to prevent InstructionSimplify from transforming an extractelement with undef index. llvm-svn: 319907	2017-12-06 17:40:09 +00:00
Matt Arsenault	3f71c0e3ee	AMDGPU: Select DS insts without m0 initialization GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270	2017-11-29 00:55:57 +00:00
Matt Arsenault	8728c5f2db	AMDGPU: Cleanup subtarget features Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258	2017-08-07 14:58:04 +00:00
Matt Arsenault	4e309b0861	AMDGPU: Start selecting global instructions llvm-svn: 309470	2017-07-29 01:03:53 +00:00
Alexander Timofeev	982aee6a38	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097	2017-07-04 17:32:00 +00:00
NAKAMURA Takumi	e4a741376b	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default" It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054	2017-07-04 02:14:18 +00:00
Alexander Timofeev	ea7f08bee5	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Matt Arsenault	3c7581bbeb	AMDGPU: Use correct register names in inline assembly Fixes using physical registers in inline asm from clang. llvm-svn: 305004	2017-06-08 19:03:20 +00:00
Mark Searles	70359ac60d	[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests. -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551	2017-06-02 14:19:25 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Marek Olsak	e22fdb9cac	AMDGPU: Always use VGPR indexing on GFX9 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396	2017-03-21 17:00:32 +00:00
Matt Arsenault	7aad8fd8f4	Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982	2017-01-24 22:02:15 +00:00
Matt Arsenault	ad55ee5869	AMDGPU: Don't required structured CFG The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744	2016-12-06 01:02:51 +00:00
Konstantin Zhuravlyov	0a1a7b6b23	Revert "AMDGPU: Enable ConstrainCopy DAG mutation" This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233	2016-11-17 16:41:49 +00:00
Matt Arsenault	3b36bb1d87	AMDGPU: Enable ConstrainCopy DAG mutation This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146	2016-11-16 20:35:23 +00:00
Matthias Braun	325cd2c98a	ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps() addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544	2016-11-11 01:34:21 +00:00
Matt Arsenault	d486d3f8d1	AMDGPU: Initial implementation of VGPR indexing mode This is the most basic handling of the indirect access pseudos using GPR indexing mode. This currently only enables the mode for a single v_mov_b32 and then disables it. This is much more complicated to use than the movrel instructions, so a new optimization pass is probably needed to fold the access into the uses and keep the mode enabled for them. llvm-svn: 284031	2016-10-12 18:49:05 +00:00
Matt Arsenault	93401f4b5e	AMDGPU: Change check prefix in test llvm-svn: 283521	2016-10-07 03:55:04 +00:00
Matt Arsenault	5d8eb25e78	AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832	2016-09-30 01:50:20 +00:00
Tom Stellard	0d23ebe888	AMDGPU/SI: Implement a custom MachineSchedStrategy Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995	2016-08-29 19:42:52 +00:00
Matt Arsenault	f403df38eb	Replace subregister uses when processing tied operands This was for some reason skipping operands that are subregisters instead of keeping the same subregister index. v_movreld_b32 expects src0 to be the subregister of the tied super register use/def. e.g. v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3] was being replaced with v[4:7] = copy v[0:3] v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7], which really writes to v[0:3] llvm-svn: 279804	2016-08-26 06:31:32 +00:00
Matt Arsenault	f0ba86a4d5	AMDGPU: Fix phis from blocks split due to register indexing llvm-svn: 276257	2016-07-21 09:40:57 +00:00
Matt Arsenault	cb540bc03c	AMDGPU: Expand register indexing pseudos in custom inserter This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934	2016-07-19 00:35:03 +00:00
Matt Arsenault	1322b6f8bb	AMDGPU: Improve offset folding for register indexing llvm-svn: 274954	2016-07-09 01:13:56 +00:00
Matt Arsenault	b4d9503171	AMDGPU: Fix out of bounds indirect indexing errors This was producing acceses to registers beyond the super register's limits, resulting in verifier failures. llvm-svn: 273977	2016-06-28 01:09:00 +00:00
Matt Arsenault	21a4625a16	AMDGPU: Fix verifier errors with undef vector indices Also fix pointlessly adding exec to liveins. llvm-svn: 273916	2016-06-27 19:57:44 +00:00
Matthias Braun	6ad3d05b68	MachineScheduler: Fully compare top/bottom candidates In bidirectional scheduling this gives more stable results than just comparing the "reason" fields of the top/bottom node because the reason field may be higher depending on what other nodes are in the queue. Differential Revision: http://reviews.llvm.org/D19401 llvm-svn: 273755	2016-06-25 00:23:00 +00:00
Matt Arsenault	3cb4ddeb4e	AMDGPU: Fix liveness when expanding m0 loop llvm-svn: 273514	2016-06-22 23:40:57 +00:00
Matt Arsenault	9babdf4265	AMDGPU: Fix verifier errors in SILowerControlFlow The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467	2016-06-22 20:15:28 +00:00
Matt Arsenault	9c47dd583a	AMDGPU: Remove some old intrinsic uses from tests llvm-svn: 260493	2016-02-11 06:02:01 +00:00
Matt Arsenault	284192730a	AMDGPU: Use explicit register size indirect pseudos This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494	2015-10-07 00:42:51 +00:00
Tom Stellard	45bb48ea19	R600 -> AMDGPU rename llvm-svn: 239657	2015-06-13 03:28:10 +00:00

35 Commits