llvm-project

Commit Graph

Author	SHA1	Message	Date
Austin Kerbow	2f41a023af	AMDGPU: Fix SMEM WAR hazard for gfx10 readlane Summary: Hazard recognizer fails to see hazard with V_READLANE_B32_gfx10. Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69172 llvm-svn: 375265	2019-10-18 18:20:30 +00:00
Matt Arsenault	34ed76e180	GlobalISel: Implement lower for G_SADDO/G_SSUBO Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042	2019-10-16 20:46:32 +00:00
Stanislav Mekhanoshin	edcd5815ce	[AMDGPU] Do not combine dpp mov reading physregs We cannot be sure physregs will stay unchanged. Differential Revision: https://reviews.llvm.org/D69065 llvm-svn: 375033	2019-10-16 19:28:25 +00:00
Stanislav Mekhanoshin	3d99310c15	[AMDGPU] Do not combine dpp with physreg def We will remove dpp mov along with the physreg def otherwise. Differential Revision: https://reviews.llvm.org/D69063 llvm-svn: 375030	2019-10-16 18:48:54 +00:00
David Stuttard	2d6a2303f8	[AMDGPU] Fix-up cases where writelane has 2 SGPR operands Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Reviewers: rampitec, tpr, arsenm, nhaehnle Reviewed By: rampitec, arsenm, nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D51932 Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea llvm-svn: 375004	2019-10-16 14:37:39 +00:00
Piotr Sobczak	02baaca742	[AMDGPU] Extend the SI Load/Store optimizer Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions. When the instructions of the same subclass differ only in dmask, merge them and update dmask accordingly. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64911 llvm-svn: 374984	2019-10-16 10:17:02 +00:00
Austin Kerbow	527e9f9a3f	AMDGPU: Fix infinite searches in SIFixSGPRCopies Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies. The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY. The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop. %0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0 Reviewers: alex-t, rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68970 llvm-svn: 374944	2019-10-15 19:59:45 +00:00
Stanislav Mekhanoshin	1184c27fa5	[AMDGPU] Support mov dpp with 64 bit operands We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910	2019-10-15 16:41:15 +00:00
Stanislav Mekhanoshin	6e8599d939	[AMDGPU] Allow DPP combiner to work with REG_SEQUENCE Differential Revision: https://reviews.llvm.org/D68828 llvm-svn: 374908	2019-10-15 16:17:50 +00:00
Roman Tereshin	044297ccbf	[update_mir_test_checks] Handle MI flags properly previously we would generate literal check lines w/ no reg-exps for vregs as MI flags (nsw, ninf, etc.) won't be recognized as a part of MI. Fixing that. Includes updating the MIR tests that suffered from the problem. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D68905 llvm-svn: 374829	2019-10-14 22:01:58 +00:00
Matt Arsenault	e8f1ad2ad8	AMDGPU: Remove unnecessary IR from test llvm-svn: 374800	2019-10-14 18:30:29 +00:00
Cameron McInally	20b8ed2c2b	[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374782	2019-10-14 15:35:01 +00:00
Alexander Timofeev	c4d256a590	[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767	2019-10-14 12:01:10 +00:00
Stanislav Mekhanoshin	f87fe45d5c	[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC. llvm-svn: 374607	2019-10-11 22:28:04 +00:00
Stanislav Mekhanoshin	e2d104f64c	[AMDGPU] link dpp pseudos and real instructions on gfx10 This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604	2019-10-11 22:03:36 +00:00
Marcello Maggioni	0112123eea	[GISel] Allow getConstantVRegVal() to return G_FCONSTANT values. In GISel we have both G_CONSTANT and G_FCONSTANT, but because in GISel we don't really have a concept of Float vs Int value the only difference between the two is where the data originates from. What both G_CONSTANT and G_FCONSTANT return is just a bag of bits with the constant representation in it. By making getConstantVRegVal() return G_FCONSTANTs bit representation as well we allow ConstantFold and other things to operate with G_FCONSTANT. Adding tests that show ConstantFolding to work on mixed G_CONSTANT and G_FCONSTANT sources. Differential Revision: https://reviews.llvm.org/D68739 llvm-svn: 374458	2019-10-10 21:46:26 +00:00
Stanislav Mekhanoshin	19a1a739b1	[AMDGPU] Handle undef old operand in DPP combine It was missing an undef flag. Differential Revision: https://reviews.llvm.org/D68813 llvm-svn: 374455	2019-10-10 21:32:41 +00:00
Stanislav Mekhanoshin	cbe55c7caf	[AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC. llvm-svn: 374365	2019-10-10 15:28:52 +00:00
Dmitri Gribenko	eaf6dd482b	Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator" This reverts commit r374240. It broke OCaml tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014 llvm-svn: 374354	2019-10-10 14:13:54 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Matt Arsenault	f8bf7d7f42	AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257	2019-10-09 22:51:42 +00:00
Matt Arsenault	85dfa82302	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255	2019-10-09 22:44:49 +00:00
Matt Arsenault	3cd3959fe2	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252	2019-10-09 22:44:43 +00:00
Stanislav Mekhanoshin	c6dec1d828	[AMDGPU] Fixed dpp combine of VOP1 If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241	2019-10-09 22:02:58 +00:00
Cameron McInally	47363a148f	[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374240	2019-10-09 21:52:15 +00:00
Matt Arsenault	190a17bbd1	AMDGPU: Fix i16 arithmetic pattern redundancy There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092	2019-10-08 17:36:38 +00:00
Tom Stellard	3a8d80944b	AMDGPU: Add offsets to MMO when lowering buffer intrinsics Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087	2019-10-08 17:04:51 +00:00
Amaury Sechet	7df5b2f79f	(Re)generate various tests. NFC llvm-svn: 374074	2019-10-08 16:16:26 +00:00
Nicolai Haehnle	df6e67697b	AMDGPU: Propagate undef flag during pre-RA exec mask optimizations Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041	2019-10-08 12:46:32 +00:00
Nicolai Haehnle	7febdb7f27	MachineSSAUpdater: insert IMPLICIT_DEF at top of basic block Summary: When getValueInMiddleOfBlock happens to be called for a basic block that has no incoming value at all, an IMPLICIT_DEF is inserted in that block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at the top of its basic block or it will likely not reach the use that the caller intends to insert. Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68183 llvm-svn: 374040	2019-10-08 12:46:20 +00:00
Matt Arsenault	c8a6df7130	AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources llvm-svn: 373989	2019-10-07 23:33:08 +00:00
Matt Arsenault	538b73b797	AMDGPU/GlobalISel: Handle more G_INSERT cases Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947	2019-10-07 19:16:26 +00:00
Matt Arsenault	4bcdcad91b	GlobalISel: Partially implement lower for G_INSERT llvm-svn: 373946	2019-10-07 19:13:27 +00:00
Matt Arsenault	1237aa2996	AMDGPU/GlobalISel: Fix selection of 16-bit shifts llvm-svn: 373945	2019-10-07 19:10:44 +00:00
Matt Arsenault	09ec6918bc	AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32 llvm-svn: 373944	2019-10-07 19:10:43 +00:00
Matt Arsenault	0b2ea91d6d	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943	2019-10-07 19:07:19 +00:00
Matt Arsenault	578fa2819f	AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources Continue making a mess of merge/unmerge legality. llvm-svn: 373942	2019-10-07 19:05:58 +00:00
Matt Arsenault	b4cbf9862c	AMDGPU/GlobalISel: Select more G_INSERT cases At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938	2019-10-07 18:43:31 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Jay Foad	301decd93d	[AMDGPU] Fix test checks The GFX10-DENORM-STRICT checks were only passing by accident. Fix them to make the test more robust in the face of scheduling or register allocation changes. llvm-svn: 373893	2019-10-07 10:57:41 +00:00
Matt Arsenault	c0ec72d4f8	AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics llvm-svn: 373840	2019-10-06 01:37:38 +00:00
Matt Arsenault	bcd6b1d209	AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS llvm-svn: 373839	2019-10-06 01:37:37 +00:00
Matt Arsenault	a5b9c75674	GlobalISel: Partially implement lower for G_EXTRACT Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838	2019-10-06 01:37:35 +00:00
Matt Arsenault	69c65a8609	AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics This wasn't updated for the immarg handling change. llvm-svn: 373837	2019-10-06 01:37:34 +00:00
Matt Arsenault	d7cad4fb41	AMDGPU/GlobalISel: Fix using wrong addrspace for aperture This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716	2019-10-04 08:35:38 +00:00
Matt Arsenault	412e0bf8f3	AMDGPU/GlobalISel: Select G_PTRTOINT llvm-svn: 373715	2019-10-04 08:35:37 +00:00
Matt Arsenault	be9521acaa	AMDGPU/GlobalISel: Support wave32 waterfall loops llvm-svn: 373714	2019-10-04 08:35:35 +00:00
Matt Arsenault	ed77b27441	AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT llvm-svn: 373639	2019-10-03 17:59:03 +00:00
Matt Arsenault	233ff982c7	AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638	2019-10-03 17:55:27 +00:00
Matt Arsenault	56271fe180	AMDGPU/GlobalISel: Allow VGPR to index SGPR register We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637	2019-10-03 17:50:32 +00:00

1 2 3 4 5 ...

2808 Commits