llvm-project

Commit Graph

Author	SHA1	Message	Date
Austin Kerbow	c35b358b74	AMDGPU/GlobalISel: Legalize FDIV16 Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347	2019-10-25 11:07:17 -07:00
Craig Topper	a5376f6322	[GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to combine trunc(g_constant). This allows X86 to properly form shift by immediate instructions since we require an 8-bit constant to match the imported SelectionDAG patterns.	2019-10-24 12:59:26 -07:00
Austin Kerbow	97263fa2dd	AMDGPU/GlobalISel: Legalize fast unsafe FDIV Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69231 llvm-svn: 375460	2019-10-21 22:18:26 +00:00
Matt Arsenault	f9a42ed0a7	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267	2019-10-18 18:26:37 +00:00
Matt Arsenault	34ed76e180	GlobalISel: Implement lower for G_SADDO/G_SSUBO Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042	2019-10-16 20:46:32 +00:00
Roman Tereshin	044297ccbf	[update_mir_test_checks] Handle MI flags properly previously we would generate literal check lines w/ no reg-exps for vregs as MI flags (nsw, ninf, etc.) won't be recognized as a part of MI. Fixing that. Includes updating the MIR tests that suffered from the problem. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D68905 llvm-svn: 374829	2019-10-14 22:01:58 +00:00
Marcello Maggioni	0112123eea	[GISel] Allow getConstantVRegVal() to return G_FCONSTANT values. In GISel we have both G_CONSTANT and G_FCONSTANT, but because in GISel we don't really have a concept of Float vs Int value the only difference between the two is where the data originates from. What both G_CONSTANT and G_FCONSTANT return is just a bag of bits with the constant representation in it. By making getConstantVRegVal() return G_FCONSTANTs bit representation as well we allow ConstantFold and other things to operate with G_FCONSTANT. Adding tests that show ConstantFolding to work on mixed G_CONSTANT and G_FCONSTANT sources. Differential Revision: https://reviews.llvm.org/D68739 llvm-svn: 374458	2019-10-10 21:46:26 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Matt Arsenault	85dfa82302	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255	2019-10-09 22:44:49 +00:00
Matt Arsenault	3cd3959fe2	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252	2019-10-09 22:44:43 +00:00
Matt Arsenault	190a17bbd1	AMDGPU: Fix i16 arithmetic pattern redundancy There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092	2019-10-08 17:36:38 +00:00
Matt Arsenault	c8a6df7130	AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources llvm-svn: 373989	2019-10-07 23:33:08 +00:00
Matt Arsenault	538b73b797	AMDGPU/GlobalISel: Handle more G_INSERT cases Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947	2019-10-07 19:16:26 +00:00
Matt Arsenault	4bcdcad91b	GlobalISel: Partially implement lower for G_INSERT llvm-svn: 373946	2019-10-07 19:13:27 +00:00
Matt Arsenault	1237aa2996	AMDGPU/GlobalISel: Fix selection of 16-bit shifts llvm-svn: 373945	2019-10-07 19:10:44 +00:00
Matt Arsenault	09ec6918bc	AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32 llvm-svn: 373944	2019-10-07 19:10:43 +00:00
Matt Arsenault	0b2ea91d6d	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943	2019-10-07 19:07:19 +00:00
Matt Arsenault	578fa2819f	AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources Continue making a mess of merge/unmerge legality. llvm-svn: 373942	2019-10-07 19:05:58 +00:00
Matt Arsenault	b4cbf9862c	AMDGPU/GlobalISel: Select more G_INSERT cases At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938	2019-10-07 18:43:31 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Matt Arsenault	c0ec72d4f8	AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics llvm-svn: 373840	2019-10-06 01:37:38 +00:00
Matt Arsenault	bcd6b1d209	AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS llvm-svn: 373839	2019-10-06 01:37:37 +00:00
Matt Arsenault	a5b9c75674	GlobalISel: Partially implement lower for G_EXTRACT Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838	2019-10-06 01:37:35 +00:00
Matt Arsenault	69c65a8609	AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics This wasn't updated for the immarg handling change. llvm-svn: 373837	2019-10-06 01:37:34 +00:00
Matt Arsenault	d7cad4fb41	AMDGPU/GlobalISel: Fix using wrong addrspace for aperture This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716	2019-10-04 08:35:38 +00:00
Matt Arsenault	412e0bf8f3	AMDGPU/GlobalISel: Select G_PTRTOINT llvm-svn: 373715	2019-10-04 08:35:37 +00:00
Matt Arsenault	be9521acaa	AMDGPU/GlobalISel: Support wave32 waterfall loops llvm-svn: 373714	2019-10-04 08:35:35 +00:00
Matt Arsenault	ed77b27441	AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT llvm-svn: 373639	2019-10-03 17:59:03 +00:00
Matt Arsenault	233ff982c7	AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638	2019-10-03 17:55:27 +00:00
Matt Arsenault	56271fe180	AMDGPU/GlobalISel: Allow VGPR to index SGPR register We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637	2019-10-03 17:50:32 +00:00
Matt Arsenault	9256183994	AMDGPU/GlobalISel: Add some more tests for G_INSERT legalization llvm-svn: 373636	2019-10-03 17:50:31 +00:00
Matt Arsenault	3d23e58dbe	AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and This would try to do FewerElements to v9s8 llvm-svn: 373635	2019-10-03 17:50:29 +00:00
Matt Arsenault	1c135a39aa	AMDGPU/GlobalISel: Expand G_BITCAST legality llvm-svn: 373567	2019-10-03 05:46:08 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	cdfe5efe9b	AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415	2019-10-02 01:02:24 +00:00
Matt Arsenault	bfce0c2664	AMDGPU/GlobalISel: Private loads always use VGPRs llvm-svn: 373414	2019-10-02 01:02:21 +00:00
Matt Arsenault	05aa8a733e	AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR This will be needed to support AGPR operations. llvm-svn: 373413	2019-10-02 01:02:18 +00:00
Matt Arsenault	3a657afb3a	AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values llvm-svn: 373412	2019-10-02 01:02:14 +00:00
Matt Arsenault	9dba603748	AMDGPU/GlobalISel: Increase max legal size to 1024 There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350	2019-10-01 16:35:06 +00:00
Dmitri Gribenko	827a7fab78	Revert "GlobalISel: Handle llvm.read_register" This reverts commit r373294. It broke Clang's CodeGen/arm64-microsoft-status-reg.cpp: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483 llvm-svn: 373310	2019-10-01 08:24:01 +00:00
Matt Arsenault	fdea5e02ce	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP llvm-svn: 373298	2019-10-01 02:23:20 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Matt Arsenault	bdcc6d3d26	GlobalISel: Handle llvm.read_register SelectionDAG has a bunch of machinery to defer this to selection time for some reason. Just directly emit a copy during IRTranslator. The x86 usage does somewhat questionably check hasFP, which could depend on the whole function being at minimum translated. This does lose the convergent bit if the callsite had it, which may be a problem. We also lose that in general for intrinsics, which may also be a problem. llvm-svn: 373294	2019-10-01 02:07:16 +00:00
Matt Arsenault	8f6bdb7668	AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293	2019-10-01 01:44:46 +00:00
Matt Arsenault	54167ea316	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO llvm-svn: 373288	2019-10-01 01:23:13 +00:00
Matt Arsenault	ed85b0cee6	GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287	2019-10-01 01:06:48 +00:00
Matt Arsenault	77ac400117	AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286	2019-10-01 01:06:43 +00:00
Matt Arsenault	317d991fa5	AMDGPU/GlobalISel: Fix select for v2s16 and/or/xor llvm-svn: 373180	2019-09-30 06:31:30 +00:00
Matt Arsenault	eb6eb694e4	AMDGPU/GlobalISel: Allow selection of scalar min/max I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450	2019-09-21 02:37:33 +00:00
Nico Weber	03475adcf7	Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it." This reverts commit `52621307bc`. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ****************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED **************** Test has no run line! ****************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383	2019-09-20 12:05:29 +00:00
Sterling Augustine	52621307bc	Use getTargetConstant for BLENDI, and add a test to catch it. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67785 Tighten up the test case. llvm-svn: 372366	2019-09-20 02:29:16 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Matt Arsenault	bffbeecb44	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzle llvm-svn: 372297	2019-09-19 04:11:17 +00:00
Matt Arsenault	494243597b	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293	2019-09-19 02:35:08 +00:00
Matt Arsenault	67f1f6ff8c	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store llvm-svn: 372292	2019-09-19 02:30:27 +00:00
Matt Arsenault	838ff36553	AMDGPU/GlobalISel: RegBankSelect struct buffer load/store llvm-svn: 372291	2019-09-19 02:26:53 +00:00
Matt Arsenault	a62ef58346	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load\|store} llvm-svn: 372290	2019-09-19 02:25:09 +00:00
Matt Arsenault	a30d022db6	AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsics Images should always have 2 consecutive, mandatory SGPR arguments. llvm-svn: 372289	2019-09-19 02:23:06 +00:00
Matt Arsenault	22e2c09515	AMDGPU/GlobalISel: Fix RegBankSelect G_SMULH/G_UMULH pre-gfx9 The scalar versions were only introduced in gfx9. llvm-svn: 372286	2019-09-19 01:42:34 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Amara Emerson	9d64721ca5	[GlobalISel] Partially revert r371901. r371901 was overeager and widenScalarDst() and the like in the legalizer attempt to increment the insert point given in order to add new instructions after the currently legalizing inst. In cases where the insertion point is not exactly the current instruction, then callers need to de-compensate for the behaviour by decrementing the insertion iterator before calling them. It's not a nice state of affairs, for now just undo the problematic parts of the change. llvm-svn: 372050	2019-09-16 23:46:03 +00:00
Matt Arsenault	07b8597656	AMDGPU/GlobalISel: Fix some broken run lines llvm-svn: 371992	2019-09-16 14:14:40 +00:00
Matt Arsenault	1fc07d6648	AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEIL llvm-svn: 371991	2019-09-16 14:14:37 +00:00
Matt Arsenault	bf7524db35	AMDGPU/GlobalISel: Remove another illegal select test llvm-svn: 371990	2019-09-16 14:14:31 +00:00
Matt Arsenault	255d157672	AMDGPU/GlobalISel: Remove illegal select tests These fail in a release build. llvm-svn: 371955	2019-09-16 04:21:10 +00:00
Matt Arsenault	bc8de8a8da	AMDGPU/GlobalISel: Select SMRD loads for more types llvm-svn: 371954	2019-09-16 00:54:07 +00:00
Matt Arsenault	48b158acae	AMDGPU/GlobalISel: RegBankSelect for kill llvm-svn: 371953	2019-09-16 00:48:37 +00:00
Matt Arsenault	01c7f40de3	AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP llvm-svn: 371952	2019-09-16 00:37:10 +00:00
Matt Arsenault	60169ed613	AMDGPU/GlobalISel: Set type on vgpr live in special arguments Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951	2019-09-16 00:33:00 +00:00
Matt Arsenault	9f52c1ea58	AMDGPU/GlobalISel: Select S16->S32 fptoint llvm-svn: 371950	2019-09-16 00:32:56 +00:00
Matt Arsenault	0a6123595f	AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFP llvm-svn: 371949	2019-09-16 00:29:12 +00:00
Matt Arsenault	f5d5cd205e	AMDGPU/GlobalISel: Fix VALU s16 fneg llvm-svn: 371948	2019-09-16 00:20:54 +00:00
Amara Emerson	02bcc86b08	[GlobalISel] Fix insertion point of new instructions to be after PHIs. For some reason we sometimes insert new instructions one instruction before the first non-PHI when legalizing. This can result in having non-PHI instructions before PHIs, which mean that PHI elimination doesn't catch them. Differential Revision: https://reviews.llvm.org/D67570 llvm-svn: 371901	2019-09-13 21:49:24 +00:00
Matt Arsenault	a4be3eff5c	AMDGPU/GlobalISel: Legalize s32->s16 G_SITOFP/G_UITOFP llvm-svn: 371811	2019-09-13 04:04:55 +00:00
Matt Arsenault	67d9349dad	AMDGPU/GlobalISel: Fix RegBankSelect for amdgcn.else llvm-svn: 371808	2019-09-13 03:55:49 +00:00
Matt Arsenault	638f802381	AMDGPU/GlobalISel: Select 16-bit VALU bit ops llvm-svn: 371807	2019-09-13 03:55:43 +00:00
Matt Arsenault	f457dd2bd4	AMDGPU/GlobalISel: Legalize G_FFLOOR llvm-svn: 371803	2019-09-13 01:48:15 +00:00
Matt Arsenault	4d33918034	AMDGPU/GlobalISel: Legalize G_FMAD Unlike SelectionDAG, treat this as a normally legalizable operation. In SelectionDAG this is supposed to only ever formed if it's legal, but I've found that to be restricting. For AMDGPU this is contextually legal depending on whether denormal flushing is allowed in the use function. Technically we currently treat the denormal mode as a subtarget feature, so custom lowering could be avoided. However I consider this to be a defect, and this should be contextually dependent on the controllable rounding mode of the parent function. llvm-svn: 371800	2019-09-13 00:44:35 +00:00
Matt Arsenault	4a73c6eada	AMDGPU/GlobalISel: Select G_CTPOP llvm-svn: 371798	2019-09-13 00:11:20 +00:00
Guillaume Chatelet	48904e9452	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608	2019-09-11 11:16:48 +00:00
Matt Arsenault	4a23ae5e78	GlobalISel/TableGen: Handle REG_SEQUENCE patterns The scalar f64 patterns don't work yet because they fail on multiple results from the unused implicit def of scc in the result bit operation. llvm-svn: 371542	2019-09-10 17:57:33 +00:00
Matt Arsenault	e1895aba3d	AMDGPU/GlobalISel: Select G_FABS/G_FNEG f64 doesn't work yet because tablegen currently doesn't handlde REG_SEQUENCE. This does regress some multi use VALU fneg cases since now the immediate remains in an SGPR, and more moves are used for legalizing the xor. This is a SIFixSGPRCopies deficiency. llvm-svn: 371540	2019-09-10 17:19:46 +00:00
Matt Arsenault	7df5b3fd26	AMDGPU/GlobalISel: Select cvt pk intrinsics llvm-svn: 371539	2019-09-10 17:17:05 +00:00
Matt Arsenault	37d1bda4f6	AMDGPU/GlobalISel: Select llvm.amdgcn.sffbh llvm-svn: 371538	2019-09-10 17:16:59 +00:00
Matt Arsenault	da027275c6	AMDGPU/GlobalISel: RegBankSelect for G_ZEXTLOAD/G_SEXTLOAD llvm-svn: 371536	2019-09-10 16:42:37 +00:00
Matt Arsenault	ad6a8b83cd	AMDGPU/GlobalISel: Legalize constant 32-bit loads Legalize by casting to a 64-bit constant address. This isn't how the DAG implements it, but it should. llvm-svn: 371535	2019-09-10 16:42:31 +00:00
Matt Arsenault	c0ceca5883	AMDGPU/GlobalISel: First pass at attempting to legalize load/stores There's still a lot more to do, but this handles decomposing due to alignment. I've gotten it to the point where nothing crashes or infinite loops the legalizer. llvm-svn: 371533	2019-09-10 16:20:14 +00:00
Matt Arsenault	a91f017ae3	AMDGPU/GlobalISel: Fix insert point when lowering fminnum/fmaxnum llvm-svn: 371471	2019-09-09 23:30:11 +00:00
Matt Arsenault	a0933e6df7	AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR v2s16 Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will probably be more convenient in most cases. llvm-svn: 371440	2019-09-09 18:57:51 +00:00
Matt Arsenault	77e3e9cafd	AMDGPU/GlobalISel: Select llvm.amdgcn.class Also fixes missing SubtargetPredicate on f16 class instructions. llvm-svn: 371436	2019-09-09 18:29:45 +00:00
Matt Arsenault	d6c1f5bb15	AMDGPU/GlobalISel: Select fmed3 llvm-svn: 371435	2019-09-09 18:29:37 +00:00
Matt Arsenault	6ebf605851	AMDGPU: Use PatFrags to allow selecting custom nodes or intrinsics This enables GlobalISel to handle various intrinsics. The custom node pattern will be ignored, and the intrinsic will work. This will also allow SelectionDAG to directly select the intrinsics, but as they are all custom lowered to the nodes, this ends up leaving dead code in the table. Eventually either GlobalISel should add the equivalent of custom nodes equivalent, or intrinsics should be directly used. These each have different tradeoffs. There are a few more to handle, but these are easy to handle ones. Some others fail for other reasons. llvm-svn: 371432	2019-09-09 18:10:31 +00:00
Matt Arsenault	d2a9516a6d	AMDGPU: Move MnemonicAlias out of instruction def hierarchy Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427	2019-09-09 17:25:35 +00:00
Matt Arsenault	64ecca90d4	AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE Handle the simple case that lowers to a constant. llvm-svn: 371424	2019-09-09 17:13:44 +00:00
Matt Arsenault	182f9248e8	AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR_TRUNC Treat this as legal on gfx9 since it can use S_PACK_* instructions for this. This isn't used by anything yet. The same will probably apply to 16-bit G_BUILD_VECTOR without the trunc. llvm-svn: 371423	2019-09-09 17:04:18 +00:00
Matt Arsenault	63e6d8db1c	AMDGPU/GlobalISel: Select atomic loads A new check for an explicitly atomic MMO is needed to avoid incorrectly matching pattern for non-atomic loads llvm-svn: 371418	2019-09-09 16:18:07 +00:00
Matt Arsenault	d8409b178e	AMDGPU/GlobalISel: Fix RegBankSelect for unaligned, uniform constant loads llvm-svn: 371416	2019-09-09 16:06:37 +00:00
Matt Arsenault	02eb308387	AMDGPU/GlobalISel: Fix regbankselect for uniform extloads There are no scalar extloads. llvm-svn: 371414	2019-09-09 16:03:45 +00:00
Matt Arsenault	ebbd6e4976	AMDGPU: Remove code address space predicates Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413	2019-09-09 16:02:07 +00:00
Matt Arsenault	c34b4036ff	AMDGPU/GlobalISel: Select G_PTR_MASK llvm-svn: 371412	2019-09-09 15:46:13 +00:00
Matt Arsenault	fdb7030117	AMDGPU/GlobalISel: Fix reg bank for uniform LDS loads The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411	2019-09-09 15:44:16 +00:00
Matt Arsenault	2dd088ec7d	AMDGPU/GlobalISel: Use known bits for selection llvm-svn: 371409	2019-09-09 15:39:32 +00:00
Matt Arsenault	8e3bc9b572	AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic llvm-svn: 371407	2019-09-09 15:20:49 +00:00
Matt Arsenault	3e45c70288	GlobalISel: Support physical register inputs in patterns llvm-svn: 371253	2019-09-06 20:32:37 +00:00
Matt Arsenault	4d90625271	AMDGPU/GlobalISel: Fix load/store of types in other address spaces There should probably be a size only matcher. llvm-svn: 371155	2019-09-06 00:36:06 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Matt Arsenault	69b1a2ae65	AMDGPU/GlobalISel: Restore insert point when getting aperture Avoids SSA violations in a future patch. llvm-svn: 371008	2019-09-05 02:20:32 +00:00
Matt Arsenault	25156ae7ea	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast llvm-svn: 371007	2019-09-05 02:20:29 +00:00
Matt Arsenault	d51a3746d0	AMDGPU/GlobalISel: Fix assert on load from constant address llvm-svn: 371006	2019-09-05 02:20:25 +00:00
Matt Arsenault	2df41a8e38	AMDGPU/GlobalISel: Select G_BITREVERSE llvm-svn: 370980	2019-09-04 20:46:31 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Matt Arsenault	d9af712da4	AMDGPU/GlobalISel: Make 16-bit constants legal This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921	2019-09-04 16:19:45 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Matt Arsenault	8ec5c10042	GlobalISel/TableGen: Handle setcc patterns This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280	2019-08-29 01:13:41 +00:00
Matt Arsenault	a8bbcbd006	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150	2019-08-28 02:11:03 +00:00
Matt Arsenault	5c7e96dc26	AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace llvm-svn: 370140	2019-08-28 00:58:24 +00:00
Petar Avramovic	d568ed40e0	[GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAG Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS to match names of surrounding variables. Differential Revision: https://reviews.llvm.org/D66587 llvm-svn: 370062	2019-08-27 14:22:32 +00:00
Volkan Keles	277631e3b8	[GlobalISel] Legalizer: Retry combining illegal artifacts as long as there new artifacts Summary: Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s possible to combine them after processing the rest of the instruction because the legalization is likely to generate more artifacts that allow ArtifactCombiner to combine away them. Instead, move illegal artifacts to another list called RetryList and wait until all of the instruction in InstList are legalized. After that, check if there is any new artifacts and try to combine them again if that’s the case. If not, abort. The idea is similar to D59339, but the approach is a bit different. This patch fixes the issue described above, but the legalizer still may be unable to handle some cases depending on when to legalize artifacts. So, in the long run, we probably need a different legalization strategy that handles this dependency in a better way. Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette Reviewed By: dsanders Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65894 llvm-svn: 369805	2019-08-23 20:30:35 +00:00
Matt Arsenault	fba82858f2	GlobalISel: Don't create G_UADDE with constant false carry in The x86 tests are now broken (in paticular add-scalar.ll now hits the DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a custom lowering for this, so that will need to be implemented. llvm-svn: 369673	2019-08-22 17:29:17 +00:00
Matt Arsenault	954a012b4c	GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547	2019-08-21 16:59:10 +00:00
Aditya Nandakumar	c65ac865c3	[GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong source index https://reviews.llvm.org/D66182 llvm-svn: 368781	2019-08-14 01:23:33 +00:00
Aditya Nandakumar	615eee6402	[GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sources https://reviews.llvm.org/D66171 llvm-svn: 368753	2019-08-13 21:49:11 +00:00
Matt Arsenault	28215caa60	GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES Odd sized vectors aren't handled yet. llvm-svn: 368713	2019-08-13 16:26:28 +00:00
Matt Arsenault	690645bda0	GlobalISel: Implement lower for G_SHUFFLE_VECTOR llvm-svn: 368709	2019-08-13 16:09:07 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Matt Arsenault	ff6b007772	AMDGPU/GlobalISel: Alternative mappings for constants Without context we assume SGPR. Allowing VGPR constants theoretically helps avoid a copy. This seems to not actually work now, and the choice isn't based on the use bank. llvm-svn: 367871	2019-08-05 14:40:26 +00:00
Matt Arsenault	d9d30a408e	GlobalISel: Lower scalarizing unmerge of a vector to shifts AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604	2019-08-01 19:10:05 +00:00
Matt Arsenault	5faa533e47	GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591	2019-08-01 18:13:16 +00:00
Fangrui Song	67a8d6c795	AMDGPU/GlobalISel: fix inst-select-load-local.mir in -DLLVM_ENABLE_ASSERTIONS=off builds after r367498 llvm-svn: 367514	2019-08-01 04:03:06 +00:00
Matt Arsenault	9952f46407	AMDGPU/GlobalISel: Fix flat load/store of pointer types llvm-svn: 367513	2019-08-01 03:57:42 +00:00
Matt Arsenault	57495268ac	AMDGPU/GlobalISel: Remove manual store select code This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512	2019-08-01 03:52:40 +00:00
Matt Arsenault	ae87b9f2c2	AMDGPU/GlobalISel: Select local atomic cmpxchg llvm-svn: 367511	2019-08-01 03:41:41 +00:00
Matt Arsenault	26cb53b260	AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD llvm-svn: 367509	2019-08-01 03:33:15 +00:00
Matt Arsenault	da5b9bfa95	AMDGPU/GlobalISel: Allow selection of DS atomicrmw llvm-svn: 367507	2019-08-01 03:29:01 +00:00
Matt Arsenault	3baf4d3418	AMDGPU/GlobalISel: Select simple local stores llvm-svn: 367504	2019-08-01 03:09:15 +00:00
Matt Arsenault	7bedceb5b2	GlobalISel: moreElementsVector for G_LOAD/G_STORE AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503	2019-08-01 01:44:22 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Matt Arsenault	9cf980d4a7	GlobalISel: Add G_ATOMICRMW_{FADD\|FSUB} llvm-svn: 367369	2019-07-30 23:56:30 +00:00
Austin Kerbow	c99f62e313	[AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization. Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344	2019-07-30 18:49:16 +00:00
Matt Arsenault	a9ea8a9aae	AMDGPU/GlobalISel: Handle most function return types handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082	2019-07-26 02:36:05 +00:00
Matt Arsenault	51d795d941	GlobalISel: Fold out unmerge to scalars from concat_vector Removes illegal intermediate vectors if an operation was lowering to concat_vectors, and the next operation is scalarized. llvm-svn: 367081	2019-07-26 02:22:23 +00:00
Matt Arsenault	0e7d8698b5	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915	2019-07-24 16:05:53 +00:00
Matt Arsenault	4668ea4072	AMDGPU/GlobalISel: Fix broken tests llvm-svn: 366688	2019-07-22 13:33:11 +00:00
Matt Arsenault	8d372008b1	AMDGPU/GlobalISel: Fix tests without asserts The legality check is only done under NDEBUG, so the failure cases are different in a release build. llvm-svn: 366680	2019-07-22 12:43:41 +00:00
Matt Arsenault	f3bfb85bce	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces llvm-svn: 366621	2019-07-19 22:28:44 +00:00
Matt Arsenault	7df225dfc2	AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597	2019-07-19 17:52:56 +00:00
Matt Arsenault	08494f6231	AMDGPU/GlobalISel: Selection for fminnum/fmaxnum v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585	2019-07-19 14:42:40 +00:00
Matt Arsenault	b60a2ae40e	AMDGPU/GlobalISel: Support arguments with multiple registers Handles structs used directly in argument lists. llvm-svn: 366584	2019-07-19 14:29:30 +00:00
Matt Arsenault	fecf43eba3	AMDGPU/GlobalISel: Rewrite lowerFormalArguments This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582	2019-07-19 14:15:18 +00:00
Matt Arsenault	1022c0dfde	AMDGPU: Decompose all values to 32-bit pieces for calling conventions This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578	2019-07-19 13:57:44 +00:00
Matt Arsenault	0966dd0d69	GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sources Extract the sources to the GCD of the original size and target size, padding with implicit_def as necessary. Also fix the case where the requested source type is wider than the original result type. This was ignoring the type, and just using the destination. Do the operation in the requested type and truncate back. llvm-svn: 366367	2019-07-17 20:22:44 +00:00
Matt Arsenault	914a59cad8	GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUES Use an anyext to the requested type for the leftover operand to produce a slightly wider type, and then truncate the final merge. I have another implementation almost ready which handles arbitrary widens, but I think it produces worse code in this example (which I think is 90% due to not folding redundant copies or folding out implicit_def users), so I wanted to add this as a baseline first. llvm-svn: 366366	2019-07-17 20:22:38 +00:00
Nicolai Haehnle	8b7041a5c6	AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314	2019-07-17 11:22:57 +00:00
Matt Arsenault	f8c8284455	AMDGPU/GlobalISel: Select G_ASHR llvm-svn: 366257	2019-07-16 20:31:25 +00:00
Matt Arsenault	e5b28b98e9	AMDGPU/GlobalISel: Select G_LSHR llvm-svn: 366256	2019-07-16 20:25:43 +00:00
Matt Arsenault	1b69fd275d	AMDGPU/GlobalISel: Select G_SHL I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254	2019-07-16 20:15:30 +00:00
Matt Arsenault	2d10407719	AMDGPU/GlobalISel: Fix selection of private stores llvm-svn: 366249	2019-07-16 19:27:44 +00:00
Matt Arsenault	7161fb0be5	AMDGPU/GlobalISel: Select private loads llvm-svn: 366248	2019-07-16 19:22:21 +00:00
Matt Arsenault	dad1f89210	AMDGPU/GlobalISel: Select flat stores llvm-svn: 366246	2019-07-16 18:42:53 +00:00
Matt Arsenault	35c96598b1	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237	2019-07-16 18:05:29 +00:00
Matt Arsenault	22c4a147a9	AMDGPU/GlobalISel: Fix test failures in release build Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210	2019-07-16 14:28:30 +00:00
Matt Arsenault	66ee934440	AMDGPU/GlobalISel: Allow scalar s1 and/or/xor If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125	2019-07-15 20:20:18 +00:00
Matt Arsenault	c8291c94f8	AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR llvm-svn: 366121	2019-07-15 19:50:07 +00:00
Matt Arsenault	ad19b50c00	AMDGPU/GlobalISel: Don't constrain source register of VCC copies This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120	2019-07-15 19:48:36 +00:00
Matt Arsenault	e1b52f4180	AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119	2019-07-15 19:46:48 +00:00
Matt Arsenault	3bfdb54d88	AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC llvm-svn: 366118	2019-07-15 19:45:49 +00:00
Matt Arsenault	18b7133843	AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117	2019-07-15 19:44:07 +00:00
Matt Arsenault	6ed315f89b	AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT llvm-svn: 366116	2019-07-15 19:43:04 +00:00
Matt Arsenault	b0e04c018c	AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT Turn the constant cases into G_EXTRACTs. llvm-svn: 366115	2019-07-15 19:40:59 +00:00
Matt Arsenault	5dfd466032	AMDGPU/GlobalISel: Fix G_ICMP for wave32 llvm-svn: 366114	2019-07-15 19:39:31 +00:00
Matt Arsenault	434d664095	GlobalISel: Implement narrowScalar for vector extract/insert indexes llvm-svn: 366113	2019-07-15 19:37:34 +00:00
Matt Arsenault	90bdfb3daf	AMDGPU/GlobalISel: Widen vector extracts llvm-svn: 366103	2019-07-15 18:31:10 +00:00
Matt Arsenault	53fa759ff5	AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break llvm-svn: 366102	2019-07-15 18:25:24 +00:00
Matt Arsenault	b390121efb	AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf llvm-svn: 366099	2019-07-15 18:18:46 +00:00
Matt Arsenault	a65913e752	AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR llvm-svn: 366087	2019-07-15 17:26:43 +00:00
Matt Arsenault	cc02b17082	AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS llvm-svn: 366086	2019-07-15 17:20:40 +00:00
Matt Arsenault	51a05d72ae	AMDGPU: Drop remnants of byval support for shaders Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953	2019-07-12 20:12:17 +00:00
Matt Arsenault	6ce1b4fec5	GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM llvm-svn: 365658	2019-07-10 16:31:19 +00:00
Tom Stellard	d0ba79fe7b	AMDGPU/GlobalISel: Add support for wide loads >= 256-bits Summary: This adds support for the most commonly used wide load types: <8xi32>, <16xi32>, <4xi64>, and <8xi64> Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57399 llvm-svn: 365586	2019-07-10 00:22:41 +00:00
Matt Arsenault	b1843e130a	GlobalISel: Implement lower for G_FCOPYSIGN In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583	2019-07-09 23:34:29 +00:00
Matt Arsenault	3f1a34546c	AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTOR llvm-svn: 365575	2019-07-09 22:48:04 +00:00
Matt Arsenault	14a4495155	GlobalISel: Combine unmerge of merge with intermediate cast This eliminates some illegal intermediate vectors when operations are scalarized. llvm-svn: 365566	2019-07-09 22:19:13 +00:00
Matt Arsenault	fdd761af15	AMDGPU/GlobalISel: Prepare some tests for store selection Mostsly these would fail due to trying to use SI with a flat operation. Implementing global loads with MUBUF is more work than flat, so these won't be handled in the initial load selection. Others fail because store of s64 won't initially work, as the current set of patterns expect everything to be turned into v2i32. llvm-svn: 365493	2019-07-09 14:30:57 +00:00
Matt Arsenault	85ad662dfd	AMDGPU/GlobalISel: Fix test llvm-svn: 365491	2019-07-09 14:30:02 +00:00
Matt Arsenault	4dd5755d01	AMDGPU/GlobalISel: Legalize more concat_vectors llvm-svn: 365488	2019-07-09 14:17:31 +00:00
Matt Arsenault	6bdb92d833	AMDGPU/GlobalISel: Improve regbankselect for icmp s16 Account for 64-bit scalar eq/ne when available. llvm-svn: 365487	2019-07-09 14:13:09 +00:00
Matt Arsenault	8b8eee5904	AMDGPU/GlobalISel: Make s16 G_ICMP legal llvm-svn: 365486	2019-07-09 14:10:43 +00:00
Matt Arsenault	e6d10f97dd	AMDGPU/GlobalISel: Select G_SUB llvm-svn: 365484	2019-07-09 14:05:11 +00:00
Matt Arsenault	872f38be7e	AMDGPU/GlobalISel: Select G_UNMERGE_VALUES llvm-svn: 365483	2019-07-09 14:02:26 +00:00
Matt Arsenault	9b7ffc4e55	AMDGPU/GlobalISel: Select G_MERGE_VALUES llvm-svn: 365482	2019-07-09 14:02:20 +00:00
Matt Arsenault	43cbca50e4	GlobalISel: Fix widenScalar for pointer typed G_MERGE_VALUES llvm-svn: 365093	2019-07-03 23:08:06 +00:00
Amara Emerson	cac1151845	[AArch64][GlobalISel] Overhaul legalization & isel or shifts to select immediate forms. There are two main issues preventing us from generating immediate form shifts: 1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift immediate forms, but they currently don't work because the amount type is expected to be an s64 constant, but we only legalize them to have homogenous types. To deal with this, first we introduce a custom legalizer to only custom legalize s32 shifts which have a constant operand into a s64. There is also an additional artifact combiner to fold zexts(g_constant) to a larger G_CONSTANT if it's legal, a counterpart to the anyext version committed in an earlier patch. 2) For G_SHL the importer can't cope with the pattern. For this I introduced an early selection phase in the arm64 selector to select these forms manually before the tablegen selector pessimizes it to a register-register variant. Differential Revision: https://reviews.llvm.org/D63910 llvm-svn: 364994	2019-07-03 01:49:06 +00:00
Matt Arsenault	50be3481d4	AMDGPU/GlobalISel: Try generated matcher with intrinsics llvm-svn: 364933	2019-07-02 14:52:16 +00:00
Matt Arsenault	a8bff4b963	AMDGPU/GlobalISel: Select mul llvm-svn: 364932	2019-07-02 14:52:14 +00:00
Matt Arsenault	dd7ca4faa5	GlobalISel: Define GINodeEquiv for G_UMULH/G_SMULH llvm-svn: 364931	2019-07-02 14:49:29 +00:00
Matt Arsenault	70a4d3f67c	AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands The register bank for the destination of the sample argument copy was wrong. We shouldn't be constraining each source to the result register bank. Allow constraining the original register to the right size. llvm-svn: 364928	2019-07-02 14:40:22 +00:00
Matt Arsenault	ed63399244	AMDGPU/GlobalISel: Select G_FENCE Manually select to workaround tablegen emitter emitting checks for G_CONSTANT. llvm-svn: 364927	2019-07-02 14:17:38 +00:00
Matt Arsenault	ce690544a6	GlobalISel: Add G_FENCE The pattern importer is for some reason emitting checks for G_CONSTANT for the immediate operands. llvm-svn: 364926	2019-07-02 14:16:39 +00:00
Matt Arsenault	c9f14f29f5	GlobalISel: Try to widen merges with other merges If the requested source type an be used as a merge source type, create a merge of merges. This avoids creating large, illegal extensions and bit-ops directly to the result type. llvm-svn: 364841	2019-07-01 19:36:10 +00:00

... 2 3 4 5 6 ...

658 Commits