llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	eb6eb694e4	AMDGPU/GlobalISel: Allow selection of scalar min/max I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450	2019-09-21 02:37:33 +00:00
Nico Weber	03475adcf7	Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it." This reverts commit `52621307bc`. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ****************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED **************** Test has no run line! ****************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383	2019-09-20 12:05:29 +00:00
Sterling Augustine	52621307bc	Use getTargetConstant for BLENDI, and add a test to catch it. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67785 Tighten up the test case. llvm-svn: 372366	2019-09-20 02:29:16 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Matt Arsenault	bffbeecb44	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzle llvm-svn: 372297	2019-09-19 04:11:17 +00:00
Matt Arsenault	494243597b	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293	2019-09-19 02:35:08 +00:00
Matt Arsenault	67f1f6ff8c	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store llvm-svn: 372292	2019-09-19 02:30:27 +00:00
Matt Arsenault	838ff36553	AMDGPU/GlobalISel: RegBankSelect struct buffer load/store llvm-svn: 372291	2019-09-19 02:26:53 +00:00
Matt Arsenault	a62ef58346	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load\|store} llvm-svn: 372290	2019-09-19 02:25:09 +00:00
Matt Arsenault	a30d022db6	AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsics Images should always have 2 consecutive, mandatory SGPR arguments. llvm-svn: 372289	2019-09-19 02:23:06 +00:00
Matt Arsenault	22e2c09515	AMDGPU/GlobalISel: Fix RegBankSelect G_SMULH/G_UMULH pre-gfx9 The scalar versions were only introduced in gfx9. llvm-svn: 372286	2019-09-19 01:42:34 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Amara Emerson	9d64721ca5	[GlobalISel] Partially revert r371901. r371901 was overeager and widenScalarDst() and the like in the legalizer attempt to increment the insert point given in order to add new instructions after the currently legalizing inst. In cases where the insertion point is not exactly the current instruction, then callers need to de-compensate for the behaviour by decrementing the insertion iterator before calling them. It's not a nice state of affairs, for now just undo the problematic parts of the change. llvm-svn: 372050	2019-09-16 23:46:03 +00:00
Matt Arsenault	07b8597656	AMDGPU/GlobalISel: Fix some broken run lines llvm-svn: 371992	2019-09-16 14:14:40 +00:00
Matt Arsenault	1fc07d6648	AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEIL llvm-svn: 371991	2019-09-16 14:14:37 +00:00
Matt Arsenault	bf7524db35	AMDGPU/GlobalISel: Remove another illegal select test llvm-svn: 371990	2019-09-16 14:14:31 +00:00
Matt Arsenault	255d157672	AMDGPU/GlobalISel: Remove illegal select tests These fail in a release build. llvm-svn: 371955	2019-09-16 04:21:10 +00:00
Matt Arsenault	bc8de8a8da	AMDGPU/GlobalISel: Select SMRD loads for more types llvm-svn: 371954	2019-09-16 00:54:07 +00:00
Matt Arsenault	48b158acae	AMDGPU/GlobalISel: RegBankSelect for kill llvm-svn: 371953	2019-09-16 00:48:37 +00:00
Matt Arsenault	01c7f40de3	AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP llvm-svn: 371952	2019-09-16 00:37:10 +00:00
Matt Arsenault	60169ed613	AMDGPU/GlobalISel: Set type on vgpr live in special arguments Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951	2019-09-16 00:33:00 +00:00
Matt Arsenault	9f52c1ea58	AMDGPU/GlobalISel: Select S16->S32 fptoint llvm-svn: 371950	2019-09-16 00:32:56 +00:00
Matt Arsenault	0a6123595f	AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFP llvm-svn: 371949	2019-09-16 00:29:12 +00:00
Matt Arsenault	f5d5cd205e	AMDGPU/GlobalISel: Fix VALU s16 fneg llvm-svn: 371948	2019-09-16 00:20:54 +00:00
Amara Emerson	02bcc86b08	[GlobalISel] Fix insertion point of new instructions to be after PHIs. For some reason we sometimes insert new instructions one instruction before the first non-PHI when legalizing. This can result in having non-PHI instructions before PHIs, which mean that PHI elimination doesn't catch them. Differential Revision: https://reviews.llvm.org/D67570 llvm-svn: 371901	2019-09-13 21:49:24 +00:00
Matt Arsenault	a4be3eff5c	AMDGPU/GlobalISel: Legalize s32->s16 G_SITOFP/G_UITOFP llvm-svn: 371811	2019-09-13 04:04:55 +00:00
Matt Arsenault	67d9349dad	AMDGPU/GlobalISel: Fix RegBankSelect for amdgcn.else llvm-svn: 371808	2019-09-13 03:55:49 +00:00
Matt Arsenault	638f802381	AMDGPU/GlobalISel: Select 16-bit VALU bit ops llvm-svn: 371807	2019-09-13 03:55:43 +00:00
Matt Arsenault	f457dd2bd4	AMDGPU/GlobalISel: Legalize G_FFLOOR llvm-svn: 371803	2019-09-13 01:48:15 +00:00
Matt Arsenault	4d33918034	AMDGPU/GlobalISel: Legalize G_FMAD Unlike SelectionDAG, treat this as a normally legalizable operation. In SelectionDAG this is supposed to only ever formed if it's legal, but I've found that to be restricting. For AMDGPU this is contextually legal depending on whether denormal flushing is allowed in the use function. Technically we currently treat the denormal mode as a subtarget feature, so custom lowering could be avoided. However I consider this to be a defect, and this should be contextually dependent on the controllable rounding mode of the parent function. llvm-svn: 371800	2019-09-13 00:44:35 +00:00
Matt Arsenault	4a73c6eada	AMDGPU/GlobalISel: Select G_CTPOP llvm-svn: 371798	2019-09-13 00:11:20 +00:00
Guillaume Chatelet	48904e9452	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608	2019-09-11 11:16:48 +00:00
Matt Arsenault	4a23ae5e78	GlobalISel/TableGen: Handle REG_SEQUENCE patterns The scalar f64 patterns don't work yet because they fail on multiple results from the unused implicit def of scc in the result bit operation. llvm-svn: 371542	2019-09-10 17:57:33 +00:00
Matt Arsenault	e1895aba3d	AMDGPU/GlobalISel: Select G_FABS/G_FNEG f64 doesn't work yet because tablegen currently doesn't handlde REG_SEQUENCE. This does regress some multi use VALU fneg cases since now the immediate remains in an SGPR, and more moves are used for legalizing the xor. This is a SIFixSGPRCopies deficiency. llvm-svn: 371540	2019-09-10 17:19:46 +00:00
Matt Arsenault	7df5b3fd26	AMDGPU/GlobalISel: Select cvt pk intrinsics llvm-svn: 371539	2019-09-10 17:17:05 +00:00
Matt Arsenault	37d1bda4f6	AMDGPU/GlobalISel: Select llvm.amdgcn.sffbh llvm-svn: 371538	2019-09-10 17:16:59 +00:00
Matt Arsenault	da027275c6	AMDGPU/GlobalISel: RegBankSelect for G_ZEXTLOAD/G_SEXTLOAD llvm-svn: 371536	2019-09-10 16:42:37 +00:00
Matt Arsenault	ad6a8b83cd	AMDGPU/GlobalISel: Legalize constant 32-bit loads Legalize by casting to a 64-bit constant address. This isn't how the DAG implements it, but it should. llvm-svn: 371535	2019-09-10 16:42:31 +00:00
Matt Arsenault	c0ceca5883	AMDGPU/GlobalISel: First pass at attempting to legalize load/stores There's still a lot more to do, but this handles decomposing due to alignment. I've gotten it to the point where nothing crashes or infinite loops the legalizer. llvm-svn: 371533	2019-09-10 16:20:14 +00:00
Matt Arsenault	a91f017ae3	AMDGPU/GlobalISel: Fix insert point when lowering fminnum/fmaxnum llvm-svn: 371471	2019-09-09 23:30:11 +00:00
Matt Arsenault	a0933e6df7	AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR v2s16 Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will probably be more convenient in most cases. llvm-svn: 371440	2019-09-09 18:57:51 +00:00
Matt Arsenault	77e3e9cafd	AMDGPU/GlobalISel: Select llvm.amdgcn.class Also fixes missing SubtargetPredicate on f16 class instructions. llvm-svn: 371436	2019-09-09 18:29:45 +00:00
Matt Arsenault	d6c1f5bb15	AMDGPU/GlobalISel: Select fmed3 llvm-svn: 371435	2019-09-09 18:29:37 +00:00
Matt Arsenault	6ebf605851	AMDGPU: Use PatFrags to allow selecting custom nodes or intrinsics This enables GlobalISel to handle various intrinsics. The custom node pattern will be ignored, and the intrinsic will work. This will also allow SelectionDAG to directly select the intrinsics, but as they are all custom lowered to the nodes, this ends up leaving dead code in the table. Eventually either GlobalISel should add the equivalent of custom nodes equivalent, or intrinsics should be directly used. These each have different tradeoffs. There are a few more to handle, but these are easy to handle ones. Some others fail for other reasons. llvm-svn: 371432	2019-09-09 18:10:31 +00:00
Matt Arsenault	d2a9516a6d	AMDGPU: Move MnemonicAlias out of instruction def hierarchy Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427	2019-09-09 17:25:35 +00:00
Matt Arsenault	64ecca90d4	AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE Handle the simple case that lowers to a constant. llvm-svn: 371424	2019-09-09 17:13:44 +00:00
Matt Arsenault	182f9248e8	AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR_TRUNC Treat this as legal on gfx9 since it can use S_PACK_* instructions for this. This isn't used by anything yet. The same will probably apply to 16-bit G_BUILD_VECTOR without the trunc. llvm-svn: 371423	2019-09-09 17:04:18 +00:00
Matt Arsenault	63e6d8db1c	AMDGPU/GlobalISel: Select atomic loads A new check for an explicitly atomic MMO is needed to avoid incorrectly matching pattern for non-atomic loads llvm-svn: 371418	2019-09-09 16:18:07 +00:00
Matt Arsenault	d8409b178e	AMDGPU/GlobalISel: Fix RegBankSelect for unaligned, uniform constant loads llvm-svn: 371416	2019-09-09 16:06:37 +00:00
Matt Arsenault	02eb308387	AMDGPU/GlobalISel: Fix regbankselect for uniform extloads There are no scalar extloads. llvm-svn: 371414	2019-09-09 16:03:45 +00:00
Matt Arsenault	ebbd6e4976	AMDGPU: Remove code address space predicates Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413	2019-09-09 16:02:07 +00:00
Matt Arsenault	c34b4036ff	AMDGPU/GlobalISel: Select G_PTR_MASK llvm-svn: 371412	2019-09-09 15:46:13 +00:00
Matt Arsenault	fdb7030117	AMDGPU/GlobalISel: Fix reg bank for uniform LDS loads The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411	2019-09-09 15:44:16 +00:00
Matt Arsenault	2dd088ec7d	AMDGPU/GlobalISel: Use known bits for selection llvm-svn: 371409	2019-09-09 15:39:32 +00:00
Matt Arsenault	8e3bc9b572	AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic llvm-svn: 371407	2019-09-09 15:20:49 +00:00
Matt Arsenault	3e45c70288	GlobalISel: Support physical register inputs in patterns llvm-svn: 371253	2019-09-06 20:32:37 +00:00
Matt Arsenault	4d90625271	AMDGPU/GlobalISel: Fix load/store of types in other address spaces There should probably be a size only matcher. llvm-svn: 371155	2019-09-06 00:36:06 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Matt Arsenault	69b1a2ae65	AMDGPU/GlobalISel: Restore insert point when getting aperture Avoids SSA violations in a future patch. llvm-svn: 371008	2019-09-05 02:20:32 +00:00
Matt Arsenault	25156ae7ea	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast llvm-svn: 371007	2019-09-05 02:20:29 +00:00
Matt Arsenault	d51a3746d0	AMDGPU/GlobalISel: Fix assert on load from constant address llvm-svn: 371006	2019-09-05 02:20:25 +00:00
Matt Arsenault	2df41a8e38	AMDGPU/GlobalISel: Select G_BITREVERSE llvm-svn: 370980	2019-09-04 20:46:31 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Matt Arsenault	d9af712da4	AMDGPU/GlobalISel: Make 16-bit constants legal This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921	2019-09-04 16:19:45 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Matt Arsenault	8ec5c10042	GlobalISel/TableGen: Handle setcc patterns This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280	2019-08-29 01:13:41 +00:00
Matt Arsenault	a8bbcbd006	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150	2019-08-28 02:11:03 +00:00
Matt Arsenault	5c7e96dc26	AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace llvm-svn: 370140	2019-08-28 00:58:24 +00:00
Petar Avramovic	d568ed40e0	[GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAG Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS to match names of surrounding variables. Differential Revision: https://reviews.llvm.org/D66587 llvm-svn: 370062	2019-08-27 14:22:32 +00:00
Volkan Keles	277631e3b8	[GlobalISel] Legalizer: Retry combining illegal artifacts as long as there new artifacts Summary: Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s possible to combine them after processing the rest of the instruction because the legalization is likely to generate more artifacts that allow ArtifactCombiner to combine away them. Instead, move illegal artifacts to another list called RetryList and wait until all of the instruction in InstList are legalized. After that, check if there is any new artifacts and try to combine them again if that’s the case. If not, abort. The idea is similar to D59339, but the approach is a bit different. This patch fixes the issue described above, but the legalizer still may be unable to handle some cases depending on when to legalize artifacts. So, in the long run, we probably need a different legalization strategy that handles this dependency in a better way. Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette Reviewed By: dsanders Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65894 llvm-svn: 369805	2019-08-23 20:30:35 +00:00
Matt Arsenault	fba82858f2	GlobalISel: Don't create G_UADDE with constant false carry in The x86 tests are now broken (in paticular add-scalar.ll now hits the DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a custom lowering for this, so that will need to be implemented. llvm-svn: 369673	2019-08-22 17:29:17 +00:00
Matt Arsenault	954a012b4c	GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547	2019-08-21 16:59:10 +00:00
Aditya Nandakumar	c65ac865c3	[GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong source index https://reviews.llvm.org/D66182 llvm-svn: 368781	2019-08-14 01:23:33 +00:00
Aditya Nandakumar	615eee6402	[GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sources https://reviews.llvm.org/D66171 llvm-svn: 368753	2019-08-13 21:49:11 +00:00
Matt Arsenault	28215caa60	GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES Odd sized vectors aren't handled yet. llvm-svn: 368713	2019-08-13 16:26:28 +00:00
Matt Arsenault	690645bda0	GlobalISel: Implement lower for G_SHUFFLE_VECTOR llvm-svn: 368709	2019-08-13 16:09:07 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Matt Arsenault	ff6b007772	AMDGPU/GlobalISel: Alternative mappings for constants Without context we assume SGPR. Allowing VGPR constants theoretically helps avoid a copy. This seems to not actually work now, and the choice isn't based on the use bank. llvm-svn: 367871	2019-08-05 14:40:26 +00:00
Matt Arsenault	d9d30a408e	GlobalISel: Lower scalarizing unmerge of a vector to shifts AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604	2019-08-01 19:10:05 +00:00
Matt Arsenault	5faa533e47	GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591	2019-08-01 18:13:16 +00:00
Fangrui Song	67a8d6c795	AMDGPU/GlobalISel: fix inst-select-load-local.mir in -DLLVM_ENABLE_ASSERTIONS=off builds after r367498 llvm-svn: 367514	2019-08-01 04:03:06 +00:00
Matt Arsenault	9952f46407	AMDGPU/GlobalISel: Fix flat load/store of pointer types llvm-svn: 367513	2019-08-01 03:57:42 +00:00
Matt Arsenault	57495268ac	AMDGPU/GlobalISel: Remove manual store select code This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512	2019-08-01 03:52:40 +00:00
Matt Arsenault	ae87b9f2c2	AMDGPU/GlobalISel: Select local atomic cmpxchg llvm-svn: 367511	2019-08-01 03:41:41 +00:00
Matt Arsenault	26cb53b260	AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD llvm-svn: 367509	2019-08-01 03:33:15 +00:00
Matt Arsenault	da5b9bfa95	AMDGPU/GlobalISel: Allow selection of DS atomicrmw llvm-svn: 367507	2019-08-01 03:29:01 +00:00
Matt Arsenault	3baf4d3418	AMDGPU/GlobalISel: Select simple local stores llvm-svn: 367504	2019-08-01 03:09:15 +00:00
Matt Arsenault	7bedceb5b2	GlobalISel: moreElementsVector for G_LOAD/G_STORE AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503	2019-08-01 01:44:22 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Matt Arsenault	9cf980d4a7	GlobalISel: Add G_ATOMICRMW_{FADD\|FSUB} llvm-svn: 367369	2019-07-30 23:56:30 +00:00
Austin Kerbow	c99f62e313	[AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization. Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344	2019-07-30 18:49:16 +00:00
Matt Arsenault	a9ea8a9aae	AMDGPU/GlobalISel: Handle most function return types handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082	2019-07-26 02:36:05 +00:00
Matt Arsenault	51d795d941	GlobalISel: Fold out unmerge to scalars from concat_vector Removes illegal intermediate vectors if an operation was lowering to concat_vectors, and the next operation is scalarized. llvm-svn: 367081	2019-07-26 02:22:23 +00:00
Matt Arsenault	0e7d8698b5	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915	2019-07-24 16:05:53 +00:00
Matt Arsenault	4668ea4072	AMDGPU/GlobalISel: Fix broken tests llvm-svn: 366688	2019-07-22 13:33:11 +00:00
Matt Arsenault	8d372008b1	AMDGPU/GlobalISel: Fix tests without asserts The legality check is only done under NDEBUG, so the failure cases are different in a release build. llvm-svn: 366680	2019-07-22 12:43:41 +00:00
Matt Arsenault	f3bfb85bce	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces llvm-svn: 366621	2019-07-19 22:28:44 +00:00
Matt Arsenault	7df225dfc2	AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597	2019-07-19 17:52:56 +00:00
Matt Arsenault	08494f6231	AMDGPU/GlobalISel: Selection for fminnum/fmaxnum v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585	2019-07-19 14:42:40 +00:00
Matt Arsenault	b60a2ae40e	AMDGPU/GlobalISel: Support arguments with multiple registers Handles structs used directly in argument lists. llvm-svn: 366584	2019-07-19 14:29:30 +00:00
Matt Arsenault	fecf43eba3	AMDGPU/GlobalISel: Rewrite lowerFormalArguments This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582	2019-07-19 14:15:18 +00:00
Matt Arsenault	1022c0dfde	AMDGPU: Decompose all values to 32-bit pieces for calling conventions This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578	2019-07-19 13:57:44 +00:00
Matt Arsenault	0966dd0d69	GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sources Extract the sources to the GCD of the original size and target size, padding with implicit_def as necessary. Also fix the case where the requested source type is wider than the original result type. This was ignoring the type, and just using the destination. Do the operation in the requested type and truncate back. llvm-svn: 366367	2019-07-17 20:22:44 +00:00
Matt Arsenault	914a59cad8	GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUES Use an anyext to the requested type for the leftover operand to produce a slightly wider type, and then truncate the final merge. I have another implementation almost ready which handles arbitrary widens, but I think it produces worse code in this example (which I think is 90% due to not folding redundant copies or folding out implicit_def users), so I wanted to add this as a baseline first. llvm-svn: 366366	2019-07-17 20:22:38 +00:00
Nicolai Haehnle	8b7041a5c6	AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314	2019-07-17 11:22:57 +00:00
Matt Arsenault	f8c8284455	AMDGPU/GlobalISel: Select G_ASHR llvm-svn: 366257	2019-07-16 20:31:25 +00:00
Matt Arsenault	e5b28b98e9	AMDGPU/GlobalISel: Select G_LSHR llvm-svn: 366256	2019-07-16 20:25:43 +00:00
Matt Arsenault	1b69fd275d	AMDGPU/GlobalISel: Select G_SHL I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254	2019-07-16 20:15:30 +00:00
Matt Arsenault	2d10407719	AMDGPU/GlobalISel: Fix selection of private stores llvm-svn: 366249	2019-07-16 19:27:44 +00:00
Matt Arsenault	7161fb0be5	AMDGPU/GlobalISel: Select private loads llvm-svn: 366248	2019-07-16 19:22:21 +00:00
Matt Arsenault	dad1f89210	AMDGPU/GlobalISel: Select flat stores llvm-svn: 366246	2019-07-16 18:42:53 +00:00
Matt Arsenault	35c96598b1	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237	2019-07-16 18:05:29 +00:00
Matt Arsenault	22c4a147a9	AMDGPU/GlobalISel: Fix test failures in release build Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210	2019-07-16 14:28:30 +00:00
Matt Arsenault	66ee934440	AMDGPU/GlobalISel: Allow scalar s1 and/or/xor If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125	2019-07-15 20:20:18 +00:00
Matt Arsenault	c8291c94f8	AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR llvm-svn: 366121	2019-07-15 19:50:07 +00:00
Matt Arsenault	ad19b50c00	AMDGPU/GlobalISel: Don't constrain source register of VCC copies This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120	2019-07-15 19:48:36 +00:00
Matt Arsenault	e1b52f4180	AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119	2019-07-15 19:46:48 +00:00
Matt Arsenault	3bfdb54d88	AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC llvm-svn: 366118	2019-07-15 19:45:49 +00:00
Matt Arsenault	18b7133843	AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117	2019-07-15 19:44:07 +00:00
Matt Arsenault	6ed315f89b	AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT llvm-svn: 366116	2019-07-15 19:43:04 +00:00
Matt Arsenault	b0e04c018c	AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT Turn the constant cases into G_EXTRACTs. llvm-svn: 366115	2019-07-15 19:40:59 +00:00
Matt Arsenault	5dfd466032	AMDGPU/GlobalISel: Fix G_ICMP for wave32 llvm-svn: 366114	2019-07-15 19:39:31 +00:00
Matt Arsenault	434d664095	GlobalISel: Implement narrowScalar for vector extract/insert indexes llvm-svn: 366113	2019-07-15 19:37:34 +00:00
Matt Arsenault	90bdfb3daf	AMDGPU/GlobalISel: Widen vector extracts llvm-svn: 366103	2019-07-15 18:31:10 +00:00
Matt Arsenault	53fa759ff5	AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break llvm-svn: 366102	2019-07-15 18:25:24 +00:00
Matt Arsenault	b390121efb	AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf llvm-svn: 366099	2019-07-15 18:18:46 +00:00
Matt Arsenault	a65913e752	AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR llvm-svn: 366087	2019-07-15 17:26:43 +00:00
Matt Arsenault	cc02b17082	AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS llvm-svn: 366086	2019-07-15 17:20:40 +00:00
Matt Arsenault	51a05d72ae	AMDGPU: Drop remnants of byval support for shaders Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953	2019-07-12 20:12:17 +00:00
Matt Arsenault	6ce1b4fec5	GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM llvm-svn: 365658	2019-07-10 16:31:19 +00:00
Tom Stellard	d0ba79fe7b	AMDGPU/GlobalISel: Add support for wide loads >= 256-bits Summary: This adds support for the most commonly used wide load types: <8xi32>, <16xi32>, <4xi64>, and <8xi64> Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57399 llvm-svn: 365586	2019-07-10 00:22:41 +00:00
Matt Arsenault	b1843e130a	GlobalISel: Implement lower for G_FCOPYSIGN In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583	2019-07-09 23:34:29 +00:00
Matt Arsenault	3f1a34546c	AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTOR llvm-svn: 365575	2019-07-09 22:48:04 +00:00
Matt Arsenault	14a4495155	GlobalISel: Combine unmerge of merge with intermediate cast This eliminates some illegal intermediate vectors when operations are scalarized. llvm-svn: 365566	2019-07-09 22:19:13 +00:00
Matt Arsenault	fdd761af15	AMDGPU/GlobalISel: Prepare some tests for store selection Mostsly these would fail due to trying to use SI with a flat operation. Implementing global loads with MUBUF is more work than flat, so these won't be handled in the initial load selection. Others fail because store of s64 won't initially work, as the current set of patterns expect everything to be turned into v2i32. llvm-svn: 365493	2019-07-09 14:30:57 +00:00
Matt Arsenault	85ad662dfd	AMDGPU/GlobalISel: Fix test llvm-svn: 365491	2019-07-09 14:30:02 +00:00
Matt Arsenault	4dd5755d01	AMDGPU/GlobalISel: Legalize more concat_vectors llvm-svn: 365488	2019-07-09 14:17:31 +00:00
Matt Arsenault	6bdb92d833	AMDGPU/GlobalISel: Improve regbankselect for icmp s16 Account for 64-bit scalar eq/ne when available. llvm-svn: 365487	2019-07-09 14:13:09 +00:00
Matt Arsenault	8b8eee5904	AMDGPU/GlobalISel: Make s16 G_ICMP legal llvm-svn: 365486	2019-07-09 14:10:43 +00:00
Matt Arsenault	e6d10f97dd	AMDGPU/GlobalISel: Select G_SUB llvm-svn: 365484	2019-07-09 14:05:11 +00:00
Matt Arsenault	872f38be7e	AMDGPU/GlobalISel: Select G_UNMERGE_VALUES llvm-svn: 365483	2019-07-09 14:02:26 +00:00
Matt Arsenault	9b7ffc4e55	AMDGPU/GlobalISel: Select G_MERGE_VALUES llvm-svn: 365482	2019-07-09 14:02:20 +00:00
Matt Arsenault	43cbca50e4	GlobalISel: Fix widenScalar for pointer typed G_MERGE_VALUES llvm-svn: 365093	2019-07-03 23:08:06 +00:00
Amara Emerson	cac1151845	[AArch64][GlobalISel] Overhaul legalization & isel or shifts to select immediate forms. There are two main issues preventing us from generating immediate form shifts: 1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift immediate forms, but they currently don't work because the amount type is expected to be an s64 constant, but we only legalize them to have homogenous types. To deal with this, first we introduce a custom legalizer to only custom legalize s32 shifts which have a constant operand into a s64. There is also an additional artifact combiner to fold zexts(g_constant) to a larger G_CONSTANT if it's legal, a counterpart to the anyext version committed in an earlier patch. 2) For G_SHL the importer can't cope with the pattern. For this I introduced an early selection phase in the arm64 selector to select these forms manually before the tablegen selector pessimizes it to a register-register variant. Differential Revision: https://reviews.llvm.org/D63910 llvm-svn: 364994	2019-07-03 01:49:06 +00:00
Matt Arsenault	50be3481d4	AMDGPU/GlobalISel: Try generated matcher with intrinsics llvm-svn: 364933	2019-07-02 14:52:16 +00:00
Matt Arsenault	a8bff4b963	AMDGPU/GlobalISel: Select mul llvm-svn: 364932	2019-07-02 14:52:14 +00:00
Matt Arsenault	dd7ca4faa5	GlobalISel: Define GINodeEquiv for G_UMULH/G_SMULH llvm-svn: 364931	2019-07-02 14:49:29 +00:00
Matt Arsenault	70a4d3f67c	AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands The register bank for the destination of the sample argument copy was wrong. We shouldn't be constraining each source to the result register bank. Allow constraining the original register to the right size. llvm-svn: 364928	2019-07-02 14:40:22 +00:00
Matt Arsenault	ed63399244	AMDGPU/GlobalISel: Select G_FENCE Manually select to workaround tablegen emitter emitting checks for G_CONSTANT. llvm-svn: 364927	2019-07-02 14:17:38 +00:00
Matt Arsenault	ce690544a6	GlobalISel: Add G_FENCE The pattern importer is for some reason emitting checks for G_CONSTANT for the immediate operands. llvm-svn: 364926	2019-07-02 14:16:39 +00:00
Matt Arsenault	c9f14f29f5	GlobalISel: Try to widen merges with other merges If the requested source type an be used as a merge source type, create a merge of merges. This avoids creating large, illegal extensions and bit-ops directly to the result type. llvm-svn: 364841	2019-07-01 19:36:10 +00:00
Matt Arsenault	bae3636f96	AMDGPU/GlobalISel: Handle more input argument intrinsics llvm-svn: 364836	2019-07-01 18:50:50 +00:00
Matt Arsenault	9e8e8c60fa	AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics llvm-svn: 364835	2019-07-01 18:49:01 +00:00
Matt Arsenault	756d81905f	AMDGPU/GlobalISel: Legalize workgroup ID intrinsics llvm-svn: 364834	2019-07-01 18:47:22 +00:00
Matt Arsenault	e2c86cce3a	AMDGPU/GlobalISel: Legalize workitem ID intrinsics Tests don't cover the masked input path since non-kernel arguments aren't lowered yet. Test is copied directly from the existing test, with 2 additions. llvm-svn: 364833	2019-07-01 18:45:36 +00:00
Matt Arsenault	e15770aec4	AMDGPU/GlobalISel: Custom lower control flow intrinsics Replace the brcond for the 2 cases that act as branches. For now follow how the current system works, although I think we can eventually get rid of the pseudos. llvm-svn: 364832	2019-07-01 18:40:23 +00:00
Matt Arsenault	4073b33786	AMDGPU/GlobalISel: Handle 16-bit SALU min/max This needs to be extended to s32, and expanded into cmp+select. This is relying on the fact that widenScalar happens to leave the instruction in place, but this isn't a guaranteed property of LegalizerHelper. llvm-svn: 364831	2019-07-01 18:33:37 +00:00
Matt Arsenault	5a7d5111e5	AMDGPU/GlobalISel: Lower SALU min/max to cmp+select Use a change observer to apply a register bank to the newly created intermediate result register. llvm-svn: 364830	2019-07-01 18:30:45 +00:00
Matt Arsenault	7f8c720939	AMDGPU/GlobalISel: Add tests for add legalization llvm-svn: 364828	2019-07-01 18:26:47 +00:00
Matt Arsenault	ef59cb6982	AMDGPU/GlobalISel: Legalize s16 add/sub/mul If this is scalar, promote to s32. Use a new observer class to assign the register bank of newly created registers. llvm-svn: 364827	2019-07-01 18:18:55 +00:00
Matt Arsenault	9470bb262b	AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT The condition register bank must be scc or vcc so that a copy will be inserted, which will be lowered to a compare. Currently greedy unnecessarily forces using a VCC select. llvm-svn: 364825	2019-07-01 18:13:12 +00:00
Matt Arsenault	b2ea20eedd	AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt llvm-svn: 364819	2019-07-01 17:40:18 +00:00
Matt Arsenault	40d1faf38f	AMDGPU/GlobalISel: Legalize s16 fcmp llvm-svn: 364817	2019-07-01 17:35:53 +00:00
Matt Arsenault	1094e6a814	AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap llvm-svn: 364811	2019-07-01 17:04:57 +00:00
Matt Arsenault	265059eaf6	AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane llvm-svn: 364808	2019-07-01 16:41:36 +00:00
Matt Arsenault	0a52e9d026	AMDGPU/GlobalISel: Complete implementation of G_GEP Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806	2019-07-01 16:34:48 +00:00
Matt Arsenault	e1006259d8	AMDGPU/GlobalISel: Select G_PHI llvm-svn: 364805	2019-07-01 16:32:47 +00:00
Matt Arsenault	d810ff2588	AMDGPU/GlobalISel: Try to select VOP3 form of add There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804	2019-07-01 16:27:32 +00:00
Matt Arsenault	62d64b0c30	AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane llvm-svn: 364801	2019-07-01 16:19:39 +00:00
Tom Stellard	9e9dd30de3	AMDGPU/GlobalISel: Implement select for 32-bit G_ADD Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797	2019-07-01 16:09:33 +00:00
Matt Arsenault	2ab25f9ceb	AMDGPU/GlobalISel: Select G_BRCOND for vcc llvm-svn: 364795	2019-07-01 16:06:02 +00:00
Matt Arsenault	cda82f0bb6	AMDGPU/GlobalISel: Select G_FRAME_INDEX llvm-svn: 364789	2019-07-01 15:48:18 +00:00
Matt Arsenault	fdf36729c7	AMDGPU/GlobalISel: Make s16 select legal This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787	2019-07-01 15:42:47 +00:00
Matt Arsenault	6464280eb0	AMDGPU/GlobalISel: Select G_BRCOND for scc conditions llvm-svn: 364786	2019-07-01 15:39:27 +00:00
Matt Arsenault	1daad91af6	AMDGPU/GlobalISel: Tolerate copies with no type set isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784	2019-07-01 15:23:04 +00:00
Matt Arsenault	4f64ade04c	AMDGPU/GlobalISel: Select src modifiers llvm-svn: 364782	2019-07-01 15:18:56 +00:00
Matt Arsenault	5bf850d52e	AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE llvm-svn: 364768	2019-07-01 13:40:18 +00:00
Matt Arsenault	b5fc94f3e7	AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR llvm-svn: 364767	2019-07-01 13:40:17 +00:00
Matt Arsenault	89fc8bcdd6	AMDGPU/GlobalISel: Fail on store to 32-bit address space llvm-svn: 364766	2019-07-01 13:37:39 +00:00
Matt Arsenault	3b7668ae4b	AMDGPU/GlobalISel: Improve icmp selection coverage. Select s64 eq/ne scalar icmp. llvm-svn: 364765	2019-07-01 13:34:26 +00:00
Matt Arsenault	c23149f612	AMDGPU/GlobalISel: RegBankSelect for WWM/WQM llvm-svn: 364763	2019-07-01 13:30:12 +00:00
Matt Arsenault	facf69e844	AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote llvm-svn: 364762	2019-07-01 13:30:09 +00:00
Matt Arsenault	9f992c238a	AMDGPU/GlobalISel: Fix scc->vcc copy handling This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761	2019-07-01 13:22:07 +00:00
Matt Arsenault	5dafcb9b11	AMDGPU/GlobalISel: Use and instead of BFE with inline immediate Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760	2019-07-01 13:22:06 +00:00
Matt Arsenault	01bb075c1f	GlobalISel: Add GINodeEquiv for min/max llvm-svn: 364759	2019-07-01 13:22:04 +00:00
Matt Arsenault	fbf67d88de	GlobalISel: Add DAG compat for G_FCANONICALIZE llvm-svn: 364758	2019-07-01 13:22:00 +00:00
Matt Arsenault	7889d4ce66	AMDGPU/GlobalISel: Add some more tests for icmp select llvm-svn: 364703	2019-06-29 00:55:16 +00:00
Matt Arsenault	0d45209757	AMDGPU/GlobalISel: RegBankSelect for update.dpp llvm-svn: 364701	2019-06-29 00:44:36 +00:00
Matt Arsenault	fd82cf4f4d	AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec llvm-svn: 364699	2019-06-29 00:39:20 +00:00
Matt Arsenault	adb1f21e52	AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics llvm-svn: 364698	2019-06-29 00:33:13 +00:00
Matt Arsenault	5ea3c9adb2	AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsics llvm-svn: 364696	2019-06-29 00:28:52 +00:00
Matt Arsenault	6aafb3068f	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmas llvm-svn: 364695	2019-06-29 00:25:53 +00:00
Matt Arsenault	ade5162432	AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsics llvm-svn: 364694	2019-06-29 00:22:28 +00:00
Diana Picus	c3dbe23977	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510	2019-06-27 08:54:17 +00:00
Matt Arsenault	f4e51dd2cd	AMDGPU/GlobalISel: Fix broken test llvm-svn: 364316	2019-06-25 13:57:53 +00:00
Matt Arsenault	d7ffa2a948	AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT llvm-svn: 364308	2019-06-25 13:18:11 +00:00
Matt Arsenault	25bc27965a	AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class llvm-svn: 364262	2019-06-25 01:07:22 +00:00
Matt Arsenault	2100caf7f6	AMDGPU/GlobalISel: Add tests for regbankselect of v2s16 and/or/xor llvm-svn: 364244	2019-06-24 22:21:02 +00:00
Matt Arsenault	dbb6c03175	AMDGPU/GlobalISel: Select G_TRUNC llvm-svn: 364215	2019-06-24 18:02:18 +00:00

... 2 3 4 5 6 ...

610 Commits