llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	bc8de8a8da	AMDGPU/GlobalISel: Select SMRD loads for more types llvm-svn: 371954	2019-09-16 00:54:07 +00:00
Matt Arsenault	48b158acae	AMDGPU/GlobalISel: RegBankSelect for kill llvm-svn: 371953	2019-09-16 00:48:37 +00:00
Matt Arsenault	01c7f40de3	AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP llvm-svn: 371952	2019-09-16 00:37:10 +00:00
Matt Arsenault	60169ed613	AMDGPU/GlobalISel: Set type on vgpr live in special arguments Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951	2019-09-16 00:33:00 +00:00
Matt Arsenault	9f52c1ea58	AMDGPU/GlobalISel: Select S16->S32 fptoint llvm-svn: 371950	2019-09-16 00:32:56 +00:00
Matt Arsenault	0a6123595f	AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFP llvm-svn: 371949	2019-09-16 00:29:12 +00:00
Matt Arsenault	f5d5cd205e	AMDGPU/GlobalISel: Fix VALU s16 fneg llvm-svn: 371948	2019-09-16 00:20:54 +00:00
Amara Emerson	02bcc86b08	[GlobalISel] Fix insertion point of new instructions to be after PHIs. For some reason we sometimes insert new instructions one instruction before the first non-PHI when legalizing. This can result in having non-PHI instructions before PHIs, which mean that PHI elimination doesn't catch them. Differential Revision: https://reviews.llvm.org/D67570 llvm-svn: 371901	2019-09-13 21:49:24 +00:00
Alexander Timofeev	9ff70132bf	Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. llvm-svn: 371873	2019-09-13 17:37:30 +00:00
Matt Arsenault	a4be3eff5c	AMDGPU/GlobalISel: Legalize s32->s16 G_SITOFP/G_UITOFP llvm-svn: 371811	2019-09-13 04:04:55 +00:00
Matt Arsenault	67d9349dad	AMDGPU/GlobalISel: Fix RegBankSelect for amdgcn.else llvm-svn: 371808	2019-09-13 03:55:49 +00:00
Matt Arsenault	638f802381	AMDGPU/GlobalISel: Select 16-bit VALU bit ops llvm-svn: 371807	2019-09-13 03:55:43 +00:00
Matt Arsenault	f457dd2bd4	AMDGPU/GlobalISel: Legalize G_FFLOOR llvm-svn: 371803	2019-09-13 01:48:15 +00:00
Tim Shen	a31c521f5e	Temporarily revert r371640 "LiveIntervals: Split live intervals on multiple dead defs". It reveals a miscompile on Hexagon. See PR43302 for details. llvm-svn: 371802	2019-09-13 01:34:25 +00:00
Matt Arsenault	4d33918034	AMDGPU/GlobalISel: Legalize G_FMAD Unlike SelectionDAG, treat this as a normally legalizable operation. In SelectionDAG this is supposed to only ever formed if it's legal, but I've found that to be restricting. For AMDGPU this is contextually legal depending on whether denormal flushing is allowed in the use function. Technically we currently treat the denormal mode as a subtarget feature, so custom lowering could be avoided. However I consider this to be a defect, and this should be contextually dependent on the controllable rounding mode of the parent function. llvm-svn: 371800	2019-09-13 00:44:35 +00:00
Matt Arsenault	4a73c6eada	AMDGPU/GlobalISel: Select G_CTPOP llvm-svn: 371798	2019-09-13 00:11:20 +00:00
Matt Arsenault	b85c8c4bbd	LiveIntervals: Remove assertion This testcase is invalid, and caught by the verifier. For the verifier to catch it, the live interval computation needs to complete. Remove the assert so the verifier catches this, which is less confusing. In this testcase there is an undefined use of a subregister, and lanes which aren't used or defined. An equivalent testcase with the super-register shrunk to have no untouched lanes already hit this verifier error. llvm-svn: 371792	2019-09-12 23:46:51 +00:00
Matt Arsenault	8382ce5f1b	AMDGPU: Inline constant when materalizing FI with add on gfx9 This was relying on the SGPR usable for the carry out clobber to also be used for the input. There was no carry out on gfx9. With no carry out clobber to worry about, so the literal can just be directly used with a VOP2 add. llvm-svn: 371791	2019-09-12 23:46:46 +00:00
Qiu Chaofan	b7fb5d0f6f	[DAGCombiner] Improve division estimation of floating points. Current implementation of estimating divisions loses precision since it estimates reciprocal first and does multiplication. This patch is to re-order arithmetic operations in the last iteration in DAGCombiner to improve the accuracy. Reviewed By: Sanjay Patel, Jinsong Ji Differential Revision: https://reviews.llvm.org/D66050 llvm-svn: 371713	2019-09-12 07:51:24 +00:00
Austin Kerbow	666af6714c	AMDGPU: Move m0 initializations earlier Summary: After hoisting and merging m0 initializations schedule them as early as possible in the MBB. This helps the scheduler avoid hazards in some cases. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67450 llvm-svn: 371671	2019-09-11 21:28:41 +00:00
Michael Liao	7957d4c015	[AMDGPU] Fix crash in phi-elimination hook. Summary: - Pre-check in case there's just a single PHI insn. Reviewers: alex-t, rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D67451 llvm-svn: 371649	2019-09-11 19:55:20 +00:00
Matt Arsenault	81196a595c	LiveIntervals: Split live intervals on multiple dead defs If there are multiple dead defs of the same virtual register, these are required to be split into multiple virtual registers with separate live intervals to avoid a verifier error. llvm-svn: 371640	2019-09-11 17:59:21 +00:00
Guillaume Chatelet	48904e9452	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608	2019-09-11 11:16:48 +00:00
Matt Arsenault	4a23ae5e78	GlobalISel/TableGen: Handle REG_SEQUENCE patterns The scalar f64 patterns don't work yet because they fail on multiple results from the unused implicit def of scc in the result bit operation. llvm-svn: 371542	2019-09-10 17:57:33 +00:00
Matt Arsenault	e1895aba3d	AMDGPU/GlobalISel: Select G_FABS/G_FNEG f64 doesn't work yet because tablegen currently doesn't handlde REG_SEQUENCE. This does regress some multi use VALU fneg cases since now the immediate remains in an SGPR, and more moves are used for legalizing the xor. This is a SIFixSGPRCopies deficiency. llvm-svn: 371540	2019-09-10 17:19:46 +00:00
Matt Arsenault	7df5b3fd26	AMDGPU/GlobalISel: Select cvt pk intrinsics llvm-svn: 371539	2019-09-10 17:17:05 +00:00
Matt Arsenault	37d1bda4f6	AMDGPU/GlobalISel: Select llvm.amdgcn.sffbh llvm-svn: 371538	2019-09-10 17:16:59 +00:00
Matt Arsenault	da027275c6	AMDGPU/GlobalISel: RegBankSelect for G_ZEXTLOAD/G_SEXTLOAD llvm-svn: 371536	2019-09-10 16:42:37 +00:00
Matt Arsenault	ad6a8b83cd	AMDGPU/GlobalISel: Legalize constant 32-bit loads Legalize by casting to a 64-bit constant address. This isn't how the DAG implements it, but it should. llvm-svn: 371535	2019-09-10 16:42:31 +00:00
Matt Arsenault	c0ceca5883	AMDGPU/GlobalISel: First pass at attempting to legalize load/stores There's still a lot more to do, but this handles decomposing due to alignment. I've gotten it to the point where nothing crashes or infinite loops the legalizer. llvm-svn: 371533	2019-09-10 16:20:14 +00:00
Alexander Timofeev	c2d292f839	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Reviewers: rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D67101 llvm-svn: 371508	2019-09-10 10:58:57 +00:00
Matt Arsenault	a91f017ae3	AMDGPU/GlobalISel: Fix insert point when lowering fminnum/fmaxnum llvm-svn: 371471	2019-09-09 23:30:11 +00:00
Matt Arsenault	a0933e6df7	AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR v2s16 Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will probably be more convenient in most cases. llvm-svn: 371440	2019-09-09 18:57:51 +00:00
Matt Arsenault	8bc05d7d60	AMDGPU: Make VReg_1 size be 1 This was getting chosen as the preferred 32-bit register class based on how TableGen selects subregister classes. llvm-svn: 371438	2019-09-09 18:43:29 +00:00
Matt Arsenault	77e3e9cafd	AMDGPU/GlobalISel: Select llvm.amdgcn.class Also fixes missing SubtargetPredicate on f16 class instructions. llvm-svn: 371436	2019-09-09 18:29:45 +00:00
Matt Arsenault	d6c1f5bb15	AMDGPU/GlobalISel: Select fmed3 llvm-svn: 371435	2019-09-09 18:29:37 +00:00
Matt Arsenault	6ebf605851	AMDGPU: Use PatFrags to allow selecting custom nodes or intrinsics This enables GlobalISel to handle various intrinsics. The custom node pattern will be ignored, and the intrinsic will work. This will also allow SelectionDAG to directly select the intrinsics, but as they are all custom lowered to the nodes, this ends up leaving dead code in the table. Eventually either GlobalISel should add the equivalent of custom nodes equivalent, or intrinsics should be directly used. These each have different tradeoffs. There are a few more to handle, but these are easy to handle ones. Some others fail for other reasons. llvm-svn: 371432	2019-09-09 18:10:31 +00:00
Matt Arsenault	d2a9516a6d	AMDGPU: Move MnemonicAlias out of instruction def hierarchy Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427	2019-09-09 17:25:35 +00:00
Matt Arsenault	64ecca90d4	AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE Handle the simple case that lowers to a constant. llvm-svn: 371424	2019-09-09 17:13:44 +00:00
Matt Arsenault	182f9248e8	AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR_TRUNC Treat this as legal on gfx9 since it can use S_PACK_* instructions for this. This isn't used by anything yet. The same will probably apply to 16-bit G_BUILD_VECTOR without the trunc. llvm-svn: 371423	2019-09-09 17:04:18 +00:00
Matt Arsenault	63e6d8db1c	AMDGPU/GlobalISel: Select atomic loads A new check for an explicitly atomic MMO is needed to avoid incorrectly matching pattern for non-atomic loads llvm-svn: 371418	2019-09-09 16:18:07 +00:00
Matt Arsenault	d8409b178e	AMDGPU/GlobalISel: Fix RegBankSelect for unaligned, uniform constant loads llvm-svn: 371416	2019-09-09 16:06:37 +00:00
Matt Arsenault	02eb308387	AMDGPU/GlobalISel: Fix regbankselect for uniform extloads There are no scalar extloads. llvm-svn: 371414	2019-09-09 16:03:45 +00:00
Matt Arsenault	ebbd6e4976	AMDGPU: Remove code address space predicates Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413	2019-09-09 16:02:07 +00:00
Matt Arsenault	c34b4036ff	AMDGPU/GlobalISel: Select G_PTR_MASK llvm-svn: 371412	2019-09-09 15:46:13 +00:00
Matt Arsenault	fdb7030117	AMDGPU/GlobalISel: Fix reg bank for uniform LDS loads The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411	2019-09-09 15:44:16 +00:00
Matt Arsenault	2dd088ec7d	AMDGPU/GlobalISel: Use known bits for selection llvm-svn: 371409	2019-09-09 15:39:32 +00:00
Matt Arsenault	8e3bc9b572	AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic llvm-svn: 371407	2019-09-09 15:20:49 +00:00
Matt Arsenault	3e45c70288	GlobalISel: Support physical register inputs in patterns llvm-svn: 371253	2019-09-06 20:32:37 +00:00
Valery Pykhtin	e8ade89bb3	[AMDGPU] Enable constant offset promotion to immediate operand for VMEM stores Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214	2019-09-06 15:33:53 +00:00
Jay Foad	6c0204c794	[AMDGPU] Mark s_barrier as having side effects but not accessing memory. Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192	2019-09-06 10:07:28 +00:00
Matt Arsenault	4d90625271	AMDGPU/GlobalISel: Fix load/store of types in other address spaces There should probably be a size only matcher. llvm-svn: 371155	2019-09-06 00:36:06 +00:00
Matt Arsenault	60c8b8bcf2	AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149	2019-09-05 23:54:35 +00:00
Matt Arsenault	59ff77ee38	AMDGPU: Fix emitting multiple stack loads for stack passed workitems The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148	2019-09-05 23:40:14 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Matt Arsenault	69b1a2ae65	AMDGPU/GlobalISel: Restore insert point when getting aperture Avoids SSA violations in a future patch. llvm-svn: 371008	2019-09-05 02:20:32 +00:00
Matt Arsenault	25156ae7ea	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast llvm-svn: 371007	2019-09-05 02:20:29 +00:00
Matt Arsenault	d51a3746d0	AMDGPU/GlobalISel: Fix assert on load from constant address llvm-svn: 371006	2019-09-05 02:20:25 +00:00
Reid Kleckner	29ccc8523a	Use -mtriple to fix AMDGPU test sensitive to object file format GOTPCREL32 doesn't exist on COFF, so it isn't used when this test runs on Windows. llvm-svn: 371000	2019-09-05 00:34:01 +00:00
Matt Arsenault	2df41a8e38	AMDGPU/GlobalISel: Select G_BITREVERSE llvm-svn: 370980	2019-09-04 20:46:31 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Matt Arsenault	84489b34f6	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9 Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929	2019-09-04 17:12:57 +00:00
Matt Arsenault	d9af712da4	AMDGPU/GlobalISel: Make 16-bit constants legal This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921	2019-09-04 16:19:45 +00:00
Jay Foad	6e18266aa4	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667	2019-09-02 14:40:57 +00:00
Piotr Sobczak	252a584cbd	[AMDGPU] Add test Summary: Add test checking that the redundant immediate MOV instruction (by-product of handling phi nodes) is not found in the generated code. Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63860 llvm-svn: 370634	2019-09-02 10:02:54 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Jordan Rupprecht	f9f81289e6	Revert [MBP] Disable aggressive loop rotate in plain mode This reverts r369664 (git commit `51f48295cb`) It causes many benchmark regressions, internally and in llvm's benchmark suite. llvm-svn: 370398	2019-08-29 19:03:58 +00:00
Matt Arsenault	216d8ff60b	AMDGPU: Don't use frame virtual registers SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281	2019-08-29 01:13:47 +00:00
Matt Arsenault	8ec5c10042	GlobalISel/TableGen: Handle setcc patterns This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280	2019-08-29 01:13:41 +00:00
Ryan Taylor	3b1459ed7c	[AMDGPU] Adjust number of SGPRs available in Calling Convention This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215	2019-08-28 15:00:45 +00:00
Matt Arsenault	a8bbcbd006	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150	2019-08-28 02:11:03 +00:00
Matt Arsenault	5c7e96dc26	AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace llvm-svn: 370140	2019-08-28 00:58:24 +00:00
Matt Arsenault	2910184936	DAG: computeNumSignBits for MUL Copied directly from the IR version. Most of the testcases I've added for this are somewhat problematic because they really end up testing the yet to be implemented version for MUL_I24/MUL_U24. llvm-svn: 370099	2019-08-27 19:05:33 +00:00
Matt Arsenault	2797474dbb	AMDGPU: Add baseline test for num sign bits of mul llvm-svn: 370098	2019-08-27 19:01:02 +00:00
Matt Arsenault	0c096da02f	AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16 This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086	2019-08-27 17:51:56 +00:00
Sanjay Patel	b516f1afdd	[DAGCombiner] cancel fnegs from multiplied operands of FMA (-X) * (-Y) + Z --> X * Y + Z This is a missing optimization that shows up as a potential regression in D66050, so we should solve it first. We appear to be partly missing this fold in IR as well. We do handle the simpler case already: (-X) * (-Y) --> X * Y And it might be beneficial to make the constraint less conservative (eg, if both operands are cheap, but not necessarily cheaper), but that causes infinite looping for the existing fmul transform. Differential Revision: https://reviews.llvm.org/D66755 llvm-svn: 370071	2019-08-27 15:17:46 +00:00
Petar Avramovic	d568ed40e0	[GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAG Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS to match names of surrounding variables. Differential Revision: https://reviews.llvm.org/D66587 llvm-svn: 370062	2019-08-27 14:22:32 +00:00
Matt Arsenault	0a6564980b	AMDGPU: Combine directly on mul24 intrinsics The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994	2019-08-27 00:18:09 +00:00
Matt Arsenault	3b95986a32	AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts The mul24 matching could interfere with SLSR and the other addressing mode related passes. This probably is not the optimal placement, but is an intermediate step. This should probably be moved after all the generic IR passes, particularly LSR. Moving this after LSR seems to help in some cases, and hurts others. As-is in this patch, in idiv-licm, it saves 1-2 instructions inside some of the loop bodies, but increases the number in others. Moving this later helps these loops. In the new lsr tests in mul24-pass-ordering, the intrinsic prevents introducing more instructions in the loop preheader, so moving this later ends up hurting them. This shouldn't be any worse than before the intrinsics were introduced in r366094, and LSR should probably be smarter. I think it's because it doesn't know the and inside the loop will be folded away. llvm-svn: 369991	2019-08-27 00:08:31 +00:00
Matt Arsenault	74115ef791	AMDGPU: Add baseline test for mul24 ordering issues llvm-svn: 369858	2019-08-24 22:22:38 +00:00
Matt Arsenault	b3dd381a73	AMDGPU: Introduce a flag to disable mul24 intrinsic formation llvm-svn: 369856	2019-08-24 22:14:41 +00:00
Matt Arsenault	c4dd1d1873	AMDGPU: Generate check lines Checking all the instructions will help catch LICM changes when passes are reordered. Also switch to using gfx9 since global stores make the relevant instructions more obvious. llvm-svn: 369855	2019-08-24 22:14:37 +00:00
Stanislav Mekhanoshin	8fe1245a0f	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816	2019-08-23 22:09:58 +00:00
Volkan Keles	277631e3b8	[GlobalISel] Legalizer: Retry combining illegal artifacts as long as there new artifacts Summary: Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s possible to combine them after processing the rest of the instruction because the legalization is likely to generate more artifacts that allow ArtifactCombiner to combine away them. Instead, move illegal artifacts to another list called RetryList and wait until all of the instruction in InstList are legalized. After that, check if there is any new artifacts and try to combine them again if that’s the case. If not, abort. The idea is similar to D59339, but the approach is a bit different. This patch fixes the issue described above, but the legalizer still may be unable to handle some cases depending on when to legalize artifacts. So, in the long run, we probably need a different legalization strategy that handles this dependency in a better way. Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette Reviewed By: dsanders Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65894 llvm-svn: 369805	2019-08-23 20:30:35 +00:00
Amaury Sechet	05f56a1ddd	[AMDGPU] Automatically generate various tests. NFC llvm-svn: 369787	2019-08-23 17:58:49 +00:00
Jay Foad	eac23862a8	[AMDGPU] gfx10 atomic optimizer changes. Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 llvm-svn: 369745	2019-08-23 10:07:43 +00:00
Matt Arsenault	fba82858f2	GlobalISel: Don't create G_UADDE with constant false carry in The x86 tests are now broken (in paticular add-scalar.ll now hits the DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a custom lowering for this, so that will need to be implemented. llvm-svn: 369673	2019-08-22 17:29:17 +00:00
Guozhi Wei	51f48295cb	[MBP] Disable aggressive loop rotate in plain mode Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 369664	2019-08-22 16:21:32 +00:00
Matt Arsenault	954a012b4c	GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547	2019-08-21 16:59:10 +00:00
Alexander Timofeev	78347c979e	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532	2019-08-21 15:15:04 +00:00
Martin Storsjo	100957153a	[test] Fix tests when run on windows after SVN r369426. NFC. When running tests on windows, invoking "llc -march=<arch>" will implicitly use windows as the target os, making these tests misbehave after this change. Fix the issue by using more specific -mtriple values instead of plain -march in these tests. This should hopefully fix buildbot failures like http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/9816. llvm-svn: 369443	2019-08-20 20:58:02 +00:00
Matt Arsenault	4b7fc85c0b	Revert "AMDGPU: Fix iterator error when lowering SI_END_CF" This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417	2019-08-20 17:45:25 +00:00
Matt Arsenault	479f3bdb2c	AMDGPU: Fix iterator error when lowering SI_END_CF If the instruction is the last in the block, there is no next instruction but the iteration still needs to look at the new block. llvm-svn: 369203	2019-08-18 00:20:44 +00:00
Matt Arsenault	0b864bb043	AMDGPU: Reduce number of registers in test Once the failure this is testing is fixed, this would fail due to using too many registers. llvm-svn: 368901	2019-08-14 19:09:48 +00:00
Aditya Nandakumar	c65ac865c3	[GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong source index https://reviews.llvm.org/D66182 llvm-svn: 368781	2019-08-14 01:23:33 +00:00
Aditya Nandakumar	615eee6402	[GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sources https://reviews.llvm.org/D66171 llvm-svn: 368753	2019-08-13 21:49:11 +00:00
Tim Renouf	10db641aab	[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm' That change (r363670) could leave a copy from vgpr to sgpr. Fixed. Differential Revision: https://reviews.llvm.org/D66133 Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4 llvm-svn: 368736	2019-08-13 18:57:55 +00:00
Matt Arsenault	28215caa60	GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES Odd sized vectors aren't handled yet. llvm-svn: 368713	2019-08-13 16:26:28 +00:00
Matt Arsenault	690645bda0	GlobalISel: Implement lower for G_SHUFFLE_VECTOR llvm-svn: 368709	2019-08-13 16:09:07 +00:00
Stanislav Mekhanoshin	4c9c98f36b	[AMDGPU] Printf runtime binding pass This pass is a port of the according pass from the HSAIL compiler. It parses printf calls and setup runtime printf buffer. After that it copies printf arguments to the buffer and fills in module metadata for runtime. Differential Revision: https://reviews.llvm.org/D24035 llvm-svn: 368592	2019-08-12 17:12:29 +00:00
Hans Wennborg	a45f301f7a	Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode" It caused assertions to fire when building Chromium: lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed. See https://crbug.com/992871#c3 for how to reproduce. > Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368579	2019-08-12 14:23:13 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Guozhi Wei	80347c3acc	[MBP] Disable aggressive loop rotate in plain mode Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368339	2019-08-08 20:25:23 +00:00
Tim Renouf	5a0794327a	[StructurizeCFG] Enable -structurizecfg-relaxed-uniform-regions by default D62198 introduced an option to relax the checks for hasOnlyUniformBranches. This commit turns the option on by default, for better code generation in some cases in AMDGPU. Differential Revision: https://reviews.llvm.org/D63198 Change-Id: I9cbff002a1e74d3b7eb96b4192dc8129936d537d llvm-svn: 368042	2019-08-06 14:30:19 +00:00
Austin Kerbow	a05c384132	Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969	2019-08-06 02:16:11 +00:00
Dmitri Gribenko	37aa8ad663	Revert "[AMDGPU] Use S_DENORM_MODE for gfx10" This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904	2019-08-05 18:36:43 +00:00
Austin Kerbow	8d229dbb47	[AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882	2019-08-05 16:09:49 +00:00
Tom Stellard	e15d95a987	AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOs Summary: This is a follow up to r367237. MachineFunction::getMachineMemOperand() adds the offset parameter to the existing offset instead of resetting it. So we need to reset the offset to the correct value after calling this function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65557 llvm-svn: 367881	2019-08-05 16:08:44 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00
Matt Arsenault	0e0a1c80fb	AMDGPU: Correct behavior of f16/i16 non-format store intrinsics This was switching to use a format store for a non-format store for f16 types. Also fixes i16/f16 stores on targets without legal f16. The corresponding loads also need to be fixed. llvm-svn: 367872	2019-08-05 14:57:59 +00:00
Matt Arsenault	ff6b007772	AMDGPU/GlobalISel: Alternative mappings for constants Without context we assume SGPR. Allowing VGPR constants theoretically helps avoid a copy. This seems to not actually work now, and the choice isn't based on the use bank. llvm-svn: 367871	2019-08-05 14:40:26 +00:00
Nicolai Haehnle	e204786b6c	AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec} Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821	2019-08-05 09:36:06 +00:00
Tim Northover	a009a60a91	IR: print value numbers for unnamed function arguments For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755	2019-08-03 14:28:34 +00:00
Simon Pilgrim	f7d9c43a4a	[AMDGPU] Regenerated saddo.ll test file for D47927 llvm-svn: 367698	2019-08-02 17:52:55 +00:00
Matt Arsenault	d9d30a408e	GlobalISel: Lower scalarizing unmerge of a vector to shifts AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604	2019-08-01 19:10:05 +00:00
Matt Arsenault	bb582ebdba	AMDGPU: Remove v0 workaround for DS_GWS_* instructions Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598	2019-08-01 18:41:32 +00:00
Matt Arsenault	5faa533e47	GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591	2019-08-01 18:13:16 +00:00
David Zarzycki	4f1d893f9e	[Testing] Fix tests that break with read-only checkouts Found with `mount --bind -o ro ...` on Linux. llvm-svn: 367519	2019-08-01 06:41:40 +00:00
Fangrui Song	67a8d6c795	AMDGPU/GlobalISel: fix inst-select-load-local.mir in -DLLVM_ENABLE_ASSERTIONS=off builds after r367498 llvm-svn: 367514	2019-08-01 04:03:06 +00:00
Matt Arsenault	9952f46407	AMDGPU/GlobalISel: Fix flat load/store of pointer types llvm-svn: 367513	2019-08-01 03:57:42 +00:00
Matt Arsenault	57495268ac	AMDGPU/GlobalISel: Remove manual store select code This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512	2019-08-01 03:52:40 +00:00
Matt Arsenault	ae87b9f2c2	AMDGPU/GlobalISel: Select local atomic cmpxchg llvm-svn: 367511	2019-08-01 03:41:41 +00:00
Matt Arsenault	26cb53b260	AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD llvm-svn: 367509	2019-08-01 03:33:15 +00:00
Matt Arsenault	da5b9bfa95	AMDGPU/GlobalISel: Allow selection of DS atomicrmw llvm-svn: 367507	2019-08-01 03:29:01 +00:00
Matt Arsenault	3baf4d3418	AMDGPU/GlobalISel: Select simple local stores llvm-svn: 367504	2019-08-01 03:09:15 +00:00
Matt Arsenault	7bedceb5b2	GlobalISel: moreElementsVector for G_LOAD/G_STORE AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503	2019-08-01 01:44:22 +00:00
Matt Arsenault	d48324ff6f	Reapply "AMDGPU: Split block for si_end_cf" This reverts commit r359363, reapplying r357634 llvm-svn: 367500	2019-08-01 01:25:27 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Michael Berg	005d705d43	Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context. Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper Reviewed By: spatel Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji Differential Revision: https://reviews.llvm.org/D65170 llvm-svn: 367486	2019-07-31 21:57:28 +00:00
Stanislav Mekhanoshin	2594fa8593	[AMDGPU] Fix high occupancy calculation and print it We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential Revision: https://reviews.llvm.org/D65423 llvm-svn: 367381	2019-07-31 01:07:10 +00:00
Matt Arsenault	9cf980d4a7	GlobalISel: Add G_ATOMICRMW_{FADD\|FSUB} llvm-svn: 367369	2019-07-30 23:56:30 +00:00
Stanislav Mekhanoshin	450afcea39	[AMDGPU] Reserve all AGPRs on targets which do not have them Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347	2019-07-30 19:29:33 +00:00
Austin Kerbow	c99f62e313	[AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization. Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344	2019-07-30 18:49:16 +00:00
Tom Stellard	cc0bc941d4	AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructions Summary: The LoadStoreOptimizer was creating instructions with 2 MachineMemOperands, which meant they were assumed to alias with all other instructions, because MachineInstr:mayAlias() returns true when an instruction has multiple MachineMemOperands. This was preventing these instructions from being merged again, and was giving the scheduler less freedom to reorder them. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65036 llvm-svn: 367237	2019-07-29 16:40:58 +00:00
Jay Foad	2bd9da8a72	[AMDGPU] Add amdgpu_kernel for consistency with other tests llvm-svn: 367221	2019-07-29 11:48:17 +00:00
Jay Foad	dcb7532479	[DivergenceAnalysis] Add methods for querying divergence at use Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use. This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis. This might allow D63953 or other similar workarounds to be removed. Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue Reviewed By: nhaehnle Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65141 llvm-svn: 367218	2019-07-29 10:22:09 +00:00
Simon Pilgrim	251b546f1b	[AMDGPU] Regenerate v2i16 insertelement tests. To help show the diffs from an upcoming SimplifyDemandedBits patch. llvm-svn: 367213	2019-07-29 09:47:07 +00:00
David Stuttard	20235ef3e7	[AMDGPU] Enable v4f16 and above for v_pk_fma instructions Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206	2019-07-29 08:15:10 +00:00
Hideto Ueno	cc0a4cdc89	[FunctionAttrs] Annotate "willreturn" for intrinsics Summary: In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef). In this patch, willreturn is annotated for LLVM intrinsics. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64904 llvm-svn: 367184	2019-07-28 06:09:56 +00:00
Simon Pilgrim	062cd8bb1d	[AMDGPU] Regenerate tests. To help show the diffs from an upcoming SimplifyDemandedBits patch. llvm-svn: 367175	2019-07-27 14:32:23 +00:00
Simon Pilgrim	8a52671782	[SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit. If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit. This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches. llvm-svn: 367171	2019-07-27 12:48:46 +00:00
Carl Ritson	00e89b428b	[AMDGPU] Add llvm.amdgcn.softwqm intrinsic Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097	2019-07-26 09:54:12 +00:00
Matt Arsenault	a9ea8a9aae	AMDGPU/GlobalISel: Handle most function return types handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082	2019-07-26 02:36:05 +00:00
Matt Arsenault	51d795d941	GlobalISel: Fold out unmerge to scalars from concat_vector Removes illegal intermediate vectors if an operation was lowering to concat_vectors, and the next operation is scalarized. llvm-svn: 367081	2019-07-26 02:22:23 +00:00
Michael Liao	53f967f2bd	[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs. Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023	2019-07-25 14:50:18 +00:00
Matt Arsenault	a85af76c72	AMDGPU: Don't assert on v4f16 arguments to shader calling conventions llvm-svn: 367018	2019-07-25 13:55:07 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Stanislav Mekhanoshin	c43784ff26	[AMDGPU] Increase kernel padding To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938	2019-07-24 19:40:13 +00:00
Matt Arsenault	0e7d8698b5	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915	2019-07-24 16:05:53 +00:00
Simon Pilgrim	743d45ee25	[TargetLowering] Add SimplifyMultipleUseDemandedBits This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799	2019-07-23 12:39:08 +00:00

1 2 3 4 5 ...

2808 Commits