llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	e87ec66762	AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll	2020-04-06 11:50:16 -04:00
Jay Foad	ddd2f4b96f	[AMDGPU] Fix inaccurate comments	2020-04-06 16:44:08 +01:00
Matt Arsenault	cbf719b568	AMDGPU: Use DAG patterns for div_fmas	2020-04-06 09:28:30 -04:00
Matt Arsenault	79b29d6df7	AMDGPU: Remove DisableInst feature I'm not sure why these were bothering to check the instruction profile, since those profiles should only be used with these instruction classes.	2020-04-06 09:27:44 -04:00
Apelete Seketeli	8aadb442d1	[scan-build] fix dead store warnings emitted on LLVM AMDGPU code base This fixes dead store warnings of the type "dead assignment" reported by Clang Static Analyzer.	2020-04-05 11:19:03 -04:00
Matt Arsenault	6bfe28e92f	AMDGPU: Fix annotate kernel features through casted calls I thought I was testing this before, but the workitem id x case isn't great since it's mandatory in the parent kernel.	2020-04-04 20:44:44 -04:00
Matt Arsenault	221890d709	AMDGPU: Add feature for fast f32 denormals	2020-04-04 20:01:24 -04:00
Matt Arsenault	30ebafaa56	CodeGen: Convert some TII hooks to use Register	2020-04-03 14:52:54 -04:00
Matt Arsenault	178050c3ba	AMDGPU: Use Register in more places	2020-04-03 14:52:54 -04:00
Matt Arsenault	e8dcb6d05e	AMDGPU: Remove redundant virtual	2020-04-03 14:52:53 -04:00
Stanislav Mekhanoshin	0462795095	[AMDGPU] Propagate AGPR RC from PHI to its PHI operands We can fix register class of PHI based on its all AGPR uses. That leaves behind all PHIs which were already processed earlier. Propagate RC back to PHI operands of a PHI. Differential Revision: https://reviews.llvm.org/D77344	2020-04-03 11:23:02 -07:00
Austin Kerbow	30f18ed387	[AMDGPU] Handle SMRD signed offset immediate Summary: This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a signed byte offset immediate, however we can overflow into a negative since we treat it as unsigned. Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We sometimes tried to use negative values here. Third, S_BUFFER instructions should never use a signed offset immediate. Differential Revision: https://reviews.llvm.org/D77082	2020-04-02 17:41:52 -07:00
Matt Arsenault	f68cc2a7ed	AMDGPU: Use 128-bit DS operations by default	2020-04-02 17:17:47 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Matt Arsenault	75cf30918f	AMDGPU: Assume f32 denormals are enabled by default This will likely introduce catastrophic performance regressions on older subtargets, but should be correct. A follow up change will remove the old fp32-denormals subtarget features, and switch to using the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends should be making sure to add the denormal-fp-math-f32 attribute when appropriate to avoid performance regressions.	2020-04-02 17:17:12 -04:00
Matt Arsenault	c3d3c22a58	AMDGPU: Hack out noinline on functions using LDS globals This is a workaround for clang adding noinline to all functions at -O0. Previously, we would just add alwaysinline, and the verifier would complain about having both noinline and alwaysinline. We currently can't truly codegen this case as a freestanding function, so override the user forcing noinline.	2020-04-02 14:12:07 -04:00
Stanislav Mekhanoshin	f2334a7ef2	[AMDGPU] Fix crash in SILoadStoreOptimizer SILoadStoreOptimizer::checkAndPrepareMerge() expects base and paired instruction to come in order and scans MBB from base to the paired instruction. An original order can be changed if there were a dependent instruction in between and base instruction was moved. Fixed by bailing the optimization. In theory it might be possible still to perform a merge by swapping instructions, but on practice it bails anyway because it finds dependency on that same instruction which has resulted in the base move. Differential Revision: https://reviews.llvm.org/D77245	2020-04-02 10:26:47 -07:00
Guillaume Chatelet	189d2e215f	[Alignment][NFC] Use more Align versions of various functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77291	2020-04-02 09:00:53 +00:00
Matt Arsenault	5e4e8d0388	AMDGPU/GlobalISel: Change intrinsic ID for _L to _LZ opt Still should handle the other case changes the opcode this way.	2020-04-01 13:03:02 -04:00
Guillaume Chatelet	1dffa2550b	[Alignment][NFC] Transition to MachineFrameInfo::getObjectAlign() Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77215	2020-04-01 14:08:28 +00:00
Simon Pilgrim	be7a233e93	Fix operator precedence warning. NFCI.	2020-04-01 14:36:52 +01:00
Simon Pilgrim	552e46ea1e	Fix unused variable warnings. NFCI.	2020-04-01 14:36:51 +01:00
Matt Arsenault	43e576593e	AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD	2020-03-31 19:57:06 -04:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Stanislav Mekhanoshin	08682dcc86	[AMDGPU] Define 16 bit VGPR subregs We have loads preserving low and high 16 bits of their destinations. However, we always use a whole 32 bit register for these. The same happens with 16 bit stores, we have to use full 32 bit register so if high bits are clobbered the register needs to be copied. One example of such code is added to the load-hi16.ll. The proper solution to the problem is to define 16 bit subregs and use them in the operations which do not read another half of a VGPR or preserve it if the VGPR is written. This patch simply defines subregisters and register classes. At the moment there should be no difference in code generation. A lot more work is needed to actually use these new register classes. Therefore, there are no new tests at this time. Register weight calculation has changed with new subregs so appropriate changes were made to keep all calculations just as they are now, especially calculations of register pressure. Differential Revision: https://reviews.llvm.org/D74873	2020-03-31 11:49:06 -07:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Guillaume Chatelet	0de874adfb	[Alignment][NFC] Transition to inferAlignFromPtrInfo Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77120	2020-03-31 08:06:49 +00:00
Matt Arsenault	d0dd24a381	AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources No test since these cases shouldn't really be getting through the legalizer.	2020-03-30 18:14:04 -04:00
Matt Arsenault	db9f0d1ce5	AMDGPU: Form v_cvt_ubyte* with f16 results We get 2 conversion instructions anyway. Previously we would get a conversion with SDWA reading from a byte source, which has a larger encoding.	2020-03-30 17:59:49 -04:00
Matt Arsenault	b27d255e1e	AMDGPU/GlobalISel: Form CVT_F32_UBYTE0	2020-03-30 17:45:55 -04:00
Matt Arsenault	bcb643c8af	AMDGPU/GlobalISel: Handle image atomics	2020-03-30 17:41:04 -04:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00
Matt Arsenault	570a578e46	AMDGPU: Account for dmask when computing image mem size Only the number of elements in the dmask will really be accessed.	2020-03-30 17:30:58 -04:00
Jay Foad	cee65d51fe	AMDGPU: Implement getMemcpyLoopLoweringType Summary: Based on a patch by Matt Arsenault. Reviewers: rampitec, kerbowa, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77057	2020-03-30 22:21:01 +01:00
Matt Arsenault	2641ba52a9	AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses The instruction definitions are missing for these register types, so round up to 8 like the DAG.	2020-03-30 17:02:47 -04:00
Matt Arsenault	42d5609809	AMDGPU/GlobalISel: Start handling _L to _LZ optimization We currently don't have a way to map to the equivalent intrinsic opcode, so track immediate 0s in place of the address for the selection to know to change the final opcode.	2020-03-30 17:02:30 -04:00
Matt Arsenault	4919f2e1c5	AMDGPU/GlobalISel: Basic legalize rules for G_FSHR Only handles easy 32-bit cases.	2020-03-30 11:53:01 -07:00
Jakub Kuderski	77ce2e21a8	[AMDGPU] Add Relocation Constant Support Summary: This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf. The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry. `SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel. Author: csyonghe <yonghe@google.com> Reviewers: tpr, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76440	2020-03-30 13:49:20 -04:00
Sameer Sahasrabuddhe	3cbbded68c	Introduce unify-loop-exits pass. For each natural loop with multiple exit blocks, this pass creates a new block N such that all exiting blocks now branch to N, and then control flow is redistributed to all the original exit blocks. The bulk of the tranformation is a new function introduced in BasicBlockUtils that an redirect control flow from a set of incoming blocks to a set of outgoing blocks via a common "hub". This is a useful workaround for a limitation in the structurizer which incorrectly orders blocks when processing a nest of loops. This pass bypasses that issue by ensuring that each natural loop is recognized as a separate region. Since the structurizer is a region pass, it no longer sees a nest of loops in a single region, and instead processes each "level" in the nesting as a separate region. The AMDGPU backend provides a new option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewers: madhur13490, arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D75865	2020-03-30 13:23:56 -04:00
Matt Arsenault	bb009498c2	AMDGPU/GlobalISel: Hack to fix i24 argument lowering I still think the call lowering type legalization logic split between the generic code and target is too confusing, but largely induced by the reliance on the DAG infrastructure.	2020-03-30 11:00:45 -04:00
Matt Arsenault	90a36bbd7c	AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM Mostly ported from the DAG version. This results in much worse code than the DAG version, largely due to a much worse expansion for G_UMULH.	2020-03-30 10:57:37 -04:00
Florian Hahn	c3b03f3d0c	[AMDGPU] Drop const for value that is copied (NFC). This fixes warning: loop variable 'Def' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis] llvm::Register just contains a single unsigned and should be copied. Reviewers: rampitec Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D77011	2020-03-30 10:59:59 +01:00
Matt Arsenault	d15723ef06	AMDGPU/GlobalISel: Remove redundant virtual	2020-03-29 14:03:07 -04:00
Matt Arsenault	ab7a41069e	AMDGPU: Fix using wrong instruction for FP conversion This was was never actually hit, but FTRUNC was clearly not the intent here.	2020-03-29 14:03:07 -04:00
Matt Arsenault	97bbe7ad2a	AMDGPU: Fix typo	2020-03-29 14:03:06 -04:00
Matt Arsenault	9564f46766	AMDGPU: Make use of default operands	2020-03-28 17:33:29 -04:00
Benjamin Kramer	2d24d74b85	[AMDGPU] Stabilize sort order Found by the expensive checks in llvm::sort.	2020-03-28 20:20:14 +01:00
Benjamin Kramer	4065e92195	Upgrade some instances of std::sort to llvm::sort. NFC.	2020-03-28 19:23:29 +01:00
Jay Foad	a6dfd827e5	[AMDGPU] Fix getEUsPerCU for gfx10 in CU mode Summary: "Per CU" is a bit simplistic for gfx10, but I couldn't think of a better name. Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76861	2020-03-27 20:36:49 +00:00
Matt Arsenault	348735b723	AMDGPU: Stop setting attributes based on TargetOptions Having arbitrary passes looking at the TargetOptions is pretty messy. This was also disregarding if a function already had an explicit attribute setting on it. opt/llc now add the attributes to functions that don't specify the attribute. clang and lld do not call the function to do this, which they maybe should. This was also treating unsafe-fp-math as implying the others, and setting the other attributes based on it. This is not done anywhere else, and I'm not sure is correct based on the current description of the option bit. Effectively reverts `1d8cf2be89`	2020-03-27 13:13:43 -07:00
Guillaume Chatelet	74eac9031a	[Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76925	2020-03-27 15:49:13 +00:00
Simon Pilgrim	d6ddabd7ef	Revert rG6ff1ea3244c543ad24fc99c7f4979db2f2078593 "Fix "use of uninitialized variable" static analyzer warning. NFCI." @dblaikie noticed that this may interfere with msan analysis	2020-03-27 11:44:03 +00:00
Stanislav Mekhanoshin	4c4b71843b	[AMDGPU] Propagate amdgpu-waves-per-eu to callees Differential Revision: https://reviews.llvm.org/D76868	2020-03-26 14:43:44 -07:00
Jay Foad	0fe096c4e9	[AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup Summary: I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76860	2020-03-26 20:21:04 +00:00
Jay Foad	bb9c4fd7ea	[AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup. Summary: These methods were identical. I chose to remove getMaxWavesPerCU because I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76859	2020-03-26 20:21:04 +00:00
Scott Linder	bd12ecb88f	[AMDGPU] Fix PC register mapping in wave32 mode Summary: The PC_32 DWARF register is for a 32-bit process address space which we don't implement in AMDGCN; another way of putting this is that the size of the PC register is not a function of the wavefront size. If we ever implement a 32-bit process address space we will need to add two more DwarfFlavours i.e. we will need to represent the product of (wave32, wave64) x (64-bit address space, 32-bit address space). Tags: #llvm Differential Revision: https://reviews.llvm.org/D76732	2020-03-26 14:43:25 -04:00
David Blaikie	9002db05a2	Roll otherwise unused subexpressions into an assertion	2020-03-26 11:32:33 -07:00
Guillaume Chatelet	b727aabcb8	[Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76613	2020-03-26 18:15:53 +00:00
Jay Foad	0602c20b1b	[AMDGPU] Make use of divideCeil. NFC.	2020-03-26 16:11:35 +00:00
Jay Foad	596bed3fd3	[AMDGPU] Remove unused methods. NFC.	2020-03-26 16:11:35 +00:00
Fangrui Song	5fad05e80d	[MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC Follow-up of D72172 and D72180 This patch passes `uint64_t Address` to print methods of PC-relative operands so that subsequent target specific patches can change `*InstPrinter::print{Operand,PCRelImm,...}` to customize the output. Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by llvm-objdump. ``` // Current llvm-objdump -d output aarch64: 20000: bl #0 ppc: 20000: bl .+4 x86: 20000: callq 0 // Ideal output aarch64: 20000: bl 0x20000 ppc: 20000: bl 0x20004 x86: 20000: callq 0x20005 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled aarch64: 20000: bl 20000 ppc: 20000: bl 0x20004 x86: 20000: callq 20005 ``` In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`): ``` case 12: // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J... - printPCRelImm(MI, 0, O); + printPCRelImm(MI, Address, 0, O); return; ``` Some targets have 2 `printOperand` overloads, one without `Address` and one with `Address`. They should annotate derived `Operand` properly with `let OperandType = "OPERAND_PCREL"`. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76574	2020-03-26 08:21:15 -07:00
Stanislav Mekhanoshin	e06d707aa2	[AMDGPU] Fixed function traversal in attribute propagation AMDGPUPropagateAttributes pass was skipping some of the functions when cloning. Functions were added to root set and then skipped on the next interation because they are already in the root set, while were meant to be processed with different features. Differential Revision: https://reviews.llvm.org/D76815	2020-03-25 18:47:09 -07:00
Stanislav Mekhanoshin	6e00e3fcb0	[AMDGPU] Preserve original symbol during attribute propagation AMDGPUPropagateAttributes can swap names while cloning a function. Only do it if original symbol was not externally visible. Differential Revision: https://reviews.llvm.org/D76789	2020-03-25 15:26:30 -07:00
cdevadas	ce984129ea	[AMDGPU] Add SIPreEmitPeephole pass. This pass can handle all the optimization opportunities found just before code emission. Presently it includes the handling of vcc branch optimization that was handled earlier in SIInsertSkips. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D76712	2020-03-25 15:35:35 +00:00
Matt Arsenault	26ebc51a34	AMDGPU/GlobalISel: Fix smrd loads of v4i64	2020-03-24 13:44:41 -04:00
Matt Arsenault	66073953a5	AMDGPU: Allow vectorization of round intrinsic There seems to be a small benefit to the legalized sequence for v2f16 round with packed instructions, so allow vectorizing it by reducing the cost. An unintended side effect is vectorization of f32 round also happens. The current FMA logic seems off to me, and isn't checking for packed instructions.	2020-03-23 17:00:41 -04:00
Matt Arsenault	2ad5fc1d91	AMDGPU/GlobalISel: Implement computeNumSignBitsForTargetInstr	2020-03-23 15:02:30 -04:00
Ram Nalamothu	24698e526f	Implement wave32 DWARF register mapping Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst. This enables generating appropriate DWARF register numbers for wave64 and wave32 modes.	2020-03-23 10:24:16 -04:00
Matt Arsenault	a950e3beef	AMDGPU: Move towards deprecating alignbit intrinsic This is equivalent to llvm.fshr, so legalize the intrinsic to the generic node.	2020-03-20 11:03:04 -04:00
alex-t	6e34e71869	[AMDGPU] Enable divergence driven ISel for ADD/SUB i64 Summary: Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary. This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step. Reviewers: arsenm, vpykhtin, rampitec Reviewed By: arsenm, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa Differential Revision: https://reviews.llvm.org/D76371	2020-03-20 17:06:11 +03:00
Austin Kerbow	2cbb8c946a	[AMDGPU] Reuse register during frame index elimination If there were no free VGPRs we would need two emergency spill slots for register scavenging during PEI/frame index elimination. Reuse 'ResultReg' for scale calculation so that only one spill is needed. Differential Revision: https://reviews.llvm.org/D76387	2020-03-20 00:19:15 -07:00
cdevadas	728b878de6	[AMDGPU] Set the CostPerUse value for vgpr registers. Apart from the argument registers, set the CostPerUse value as per the ratio reg_index/allocation_granularity. It is a pre-commit for introducing the scratch registers in the ABI. This change should help in a balanced register allocation. Differential Revision: https://reviews.llvm.org/D76417	2020-03-20 11:49:35 +05:30
Matt Arsenault	678da7b109	AMDGPU/GlobalISel: Remove leftover #if 0 The subtarget feature used to be missing from subtargets, but that was fixed.	2020-03-19 20:07:05 -04:00
Scott Linder	0e9368cc8c	[AMDGPU] Move frame pointer from s34 to s33 Remove the gap left between the stack pointer (s32) and frame pointer (s34) now that the scratch wave offset is no longer a part of the calling convention ABI. Update llvm/docs/AMDGPUUsage.rst to reflect the change. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75657	2020-03-19 15:35:16 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Scott Linder	db099f994b	[AMDGPU][NFC] Refactor some uses of unsigned to Register Tags: #llvm Differential Revision: https://reviews.llvm.org/D76035	2020-03-19 15:35:16 -04:00
Scott Linder	30bb113beb	[AMDGPU][NFC] Refactor emitEntryFunctionPrologue Remove dead code and factor repeated conditions out into a single check. Rename and move code to make it more obvious what is running only for entry functions. Simplify function arguments to make it clearer what the relevant inputs are. Make flat scratch init accept an MBB iterator and move it to where it was logically being emitted within the prologue. These changes will make a future update to the calling convention simpler. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75092	2020-03-19 15:35:16 -04:00
Matt Arsenault	4ea1baf6a0	AMDGPU: Initial, crude support for indirect calls This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work. Assumes a uniform call target, and will hit a verifier error if the call target ends up in a VGPR. Also doesn't attempt to do anything sensible for the reported register/stack usage.	2020-03-18 12:03:48 -04:00
Matt Arsenault	ea4597eef1	Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" This reverts commit `9bca8fc4cf`. Rearrange handling to avoid changing the instruction in the case where it's going to be erased and replaced with undef.	2020-03-18 12:01:22 -04:00
Piotr Sobczak	d1a7bfca74	[AMDGPU] Fix AMDGPUUnifyDivergentExitNodes Summary: For the case where "done" bits on existing exports are removed by unifyReturnBlockSet(), unify all return blocks - even the uniformly reached ones. We do not want to end up with a non-unified, uniformly reached block containing a normal export with the "done" bit cleared. That case is believed to be rare - possible with infinite loops in pixel shaders. This is a fix for D71192. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76364	2020-03-18 16:49:30 +01:00
Guillaume Chatelet	d000655a8c	[Alignment][NFC] Deprecate getMaxAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76348	2020-03-18 14:48:45 +01:00
Vitaly Buka	9bca8fc4cf	Revert "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" The patch introduced use-after-poison. This reverts commit `d0fe13ecf9`.	2020-03-17 22:04:14 -07:00
Matt Arsenault	c9b454a1b7	AMDGPU/GlobalISel: Fix verifier errors on image atomics	2020-03-17 20:06:25 -04:00
Scott Linder	68f163df0e	[AMDGPU] Print DWARF register numbers in AMDGPUInstPrinter Summary: Explanation is in a comment in the diff, but essentially printing a physical register name here is ambiguous. Until we can implement printing a DWARF register name here just use the encoding directly. Tags: #llvm Differential Revision: https://reviews.llvm.org/D76253	2020-03-17 19:42:10 -04:00
Sebastian Neubauer	6e29846b29	[AMDGPU] Fix whole wavefront mode We cannot move wwm over exec copies because the exec register needs an exact exec mask. Differential Revision: https://reviews.llvm.org/D76232	2020-03-17 17:23:23 +01:00
Matt Arsenault	039c917b43	AMDGPU/GlobalISel: Fix asserting on gather4 intrinsics	2020-03-17 11:07:30 -04:00
alex-t	48a9cf9043	[AMDGPU] Enable SEXT divergence driven selection. Summary: This change enable the divergence driven selection for the SEXT DAG opcode. Reviewers: vpykhtin, rampitec Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Differential Revision: https://reviews.llvm.org/D76230	2020-03-17 17:30:11 +03:00
Matt Arsenault	d0fe13ecf9	AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize For normal loads, fully eliminate the load. For the TFE case, adjust the dmask value in the instruction so the selector doesn't need to handle it. For the TFE special case, I guess it would be possible to replace the loaded data register with undef, but as-is this will start treating it as a well defined value.	2020-03-17 10:15:30 -04:00
Matt Arsenault	d9a012ed8a	AMDGPU/GlobalISel: Adjust image load register type based on dmask Trim elements that won't be written. The equivalent still needs to be done for writes. Also start widening 3 elements to 4 elements. Selection will get the count from the dmask.	2020-03-17 10:09:18 -04:00
Matt Arsenault	83ffbf2618	AMDGPU/GlobalISel: Legalize non-a16 non-NSA images	2020-03-17 10:02:09 -04:00
Matt Arsenault	2aba9b6cf8	AMDGPU/GlobalISel: Legalize a16 images Pack the address registers in the legalizer. Avoid introducing a huge family of new intermediate operations by filling dead operands with noreg.	2020-03-17 10:02:09 -04:00
Matt Arsenault	80b627d69d	AMDGPU/GlobalISel: Fix handling of G_ANYEXT with s1 source We were letting G_ANYEXT with a vcc register bank through, which was incorrect and would select to an invalid copy. Fix this up like G_ZEXT and G_SEXT. Also drop old code to fixup the non-boolean case in RegBankSelect. We now have to perform that expansion during selection, so there's no benefit to doing it during RegBankSelect.	2020-03-16 12:59:54 -04:00
Matt Arsenault	c460dc6eeb	AMDGPU/GlobalISel: Fix some illegal scalar argument types Fixes integers that don't evenly divide to i32 pieces. We should probably extract some of the code in the legalizer to start handling argument breakdowns. I'm dissatisfied with the argument lowering's handling of vectors for example, and we should not be producing the weird G_EXTRACTs we do now.	2020-03-16 12:51:23 -04:00
Matt Arsenault	84386b2d8a	AMDGPU: Drop special case f64 fround lowering The result is better if ftrunc is emitted and separately legalized when unavailable.	2020-03-16 12:09:30 -04:00
Matt Arsenault	57d896e838	AMDGPU/GlobalISel: Make some large merges legal We allow up to 1024-bit registers, so we should support merges all the way to the maximum.	2020-03-16 10:49:10 -04:00
Sander de Smalen	8105935d3a	[TypeSize] Allow returning scalable size in implicit conversion to uint64_t This patch removes compiler runtime assertions that ensure the implicit conversion are only guaranteed to work for fixed-width vectors. With the assert it would be impossible to get _anything_ to build until the entire codebase has been upgraded, even when the indiscriminate uses of the size as uint64_t would work fine for both scalable and fixed-width types. This issue will need to be addressed differently, with build-time errors rather than assertion failures, but that effort falls beyond the scope of this patch. Returning the scalable size and avoiding the assert in getFixedSize() is a temporary stop-gap in order to use LLVM for compiling and using the SVE ACLE intrinsics. Reviewers: efriedma, huntergr, rovka, ctetreau, rengolin Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D75297	2020-03-15 13:48:49 +00:00
Stanislav Mekhanoshin	c262b69dcc	[AMDGPU] Fix endcf collapse Only collapse inner endcf if the outer one belongs to SI_IF. If it does belong to SI_ELSE then mask being restored in fact a partial inverse of what we need. Differential Revision: https://reviews.llvm.org/D76154	2020-03-13 13:50:21 -07:00
Matt Arsenault	015b640be4	AMDGPU: Add flag to used fixed function ABI Pass all arguments to every function, rather than only passing the minimum set of inputs needed for the call graph.	2020-03-13 13:27:05 -07:00
Matt Arsenault	bb8622094d	AMDGPU: Don't handle kernarg.segment.ptr in functions Just lower this to null. Pass implicitarg.ptr in its place in the argument list.	2020-03-13 12:51:12 -07:00
Stanislav Mekhanoshin	32e90cbcd1	[AMDGPU] Disable endcf collapse There are some functional regressions and I suspect our scopes are not as perfectly enclosed as I expected. Disable it for now. Differential Revision: https://reviews.llvm.org/D76148	2020-03-13 12:33:22 -07:00
Simon Cook	a26bd4ec16	[TableGen] Support combining AssemblerPredicates with ORs For context, the proposed RISC-V bit manipulation extension has a subset of instructions which require one of two SubtargetFeatures to be enabled, 'zbb' or 'zbp', and there is no defined feature which both of these can imply to use as a constraint either (see comments in D65649). AssemblerPredicates allow multiple SubtargetFeatures to be declared in the "AssemblerCondString" field, separated by commas, and this means that the two features must both be enabled. There is no equivalent to say that _either_ feature X or feature Y must be enabled, short of creating a dummy SubtargetFeature for this purpose and having features X and Y imply the new feature. To solve the case where X or Y is needed without adding a new feature, and to better match a typical TableGen style, this replaces the existing "AssemblerCondString" with a dag "AssemblerCondDag" which represents the same information. Two operators are defined for use with AssemblerCondDag, "all_of", which matches the current behaviour, and "any_of", which adds the new proposed ORing features functionality. This was originally proposed in the RFC at http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html Changes to all current backends are mechanical to support the replaced functionality, and are NFCI. At this stage, it is illegal to combine features with ands and ors in a single AssemblerCondDag. I suspect this case is sufficiently rare that adding more complex changes to support it are unnecessary. Differential Revision: https://reviews.llvm.org/D74338	2020-03-13 17:13:51 +00:00
Matt Arsenault	ccc6e780c8	AMDGPU: Directly annotate functions if they have calls Currently we infer whether the flat-scratch-init kernel input should be enabled based on calls. Move this handling, so we can decide if the full set of ABI inputs is needed in kernels. Ideally we would have an analysis of some sort, rather than the function attributes.	2020-03-12 19:10:59 -04:00
Stanislav Mekhanoshin	a73528649c	[AMDGPU] Simplify exec copies The patch removes late endcf handling and only leaves the related portion with redundant exec mask copy elimination. Differential Revision: https://reviews.llvm.org/D76095	2020-03-12 14:54:19 -07:00
Simon Pilgrim	e91feeed21	[AMDGPU] Add ISD::FSHR -> ALIGNBIT support This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions. This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default. Differential Revision: https://reviews.llvm.org/D76070	2020-03-12 20:16:57 +00:00
Stanislav Mekhanoshin	360aff0493	[AMDGPU] Simplify nested SI_END_CF This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still in place and will be removed later, but we know for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64 with EXEC. The subsequent change needs to convert s_and_saveexec into s_and and address new TODO lines in tests, then code block guarded by the -amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer will be removed. Differential Revision: https://reviews.llvm.org/D76033	2020-03-12 11:25:07 -07:00
Sebastian Neubauer	4327a9b46b	[AMDGPU] Use progbits type for .AMDGPU.disasm section The note section type implies a specific format that this section does not have thus tools like readelf fail here. Progbits has no format and another pipeline compiler already sets the type to progbits. Differential Revision: https://reviews.llvm.org/D75913	2020-03-12 09:08:11 +01:00
Matt Arsenault	1e0c540360	AMDGPU: Don't hard error on LDS globals in functions Instead, emit a trap and a warning. We force inlining of this situation, so any function where this happens should be dead as indirect or external calls are not yet supported. This should avoid erroring on dead code.	2020-03-11 15:34:11 -04:00
Stanislav Mekhanoshin	9801e5469b	[AMDGPU] Disable nested endcf collapse The assumption is that conditional regions are perfectly nested and a mask restored at the exit from the inner block will be completely covered by a mask restored in the outer. It turns out with our current structurizer this is not always the case. Disable the optimization for now, but I want to keep it around for a while to either try after further structurizer changes or to move it into control flow lowering where we have more info and reuse the test. Differential Revision: https://reviews.llvm.org/D75958	2020-03-11 11:24:20 -07:00
Jay Foad	a46dba24fa	[AMDGPU] Extend macro fusion for ADDC and SUBB to SUBBREV Summary: There's a lot of test case churn but the overall effect is to increase the number of back-to-back v_sub,v_subbrev pairs, which can execute with no delay even on gfx10. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75999	2020-03-11 17:59:21 +00:00
Matt Arsenault	a2202f6a3f	AMDGPU/GlobalISel: Manually RegBankSelect copies This was failng on any pre-assigned copy to the VCC bank. This is something of a workaround for the default implementation in getInstrMappingImpl, and how it treats copy-like operations in general. Copy-like operations are considered to only have one result register bank, rather than separate banks for each source like a normal instruction. To avoid potentially mishandling reg_sequence with impossible operand combinations, the generic implementation errors on impossible costs. If the bank was already assigned, is treated it as-if it were an unsatisfiable REG_SEQUENCE mapping. We really don't get any value from any of what getInstrMappingImpl tries to do for copies, so just directly emit the simple mapping we really want.	2020-03-11 11:12:12 -04:00
Anna Welker	a6d3bec83f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Matt Arsenault	edd0dfca0d	AMDGPU/GlobalISel: Refine G_TRUNC legality rules Scalarize most truncates. Avoid touching cases that could end up in unresolvable infinite loops.	2020-03-10 15:32:22 -07:00
Matt Arsenault	ce8a1f7294	GlobalISel: Implement fewerElementsVector for G_TRUNC Extend fewerElementsVectorBasic to handle operands with different element types.	2020-03-10 15:17:20 -07:00
Matt Arsenault	200b20639a	AMDGPU: Use V_MAC_F32 for fmad.ftz This avoids regressions in a future patch. I'm confused by the use of the gfx9 usage legacy_mad. Was this a pointless instruction rename, or uses fmul_legacy handling? Why is regular mac avilable in that case?	2020-03-10 14:41:06 -07:00
Jay Foad	c8f0d27ef3	[AMDGPU] Fix the gfx10 scheduling model for f32 conversions Summary: As far as I can tell on gfx10 conversions to/from f32 (that are not converting f32 to/from f64) are full rate instructions, but they were marked as quarter rate instructions. I have fixed this for gfx10 only. I assume the scheduling model was correct for older architectures, though I don't have any documentation handy to confirm that. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75392	2020-03-10 19:31:24 +00:00
Matt Arsenault	67cfbec746	AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns In case the source value ends up in a VGPR, insert a readfirstlane to avoid producing an illegal copy later. If it turns out to be unnecessary, it can be folded out.	2020-03-10 11:18:48 -04:00
alex-t	39e1a90784	[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block Summary: When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop. To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result. %95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32 results to s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_mov_b64 exec, s[6:7] The bug appeared in case this expansion occurs in the ELSE block of the CF. Originally %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32, %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec expanded to ****************** <== here exec has "THEN" context s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten. s_mov_b64 exec, s[6:7] The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask" SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec. Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75472	2020-03-10 14:04:22 +03:00
Matt Arsenault	627bb31a28	AMDGPU/GlobalISel: Avoid illegal vector exts for add/sub/mul When expanding scalar packed operations, we should not introduce illegal vector casts LegalizerHelper introduces. We're not in a legalizer context, and there's no RegBankSelect apply or legalize worklist.	2020-03-09 23:42:17 -04:00
Matt Arsenault	ed72bcae34	AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul We weren't considering the packed case correctly, and this was passing through to the selector. The selector only checked the size, so this would incorrectly compile to a single 32-bit scalar add. As usual, the LegalizerHelper is somewhat awkward to use from applyMappingImpl. I think this is the first place we've needed multi-step legalization here though.	2020-03-09 22:51:54 -04:00
Jay Foad	c7b2e7f527	[AMDGPU] Fix scheduling info for terminator SALU instructions Summary: Instruction variants like S_MOV_B32_term should have the same SchedRW class as the base instruction, S_MOV_B32. This probably doesn't make any difference in practice because as terminators, they'll always be scheduled at the end of a basic block, but it's simply more correct than giving them all the default SchedRW class of Write32Bit, which implies a VALU operation. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75860	2020-03-09 21:39:52 +00:00
Matt Arsenault	eb41627799	AMDGPU/GlobalISel: Improve handling of illegal return types Most importantly, this fixes ret i8. Also make sure to handle signext/zeroext for odd types > i32. Some of the corresponding argument passing fixes also need to be handled.	2020-03-09 13:11:30 -07:00
Matt Arsenault	156a1b59df	AMDGPU: Make signext/zeroext behave more sensibly over > i32 Interpret these as extending to the next multiple of 32-bits. This had no effect with i48 for example, which is really split into {i32, i16}, which should extend the high part.	2020-03-09 12:56:10 -07:00
Matt Arsenault	209094eeb6	AMDGPU/GlobalISel: Start matching s_lshlN_add_u32 instructions Use a hack to only enable this for GlobalISel. Technically this also works with SelectionDAG, but the divergence selection isn't reliable enough and a few cases fail, but I have no desire to spend time writing the manual expansion code for it. The DAG actually does a better job since it catches using v_add_lshl_u32 in the mixed SGPR/VGPR cases.	2020-03-09 12:36:51 -07:00
Jay Foad	daf686b7b9	[AMDGPU] Remove unused SchedWrite class	2020-03-09 16:09:43 +00:00
Jay Foad	11d1573bb6	[APFloat] Make use of new overloaded comparison operators. NFC. Reviewers: ekatz, spatel, jfb, tlively, craig.topper, RKSimon, nikic, scanon Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75744	2020-03-06 16:42:53 +00:00
David Stuttard	a74b33f612	AMDGPU: Fix SMRD test in trivially disjoint mem access code Summary: This seems like an obvious error - cut and paste issue? The change does make a change to one of the lit tests - it stops s_buffer_load re-ordering past an MUBUF instruction (which is not surprising). Change-Id: I80be99de5b62af4f42e91af2591b76a52ac9efa6 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75686	2020-03-05 17:14:01 +00:00
hsmahesha	3fda1fde8f	AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s). Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74688	2020-03-05 08:16:57 +05:30
Matt Arsenault	15bf916b54	AMDGPU: Remove VOP3OpSelMods0 complex pattern Use default operand of 0 instead.	2020-03-04 17:18:22 -05:00
Matt Arsenault	9e1d2afc13	AMDGPU/GlobalISel: Don't use vector G_EXTRACT in arg lowering Create a wider source vector, and unmerge with dead defs like the legalizer. The legalization handling for G_EXTRACT is incomplete, and it's preferrable to keep everything in 32-bit pieces. We should probably start moving these functions into utils, since we have a growing number of places that do almost the same thing.	2020-03-04 16:49:01 -05:00
Matt Arsenault	fb0c35fa34	GlobalISel: Set alignment on function argument stack load/store	2020-03-04 16:38:46 -05:00
Simon Pilgrim	e2f0093800	[AMDGPU] performCvtF32UByteNCombine - revisit node after src operand simplification. If SimplifyDemandedBits succeeds in simplifying the byte src, add the CVT_F32_UBYTE node back to the worklist as we might be able to simplify further. Yet another step towards removing SelectionDAG::GetDemandedBits.	2020-03-04 11:25:50 +00:00
Matt Arsenault	88aced1e45	AMDGPU: Fix computation for getOccupancyWithLocalMemSize The computation here didn't really make sense to me, and reported wildy different results depending on the flat work group size attribute. I think this should really report a range derived from the possible work group size bounds, and only allow an occupancy that is a multiple of the group size.	2020-03-03 17:15:57 -05:00
Fangrui Song	692e0c9648	[MC] Add MCStreamer::emitInt{8,16,32,64} Similar to AsmPrinter::emitInt{8,16,32,64}.	2020-02-29 09:40:21 -08:00
Benjamin Kramer	186dd63182	ArrayRef'ize restoreCalleeSavedRegisters. NFCI. restoreCalleeSavedRegisters can mutate the contents of the CalleeSavedInfos, so use a MutableArrayRef.	2020-02-29 09:50:23 +01:00
Jay Foad	7d973307d5	[AMDGPU] Fix scheduling model for V_MULLIT_F32 This was incorrectly marked as a half rate 64-bit instruction by D45073.	2020-02-28 23:22:58 +00:00
Jay Foad	43830790d7	[AMDGPU] Remove dubious logic in bidirectional list scheduler Summary: pickNodeBidirectional tried to compare the best top candidate and the best bottom candidate by examining TopCand.Reason and BotCand.Reason. This is unsound because, after calling pickNodeFromQueue, Cand.Reason does not reflect the most important reason why Cand was chosen. Rather it reflects the most recent reason why it beat some other potential candidate, which could have been for some low priority tie breaker reason. I have seen this cause problems where TopCand is a good candidate, but because TopCand.Reason is ORDER (which is very low priority) it is repeatedly ignored in favour of a mediocre BotCand. This is not how bidirectional scheduling is supposed to work. To fix this I changed the code to always compare TopCand and BotCand directly, like the generic implementation of pickNodeBidirectional does. This removes some uncommented AMDGPU-specific logic; if this logic turns out to be important then perhaps it could be moved into an override of tryCandidate instead. Graphics shader benchmarking on gfx10 shows a lot more positive than negative effects from this change. Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68338	2020-02-28 21:35:34 +00:00
Teresa Johnson	f9ca75f19b	[Inliner] Inlining should honor nobuiltin attributes Summary: Final patch in series to fix inlining between functions with different nobuiltin attributes/options, which was specifically an issue in LTO. See discussion on D61634 for background. The prior patch in this series (D67923) enabled per-Function TLI construction that identified the nobuiltin attributes. Here I have allowed inlining to proceed if the callee's nobuiltins are a subset of the caller's nobuiltins, but not in the reverse case, which should be conservatively correct. This is controlled by a new option, -inline-caller-superset-nobuiltin, which is enabled by default. Reviewers: hfinkel, gchatelet, chandlerc, davidxl Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74162	2020-02-28 07:34:14 -08:00
Jay Foad	970558df94	[AMDGPU] Mark the scheduling model as complete	2020-02-28 13:35:55 +00:00
Jay Foad	addcbc401c	[AMDGPU] Update a comment missed in `74e2974ac6`	2020-02-28 13:35:55 +00:00
Stanislav Mekhanoshin	6b813f2762	[AMDGPU] Enable runtime unroll for LDS We want to do unroll for LDS even for runtime trip count to combine LDS operations. Differential Revision: https://reviews.llvm.org/D75293	2020-02-27 12:59:35 -08:00
Reid Kleckner	465dca79b3	Avoid SmallString.h include in MD5.h, NFC Saves 200 includes, which is mostly immaterial.	2020-02-26 09:10:24 -08:00
Nicolai Hähnle	d6b05fccb7	Full fix for "AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible" (hopefully) Properly preserve the MachineDominatorTree in all cases. Change-Id: I54cf0c0a20934168a356920ba8ed5097a93c4131	2020-02-26 16:21:44 +01:00
Nicolai Hähnle	0aec4b418e	Quick fix for bot failure on "AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible" Apparently the dominator tree update is incorrect, will investigate. Change-Id: Ie76f8d11b22a552af1f098c893773f3d85e02d4f	2020-02-26 16:02:22 +01:00
Nicolai Hähnle	0f1df48925	AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible Summary: The old code made some incorrect assumptions about the order in which basic blocks are laid out in a function. This could lead to incorrect early-exits, especially when kills occurred inside of loops. The new approach is to check whether the point where the conditional kill occurs dominates all reachable code. If that is the case, there cannot be any other threads in the wave that are waiting to rejoin at a later point in the CFG, i.e. if exec=0 at that point, then all threads really are dead and we can exit the wave. Make some other minor cleanups to the pass while we're at it. v2: preserve the dominator tree Reviewers: arsenm, cdevadas, foad, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74908 Change-Id: Ia0d2b113ac944ad642d1c622b6da1b20aa1aabcc	2020-02-26 15:30:42 +01:00
Scott Linder	481b1c8380	[AMDGPU] Implement wave64 DWARF register mapping Summary: Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst This is currently limited to wave64 VGPRs/AGPRs. This also includes some minor changes in AMDGPUInstPrinter, AMDGPUMCTargetDesc, and AMDGPUAsmParser to make generating CFI assembly text and ELF sections possible to ease testing, although complete CFI support is not yet implemented. Tags: #llvm Differential Revision: https://reviews.llvm.org/D74915	2020-02-25 14:00:01 -05:00
Matt Arsenault	86e13ec194	AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16	2020-02-25 11:20:35 -05:00
Jay Foad	33cbd5ee08	AMDGPU/GlobalISel: Legalize s64 min/max by lowering Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75108	2020-02-25 16:00:43 +00:00
Matt Arsenault	fee41517fe	AMDGPU/GlobalISel: Introduce post-legalize combiner The current set of custom combines are only really useful after legalization, so move them there. There is a lot of overlap in the boilerplate here, but I think we do want a pretty different set of combines before and after legalize. I think we will want a lot of overlap between the post-legalize and a post-regbankselect combiner.	2020-02-24 22:12:12 -05:00
Matt Arsenault	0b46b078b6	AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding We use some s32 values in VOP3P operands, and won't see any intervening casts from a 32-bit fneg. Make sure it's really a packed fneg before folding.	2020-02-24 21:20:35 -05:00

1 2 3 4 5 ...

4754 Commits