llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	b2ea20eedd	AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt llvm-svn: 364819	2019-07-01 17:40:18 +00:00
Matt Arsenault	40d1faf38f	AMDGPU/GlobalISel: Legalize s16 fcmp llvm-svn: 364817	2019-07-01 17:35:53 +00:00
Nicolai Haehnle	10c911db63	AMDGPU/GFX10: implement ds_ordered_count changes Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815	2019-07-01 17:17:52 +00:00
Nicolai Haehnle	4dc3b2bf95	AMDGPU: Support GDS atomics Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814	2019-07-01 17:17:45 +00:00
Matt Arsenault	1094e6a814	AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap llvm-svn: 364811	2019-07-01 17:04:57 +00:00
Matt Arsenault	265059eaf6	AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane llvm-svn: 364808	2019-07-01 16:41:36 +00:00
Matt Arsenault	a310727830	AMDGPU/GlobalISel: Fail instead of assert when selecting loads llvm-svn: 364807	2019-07-01 16:36:39 +00:00
Matt Arsenault	0a52e9d026	AMDGPU/GlobalISel: Complete implementation of G_GEP Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806	2019-07-01 16:34:48 +00:00
Matt Arsenault	e1006259d8	AMDGPU/GlobalISel: Select G_PHI llvm-svn: 364805	2019-07-01 16:32:47 +00:00
Matt Arsenault	d810ff2588	AMDGPU/GlobalISel: Try to select VOP3 form of add There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804	2019-07-01 16:27:32 +00:00
Matt Arsenault	62d64b0c30	AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane llvm-svn: 364801	2019-07-01 16:19:39 +00:00
Tom Stellard	9e9dd30de3	AMDGPU/GlobalISel: Implement select for 32-bit G_ADD Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797	2019-07-01 16:09:33 +00:00
Matt Arsenault	2ab25f9ceb	AMDGPU/GlobalISel: Select G_BRCOND for vcc llvm-svn: 364795	2019-07-01 16:06:02 +00:00
Matt Arsenault	cda82f0bb6	AMDGPU/GlobalISel: Select G_FRAME_INDEX llvm-svn: 364789	2019-07-01 15:48:18 +00:00
Nicolai Haehnle	7cfd99ab15	AMDGPU/GFX10: fix scratch resource descriptor Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788	2019-07-01 15:43:00 +00:00
Matt Arsenault	fdf36729c7	AMDGPU/GlobalISel: Make s16 select legal This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787	2019-07-01 15:42:47 +00:00
Matt Arsenault	6464280eb0	AMDGPU/GlobalISel: Select G_BRCOND for scc conditions llvm-svn: 364786	2019-07-01 15:39:27 +00:00
Matt Arsenault	1daad91af6	AMDGPU/GlobalISel: Tolerate copies with no type set isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784	2019-07-01 15:23:04 +00:00
Matt Arsenault	4f64ade04c	AMDGPU/GlobalISel: Select src modifiers llvm-svn: 364782	2019-07-01 15:18:56 +00:00
Matt Arsenault	1b317685e9	AMDGPU: Convert some places to Register llvm-svn: 364769	2019-07-01 13:44:46 +00:00
Matt Arsenault	5bf850d52e	AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE llvm-svn: 364768	2019-07-01 13:40:18 +00:00
Matt Arsenault	b5fc94f3e7	AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR llvm-svn: 364767	2019-07-01 13:40:17 +00:00
Matt Arsenault	89fc8bcdd6	AMDGPU/GlobalISel: Fail on store to 32-bit address space llvm-svn: 364766	2019-07-01 13:37:39 +00:00
Matt Arsenault	3b7668ae4b	AMDGPU/GlobalISel: Improve icmp selection coverage. Select s64 eq/ne scalar icmp. llvm-svn: 364765	2019-07-01 13:34:26 +00:00
Matt Arsenault	c23149f612	AMDGPU/GlobalISel: RegBankSelect for WWM/WQM llvm-svn: 364763	2019-07-01 13:30:12 +00:00
Matt Arsenault	facf69e844	AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote llvm-svn: 364762	2019-07-01 13:30:09 +00:00
Matt Arsenault	9f992c238a	AMDGPU/GlobalISel: Fix scc->vcc copy handling This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761	2019-07-01 13:22:07 +00:00
Matt Arsenault	5dafcb9b11	AMDGPU/GlobalISel: Use and instead of BFE with inline immediate Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760	2019-07-01 13:22:06 +00:00
Florian Hahn	33c8c0ea27	[AMDGPU] Call isLoopExiting for blocks in the loop. isLoopExiting should only be called for blocks in the loop. A follow up patch makes this requirement an assertion. I've updated the usage here, to only match for actual exit blocks. Previously, it would also match blocks not in the loop. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D63980 llvm-svn: 364750	2019-07-01 12:36:44 +00:00
Matt Arsenault	0d45209757	AMDGPU/GlobalISel: RegBankSelect for update.dpp llvm-svn: 364701	2019-06-29 00:44:36 +00:00
Matt Arsenault	fd82cf4f4d	AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec llvm-svn: 364699	2019-06-29 00:39:20 +00:00
Matt Arsenault	adb1f21e52	AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics llvm-svn: 364698	2019-06-29 00:33:13 +00:00
Matt Arsenault	b416d5fc8b	AMDGPU/GlobalISel: RegBankSelect for some easy intrinsics llvm-svn: 364697	2019-06-29 00:29:56 +00:00
Matt Arsenault	5ea3c9adb2	AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsics llvm-svn: 364696	2019-06-29 00:28:52 +00:00
Matt Arsenault	6aafb3068f	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmas llvm-svn: 364695	2019-06-29 00:25:53 +00:00
Matt Arsenault	ade5162432	AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsics llvm-svn: 364694	2019-06-29 00:22:28 +00:00
Dmitry Preobrazhensky	e1eb25ff3e	[AMDGPU][MC] Fix 2 for sanitizer failure in 364645 llvm-svn: 364656	2019-06-28 16:28:46 +00:00
Dmitry Preobrazhensky	d12966c088	[AMDGPU][MC] Fix for sanitizer failure in 364645 llvm-svn: 364651	2019-06-28 15:22:47 +00:00
Dmitry Preobrazhensky	1d572ce395	[AMDGPU][MC] Enabled constant expressions as operands of sendmsg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645	2019-06-28 14:14:02 +00:00
Stanislav Mekhanoshin	07fd88d735	[AMDGPU] Packed thread ids in function call ABI Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619	2019-06-28 01:52:13 +00:00
Matt Arsenault	1178dc3d0b	AMDGPU/GlobalISel: Convert to using Register llvm-svn: 364616	2019-06-28 01:16:46 +00:00
Nicolai Haehnle	32ef9292be	AMDGPU: Make fixing i1 copies robust against re-ordering Summary: The new test case led to incorrect code. Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca Reviewers: arsenm, david-salinas Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63871 llvm-svn: 364566	2019-06-27 16:56:44 +00:00
Diana Picus	c3dbe23977	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510	2019-06-27 08:54:17 +00:00
Jay Foad	8479240b0a	[AMDGPU] Fix +DumpCode to print an entry label for the first function Summary: The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. It tries to print an entry label at the start of every function, but that didn't work for the first function in the module because DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart which is too late. Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee Reviewers: arsenm, tpr, kzhuravl Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63712 llvm-svn: 364508	2019-06-27 08:19:28 +00:00
Matt Arsenault	c0cad98363	AMDGPU: Assert SPAdj is 0 llvm-svn: 364473	2019-06-26 20:56:18 +00:00
Matt Arsenault	6a87e0fc6a	[AMDGPU] Fix Livereg computation during epilogue insertion The LivePhysRegs calculated in order to find a scratch register in the epilogue code wrongly uses 'LiveIns'. Instead, it should use the 'Liveout' sets. For the liveness, also considering the operands of the terminator (return) instruction which is the insertion point for the scratch-exec-copy instruction. Patch by Christudasan Devadasan llvm-svn: 364470	2019-06-26 20:35:18 +00:00
Ryan Taylor	9ab812d475	[AMDGPU] Fix for branch offset hardware workaround Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451	2019-06-26 17:34:57 +00:00
Matt Arsenault	5f798f1346	AMDGPU: Fix unused variable llvm-svn: 364426	2019-06-26 13:48:04 +00:00
Matt Arsenault	e0b8443460	AMDGPU: Check MRI for callee saved regs instead of TRI This should the same, but MRI does allow dynamically changing the CSR set, although currently not used. llvm-svn: 364425	2019-06-26 13:39:29 +00:00
Matt Arsenault	8fcc70f141	Don't look for the TargetFrameLowering in the implementation The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349	2019-06-25 20:53:35 +00:00
Diego Novillo	688afeb884	Update phis in AMDGPUUnifyDivergentExitNodes Original patch https://reviews.llvm.org/D63659 from Steven Perron <stevenperron@google.com> The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in the successors of blocks that is splits. This is fixed by calling BasicBlock::splitBasicBlock to split the block instead of doing it manually. This does extra work because a new conditional branch is created in BB which is immediately replaced, but I think the simplicity is worth it. It also helps make the code more future proof in case other things need to be updated. llvm-svn: 364342	2019-06-25 18:55:16 +00:00
Stanislav Mekhanoshin	4be636ebb3	[AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR() Differential Revision: https://reviews.llvm.org/D63780 llvm-svn: 364339	2019-06-25 18:33:53 +00:00
Michael Liao	f0a665afca	[AMDGPU] Null checking on TS to avoid crashing in clang tests. - `test/Misc/backend-resource-limit-diagnostics.cl` crashes as null streamer is used. llvm-svn: 364318	2019-06-25 14:06:34 +00:00
Matt Arsenault	d7ffa2a948	AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT llvm-svn: 364308	2019-06-25 13:18:11 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00
Nicolai Haehnle	08e8cb5760	AMDGPU/MC: Add .amdgpu_lds directive Summary: The directive defines a symbol as an group/local memory (LDS) symbol. LDS symbols behave similar to common symbols for the purposes of ELF, using the processor-specific SHN_AMDGPU_LDS as section index. It is the linker and/or runtime loader's job to "instantiate" LDS symbols and resolve relocations that reference them. It is not possible to initialize LDS memory (not even zero-initialize as for .bss). We want to be able to link together objects -- starting with relocatable objects, but possible expanding to shared objects in the future -- that access LDS memory in a flexible way. LDS memory is in an address space that is entirely separate from the address space that contains the program image (code and normal data), so having program segments for it doesn't really make sense. Furthermore, we want to be able to compile multiple kernels in a compilation unit which have disjoint use of LDS memory. In that case, we may want to place LDS symbols differently for different kernels to save memory (LDS memory is very limited and physically private to each kernel invocation), so we can't simply place LDS symbols in a .lds section. Hence this solution where LDS symbols always stay undefined. Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61493 llvm-svn: 364296	2019-06-25 11:51:35 +00:00
Matt Arsenault	25bc27965a	AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class llvm-svn: 364262	2019-06-25 01:07:22 +00:00
Matt Arsenault	dbb6c03175	AMDGPU/GlobalISel: Select G_TRUNC llvm-svn: 364215	2019-06-24 18:02:18 +00:00
Matt Arsenault	14d0b646b7	AMDGPU/GlobalISel: RegBankSelect for amdgcn.class llvm-svn: 364214	2019-06-24 18:00:47 +00:00
Matt Arsenault	8fcd5ade3e	AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect Scalar extends to s64 can use S_BFE_{I64\|U64}, but vector extends need to extend to the 32-bit half, and then to 64. I'm not sure what the line should be between what RegBankSelect handles, and what instruction select does, but for now I'm erring on the side of RegBankSelect for future post-RBS combines. llvm-svn: 364212	2019-06-24 17:54:12 +00:00
Tim Renouf	d2fdb956e0	[AMDGPU] Allow any value in unused src0 field in v_nop Summary: The LLVM disassembler assumes that the unused src0 operand of v_nop is zero. Other tools can put another value in that field, which is still valid. This commit fixes the LLVM disassembler to recognize such an encoding as v_nop, in the same way as we already do for s_getpc. Differential Revision: https://reviews.llvm.org/D63724 Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f llvm-svn: 364208	2019-06-24 17:35:20 +00:00
Matt Arsenault	f8a841b88e	AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1 Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199	2019-06-24 16:24:03 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Bjorn Pettersson	3260ef16bb	[AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFC Summary: Removing the unused variable AllSGPRSpilledToVGPRs in SIFrameLowering::processFunctionBeforeFrameFinalized to avoid error: variable 'AllSGPRSpilledToVGPRs' set but not used [-Werror=unused-but-set-variable] Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63721 llvm-svn: 364190	2019-06-24 15:50:18 +00:00
Matt Arsenault	5dbd9228c4	AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyext This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186	2019-06-24 14:53:58 +00:00
Matt Arsenault	60957cb74c	AMDGPU: Fold frame index into MUBUF This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185	2019-06-24 14:53:56 +00:00
Matt Arsenault	942404d01b	AMDGPU: Cleanup checking when spills need emergency slots Address fixme, which should no longer be a problem since r363757. llvm-svn: 364182	2019-06-24 14:34:40 +00:00
Matt Arsenault	22e3dc60a0	AMDGPU: Fix not using s33 for scratch wave offset in kernels Fixes missing piece from r363990. llvm-svn: 364099	2019-06-21 20:04:02 +00:00
Stanislav Mekhanoshin	bdf7f81b89	[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074	2019-06-21 16:30:14 +00:00
Matt Arsenault	d88db6d7fc	AMDGPU: Always use s33 for global scratch wave offset Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990	2019-06-20 21:58:24 +00:00
Matt Arsenault	740322f1eb	AMDGPU: Add intrinsics for DS GWS semaphore instructions llvm-svn: 363983	2019-06-20 21:11:42 +00:00
Matt Arsenault	8ad1decf45	AMDGPU: Insert mem_viol check loop around GWS pre-GFX9 It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979	2019-06-20 20:54:32 +00:00
Matt Arsenault	5dc457cbe4	AMDGPU: Fix ignoring DisableFramePointerElim in leaf functions The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949	2019-06-20 17:03:23 +00:00
Matt Arsenault	b7f87c0ecf	AMDGPU: Treat undef as an inline immediate This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941	2019-06-20 16:01:09 +00:00
Stanislav Mekhanoshin	0846c125f9	[AMDGPU] gfx1010 core wave32 changes Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934	2019-06-20 15:08:34 +00:00
Matt Arsenault	c67c484f36	AMDGPU: Don't clobber VCC in MUBUF addr64 emulation Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904	2019-06-20 00:51:28 +00:00
Matt Arsenault	e4c2e9b016	AMDGPU: Consolidate some getGeneration checks This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902	2019-06-19 23:54:58 +00:00
Matt Arsenault	e24b34e9c9	AMDGPU: Undo sub x, c canonicalization for v2i16 Should avoid regression from D62341 llvm-svn: 363899	2019-06-19 23:37:43 +00:00
Matt Arsenault	4d000d2488	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876	2019-06-19 20:44:15 +00:00
Matt Arsenault	4d55d024be	Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics" This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870	2019-06-19 19:55:27 +00:00
Simon Pilgrim	128ce93c60	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797	2019-06-19 13:00:54 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Stanislav Mekhanoshin	bb1c8b6f5c	[AMDGPU] gfx10 wave32 patterns Differential Revision: https://reviews.llvm.org/D63511 llvm-svn: 363729	2019-06-18 20:00:24 +00:00
Stanislav Mekhanoshin	ab4f2ea793	[AMDGPU] gfx1010 disassembler changes for wave32 Differential Revision: https://reviews.llvm.org/D63506 llvm-svn: 363721	2019-06-18 19:10:59 +00:00
Matt Arsenault	8d35dcd703	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678	2019-06-18 13:19:57 +00:00
Matt Arsenault	f39f3bd056	AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675	2019-06-18 12:48:36 +00:00
Matt Arsenault	bcb5ea0042	AMDGPU: Fold readlane from copy of SGPR or imm These may be inserted to assert uniformity somewhere. llvm-svn: 363670	2019-06-18 12:23:46 +00:00
Matt Arsenault	e75e197ad8	AMDGPU: Remove unnecessary check for virtual register The copy was found by searching the uses of a virtual register, so it's already known to be virtual. llvm-svn: 363669	2019-06-18 12:23:45 +00:00
Matt Arsenault	23f03f5059	AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668	2019-06-18 12:23:44 +00:00
Matt Arsenault	d5ce8ec778	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale llvm-svn: 363667	2019-06-18 12:23:42 +00:00
Valery Pykhtin	7e854e1cdd	[AMDGPU] Speed up live-in virtual register set computaion in GCNScheduleDAGMILive. Differential revision: https://reviews.llvm.org/D62401 llvm-svn: 363661	2019-06-18 11:43:17 +00:00
Stanislav Mekhanoshin	121956108f	[AMDGPU] Use custom inserter for gfx10 VOP2b This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625	2019-06-17 22:37:37 +00:00
Stanislav Mekhanoshin	3138278287	[AMDGPU] Propagate function attributes thru bitcasts AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614	2019-06-17 20:42:48 +00:00
Nicolai Haehnle	ae4fcb97dd	AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602	2019-06-17 19:28:43 +00:00
Stanislav Mekhanoshin	a9191c8492	[AMDGPU] gfx1010 wavefrontsize intrinsic folding Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588	2019-06-17 17:57:50 +00:00
Stanislav Mekhanoshin	ad04e7ad42	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586	2019-06-17 17:47:28 +00:00
Matt Arsenault	a7f09f3c9e	GlobalISel: Verify intrinsics I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579	2019-06-17 17:01:32 +00:00
Matt Arsenault	fee1949b35	AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic ID llvm-svn: 363578	2019-06-17 17:01:27 +00:00
Stanislav Mekhanoshin	5d00c3060e	[AMDGPU] gfx1010 wave32 metadata Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577	2019-06-17 16:48:56 +00:00

1 2 3 4 5 ...

3597 Commits