llvm-project

Commit Graph

Author	SHA1	Message	Date
Tim Renouf	ad8b7c1190	[AMDGPU] Fixed incorrect break from loop Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258	2018-05-25 07:55:04 +00:00
Tom Stellard	79fffe3515	AMDGPU: Remove AMDGPUMCInstLower.h Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254	2018-05-25 04:57:02 +00:00
Tom Stellard	c501501055	AMDGPU: Split R600 AsmPrinter code into its own class Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219	2018-05-24 20:02:01 +00:00
Tom Stellard	1b95fed6f7	AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153	2018-05-24 05:28:34 +00:00
Matt Arsenault	606bc315d6	AMDGPU: Fix v2f16 fneg/fabs pattern The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019	2018-05-22 20:13:34 +00:00
Tom Stellard	b12f4dec08	AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016	2018-05-22 19:37:55 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Tom Stellard	44b30b4537	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930	2018-05-22 02:03:23 +00:00
Peter Collingbourne	dcd7d6c331	MC: Separate creating a generic object writer from creating a target object writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868	2018-05-21 19:20:29 +00:00
Stanislav Mekhanoshin	9badad2051	[AMDGPU] Add divergence analysis as a dependency for ISel AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862	2018-05-21 18:18:52 +00:00
Peter Collingbourne	571a3301ae	MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857	2018-05-21 17:57:19 +00:00
Tom Stellard	a91ce17b5f	AMDGPU/GlobalISel: Address post-commit review comments for r332379 MCRegisterInfo::getPhysRegSize() will be deprecated. llvm-svn: 332856	2018-05-21 17:49:31 +00:00
Simon Pilgrim	ede0e4073e	Fix MSVC unused variable warning. NFCI. AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807	2018-05-19 12:46:02 +00:00
Matt Arsenault	372d796ab1	AMDGPU: Add pass to optimize reqd_work_group_size Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771	2018-05-18 21:35:00 +00:00
Peter Collingbourne	e3f652973e	Support: Simplify endian stream interface. NFCI. Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757	2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov	caa8251971	AMDGPU/NFC: Set symbol's type that is coming from an argument in EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL. llvm-svn: 332753	2018-05-18 18:41:37 +00:00
Peter Collingbourne	f7b81db715	MC: Change the streamer ctors to take an object writer instead of a stream. NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749	2018-05-18 18:26:45 +00:00
Changpeng Fang	860d460063	AMDGPU/SI: Don't promote alloca to vector for atomic load/store Summary: Don't promote alloca to vector for atomic load/store Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D46085 llvm-svn: 332673	2018-05-17 21:49:44 +00:00
Changpeng Fang	391bcf8893	AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops. Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working. In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized. Reviewer: nhaehnle Differential Revision: https://reviews.llvm.org/D46340 llvm-svn: 332625	2018-05-17 16:45:01 +00:00
Konstantin Zhuravlyov	c72ece6c2c	AMDGPU : Recalculate SGPRs when trap handler is supported Differential Revision: https://reviews.llvm.org/D29911 llvm-svn: 332523	2018-05-16 20:47:48 +00:00
Tony Tye	43259df44a	[AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485	2018-05-16 16:19:34 +00:00
Matt Arsenault	67a9815a5c	AMDGPU: Custom lower v4i16/v4f16 vector operations Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453	2018-05-16 11:47:30 +00:00
Stanislav Mekhanoshin	57d341c27a	[AMDGPU] Fix handling of void types in isLegalAddressingMode It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409	2018-05-15 22:07:51 +00:00
Konstantin Zhuravlyov	f13c9969fc	AMDGPU: Fix v_dot{4, 8}* instruction encoding Differential Revision: https://reviews.llvm.org/D46848 llvm-svn: 332387	2018-05-15 19:32:47 +00:00
Tom Stellard	e182b28ae4	AMDGPU/GlobalISel: Implement select() for G_FCONSTANT Summary: Also clean up G_CONSTANT selection. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46170 llvm-svn: 332379	2018-05-15 17:57:09 +00:00
Konstantin Zhuravlyov	603a43fcd5	AMDGPU: Add disasm tests for deep learning instructions + fix v_fmac_f32 disasm Differential Revision: https://reviews.llvm.org/D46853 llvm-svn: 332377	2018-05-15 17:39:13 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Matt Arsenault	432aaea63f	AMDGPU: Rename OpenCL lowering pass to be R600 specific. This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie llvm-svn: 332196	2018-05-13 10:04:48 +00:00
Matt Arsenault	dfb88dfe30	AMDGPU: Make undef legal for v2i16/v2f16 This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195	2018-05-13 10:04:38 +00:00
Stanislav Mekhanoshin	7012c246c1	[AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166	2018-05-12 01:41:56 +00:00
Tom Stellard	655fdd3f82	AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46153 llvm-svn: 332154	2018-05-11 23:12:49 +00:00
Changpeng Fang	f094885a9e	AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction. Summary: We have no logic to promote alloca to vector for an AddrSpaceCast instruction. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D45993 llvm-svn: 332147	2018-05-11 22:17:57 +00:00
Yaxun Liu	deba150c27	[AMDGPU] Fix compilation failure when IR contains comdat Remove a useless SwitchSection which also causes compilation failure when IR contains comdat. The SwitchSection is useless because the current section is already correct text section for the function therefore no need to switch. It causes compilation failure for comdat because functions with comdat has specific text section, not the default .text section. Since HIP uses comdat, this bug caused failures for HIP. Differential Revision: https://reviews.llvm.org/D46770 llvm-svn: 332137	2018-05-11 20:40:14 +00:00
Tom Stellard	dcc95e9385	AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45883 llvm-svn: 332082	2018-05-11 05:44:16 +00:00
Tom Stellard	1e0edad4bb	AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16> Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45881 llvm-svn: 332042	2018-05-10 21:20:10 +00:00
Tom Stellard	1dc90204bf	AMDGPU/GlobalISel: Enable TableGen'd instruction selector Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45994 llvm-svn: 332039	2018-05-10 20:53:06 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Matt Arsenault	eac81b2448	AMDGPU: Ignore any_extend in mul24 combine If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919	2018-05-09 21:11:35 +00:00
Matt Arsenault	74fd7600d2	AMDGPU: Handle partial shift reduction for variable shifts If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917	2018-05-09 20:52:54 +00:00
Matt Arsenault	b143d9a5ea	AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit This is an extension of an existing combine to reduce wider shls if the result fits in the final result type. This introduces the same combine, but reduces the shift to a middle sized type to avoid the slow 64-bit shift. llvm-svn: 331916	2018-05-09 20:52:43 +00:00
Matt Arsenault	762d498808	AMDGPU: Add combine for trunc of bitcast from build_vector If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. llvm-svn: 331909	2018-05-09 18:37:39 +00:00
Matt Arsenault	378f86998c	AMDGPU: Stop special casing constant indexes of extract_vector_elt The same result folds out of the dynamic expansion logic if the index is constant. llvm-svn: 331906	2018-05-09 18:29:26 +00:00
Shiva Chen	801bf7ebbe	[DebugInfo] Examine all uses of isDebugValue() for debug instructions. Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844	2018-05-09 02:42:00 +00:00
Tim Renouf	64afc2d7f0	[AMDGPU] Provide machine -> name mapping Summary: AMDGPU stores a numerical code for the particular GPU variant in EFlags in the ELF file. This commit provides a mapping from that number into the machine name for use by objdump-type tools. Change-Id: Id37fc0bebad443bd89c0080985ce298c4e7e9319 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46587 llvm-svn: 331798	2018-05-08 18:53:04 +00:00
Matt Arsenault	869cbedc81	AMDGPU: Fix broken dynamic vector indexing for packed types The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793	2018-05-08 18:43:25 +00:00
Changpeng Fang	d049da3740	AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed. Reviewer: Nicolai Differential Revision: https://reviews.llvm.org/D46438 llvm-svn: 331788	2018-05-08 18:32:35 +00:00
Stanislav Mekhanoshin	432936161e	[AMDGPU] Added checks for dpp_ctrl value - Report error for invalid dpp_ctrl values. - Changed the way it is reported, now the error will be emitted into asm and will work with release build as well. - Added dpp_ctrl value verifier for codegen. - Added symbolic constants for dpp_ctrl. Differential Revision: https://reviews.llvm.org/D46565 llvm-svn: 331775	2018-05-08 16:53:02 +00:00
Tom Stellard	37444285f1	AMDGPU/GlobalISel: Don't try to lower hull shaders Summary: The AMDGPU_HS calling convention is not supported yet. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46149 llvm-svn: 331691	2018-05-07 22:17:54 +00:00
Mark Searles	4a0f2c5047	[AMDGPU][Waitcnt] Remove the old waitcnt pass Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained and getting crufty Differential Revision: https://reviews.llvm.org/D46448 llvm-svn: 331641	2018-05-07 14:43:28 +00:00
Tim Renouf	18a1e9d03a	[AMDGPU] Don't force WQM for DS op Summary: Previously, all DS ops forced WQM in a pixel shader. That was a hack to allow for graphics frontends using ds_swizzle to implement explicit derivatives, on SI/CI at least where DPP is not available. But it forced WQM for _any_ DS op. With this commit, DS ops no longer force WQM. Both graphics frontends (Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm intrinsic call when calculating explicit derivatives. The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for explicit derivatives". Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46051 Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81 llvm-svn: 331633	2018-05-07 13:21:26 +00:00

1 2 3 4 5 ...

2598 Commits