llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	6aafc5e19d	AMDGPU/GlobalISel: Legalize G_FRINT llvm-svn: 361026	2019-05-17 12:19:57 +00:00
Matt Arsenault	1448f5689e	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN llvm-svn: 361025	2019-05-17 12:19:52 +00:00
Matt Arsenault	568f193847	AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load llvm-svn: 361023	2019-05-17 12:02:34 +00:00
Matt Arsenault	a3b5a386fa	AMDGPU/GlobalISel: Use subreg index instead of extra unmerge This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022	2019-05-17 12:02:31 +00:00
Matt Arsenault	b3dc73634c	AMDGPU/GlobalISel: Use waterfall loop for buffer_load This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021	2019-05-17 12:02:27 +00:00
Rhys Perry	c4bc61bad7	[AMDGPU] detect WaW hazards when moving/merging load/store instructions Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008	2019-05-17 09:32:23 +00:00
Matt Arsenault	99e6f4d11a	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906	2019-05-16 15:10:27 +00:00
Matt Arsenault	df24c92c0f	AMDGPU: Assume xnack is enabled by default This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903	2019-05-16 14:48:34 +00:00
Matt Arsenault	a8f88c388f	AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877	2019-05-16 12:06:41 +00:00
Ryan Taylor	29257eb76c	[AMDGPU] Increases available SGPR for Calling Convention Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778	2019-05-15 14:43:55 +00:00
Richard Trieu	8ce2ee9d56	[AMDGPU] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360713	2019-05-14 21:54:37 +00:00
Dmitry Preobrazhensky	ee51d851ea	[AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D61905 llvm-svn: 360702	2019-05-14 19:16:24 +00:00
Stanislav Mekhanoshin	05791d90c9	[AMDGPU] Fixed handling of imemdiate i1 literals This bug was exposed by the rL360395. Differential Revision: https://reviews.llvm.org/D61812 llvm-svn: 360689	2019-05-14 16:18:00 +00:00
Tim Renouf	33cb8f5b54	[AMDGPU] Fixed +DumpCode The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. Longer term, we should re-implement that by using the LLVM disassembler from the Vulkan driver. Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and with -filetype=obj I think it did not include any instructions, only the labels. Fixed with this commit: now it has no effect with -filetype=asm, and works as intended with -filetype=obj. Differential Revision: https://reviews.llvm.org/D60682 Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b llvm-svn: 360688	2019-05-14 16:17:14 +00:00
Stanislav Mekhanoshin	79b2828b3f	[AMDGPU] Reorder includes per coding standard. NFC. llvm-svn: 360609	2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin	21088639ae	[AMDGPU] Remove now unused V2FP16_ONE constant def. NFC. llvm-svn: 360608	2019-05-13 17:52:57 +00:00
Richard Trieu	c0bd7bd481	[AMDGPU] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360487	2019-05-11 00:03:35 +00:00
Stanislav Mekhanoshin	64196850f0	[AMDGPU] Pattern for v_xor3_b32 This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395	2019-05-10 00:09:01 +00:00
Stanislav Mekhanoshin	a76da34b1d	[AMDGPU] gfx1010 v_interp_* instructions Differential Revision: https://reviews.llvm.org/D61703 llvm-svn: 360364	2019-05-09 18:38:55 +00:00
Stanislav Mekhanoshin	4d4c9e0757	[AMDGPU] gfx1010 changes for PAL metadata Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353	2019-05-09 16:34:13 +00:00
Matt Arsenault	462403a5c8	AMDGPU: Mark scheduler classes as final llvm-svn: 360294	2019-05-08 22:10:04 +00:00
Matt Arsenault	01434f9377	AMDGPU: Select VOP3 form of add The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293	2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin	1dbf721315	[AMDGPU] gfx1010 exp modifications Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287	2019-05-08 21:23:37 +00:00
Changpeng Fang	73b7272e7a	AMDGPU: Fix a mis-placed bracket Differential Revision: https://reviews.llvm.org/D61430 llvm-svn: 360283	2019-05-08 19:46:04 +00:00
Simon Pilgrim	e3eec06dde	[AMDGPU] Reapplied BFE canonicalization from D60462 This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263	2019-05-08 15:49:10 +00:00
Simon Pilgrim	02937dad69	R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI. Fixes static analyzer undefined value warning. llvm-svn: 360239	2019-05-08 10:39:56 +00:00
Simon Pilgrim	be9ade93d1	[SIMode] Fix typo in Status constructor As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names. Differential Revision: https://reviews.llvm.org/D61595 llvm-svn: 360236	2019-05-08 10:24:22 +00:00
Austin Kerbow	8a3d3a9af6	[AMDGPU] Check MI bundles for hazards Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually. Reviewers: arsenm, msearles, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61564 llvm-svn: 360199	2019-05-07 22:12:15 +00:00
Nicolai Haehnle	79ea85c6af	AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand Summary: No test case because I don't know of a way to trigger this, but I accidentally caused this to fail while working on a different change. Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61490 llvm-svn: 360123	2019-05-07 09:19:09 +00:00
Stanislav Mekhanoshin	491746a584	[AMDGPU] gfx1010 verifier changes Differential Revision: https://reviews.llvm.org/D61521 llvm-svn: 360095	2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin	971cb8b633	[AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32 GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094	2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin	1bc001dec4	[AMDGPU] gfx1010 memory legalizer Differential Revision: https://reviews.llvm.org/D61535 llvm-svn: 360087	2019-05-06 21:57:02 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Alexandre Ganea	799d96ec39	Fix compilation warnings when compiling with GCC 7.3 Differential Revision: https://reviews.llvm.org/D61046 llvm-svn: 360044	2019-05-06 13:41:54 +00:00
Stanislav Mekhanoshin	5ddd564e19	[AMDGPU] Fixed asan error after D61536 llvm-svn: 359963	2019-05-04 06:40:20 +00:00
Stanislav Mekhanoshin	51d1415a16	AMDGPU] gfx1010 hazard recognizer Differential Revision: https://reviews.llvm.org/D61536 llvm-svn: 359961	2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin	28a1936f6d	[AMDGPU] gfx1010: use fmac instructions Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959	2019-05-04 04:20:37 +00:00
Stanislav Mekhanoshin	d9dcf392c7	[AMDGPU] gfx1010 wait count insertion Differential Revision: https://reviews.llvm.org/D61534 llvm-svn: 359938	2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin	41bbe101a2	[AMDGPU] gfx1010 s_code_end generation Also add some missing metadata in the streamer. Differential Revision: https://reviews.llvm.org/D61531 llvm-svn: 359937	2019-05-03 21:26:39 +00:00
Stanislav Mekhanoshin	93f15c922f	[AMDGPU] gfx1010 loop alignment Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935	2019-05-03 21:17:29 +00:00
Matt Arsenault	657ef48a88	AMDGPU: Select VOP3 form of sub The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899	2019-05-03 15:37:07 +00:00
Matt Arsenault	cfd0ca38b0	AMDGPU: Support shrinking add with FI in SIFoldOperands Avoids test regression in a future patch llvm-svn: 359898	2019-05-03 15:21:53 +00:00
Matt Arsenault	344d68d3c9	AMDGPU: Remove redundant patterns for shifts llvm-svn: 359895	2019-05-03 15:08:36 +00:00
Matt Arsenault	ada33314a2	AMDGPU: Remove redundant patterns for sub There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894	2019-05-03 15:08:35 +00:00
Matt Arsenault	0446fbe45e	AMDGPU: Replace shrunk instruction with dummy implicit_def This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891	2019-05-03 14:40:10 +00:00
Matt Arsenault	2c8936fd26	AMDGPU: Fix incorrect commute with sub when folding immediates When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883	2019-05-03 13:42:56 +00:00
Sanjay Patel	284472be6d	[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791	2019-05-02 14:47:59 +00:00
Stanislav Mekhanoshin	64399da8b8	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757	2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin	5cf8167735	[AMDGPU] gfx1010 allows VOP3 to have a literal Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756	2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin	f2baae0abb	[AMDGPU] gfx1010 constant bus limit Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754	2019-05-02 03:47:23 +00:00
Stanislav Mekhanoshin	3b7925f035	[AMDGPU] gfx1010 GCNRegBankReassign pass Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704	2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin	c29d491596	[AMDGPU] gfx1010 GCNNSAReassign pass Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700	2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin	692560dc98	[AMDGPU] gfx1010 MIMG implementation Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698	2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin	a224f68a10	[AMDGPU] gfx1010 DS implementation Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696	2019-05-01 16:11:11 +00:00
Stanislav Mekhanoshin	a6322941ff	[AMDGPU] gfx1010 VMEM and SMEM implementation Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621	2019-04-30 22:08:23 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Simon Pilgrim	19cde62008	Avoid "checking a pointer after dereferencing" warning. NFCI. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359473	2019-04-29 17:38:18 +00:00
Simon Pilgrim	6f349d8c39	Move if() to newline to stop ambiguity over whether it should be else if. NFCI. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359472	2019-04-29 17:34:26 +00:00
Mark Searles	76c5b62988	Revert "AMDGPU: Split block for si_end_cf" This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363	2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin	4f331cb1f3	[AMDGPU] gfx1010 VOPC implementation Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358	2019-04-26 23:16:16 +00:00
Stanislav Mekhanoshin	61beff020e	[AMDGPU] gfx1010 VOP3 and VOP3P implementation Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328	2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin	8f3da70eed	[AMDGPU] gfx1010 VOP2 changes Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316	2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin	917c477a07	[AMDGPU] gfx1010 - fix ubsan failure Revert DecoderNamespace in one place for now. It will need more changes to properly work. llvm-svn: 359239	2019-04-25 20:39:06 +00:00
Stanislav Mekhanoshin	2c97ff07bf	[AMDGPU] gfx1010 VOP1 instructions Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225	2019-04-25 19:01:51 +00:00
Stanislav Mekhanoshin	956b0be72e	[AMDGPU] gfx1010 utility functions Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224	2019-04-25 18:53:41 +00:00
Austin Kerbow	83e52142d1	Fix spelling error. NFC Summary: Test commit. Reviewers: msearles, jkorous Reviewed By: jkorous Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61093 llvm-svn: 359154	2019-04-24 23:32:21 +00:00
Stanislav Mekhanoshin	9d287358a8	[AMDGPU] gfx1010 SOP instructions Differential Revision: https://reviews.llvm.org/D61080 llvm-svn: 359139	2019-04-24 20:44:34 +00:00
Stanislav Mekhanoshin	33d806a517	[AMDGPU] gfx1010 sgpr register changes Differential Revision: https://reviews.llvm.org/D61045 llvm-svn: 359117	2019-04-24 17:28:30 +00:00
Stanislav Mekhanoshin	cee607e414	[AMDGPU] Add gfx1010 target definitions Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113	2019-04-24 17:03:15 +00:00
Dmitry Preobrazhensky	47621d7c89	[AMDGPU][MC] Parser cleanup and refactoring Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60767 llvm-svn: 359096	2019-04-24 14:06:15 +00:00
Stanislav Mekhanoshin	c464dddccb	[AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cpp The second argument is flags, not subreg. Differential Revision: https://reviews.llvm.org/D61031 llvm-svn: 359017	2019-04-23 17:59:26 +00:00
Scott Linder	3eed961973	[AMDGPU] Fix hidden argument metadata duplication for V3 Essentially complete a proper rebase of the V3 metadata change over https://reviews.llvm.org/D49096. Minimize the diff between the V2 and V3 variants of the relevant lit tests, and clean up some trailing whitespace. llvm-svn: 358992	2019-04-23 14:31:17 +00:00
Nicolai Haehnle	7edae4c403	AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging. This then used to trigger an assertion when processing a dependent phi instruction. Change-Id: Id4949719f8298062fe476a25718acccc109113b6 Reviewers: llvm-commits Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D60999 llvm-svn: 358983	2019-04-23 13:12:52 +00:00
Fedor Sergeev	652168a99b	[CallSite removal] move InlineCost to CallBase usage Converting InlineCost interface and its internals into CallBase usage. Inliners themselves are still not converted. Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D60636 llvm-svn: 358982	2019-04-23 12:43:27 +00:00
Michael Liao	389d5a3474	[AMDGPU] Fix an issue in `op_sel_hi` skipping. Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922	2019-04-22 22:05:49 +00:00
Matt Arsenault	f84ce75cd1	AMDGPU: Skip debug instructions in assert These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909	2019-04-22 19:14:26 +00:00
Matt Arsenault	2b6f76f05f	AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources llvm-svn: 358894	2019-04-22 15:22:46 +00:00
Matt Arsenault	70346d127b	AMDGPU: Fix not checking for copy when looking at copy src Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890	2019-04-22 14:54:39 +00:00
Dmitry Preobrazhensky	e2707f5aac	[AMDGPU][MC] Corrected parsing of SP3 'neg' modifier See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888	2019-04-22 14:35:47 +00:00
Simon Pilgrim	6276ce0142	[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887	2019-04-22 14:04:35 +00:00
Bjorn Pettersson	238c9d6308	[CodeGen] Add "const" to MachineInstr::mayAlias Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744	2019-04-19 09:08:38 +00:00
Piotr Sobczak	72e2960e52	[AMDGPU] Ignore non-SUnits edges Summary: Ignore edges to non-SUnits (e.g. ExitSU) when checking for low latency instructions. When calling the function isLowLatencyInstruction(), an ExitSU could be on the list of successors, not necessarily a regular SU. In other places in the code there is a check "Succ->NodeNum >= DAGSize" to prevent further processing of ExitSU as "Succ->getInstr()" is NULL in such a case. Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())" has the guard, so it is clearly an omission here. Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60864 llvm-svn: 358740	2019-04-19 06:19:14 +00:00
Tim Renouf	7c55c8d8c3	[AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0)) fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640	2019-04-18 05:27:01 +00:00
Dmitry Preobrazhensky	394d0a1637	[AMDGPU][MC] Corrected handling of "-" before expressions See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60622 llvm-svn: 358596	2019-04-17 16:56:34 +00:00
Rhys Perry	c2814e12e7	AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592	2019-04-17 16:31:52 +00:00
Dmitry Preobrazhensky	20d52e3aa2	[AMDGPU][MC] Corrected parsing of registers See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60621 llvm-svn: 358581	2019-04-17 14:44:01 +00:00
Tim Renouf	59e8bd3093	[AMDGPU] Flag new raw/struct atomic ops as source of divergence Differential Revision: https://reviews.llvm.org/D60731 Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d llvm-svn: 358579	2019-04-17 14:04:31 +00:00
Matt Arsenault	101abd219b	AMDGPU: Fix unreachable when counting register usage of SGPR96 llvm-svn: 358447	2019-04-15 20:51:12 +00:00
Matt Arsenault	fbdd2a1887	AMDGPU: Fix printed format of SReg_96 These are artificial, so I think this should only come up with inline asm comments. llvm-svn: 358446	2019-04-15 20:42:18 +00:00
Tim Renouf	842be38162	[AMDGPU] Fixed incorrect test in vcnd/vcmp optimization This fixes a test I introduced in change D59191 (that added src0 and src1 modifiers to the v_cndmask instruction for disassembly purposes). Spotted by David Binderman in bug 41488. Differential Revision: https://reviews.llvm.org/D60652 Change-Id: I6ac95e66cd84e812ed3359ad57bcd0e13198ba0c llvm-svn: 358392	2019-04-15 10:36:24 +00:00
Amara Emerson	946b1246d6	[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 `8033492` -0.6% test-suite :: CTMark/kimwitu++/kc.test `3870380` 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369	2019-04-15 05:04:20 +00:00
Amara Emerson	d189680baa	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368	2019-04-15 04:53:46 +00:00
Nick Desaulniers	5277b3ff25	[AsmPrinter] refactor to remove remove AsmVariant. NFC Summary: The InlineAsm::AsmDialect is only required for X86; no architecture makes use of it and as such it gets passed around between arch-specific and general code while being unused for all architectures but X86. Since the AsmDialect is queried from a MachineInstr, which we also pass around, remove the additional AsmDialect parameter and query for it deep in the X86AsmPrinter only when needed/as late as possible. This refactor should help later planned refactors to AsmPrinter, as this difference in the X86AsmPrinter makes it harder to make AsmPrinter more generic. Reviewers: craig.topper Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60488 llvm-svn: 358101	2019-04-10 16:38:43 +00:00
Tom Stellard	206b9927f8	AMDGPU/GlobalISel: Implement call lowering for shaders returning values Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D57166 llvm-svn: 357964	2019-04-09 02:26:03 +00:00
Nikita Popov	3db93ac5d6	Reapply [ValueTracking] Support min/max selects in computeConstantRange() Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 357870	2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin	5182302a37	[AMDGPU] Sort out and rename multiple CI/VI predicates Differential Revision: https://reviews.llvm.org/D60346 llvm-svn: 357835	2019-04-06 09:20:48 +00:00
Stanislav Mekhanoshin	c8f78f8dd3	[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 llvm-svn: 357805	2019-04-05 20:11:32 +00:00
Stanislav Mekhanoshin	7895c03232	[AMDGPU] predicate and feature refactoring We have done some predicate and feature refactoring lately but did not upstream it. This is to sync. Differential revision: https://reviews.llvm.org/D60292 llvm-svn: 357791	2019-04-05 18:24:34 +00:00
Matt Arsenault	4ed6ccab9b	AMDGPU/GlobalISel: Fix non-power-of-2 select llvm-svn: 357762	2019-04-05 14:03:04 +00:00
Matt Arsenault	396653f8a1	AMDGPU: Split block for si_end_cf Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634	2019-04-03 20:53:20 +00:00

1 2 3 4 5 ...

3403 Commits