llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	6cc8b2fc65	[AMDGPU] Extend promote alloca vectorization Promote alloca can vectorize a small array by bitcasting it to a vector type. Extend vectorization for the case when alloca is already a vector type. We still want to replace GEPs with an insert/extract element instructions in this case. Differential Revision: https://reviews.llvm.org/D54219 llvm-svn: 346376	2018-11-08 00:16:23 +00:00
Nicolai Haehnle	bc233f5523	Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics" This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364	2018-11-07 21:53:43 +00:00
Nicolai Haehnle	61396ff67c	AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI) Summary: Remove redundant logic and simplify control flow. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54086 llvm-svn: 346363	2018-11-07 21:53:36 +00:00
Nicolai Haehnle	0ab31c9c44	AMDGPU/InsertWaitcnts: Remove kill-related logic Summary: This is not needed, because we don't actually insert relevant branches for KILLs that late in the compilation flow. Besides, this was always checking for the wrong kill opcode anyway... Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54085 llvm-svn: 346362	2018-11-07 21:53:29 +00:00
Konstantin Zhuravlyov	15e90e331c	AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclasses llvm-svn: 346361	2018-11-07 21:42:13 +00:00
Konstantin Zhuravlyov	7f1959ebb3	AMDGPU/NFC: Split MUBUF_Pseudo_Atomics into RTN/NO_RTN multiclasses llvm-svn: 346357	2018-11-07 21:21:32 +00:00
Matt Arsenault	8ba740a5a8	Allow subclassing ExternalAA This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353	2018-11-07 20:26:42 +00:00
Sanjay Patel	de58e93666	fix typos aggressively; NFC llvm-svn: 346316	2018-11-07 14:35:36 +00:00
Yaxun Liu	73bf0af32f	AMDGPU: Add an option -disable-promote-alloca-to-lds Add this option for debugging and providing workaround. By default it is off so no behavior change in backend. Differential Revision: https://reviews.llvm.org/D54158 llvm-svn: 346267	2018-11-06 21:28:17 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Konstantin Zhuravlyov	108927b944	AMDGPU: Add sram-ecc feature Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177	2018-11-05 22:44:19 +00:00
Neil Henning	233a02d0ed	[AMDGPU] Fix the new atomic optimizer in pixel shaders. The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128	2018-11-05 12:04:48 +00:00
Sylvestre Ledru	df92dabaef	Fixed inclusion of M_PI fow MinGW-w64 Patch by KOLANICH llvm-svn: 346000	2018-11-02 17:25:40 +00:00
Neil Henning	7d1b77df57	[AMDGPU] UBSan bug fix for r345710 UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. llvm-svn: 345962	2018-11-02 10:24:57 +00:00
Matt Arsenault	8e0269ba0b	AMDGPU: Fix assertion with bitcast from i64 constant to v4i16 llvm-svn: 345922	2018-11-02 02:43:55 +00:00
Farhana Aleen	5853762e5a	[AMDGPU] Handle the idot8 pattern generated by FE. Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge increase in the compile time. Support the pattern that clang FE generates after reordering the additions in integer-dot8 source language pattern. Author: FarhanaAleen Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D53937 llvm-svn: 345902	2018-11-01 22:48:19 +00:00
Reid Kleckner	4dc0b1ac60	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882	2018-11-01 19:54:45 +00:00
Stanislav Mekhanoshin	222e9c11f7	Check shouldReduceLoadWidth from SimplifySetCC SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778	2018-10-31 21:24:30 +00:00
Scott Linder	c6c627253d	[AMDGPU] Remove FeatureVGPRSpilling This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763	2018-10-31 18:54:06 +00:00
Nicolai Haehnle	814abb59df	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719	2018-10-31 13:27:08 +00:00
Nicolai Haehnle	28212cc689	AMDGPU: Remove PHI loop condition optimization Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718	2018-10-31 13:26:48 +00:00
Neil Henning	63718b214a	[AMDGPU] support image load/store a16 Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710	2018-10-31 10:34:48 +00:00
Konstantin Zhuravlyov	2d22d24ac4	Revert r345542: AMDGPU: Enable code object v3 by default It breaks mesa. llvm-svn: 345662	2018-10-30 22:02:40 +00:00
Simon Pilgrim	858303b827	[SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578	2018-10-30 10:32:11 +00:00
Matt Arsenault	abc4f29f9c	AMDGPU: Remove custom BUILD_VECTOR combine This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560	2018-10-30 01:37:59 +00:00
Matt Arsenault	b0b741efb8	AMDGPU: Use scavengeRegisterBackwards llvm-svn: 345559	2018-10-30 01:33:14 +00:00
Konstantin Zhuravlyov	5cb950200c	AMDGPU: Enable code object v3 by default Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542	2018-10-29 21:07:27 +00:00
Stanislav Mekhanoshin	6b1c6548bd	[AMDGPU] Fixed return value causing warning and regression llvm-svn: 345518	2018-10-29 17:53:23 +00:00
Stanislav Mekhanoshin	79080ecd82	[AMDGPU] Match v_swap_b32 Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514	2018-10-29 17:26:01 +00:00
Scott Linder	11ef7984b0	[AMDGPU] Add a pass to promote bitcast calls AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382	2018-10-26 13:18:36 +00:00
Tim Renouf	2a1b1d94b6	[AMDGPU] Defined gfx909 Raven Ridge 2 Differential Revision: https://reviews.llvm.org/D53418 Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef llvm-svn: 345120	2018-10-24 08:14:07 +00:00
Matt Arsenault	687ec75d10	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914	2018-10-22 16:27:27 +00:00
Changpeng Fang	f95f763ea5	AMDGPU: Add support pattern for SUB of one bit Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815	2018-10-19 21:09:21 +00:00
Nicolai Haehnle	4821937d2e	AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698	2018-10-17 15:37:48 +00:00
Nicolai Haehnle	c4a2ff0950	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696	2018-10-17 15:37:30 +00:00
Nicolai Haehnle	e9b134aa31	AMDGPU: Remove dead TableGen code Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0 Reviewers: tpr Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53318 Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb llvm-svn: 344691	2018-10-17 12:14:26 +00:00
Konstantin Zhuravlyov	94dfcc2eb2	AMDGPU: Generate .amdgcn_target for object code v3 Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552	2018-10-15 20:37:47 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Tom Stellard	a894043910	Revert "AMDGPU/GlobalISel: Implement select for G_INSERT" This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317	2018-10-11 23:36:46 +00:00
Tom Stellard	4733be6e7b	AMDGPU/GlobalISel: Implement select for G_INSERT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 344310	2018-10-11 22:49:54 +00:00
Scott Linder	823549a6ec	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Recommits r341413 with changes to update the MachineDominatorTree when present. Differential Revision: https://reviews.llvm.org/D51742 llvm-svn: 343992	2018-10-08 18:47:01 +00:00
Tom Stellard	14d8807d9a	AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions Summary: The 32-bit variants do not exist on VI+. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52958 llvm-svn: 343985	2018-10-08 17:49:29 +00:00
Neil Henning	6641657453	[AMDGPU] Add an AMDGPU specific atomic optimizer. This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973	2018-10-08 15:49:19 +00:00
Neil Henning	57f5d0a885	[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've: - Added an ArrayRef<Type > member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test CreateIntrinsic calls that weren't being tested (including the FMF flag that wasn't checked). This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969). Differential Revision: https://reviews.llvm.org/D52087 llvm-svn: 343962	2018-10-08 10:32:33 +00:00
Tom Stellard	251ee083a3	AMDGPU: Consolidate SMRD TableGen patterns Summary: Merge the SMRD patterns for CI into the same multiclass as the patterns for other sub-targets. This removes some duplicate code and will make it easier for some future GlobalISel changes I would like to do. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52557 llvm-svn: 343909	2018-10-06 03:32:43 +00:00
Jonas Paulsson	faad1b3056	[TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints() Finally all targets are enabling multiple regalloc hints, so the hook to disable this can now be removed. NFC. Review: Simon Pilgrim https://reviews.llvm.org/D52316 llvm-svn: 343851	2018-10-05 14:23:11 +00:00
Tom Stellard	7c65078f04	AMDGPU/GlobalISel: Add support for G_INTTOPTR Summary: This is a no-op. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52916 llvm-svn: 343839	2018-10-05 04:34:09 +00:00
Konstantin Zhuravlyov	aa067cb9fb	AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa The isAmdCodeObjectV2 is a misleading name which actually checks whether the os is amdhsa or mesa. Also add a test to make sure we do not generate old kernel header for code object v3. Differential Revision: https://reviews.llvm.org/D52897 llvm-svn: 343813	2018-10-04 21:02:16 +00:00
Farhana Aleen	4bc597bff5	[AMDGPU] Match signed dot4/8 pattern. Summary: This patch matches signed dot4 and dot8 pattern. Author: FarhanaAleen Reviewed By: msearles Differential Revision: https://reviews.llvm.org/D52520 llvm-svn: 343798	2018-10-04 16:57:37 +00:00
Tim Renouf	a37679d67b	[AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics Summary: The new buffer/tbuffer intrinsics handle an out-of-range immediate offset by moving/adding offset&-4096 to a vgpr, leaving an in-range immediate offset, with a chance of the move/add being CSEd for similar loads/stores. However it turns out that a negative offset in a vgpr is illegal, even if adding the immediate offset makes it legal again. Therefore, this commit disables the offset&-4096 thing if the offset is negative. Differential Revision: https://reviews.llvm.org/D52683 Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13 llvm-svn: 343672	2018-10-03 10:29:43 +00:00
Fangrui Song	3d76d36059	[AMDGPU] Rename pass "isel" to "amdgpu-isel" Summary: The AMDGPU target specific pass "isel" is a misleading name. Reviewers: tstellar, echristo, javed.absar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52759 llvm-svn: 343659	2018-10-03 03:38:22 +00:00
Matt Arsenault	635d479322	AMDGPU: Always run AMDGPUAlwaysInline Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. llvm-svn: 343657	2018-10-03 02:47:25 +00:00
Stanislav Mekhanoshin	1821513e2f	[AMDGPU] Assert in getOpSize() there are no sub-dword subregs Differential Revision: https://reviews.llvm.org/D52769 llvm-svn: 343648	2018-10-03 00:00:41 +00:00
Matt Arsenault	ab41193312	AMDGPU: Expand atomicrmw nand in IR llvm-svn: 343559	2018-10-02 03:50:56 +00:00
Stanislav Mekhanoshin	ae8bd6d9b5	[AMDGPU] Fixed SIInstrInfo::getOpSize to handle subregs Currently it returns incorrect operand size for a target independet node such as COPY if operand is a register with subreg. Instead of correct subreg size it returns a size of the whole superreg. Differential Revision: https://reviews.llvm.org/D52736 llvm-svn: 343508	2018-10-01 18:00:02 +00:00
Alexander Timofeev	b048fa3344	[AMDGPU] Divergence driven instruction selection. Shift operations. Summary: This change enables VOP3 shifts to be explicitly selected dependent on the divergence. Differential Revision: https://reviews.llvm.org/D52559 Reviewers: rampitec llvm-svn: 343455	2018-10-01 11:06:35 +00:00
Vitaly Buka	0509070811	[cxx2a] Fix warning triggered by r343285 llvm-svn: 343369	2018-09-29 02:17:12 +00:00
Konstantin Zhuravlyov	5f1b8181ad	AMDGPU: Split HasExt into HasExtDPP/SDWA/SDWA9 llvm-svn: 343264	2018-09-27 20:49:00 +00:00
Konstantin Zhuravlyov	9da26b20da	AMDGPU: Split VOP2Inst into VOP2Inst_e32/e64/sdwa llvm-svn: 343259	2018-09-27 19:46:41 +00:00
Konstantin Zhuravlyov	7d424aae13	AMDGPU/NFC: Simplify VOP_MAC_F16/F32 llvm-svn: 343254	2018-09-27 19:24:05 +00:00
Stanislav Mekhanoshin	b080adfc0c	[AMDGPU] Fold copy (copy vgpr) This allows to reduce a number of used VGPRs in some cases. Differential Revision: https://reviews.llvm.org/D52577 llvm-svn: 343249	2018-09-27 18:55:20 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Tom Stellard	344475fce5	AMDGPU/SI: Change predicate to isCIOnly for 32-bit imm s_buffer_load* patterns Summary: This is essentially NFC, because the complex pattern used for these patterns will fail on non-CI, but this makes the pattern consistent with other CI smrd patterns. It is also a performance improvement, because the pattern will now fail earlier on non-CI. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52469 llvm-svn: 343125	2018-09-26 16:53:36 +00:00
Stanislav Mekhanoshin	8dfcd83371	[AMDGPU] Fix ds combine with subregs Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047	2018-09-25 23:33:18 +00:00
Changpeng Fang	6f4922ccc9	AMDGPU: Add Selection patterns to support add of one bit. Summary: We generate s_xor to lower add of i1s in general cases, and s_not to lower add with a one-bit imm of -1 (true). Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D52518 llvm-svn: 343030	2018-09-25 21:21:18 +00:00
Sameer Sahasrabuddhe	b4f2d1cb68	[AMDGPU] restore r342722 which was reverted with r342743 [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956	2018-09-25 09:39:21 +00:00
Matt Arsenault	f432011d33	AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879	2018-09-24 13:18:15 +00:00
Sameer Sahasrabuddhe	0807e94951	revert changes from r342722 "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 llvm-svn: 342743	2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe	2de7653fd5	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 llvm-svn: 342722	2018-09-21 11:26:55 +00:00
Alexander Timofeev	36617f0160	[AMDGPU] Divergence driven instruction selection. Part 1. Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719	2018-09-21 10:31:22 +00:00
Carl Ritson	6b8d75425e	[AMDGPU] Add instruction selection for i1 to f16 conversion Summary: This is required for GPUs with 16 bit instructions where f16 is a legal register type and hence int_to_fp i1 to f16 is not lowered by legalizing. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52018 Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851 llvm-svn: 342558	2018-09-19 16:32:12 +00:00
Matthias Braun	726e12cf0c	ScheduleDAG: Cleanup dumping code; NFC - Instead of having both `SUnit::dump(ScheduleDAG)` and `ScheduleDAG::dumpNode(ScheduleDAG)`, just keep the latter around. - Add `ScheduleDAG::dump()` and avoid code duplication in several places. Implement it for different ScheduleDAG variants. - Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()` functions. They were only ever used for debug dumping and putting the function into ScheduleDAG is consistent with the `dumpNode()` change. llvm-svn: 342520	2018-09-19 00:23:35 +00:00
Farhana Aleen	f5a2848376	[AMDGPU] Match udot8 pattern Summary: D.u32 = S0.u4[0] * S1.u4[0] + S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D51947 llvm-svn: 342497	2018-09-18 16:59:48 +00:00
Matt Arsenault	ebf46143ea	AMDGPU: Don't form fmed3 if it will require materialization If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443	2018-09-18 02:34:54 +00:00
Matt Arsenault	9d49c449ec	AMDGPU: Expand vector canonicalizes llvm-svn: 342439	2018-09-18 01:51:33 +00:00
Stanislav Mekhanoshin	06d3b4139e	[AMDGPU] Initialize instruction itinerary from GCNSubtarget I need to use it in the GCN codegen. Differential Revision: https://reviews.llvm.org/D52123 llvm-svn: 342400	2018-09-17 16:04:32 +00:00
Konstantin Zhuravlyov	e721b11c12	AMDGPU: Clear the bits before they are being set in program resource registers Change by Tony Tye llvm-svn: 342270	2018-09-14 20:00:36 +00:00
David Stuttard	20de3e99b5	[AMDGPU] Ensure trig range reduction only used for subtargets that require it Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222	2018-09-14 10:27:19 +00:00
Tim Renouf	c8af6a46fa	[AMDGPU] Removed unused method Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189	2018-09-13 21:56:25 +00:00
Matt Arsenault	ff987ac6ea	AMDGPU: Fix not preserving alignent in call setups If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133	2018-09-13 12:14:31 +00:00
Alexander Timofeev	4d302f6911	[AMDGPU] Load divergence predicate refactoring Differential revision: https://reviews.llvm.org/D51931 Reviewers: rampitec llvm-svn: 342120	2018-09-13 09:06:56 +00:00
Alexander Timofeev	2fb44808b1	[AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115	2018-09-13 06:34:56 +00:00
Konstantin Zhuravlyov	6e551e0e49	AMDGPU: Print all kernel descriptor directives (including the ones with default values) Change by Tony Tye Differential Revision: https://reviews.llvm.org/D51954 llvm-svn: 342077	2018-09-12 20:25:39 +00:00
Konstantin Zhuravlyov	71e43ee47d	AMDGPU: Re-apply r341982 after fixing the layering issue Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069	2018-09-12 18:50:47 +00:00
Ilya Biryukov	95066496d0	Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023	2018-09-12 07:05:30 +00:00
Konstantin Zhuravlyov	941615e4c8	AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982	2018-09-11 18:56:51 +00:00
Alexander Timofeev	db7ee7660a	[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928	2018-09-11 11:56:50 +00:00
Matt Arsenault	796b0e7a90	AMDGPU: Remove leftovers from configurable address spaces llvm-svn: 341895	2018-09-11 04:00:49 +00:00
Alexander Timofeev	20cbe6f319	[AMDGPU] Preliminary patch for divergence driven instruction selection. Inline immediate move to V_MADAK_F32. Differential revision: https://reviews.llvm.org/D51586 Reviewer: rampitec llvm-svn: 341843	2018-09-10 16:42:49 +00:00
Matt Arsenault	d1f4571a66	AMDGPU: Remove function pointer type hack Now the pointer size should always be correct and we don't need to improperly inspect the pointee type. llvm-svn: 341806	2018-09-10 12:16:11 +00:00
Matt Arsenault	7f6dc597d3	AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit This will require something to cast. Before this would eliminate the cast, which would result in copies of $noreg. llvm-svn: 341803	2018-09-10 11:59:27 +00:00
Matt Arsenault	57b5966dad	DAG: Handle odd vector sizes in calling conv splitting This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801	2018-09-10 11:49:23 +00:00
Carl Ritson	f898edd117	[AMDGPU] Prevent sequences of non-instructions disrupting GCNHazardRecognizer wait state counting Summary: This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted. Reviewers: nhaehnle, arsenm Reviewed By: arsenm Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51726 Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b llvm-svn: 341798	2018-09-10 10:14:48 +00:00
Matt Arsenault	d77fcc2a92	AMDGPU: Use GOT PSV since it has an address space now llvm-svn: 341768	2018-09-10 02:23:39 +00:00
Matt Arsenault	b998674610	AMDGPU: Don't abort on unknown addrspace argument llvm-svn: 341767	2018-09-10 02:23:30 +00:00
Alexander Timofeev	a805c96c65	[AMDGPU] Preliminary patch for divergence driven instruction selection. Fold immediate SMRD offset. Differential revision: https://reviews.llvm.org/D51610 Reviewer: rampitec llvm-svn: 341636	2018-09-07 09:05:34 +00:00
Scott Linder	834cbc645c	Revert r341413 Causes a regression in expensive checks. llvm-svn: 341589	2018-09-06 21:38:56 +00:00
Matt Arsenault	df84dc6979	AMDGPU: Remove old hack for function addresses llvm-svn: 341567	2018-09-06 17:23:24 +00:00
Scott Linder	dfe089dfd1	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Differential Revision: https://reviews.llvm.org/D50982 llvm-svn: 341413	2018-09-04 21:50:47 +00:00
Matt Arsenault	813613c494	AMDGPU: Fix DAG divergence not reporting flat loads Match behavior in DAG of r340343 llvm-svn: 341393	2018-09-04 18:58:19 +00:00
Simon Pilgrim	2e35c1e399	Remove unnecessary semicolon to silence -Wpedantic warning. NFCI. llvm-svn: 341303	2018-09-03 10:17:25 +00:00
Tom Stellard	ffc6bd6f3d	AMDGPU/GlobalISel: Define instruction mapping for G_SELECT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49737 llvm-svn: 341271	2018-09-01 02:41:19 +00:00
Stanislav Mekhanoshin	44451b3344	[AMDGPU] Split v32i32 loads Differential Revision: https://reviews.llvm.org/D51555 llvm-svn: 341266	2018-08-31 22:43:36 +00:00
Matt Arsenault	bf07a50a98	AMDGPU: Restrict extract_vector_elt combine to loads The intention is to enable the extract_vector_elt load combine, and doing this for other operations interferes with more useful optimizations on vectors. Handle any type of load since in principle we should do the same combine for the various load intrinsics. llvm-svn: 341219	2018-08-31 15:39:52 +00:00
Matt Arsenault	988df63525	AMDGPU: Stop forcing internalize at -O0 This doesn't really matter if clang is always emitting the visibility as hidden by default. llvm-svn: 341168	2018-08-31 06:02:36 +00:00
Matt Arsenault	0da6350dc8	AMDGPU: Remove remnants of old address space mapping llvm-svn: 341165	2018-08-31 05:49:54 +00:00
Nicolai Haehnle	35617ed4cb	[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071	2018-08-30 14:21:36 +00:00
Alexander Timofeev	201f892b3b	[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068	2018-08-30 13:55:04 +00:00
Marek Olsak	3fc2079cf4	AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 llvm-svn: 340959	2018-08-29 20:03:00 +00:00
Farhana Aleen	9250c92d0e	[AMDGPU] Match udot4 pattern. Summary: D.u32 = S0.u8[0] * S1.u8[0] + S0.u8[1] * S1.u8[1] + S0.u8[2] * S1.u8[2] + S0.u8[3] * S1.u8[3] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50921 llvm-svn: 340936	2018-08-29 16:31:18 +00:00
Nicolai Haehnle	283b995097	AMDGPU: Fix getInstSizeInBytes Summary: Add some optional code to validate getInstSizeInBytes for emitted instructions. This flushed out some issues which are fixed by this patch: - Streamline getInstSizeInBytes - Properly define the VI readlane/writelane instruction as VOP3 - Fix the inline constant determination. Specifically, this change fixes an issue where a 32-bit value of 0xffffffff was recorded as unsigned. This is equal to -1 when restricting to a 32-bit comparison, and an inline constant can be used. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50629 Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2 llvm-svn: 340903	2018-08-29 07:46:09 +00:00
Fangrui Song	9cca227d3e	[AMDGPU] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off llvm-svn: 340868	2018-08-28 19:19:03 +00:00
Matt Arsenault	755f41f3a2	AMDGPU: Don't delete instructions if S_ENDPGM has implicit uses This can leave behind the uses with the defs removed. Since this should only really happen in tests, it's not worth the effort of trying to handle this. llvm-svn: 340866	2018-08-28 18:55:55 +00:00
Matt Arsenault	44a8a756e2	AMDGPU: Force shrinking of add/sub even if the carry is used The original motivating example uses a 64-bit add, so the carry is used. Insert a copy from VCC. This may allow shrinking of the used carry instruction. At worst, we are replacing a mov to materialize the constant with a copy of vcc. llvm-svn: 340862	2018-08-28 18:44:16 +00:00
Matt Arsenault	de6c421cc8	AMDGPU: Shrink insts to fold immediates This needs to be done in the SSA fold operands pass to be effective, so there is a bit of overlap with SIShrinkInstructions but I don't think this is practically avoidable. llvm-svn: 340859	2018-08-28 18:34:24 +00:00
Matt Arsenault	35b1902bce	AMDGPU: Move canShrink into TII llvm-svn: 340855	2018-08-28 18:22:34 +00:00
Ryan Taylor	1f334d0062	[AMDGPU] Add support for a16 modifiear for gfx9 Summary: Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9. Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50575 llvm-svn: 340831	2018-08-28 15:07:30 +00:00
Tim Renouf	904343f879	[AMDGPU] Add support for multi-dword s.buffer.load intrinsic Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684	2018-08-25 14:53:17 +00:00
Samuel Pitoiset	7bd9dcffcd	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 llvm-svn: 340417	2018-08-22 16:08:48 +00:00
Samuel Pitoiset	d81d6f7d58	AMDGPU: fix existing alias rules for constant and global Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters llvm-svn: 340416	2018-08-22 16:08:43 +00:00
Matt Arsenault	bb8e64e7f5	AMDGPU: Fix not respecting byval alignment in call frame setup This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396	2018-08-22 11:09:45 +00:00
Alina Sbirlea	ab6f84f763	Update MemorySSA in BasicBlockUtils. Summary: Extend BasicBlocksUtils to update MemorySSA. Subscribers: sanjoy, arsenm, nhaehnle, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D45300 llvm-svn: 340365	2018-08-21 23:32:03 +00:00
Scott Linder	72855e36c5	[AMDGPU] Consider loads from flat addrspace to be potentially divergent In general we can't assume flat loads are uniform, and cases where we can prove they are should be handled through infer-address-spaces. Differential Revision: https://reviews.llvm.org/D50991 llvm-svn: 340343	2018-08-21 21:24:31 +00:00
Farhana Aleen	3528c80378	[AMDGPU] Support idot2 pattern. Summary: Transform add (mul ((i32)S0.x, (i32)S1.x), add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3) Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50024 llvm-svn: 340295	2018-08-21 16:21:15 +00:00
Tim Renouf	bb5ee41ab4	[AMDGPU] Allow int types for MUBUF vdata Summary: Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics only allowed float types for the data to be loaded or stored, which sometimes meant the frontend needed to generate a bitcast. In this, the new intrinsics copied the old buffer intrinsics. This commit extends the new intrinsics to allow int types as well. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50315 Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85 llvm-svn: 340270	2018-08-21 11:08:12 +00:00
Tim Renouf	4f703f5e11	[AMDGPU] New buffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269	2018-08-21 11:07:10 +00:00
Tim Renouf	35484c9d50	[AMDGPU] New tbuffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.tbuffer.load llvm.amdgcn.struct.tbuffer.load llvm.amdgcn.raw.tbuffer.store llvm.amdgcn.struct.tbuffer.store with the following changes from the llvm.amdgcn.tbuffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined format arg (dfmt+nfmt) * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::TBUFFER_* SD nodes always have an index operand, all three offset operands, combined format operand, combined cachepolicy operand, and an extra idxen operand. The tbuffer pseudo- and real instructions now also have a combined format operand. The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store intrinsics continue to work. V2: Separate raw and struct intrinsics. V3: Moved extract_glc and extract_slc defs to a more sensible place. V4: Rebased on D49995. V5: Only two separate offset args instead of three. V6: Pseudo- and real instructions have joint format operand. V7: Restored optionality of dfmt and nfmt in assembler. V8: Addressed minor review comments. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49026 Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4 llvm-svn: 340268	2018-08-21 11:06:05 +00:00
Vitaly Buka	30b5ed3eb7	Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space" As it introduces out of bound access. This reverts commit r340172 and r340171 llvm-svn: 340202	2018-08-20 19:31:03 +00:00
Marcello Maggioni	5ca4128b45	[PSV] Update API to be able to use TargetCustom without UB. getTargetCustom() requires values for "Kind" in the constructor that are not in the PSVKind enum. Passing a value that is not inside an enum as an argument to a constructor of the type of the enum is UB. Changing to the underlying type of the enum would solve the UB Differential Revision: https://reviews.llvm.org/D50909 llvm-svn: 340200	2018-08-20 19:23:45 +00:00
Samuel Pitoiset	216a2da577	AMDGPU: fix compilation errors since r340171 Some buildbot slaves reports compilation errors, but it compiled fine on my side, sorry for the breakage. llvm-svn: 340172	2018-08-20 13:31:41 +00:00
Samuel Pitoiset	c95ef77d37	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 340171	2018-08-20 13:18:59 +00:00
Chandler Carruth	c73c0307fe	[MI] Change the array of `MachineMemOperand` pointers to be a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940	2018-08-16 21:30:05 +00:00
Matt Arsenault	7121bed210	AMDGPU: Custom lower fexp This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902	2018-08-16 17:07:52 +00:00
Matt Arsenault	f533e6b0ed	AMDGPU: Fold fneg into fmed3 llvm-svn: 339821	2018-08-15 21:46:27 +00:00
Matt Arsenault	a816073764	AMDGPU: Improve extract_vector_elt reduction combine Handle fmul, fsub and preserve flags. Also really test minnum/maxnum reductions. The existing tests were only checking from minnum/maxnum matched from a fast math compare and select which is not the same. llvm-svn: 339820	2018-08-15 21:34:06 +00:00
Matt Arsenault	b3a80e5397	AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16 Also support these on targets without support for these, since it will allow us to freely create these in instcombine. llvm-svn: 339819	2018-08-15 21:25:20 +00:00
Matt Arsenault	6c7ba82900	AMDGPU: Address todo for handling 1/(2 pi) llvm-svn: 339814	2018-08-15 21:03:55 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Matt Arsenault	13b0db9285	AMDGPU: Check NSZ MI flag when folding omod I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513	2018-08-12 08:44:25 +00:00
Matt Arsenault	b5acec1f79	AMDGPU: Use splat vectors for undefs when folding canonicalize If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512	2018-08-12 08:42:54 +00:00
Matt Arsenault	3ead7d7389	AMDGPU: Fix packing undef parts of build_vector llvm-svn: 339511	2018-08-12 08:42:46 +00:00
Tom Stellard	8adc86a7dc	AMDGPU/GlobalISel: Define instruction mapping for G_INSERT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491	2018-08-11 00:51:54 +00:00
Matt Arsenault	940e6075e4	AMDGPU: More canonicalized operations llvm-svn: 339464	2018-08-10 19:20:17 +00:00
Matt Arsenault	3dcf4ce435	AMDGPU: Combine and of seto/setuo and fp_class Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462	2018-08-10 18:58:56 +00:00
Matt Arsenault	8ad00d30fa	AMDGPU: Match isfinite pattern to class instructions llvm-svn: 339460	2018-08-10 18:58:41 +00:00
Matt Arsenault	5bb9d798b4	AMDGPU: Add LLVM_FALLTHROUGH llvm-svn: 339458	2018-08-10 17:57:12 +00:00
Matt Arsenault	935f3b70fe	AMDGPU: Error more gracefully on libcalls I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271	2018-08-08 16:58:39 +00:00
Matt Arsenault	e719139b10	AMDGPU: Fix shifts for i128 llvm-svn: 339270	2018-08-08 16:58:33 +00:00
Jan Vesely	7b2c98ab59	AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190	2018-08-07 21:54:37 +00:00
Matt Arsenault	96b678427a	AMDGPU: Add feature vi-insts This is necessary to add a VI specific builtin, __builtin_amdgcn_s_dcache_wb. We already have an overly specific feature for one of these builtins, for s_memrealtime. I'm not sure whether it's better to add more of those, or to get rid of that and merge it with vi-insts. Alternatively, maybe this logically goes with scalar-stores? llvm-svn: 339104	2018-08-07 07:28:46 +00:00
Matt Arsenault	08f3fe4fae	AMDGPU: cvt_pk_rtz_f16 canonicalizes llvm-svn: 339078	2018-08-06 23:01:31 +00:00
Matt Arsenault	e94ee833f9	AMDGPU: Handle some vector operations in isCanonicalized llvm-svn: 339077	2018-08-06 22:45:51 +00:00
Matt Arsenault	a29e76244a	AMDGPU: Push fcanonicalize through partially constant build_vector This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072	2018-08-06 22:30:44 +00:00
Matt Arsenault	f2a167fb1d	AMDGPU: Refactor fcanonicalize combine This will make more complex combines easier. llvm-svn: 339070	2018-08-06 22:10:26 +00:00
Matt Arsenault	d49ab0b214	AMDGPU: Treat more custom operations as canonicalizing Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065	2018-08-06 21:58:11 +00:00
Matt Arsenault	ce6d61fba8	AMDGPU: Conversions always produce canonical results Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064	2018-08-06 21:51:52 +00:00
Matt Arsenault	f8768bfc84	AMDGPU: Fix implementation of isCanonicalized If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061	2018-08-06 21:38:27 +00:00
Matt Arsenault	0d1b3934e2	AMDGPU: Fold v_lshl_or_b32 with 0 src0 Appears from expansion of some packed cases. llvm-svn: 339025	2018-08-06 15:40:20 +00:00
David Bolvansky	c0aa4b75a4	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338969	2018-08-05 14:53:08 +00:00
Matt Arsenault	c3dc8e65e2	DAG: Enhance isKnownNeverNaN Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910	2018-08-03 18:27:52 +00:00
Tim Renouf	366a49d986	[AMDGPU] Minor change to d16 buffer load implementation Summary: By not reconstructing the operand list of the SDNode, this change makes it easier to add the forthcoming new tbuffer and buffer intrinsics. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49995 Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38 llvm-svn: 338784	2018-08-02 23:33:01 +00:00
Tim Renouf	abd85fb1f5	[AMDGPU] Reworked SIFixWWMLiveness Summary: I encountered some problems with SIFixWWMLiveness when WWM is in a loop: 1. It sometimes gave invalid MIR where there is some control flow path to the new implicit use of a register on EXIT_WWM that does not pass through any def. 2. There were lots of false positives of registers that needed to have an implicit use added to EXIT_WWM. 3. Adding an implicit use to EXIT_WWM (and adding an implicit def just before the WWM code, which I tried in order to fix (1)) caused lots of the values to be spilled and reloaded unnecessarily. This commit is a rework of SIFixWWMLiveness, with the following changes: 1. Instead of considering any register with a def that can reach the WWM code and a def that can be reached from the WWM code, it now considers three specific cases that need to be handled. 2. A register that needs liveness over WWM to be synthesized now has it done by adding itself as an implicit use to defs other than the dominant one. Also added the following fixmes: FIXME: We should detect whether a register in one of the above categories is already live at the WWM code before deciding to add the implicit uses to synthesize its liveness. FIXME: I believe this whole scheme may be flawed due to the possibility of the register allocator doing live interval splitting. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46756 Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94 llvm-svn: 338783	2018-08-02 23:31:32 +00:00
Tim Renouf	f1c7b92a6a	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779	2018-08-02 22:53:57 +00:00
Matt Arsenault	36cdcfadcf	AMDGPU: Fix scalarizing v4f16 fcanonicalize llvm-svn: 338714	2018-08-02 13:43:42 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
Matt Arsenault	a7dfd48310	AMDGPU: Use SPseudoInst helper llvm-svn: 338631	2018-08-01 20:49:00 +00:00
Matt Arsenault	709374d186	AMDGPU: Improve hack for packing conversion ops Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619	2018-08-01 20:13:58 +00:00
Matt Arsenault	55ab9213d3	AMDGPU: Partially fix handling of packed amdgpu_ps arguments Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values. llvm-svn: 338618	2018-08-01 19:57:34 +00:00
Jan Vesely	93b252799b	AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) llvm-svn: 338610	2018-08-01 18:36:07 +00:00
Jan Vesely	5ba1b4bdab	AMDGPU: Allow fp32-denormals feature for r600 targets This was accidentally removed in r335942. Differential Revision: https://reviews.llvm.org/D49934 llvm-svn: 338569	2018-08-01 15:04:36 +00:00
Ryan Taylor	894c8fd0e2	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523	2018-08-01 12:12:01 +00:00
David Bolvansky	fbbb83c782	Revert "Enrich inline messages", tests fail llvm-svn: 338496	2018-08-01 08:02:40 +00:00
David Bolvansky	7f36cd9d96	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338494	2018-08-01 07:37:16 +00:00
Konstantin Zhuravlyov	bb30ef7af4	AMDGPU: Add clamp bit to dot intrinsics Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470	2018-08-01 01:31:30 +00:00
Matt Arsenault	feedabfde7	AMDGPU: Break 64-bit arguments into 32-bit pieces llvm-svn: 338421	2018-07-31 19:29:04 +00:00
Matt Arsenault	0395da7842	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418	2018-07-31 19:17:47 +00:00
Matt Arsenault	9ced1e0d80	AMDGPU: Scalarize vector argument types to calls When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416	2018-07-31 19:05:14 +00:00
David Bolvansky	ab79414f7b	Revert Enrich inline messages llvm-svn: 338389	2018-07-31 14:47:22 +00:00
David Bolvansky	b562dbabda	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338387	2018-07-31 14:25:24 +00:00
Matt Arsenault	638a202760	AMDGPU: Don't handle FP16_TO_FP in isCanonicalized This needs more special handling to do correctly. Fixes test in subsequent commit. llvm-svn: 338381	2018-07-31 14:15:16 +00:00
Matt Arsenault	4aec86d37a	AMDGPU: Fold undef fcanonicalize to qNaN We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376	2018-07-31 13:34:31 +00:00
Matt Arsenault	de496c32a4	AMDGPU: Reduce code size with fcanonicalize (fneg x) When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244	2018-07-30 12:16:58 +00:00
Matt Arsenault	f3c9a34def	AMDGPU: Make fneg combine handle fcanonicalize llvm-svn: 338243	2018-07-30 12:16:47 +00:00
Nicolai Haehnle	7f0d05d532	AMDGPU: Force skip over s_sendmsg and exp instructions Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235	2018-07-30 09:23:59 +00:00
Matt Arsenault	8f9dde94b7	AMDGPU: Stop wasting argument registers with v3i32/v3f32 SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197	2018-07-28 14:11:34 +00:00
Matt Arsenault	81920b0a25	DAG: Add calling convention argument to calling convention funcs This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194	2018-07-28 13:25:19 +00:00
Matt Arsenault	72b0e38b26	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193	2018-07-28 12:34:25 +00:00
Jan Vesely	6ff58ed5ca	AMDGPU/R600: Add MOV instructions to BFE patterns R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127	2018-07-27 15:00:13 +00:00
Matt Arsenault	0183c56c11	AMDGPU: Fix code size for return_to_epilog pseudo llvm-svn: 338113	2018-07-27 09:15:03 +00:00
Tom Stellard	e9bdc5f1d8	AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102	2018-07-27 06:04:40 +00:00
Scott Linder	eb1f75d561	[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060	2018-07-26 19:47:51 +00:00
Stanislav Mekhanoshin	7e7268ac1c	[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938	2018-07-25 17:02:11 +00:00
Tom Stellard	b7f19e6d1e	AMDGPU/GlobalISel: Legalize G_INSERT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798	2018-07-24 02:19:20 +00:00
Tom Stellard	2d37929c10	AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACT Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794	2018-07-24 01:43:49 +00:00
Matt Arsenault	5ae765e68c	AMDGPU: Use existing function to check for VGPRs llvm-svn: 337621	2018-07-20 21:20:36 +00:00
Matt Arsenault	4bec7d4261	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. llvm-svn: 337535	2018-07-20 09:05:08 +00:00
Farhana Aleen	c370d7b33d	[AMDGPU] [AMDGPU] Support a fdot2 pattern. Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198	2018-07-16 18:19:59 +00:00
Mark Searles	f4e7025299	[AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" build error" Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"" ( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via 2b2ee080f0164485562593b1b87291a48cea4a9a . llvm-svn: 337156	2018-07-16 10:21:36 +00:00
Mark Searles	72da47df25	run post-RA hazard recognizer pass late Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154	2018-07-16 10:02:41 +00:00
Mark Searles	c2d5d9adb5	Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error" This reverts commit fe0a456510131f268e388c4a18a92f575c0db183. llvm-svn: 337153	2018-07-16 10:02:40 +00:00

... 2 3 4 5 6 ...

3072 Commits