Commit Graph

4042 Commits

Author SHA1 Message Date
Michael Liao 4531aee2ac [amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69735
2019-11-01 17:06:17 -04:00
Matt Arsenault 19e7f8a21d AMDGPU: Add default denormal mode to MachineFunctionInfo
The default FP mode should really be a property of a specific
function, and not a subtarget. Introduce the necessary fields to the
SIMachineFunctionInfo to help move towards this goal.
2019-11-01 00:03:39 -07:00
Matt Arsenault 6221767055 DAG: Add DAG argument to isFPExtFoldable
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
2019-10-31 22:32:45 -07:00
Matt Arsenault 1725f28841 DAG: Add new control for ISD::FMAD formation
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
2019-10-31 07:51:38 -07:00
Matt Arsenault bc56166281 AMDGPU: Simplify getAddressSpace calls
These can be directly taken from the GlobalValue instead of going
through the type.
2019-10-31 07:51:38 -07:00
Matt Arsenault d9e0a2942a AMDGPU: Disallow spill folding with m0 copies
readlane and writelane instructions are not allowed to use m0 as the
data operand, so spilling them is tricky and would require an
intermediate SGPR to spill it. Constrain the virtual register class in
this caes to disallow the inline spiller from folding the m0 operand
directly into the spill instruction.

I copied this hack from AArch64 which has the same problem for $sp.
2019-10-30 14:56:33 -07:00
Matt Arsenault edca9ac0de AMDGPU: Don't fold S_NOPs with implicit operands 2019-10-30 14:40:56 -07:00
Jay Foad e5972f2a04 [AMDGPU] Simplify VCCZ bug handling
Summary:
VCCZBugHandledSet was used to make sure we don't apply the same
workaround more than once to a single cbranch instruction, but it's not
necessary because the workaround involves inserting an s_waitcnt
instruction, which is enough for subsequent iterations to detect that no
further workaround is necessary.

Also beef up the test case to check that the workaround was only applied
once. I have also manually verified that the test still passes even if I
hack the big do-while loop in runOnMachineFunction to run a minimum of
five iterations.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69621
2019-10-30 17:09:07 +00:00
Jay Foad b592253ec6 [AMDGPU] Consolidate one more getGeneration check
This one should have been done in r363902 when hasReadVCCZBug was
introduced.
2019-10-30 11:16:42 +00:00
Austin Kerbow 2b88b344f2 AMDGPU/GlobalISel: Legalize FDIV32
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69581
2019-10-29 17:18:06 -07:00
Matt Arsenault 21bc8e5a13 AMDGPU: Make VReg_1 only include 1 artificial register
When TableGen is inferring register classes from contexts, it uses a
sorting function based on the number of registers in the class. Since
this was being treated as an alias of VGPR_32, they had exactly the
same size. The sort used wasn't a stable sort, and even if it were, I
believe the tie breaker would effectively end up being the
alphabetical ordering of the class name. There appear to be issues
trying to use an empty set of registers, so add only one so this will
always sort to the end.

Also add a comment explaining how VReg_1 is a dirty hack for
SelectionDAG.

This does end up changing the behavior of i1 with inline asm and VGPR
constraints, but the existing behavior was was already nonsensical and
inconsistent. It should probably be disallowed anyway.

Fixes bug 43699
2019-10-28 20:51:51 -07:00
Austin Kerbow d11b93ec6a AMDGPU: Avoid overwriting saved PC
Summary:
An outstanding load with same destination sgpr as call could cause PC to be
updated with junk value on return.

Reviewers: arsenm, rampitec

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69474
2019-10-28 10:02:22 -07:00
Dmitry Preobrazhensky b8042dbe2b [AMDGPU][MC][GFX10] Added v_interp_[p1/p2/mov]_f32_e64
See https://bugs.llvm.org/show_bug.cgi?id=43747

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69348
2019-10-28 15:03:43 +03:00
cdevadas e921ede540 [AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies.
There is a minor flaw in the implementation of function lowerPhis.
This function replaces values of regclass Vreg_1 (boolean values)
involved in PHIs into an SGPR. Currently it iterates over the MBBs
and performs an inplace lowering of PHIs and fails to lower any
incoming value that itself is another PHI of Vreg_1 regclass.
The failure occurs only when the MBB where the incoming PHI value
belongs is not visited/lowered yet.

To fix this problem, collect all Vreg_1 PHIs upfront and then
perform the lowering.

Differential Revision: https://reviews.llvm.org/D69182
2019-10-26 14:37:45 +05:30
Stanislav Mekhanoshin 4c0251da14 [AMDGPU] Enable SGPR copy folding
That used to fail in the last testcase function because after
%0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it
was further folded into S_STORE_DWORD_IMM. Its legal effective
subreg class is SReg_32 while instruction expects more restricted
SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand()
passed the legality check and it was caught in the verifier.

Borrowed code from the verifier to check for RC legality.

Differential Revision: https://reviews.llvm.org/D69445
2019-10-25 15:08:30 -07:00
Stanislav Mekhanoshin c7dcacf16a [AMDGPU] Fixed asan failure in SIFoldOperands
Both tryFoldOMod() and tryFoldClamp() remove original instruction,
so the check MI.modifiesRegister() may use a deleted MI.

Differential Revision: https://reviews.llvm.org/D69448
2019-10-25 13:59:56 -07:00
Matt Arsenault 171cf5302f AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
2019-10-25 13:11:09 -07:00
Changpeng Fang 1ce552f3ef AMDGPU: Fix the broken dominator tree when creating waterfall loop for resource descriptor
Summary:
  In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D69358
2019-10-25 13:08:04 -07:00
Stanislav Mekhanoshin d4303b3861 [AMDGPU] Fold AGPR reg_sequence initializers
Differential Revision: https://reviews.llvm.org/D69413
2019-10-25 11:39:02 -07:00
vpykhtin c9c18e5a31 [AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand (when Src2 is required)
Differential revision: https://reviews.llvm.org/D69430
2019-10-25 21:30:37 +03:00
Austin Kerbow c35b358b74 AMDGPU/GlobalISel: Legalize FDIV16
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69347
2019-10-25 11:07:17 -07:00
Stanislav Mekhanoshin 3c8e055187 [AMDGPU] Fix mfma scheduling crash
An SUnit can be neither intruction not SDNode. It is all
null if represents a nop. Fixed a crash on using SU->getInstr().

Differential Revision: https://reviews.llvm.org/D69395
2019-10-24 11:01:52 -07:00
dfukalov 6d0fc4373e [NFC] Remove redundant lines
Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69375
2019-10-24 19:54:28 +03:00
Michael Liao b2a65f0d70 [AMDGPU] Skip additional folding on the same operand.
Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69355
2019-10-24 11:30:22 -04:00
Stanislav Mekhanoshin 61e7a61bdc [AMDGPU] Allow folding of sgpr to vgpr copy
Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xexec permutations.

Differential Revision: https://reviews.llvm.org/D69280
2019-10-23 18:42:48 -07:00
Mirko Brkusanin 4b63ca1379 [Mips] Use appropriate private label prefix based on Mips ABI
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.

Tags: #llvm, #clang, #lldb

Differential Revision: https://reviews.llvm.org/D66795
2019-10-23 12:24:35 +02:00
Stanislav Mekhanoshin 48f57138be [AMDGPU] Allow tied operand subreg folding
Turns out it makes sense, contrarily to what comment said.

Differential Revision: https://reviews.llvm.org/D69287
2019-10-22 11:27:36 -07:00
Austin Kerbow 97263fa2dd AMDGPU/GlobalISel: Legalize fast unsafe FDIV
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69231

llvm-svn: 375460
2019-10-21 22:18:26 +00:00
Matt Arsenault ef9a0278f0 AMDGPU: Select basic interp directly from intrinsics
llvm-svn: 375457
2019-10-21 21:49:44 +00:00
Matt Arsenault 38038f116f AMDGPU: Use CopyToReg for interp intrinsic lowering
This doesn't use the default value, so doesn't benefit from the hack
to help optimize it.

llvm-svn: 375450
2019-10-21 19:53:49 +00:00
Matt Arsenault 8ebbf25cb1 AMDGPU: Erase redundant redefs of m0 in SIFoldOperands
Only handle simple inter-block redefs of m0 to the same value. This
avoids interference from redefs of m0 in SILoadStoreOptimzer. I was
initially teaching that pass to ignore redefs of m0, but having them
not exist beforehand is much simpler.

This is in preparation for deleting the current special m0 handling in
SIFixSGPRCopies to allow the register coalescer to handle the
difficult cases.

llvm-svn: 375449
2019-10-21 19:53:46 +00:00
Matt Arsenault dd6cf159ba AMDGPU: Stop adding m0 implicit def to SGPR spills
r375293 removed the SGPR spilling with scalar stores path, so this is
no longer necessary. This also always had the defect of adding the def
even when this path wasn't in use.

llvm-svn: 375448
2019-10-21 19:42:29 +00:00
Matt Arsenault b5234b64af AMDGPU: Slightly restructure m0 init code
This will allow using another operation to produce the glue in a
future change.

llvm-svn: 375447
2019-10-21 19:42:26 +00:00
Stanislav Mekhanoshin 33092194f2 [AMDGPU] Select AGPR in PHI operand legalization
If a PHI defines AGPR legalize its operands to AGPR.
At the moment we can get an AGPR PHI with VGPR operands.
I am not aware of any problems as it seems to be handled
gracefully in RA, but this is not right anyway.

It also slightly decreases VGPR pressure in some cases
because we do not have to a copy via VGPR.

Differential Revision: https://reviews.llvm.org/D69206

llvm-svn: 375446
2019-10-21 19:25:27 +00:00
Guillaume Chatelet 3cc4835c00 Use Align for TFL::TransientStackAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69216

llvm-svn: 375398
2019-10-21 08:31:25 +00:00
Zinovy Nis 5fa36e42c4 Fix buildbot error in SIRegisterInfo.cpp.
llvm-svn: 375373
2019-10-20 20:01:16 +00:00
Matt Arsenault e5be543a55 AMDGPU: Increase vcc liveness scan threshold
Avoids a test regression in a future patch. Also add debug printing on
this case, so I waste less time debugging folds in the future.

llvm-svn: 375367
2019-10-20 17:44:17 +00:00
Matt Arsenault 7cd57dcd5b AMDGPU: Split flat offsets that don't fit in DAG
We handle it this way for some other address spaces.

Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.

llvm-svn: 375366
2019-10-20 17:34:44 +00:00
Matt Arsenault 1aad3835f8 AMDGPU: Fix missing OPERAND_IMMEDIATE
llvm-svn: 375365
2019-10-20 16:56:10 +00:00
Matt Arsenault fc205f1d11 AMDGPU: Don't re-get the subtarget
It's already available in the class.

llvm-svn: 375363
2019-10-20 16:26:26 +00:00
Matt Arsenault 8a8b317460 AMDGPU: Don't error on calls to null or undef
Calls to constants should probably be generally handled.

llvm-svn: 375356
2019-10-20 07:46:04 +00:00
Reid Kleckner 904cd3e06b Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each
Now X86ISelLowering doesn't depend on many IR analyses.

llvm-svn: 375320
2019-10-19 01:31:09 +00:00
Reid Kleckner 1d7b41361f Prune two MachineInstr.h includes, fix up deps
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.

Noticed with -ftime-trace.

llvm-svn: 375311
2019-10-19 00:22:07 +00:00
Stanislav Mekhanoshin 0fab220eb5 [AMDGPU] move PHI nodes to AGPR class
If all uses of a PHI are in AGPR register class we should
avoid unneeded copies via VGPRs.

Differential Revision: https://reviews.llvm.org/D69200

llvm-svn: 375297
2019-10-18 22:48:45 +00:00
Jay Foad a9aa4ec6a3 [AMDGPU] Remove -amdgpu-spill-sgpr-to-smem.
Summary: The implementation was never completed and never used except in tests.

Reviewers: arsenm, mareko

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69163

llvm-svn: 375293
2019-10-18 21:48:22 +00:00
Quentin Colombet 9f9151d494 [GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual method
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.

NFC

Differential Revision: https://reviews.llvm.org/D69187

llvm-svn: 375277
2019-10-18 20:13:42 +00:00
Matt Arsenault f9a42ed0a7 AMDGPU: Relax 32-bit SGPR register class
Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This
will allow the register coalescer to do a better job eliminating
copies to m0.

For GlobalISel, as a terrible hack, use SGPR_32 for things that should
use SCC until booleans are solved.

llvm-svn: 375267
2019-10-18 18:26:37 +00:00
Austin Kerbow 2f41a023af AMDGPU: Fix SMEM WAR hazard for gfx10 readlane
Summary: Hazard recognizer fails to see hazard with V_READLANE_B32_gfx10.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69172

llvm-svn: 375265
2019-10-18 18:20:30 +00:00
Dmitry Preobrazhensky 6c7d7eebda [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32
See https://bugs.llvm.org/show_bug.cgi?id=43608

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69096

llvm-svn: 375241
2019-10-18 14:49:53 +00:00
Dmitry Preobrazhensky 7d325fe57b [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa
See https://bugs.llvm.org/show_bug.cgi?id=43607

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69095

llvm-svn: 375231
2019-10-18 13:31:53 +00:00