Commit Graph

4278 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 555d8f4ef5 [AMDGPU] Bundle loads before post-RA scheduler
We are relying on atrificial DAG edges inserted by the
MemOpClusterMutation to keep loads and stores together in the
post-RA scheduler. This does not work all the time since it
allows to schedule a completely independent instruction in the
middle of the cluster.

Removed the DAG mutation and added pass to bundle already
clustered instructions. These bundles are unpacked before the
memory legalizer because it does not work with bundles but also
because it allows to insert waitcounts in the middle of a store
cluster.

Removing artificial edges also allows a more relaxed scheduling.

Differential Revision: https://reviews.llvm.org/D72737
2020-01-24 11:33:38 -08:00
Stanislav Mekhanoshin 44b865fa7f [AMDGPU] Allow narrowing muti-dword loads
Currently BE allows only a little load narrowing because
of the fear it will produce sub-dword ext loads. However,
we can always allow narrowing if we are shrinking one
multi-dword load to another multi-dword load.

In particular we were unable to reduce s_load_dwordx8 into
s_load_dwordx4 if identity shuffle was used to extract
low 4 dwords.

Differential Revision: https://reviews.llvm.org/D73133
2020-01-24 11:03:41 -08:00
Austin Kerbow c226646337 Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI
Summary:
Enable the new diveregence analysis by default for AMDGPU.

Resubmit with test updates since GPUDA was causing failures on Windows.

Reviewers: rampitec, nhaehnle, arsenm, thakis

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73315
2020-01-24 10:39:40 -08:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Changpeng Fang 2531535984 AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare
Summary:
      RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not
    meet the requirement. However, in DAG lowering, fpmath information gets lost,
    and thus we may generate either inaccurate rcp related computation or slow code
    for fdiv.

    In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could
    exactly know !fpmath.

     FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on
                         unsafe-fp-math, fast math flags, denormals and fpmath
                         accuracy request.

     RCP Optimizations:
       1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with
                                                      denormals flushed.
       a/b -> a*rcp(b) when fast unsafe rcp is legal.

     Use fdiv.fast:
       a/b -> fdiv.fast(a, b) when RCP optimization is not performed and
                              fpmath >= 2.5ULP with denormals flushed.

       1/x -> fdiv.fast(1,x)  when RCP optimization is not performed and
                              fpmath >= 2.5ULP with denormals.

    Reviewers:
      arsenm

    Differential Revision:
      https://reviews.llvm.org/D71293
2020-01-23 16:57:43 -08:00
Matt Arsenault 86e5b56a7c AMDGPU/GlobalISel: Fix RegBanKSelect for llvm.amdgcn.exp.compr
This wasn't updated for the immarg handling change. We really need a
verifier for this.
2020-01-23 13:30:46 -08:00
Matt Arsenault fac9941e57 AMDGPU: Fix ubsan error
Since register classes go up to 1024, 32 elements, all masks bits are
needed and a 32-bit shift by 32 is illegal. We didn't have any
instructions theoretically using a 32 element VGPR before
d1dbb5e471
2020-01-23 15:05:47 -05:00
Matt Arsenault 618fa77ae4 AMDGPU/GlobalISel: Select V_ADD3_U32/V_XOR3_B32
The other 3-op patterns should also be theoretically handled, but
currently there's a bug in the inferred pattern complexity.

I'm not sure what the error handling strategy should be for potential
constant bus violations. I think the correct strategy is to never
produce mixed SGPR and VGPR operands in a typical VOP instruction,
which will trivially avoid them. However, it's possible to still have
hand written MIR (or erroneously transformed code) with these
operands. When these fold, the restriction will be violated. We
currently don't have any verifiers for reg bank legality. For now,
just ignore the restriction.

It might be worth triggering a DAG fallback on verifier error.
2020-01-23 12:04:20 -05:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Matt Arsenault dfec702290 AMDGPU: Check for other uses when looking through casted select
Fixes mesa regression on ext_transform_feedback-max-varyings
2020-01-23 11:31:24 -05:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Matt Arsenault 4d14772f5c AMDGPU/GlobalISel: Remove redundant or patterns
These ended up with higher priority than or3 patterns in a future
patch. This also fixes the using VOP2 forms.
2020-01-22 21:45:51 -05:00
Jan Vesely 1b8eab179d AMDGPU/R600: Emit rodata in text segment
R600 relies on this behaviour.
Fixes: 6e18266aa4 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"')
Fixes ~100 piglit regressions since 6e18266

Differential Revision: https://reviews.llvm.org/D72991
2020-01-22 14:31:51 -05:00
Nico Weber cd470717d1 Revert "[DA][TTI][AMDGPU] Add option to select GPUDA with TTI"
This reverts commit a90a6502ab.
Broke tests on Windows: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13808
2020-01-22 12:56:19 -05:00
Matt Arsenault 1192d7b254 AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16
The pattern is also mishandled by the generated matcher, so workaround
this as in the DAG path.

The existing DAG tests aren't particularly targeted to just this one
intrinsic. These also end up differing in scheduling from SGPR->VGPR
operand constraint copies.
2020-01-22 12:10:59 -05:00
Matt Arsenault c05f23e409 AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp
This is deprecated, but easy to support.
2020-01-22 11:43:53 -05:00
Matt Arsenault dd09ec1208 AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp8 2020-01-22 11:43:40 -05:00
Matt Arsenault 0bf434ccd5 AMDGPU: Fix element size assertion
The GlobalISel usage called this with bits, but the DAG usage was
incorrectly using bytes.
2020-01-22 11:18:45 -05:00
Matt Arsenault bb562d1af0 AMDGPU/GlobalISel: Keep G_BITCAST out of waterfall loop
The waterfall utility function blindly inserts a phi for every def in
the loop. We don't need this one to be preserved for every
iteration. Saves an extra phi and copy inside the loop body.
2020-01-22 11:16:19 -05:00
Matt Arsenault 52ec7379ad AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT
Move the subregister base like in the extract case.
2020-01-22 11:09:15 -05:00
Matt Arsenault d1dbb5e471 AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT 2020-01-22 11:00:49 -05:00
Matt Arsenault 3524d4412c AMDGPU/GlobalISel: Fix RegBankSelect for G_INSERT_VECTOR_ELT
The result and source vector are going to be tied, so these need to be
the same bank.

The inserted value also needs to be broken down based on the result
bank, not the inserted value itself.
2020-01-22 10:57:50 -05:00
Matt Arsenault e3d352c541 AMDGPU/GlobalISel: Fold constant offset vector extract indexes
Handle dynamic vector extracts that use an index that's an add of a
constant offset into moving the base subregister of the indexing
operation.

Force the add into the loop in regbankselect, which will be recognized
when selected.
2020-01-22 10:50:59 -05:00
Matt Arsenault e93e1b621c AMDGPU: Fix typo 2020-01-22 10:17:46 -05:00
Matt Arsenault 2fe500ab5b AMDGPU: Look through casted selects to constant fold bin ops
The promotion of the uniform select to i32 interfered with this fold.
2020-01-22 10:16:39 -05:00
Matt Arsenault bcd91778fe AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare
DAGCombiner does this, but divisions expanded here miss this
optimization. Since 67aa18f165,
divisions have been expanded here and missed out on this
optimization. Avoids test regressions in a future patch.
2020-01-22 10:16:39 -05:00
Matt Arsenault a174f0da62 AMDGPU/GlobalISel: Add pre-legalize combiner pass
Just copy the AArch64 pass as-is for now, except for removing the
memcpy handling.
2020-01-22 10:16:39 -05:00
Jay Foad e0f0d0e55c [MachineScheduler] Allow clustering mem ops with complex addresses
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.

The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.

The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
  get sorted next to each other, and
- shouldClusterMemOps knows what they mean.

One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.

Differential Revision: https://reviews.llvm.org/D71655
2020-01-22 14:28:24 +00:00
Matt Arsenault 70096ca111 AMDGPU/GlobalISel: Fix RegbankSelect for llvm.amdgcn.fmul.legacy 2020-01-22 09:26:17 -05:00
Matt Arsenault a722cbf77c AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec
The intermediate instruction drops the extra volatile argument. We are
missing an atomic ordering on these.
2020-01-22 09:26:17 -05:00
Matt Arsenault 9c928649a0 AMDGPU: Fix interaction of tfe and d16
This using the wrong result register, and dropping the result entirely
for v2f16. This would fail to select on the scalar case. I believe it
was also mishandling packed/unpacked subtargets.
2020-01-22 09:26:17 -05:00
Matt Arsenault b94d3b9b77 AMDGPU/GlobalISel: RegBankSelect interp intrinsics
Note this assumes the future use of immediates for immarg, not the
current G_CONSTANT which will be emitted.
2020-01-22 09:01:34 -05:00
Matt Arsenault 64e9528201 AMDGPU: Fix missing immarg on llvm.amdgcn.interp.mov
The first operand maps to an immediate field, so this should be
immarg.
2020-01-22 09:01:34 -05:00
Austin Kerbow a90a6502ab [DA][TTI][AMDGPU] Add option to select GPUDA with TTI
Summary: Enable the new diveregence analysis by default for AMDGPU.

Reviewers: rampitec, nhaehnle, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73049
2020-01-21 21:13:20 -08:00
Carl Ritson 6b4b3e2856 [AMDGPU] SIRemoveShortExecBranches should not remove branches exiting loops
Summary:
Check that a s_cbranch_execz is not a loop exit before removing it.
As the pass is generating infinite loops.

Reviewers: cdevadas, arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits, dstuttard, foad

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72997
2020-01-22 13:18:40 +09:00
cdevadas e53a9d96e6 Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092
2020-01-22 13:18:32 +09:00
Amara Emerson 67a8775322 [AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible.
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.

Differential Revision: https://reviews.llvm.org/D72961
2020-01-21 16:51:31 -08:00
Matt Arsenault e47965bf64 AMDGPU/GlobalISel: Merge trivial legalize rules
Also move constant-like rules together
2020-01-21 17:37:19 -05:00
Matt Arsenault 9a5a6e9465 AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules 2020-01-21 16:57:01 -05:00
Matt Arsenault fd109308a7 AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers
Pointers of unrecognized address spaces shoudl be treated as
global-like pointers. Even if loads and stores of them aren't handled,
dumb operations that just operate on the bits should work.
2020-01-21 16:35:36 -05:00
Krzysztof Parzyszek 020041d99b Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.

This change is NFC.
2020-01-21 10:15:38 -06:00
Nicolai Hähnle a80291ce10 Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bf.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.
2020-01-21 09:17:25 +01:00
Fangrui Song 5721483b64 [AMDGPU] Fix -Wunused-variable after e5823bf806 2020-01-20 22:41:13 -08:00
Matt Arsenault c72aa27f91 AMDDGPU/GlobalISel: Fix RegBankSelect for llvm.amdgcn.ps.live 2020-01-20 23:21:53 -05:00
Matt Arsenault e5823bf806 AMDGPU: Don't create weird sized integers
There's no reason to introduce a new, unnaturally sized value
here. This has a chance to produce worse code with
legalization. Avoids regression in a future patch.
2020-01-20 20:02:54 -05:00
Matt Arsenault 9b13b4a0e3 AMDGPU: Prepare to use scalar register indexing
Define pseudos mirroring the the VGPR indexing ones, and adjust the
operands in the s_movrel* instructions to avoid the result def.
2020-01-20 17:19:16 -05:00
Matt Arsenault 8615eeb455 AMDGPU: Partially merge indirect register write handling
a785209bc2 switched to using a pseudos instead of manually tying
operands on the regular instruction. The VGPR indexing mode path
should have the same problems that change attempted to avoid, so these
should use the same strategy.

Use a single pseudo for the VGPR indexing mode and movreld paths, and
expand it based on the subtarget later. These have essentially the
same constraints, reading the index from m0.

Switch from using an offset to the subregister index directly, instead
of computing an offset and re-adding it back. Also add missing pseudos
for existing register class sizes.
2020-01-20 17:19:16 -05:00
Matt Arsenault f6418d72f5 AMDGPU/GlobalISel: Add documentation for RegisterBankInfo
Document some high level strategies that should be used for register
bank selection. The constant bus restriction section hasn't actually
been implemented yet.
2020-01-20 15:41:25 -05:00
Fangrui Song 8e8a75ad50 [TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
2020-01-19 14:20:37 -08:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00