Commit Graph

201 Commits

Author SHA1 Message Date
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Sebastian Neubauer b4264210f2 [AMDGPU] Remove SIInstrInfo::calculateLDSSpillAddress
This function does not seem to be used anymore.

Differential Revision: https://reviews.llvm.org/D88904
2020-10-06 18:45:22 +02:00
Jay Foad e20f459229 [AMDGPU] Simplify getNumFlatOffsetBits. NFC.
Remove some checks that have already been done in the only caller.
2020-10-01 15:24:09 +01:00
Jay Foad 98de0d22f5 [AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy 2020-08-21 10:14:35 +01:00
Matt Arsenault e1a2f4713c AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Matt Arsenault d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
Piotr Sobczak 0045786f14 [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Re-commit D81925 with a bugfix D82370.

Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
2020-06-25 10:38:23 +02:00
Matt Arsenault 778351df77 Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5ce.

Reported to break thousands of piglit tests.
2020-06-24 11:21:30 -04:00
alex-t 521ac0b5ce [AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82194
2020-06-24 11:50:40 +03:00
Piotr Sobczak 6d9565d6d5 Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with
expensive checks enabled.

This reverts commit 4067de569f.
2020-06-19 16:41:04 +02:00
Piotr Sobczak 4067de569f [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81925
2020-06-19 16:17:46 +02:00
hsmahesha 29c17ed96e [AMDGPU/MemOpsCluster] Code clean-up around accessing of memory operand width
Summary:
Clean-up the width computing logic given a memory operand, and re-arrange code to avoid
code duplication.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80946
2020-06-03 14:03:52 +05:30
Matt Arsenault 452e0d9023 AMDGPU: Don't run mode switches with exec 0
These are scalar instructions that change vector instructions, so they
should not be executed without any active lanes.

The implementation of -amdgpu-skip-threshold also seem to be backwards
from expected, since decreasing it prevents removal.
2020-06-02 13:47:48 -04:00
hsmahesha 0ed2c04636 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Stanislav Mekhanoshin 9ef166e657 [AMDGPU] Fix FoldImmediate for 16 bit operand
Differential Revision: https://reviews.llvm.org/D79362
2020-05-05 10:19:14 -07:00
Matt Arsenault 30ebafaa56 CodeGen: Convert some TII hooks to use Register 2020-04-03 14:52:54 -04:00
Matt Arsenault 178050c3ba AMDGPU: Use Register in more places 2020-04-03 14:52:54 -04:00
Sander de Smalen 8fbc925807 Add OffsetIsScalable to getMemOperandWithOffset
Summary:
Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo,
has the effect that all places where this information is used
(notably, TargetInstrInfo::getMemOperandWithOffset) will need
to consider Scale - and derived, Offset - possibly being scalable.

This patch adds a new operand `bool &OffsetIsScalable` to
TargetInstrInfo::getMemOperandWithOffset and fixes up all
the places where this function is used, to consider the
offset possibly being scalable.

In most cases, this means bailing out because the algorithm does not
(or cannot) support scalable offsets in places where it does some
form of alias checking for example.

Reviewers: rovka, efriedma, kristof.beyls

Reviewed By: efriedma

Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72758
2020-02-18 15:53:29 +00:00
Stanislav Mekhanoshin cacc3b7a55 [AMDGPU] Cleanup assumptions about generated subregs
We are using countPopulation on a LaneBitmask to determine
a number of registers it covers. This is the assumption which
does not necessarily need to be true. It is not changed but
factored into a single call SIRegisterInfo::getNumCoveredRegs().

Some other places are cleaned up with respect to assumptions
about subreg indexes values and tablegen behavior.

Differential Revision: https://reviews.llvm.org/D74177
2020-02-06 17:39:24 -08:00
Simon Moll 5c8ba508b2 [NFC] unsigned->Register in storeRegTo/loadRegFromStack
Summary:
This patch makes progress on the 'unsigned -> Register' rewrite for
`TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`.

Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka

Reviewed By: arsenm

Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73870
2020-02-03 14:22:16 +01:00
Stanislav Mekhanoshin c2ad7ee1a9 [AMDGPU] override isHighLatencyDef
SIMachineScheduler uses isHighLatencyInstruction with the same
sematincs, but TargetInstrInfo has virtual isHighLatencyDef
method, so override it instead.

Added FLAT to the list of high latency opcodes and a check for
mayLoad since stores are not technically high latency in terms
of data dependency.

This change did not produce any visible impact on our tests.

Differential Revision: https://reviews.llvm.org/D73582
2020-01-29 08:01:29 -08:00
Matt Arsenault d1dbb5e471 AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT 2020-01-22 11:00:49 -05:00
Jay Foad e0f0d0e55c [MachineScheduler] Allow clustering mem ops with complex addresses
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.

The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.

The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
  get sorted next to each other, and
- shouldClusterMemOps knows what they mean.

One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.

Differential Revision: https://reviews.llvm.org/D71655
2020-01-22 14:28:24 +00:00
Amara Emerson 67a8775322 [AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible.
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.

Differential Revision: https://reviews.llvm.org/D72961
2020-01-21 16:51:31 -08:00
Stanislav Mekhanoshin eca4474587 [AMDGPU] Fix getInstrLatency() always returning 1
We do not have InstrItinerary so generic getInstLatency() was always
defaulting to return 1 cycle. We need to use TargetSchedModel instead
to compute an instruction's latency.

Differential Revision: https://reviews.llvm.org/D72655
2020-01-14 01:08:30 -08:00
Stanislav Mekhanoshin 987bf8b6c1 Let targets adjust operand latency of bundles
This reverts the AMDGPU DAG mutation implemented in D72487 and gives
a more general way of adjusting BUNDLE operand latency.

It also replaces FixBundleLatencyMutation with adjustSchedDependency
callback in the AMDGPU, fixing not only successor latencies but
predecessors' as well.

Differential Revision: https://reviews.llvm.org/D72535
2020-01-10 14:56:53 -08:00
Stanislav Mekhanoshin cd69e4c74c [AMDGPU] Fix bundle scheduling
Bundles coming to scheduler considered free, i.e. zero latency.
Fixed.

Differential Revision: https://reviews.llvm.org/D72487
2020-01-09 15:56:36 -08:00
Matt Arsenault 14d25052a2 AMDGPU: Use ImmLeaf for inline immediate predicates 2020-01-06 17:21:51 -05:00
Matt Arsenault e29ae3799b TII: Fix using Register for a subregister index argument 2019-12-27 16:53:29 -05:00
Dmitry Preobrazhensky 6778a62eb0 [AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegen
These opcodes use indirect register addressing so they need special handling by codegen (currently missing).

Reviewers: vpykhtin, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D70400
2019-11-20 17:57:50 +03:00
Matt Arsenault e6c9a9af39 Use MCRegister in copyPhysReg 2019-11-11 14:42:33 +05:30
Matt Arsenault d9e0a2942a AMDGPU: Disallow spill folding with m0 copies
readlane and writelane instructions are not allowed to use m0 as the
data operand, so spilling them is tricky and would require an
intermediate SGPR to spill it. Constrain the virtual register class in
this caes to disallow the inline spiller from folding the m0 operand
directly into the spill instruction.

I copied this hack from AArch64 which has the same problem for $sp.
2019-10-30 14:56:33 -07:00
Matt Arsenault 7cd57dcd5b AMDGPU: Split flat offsets that don't fit in DAG
We handle it this way for some other address spaces.

Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.

llvm-svn: 375366
2019-10-20 17:34:44 +00:00
Reid Kleckner 1d7b41361f Prune two MachineInstr.h includes, fix up deps
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.

Noticed with -ftime-trace.

llvm-svn: 375311
2019-10-19 00:22:07 +00:00
Stanislav Mekhanoshin 1184c27fa5 [AMDGPU] Support mov dpp with 64 bit operands
We define mov/update dpp intrinsics as overloaded but do not
support i64, which is a practically useful type. Fix the
selection and lowering.

Differential Revision: https://reviews.llvm.org/D68673

llvm-svn: 374910
2019-10-15 16:41:15 +00:00
Changpeng Fang f5524f0451 Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjoint
Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D58360

llvm-svn: 373024
2019-09-26 22:53:44 +00:00
Simon Pilgrim 5f2d8b2618 [TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr&
Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future.

Committed on behalf of @hvdijk (Harald van Dijk)

Differential Revision: https://reviews.llvm.org/D66138

llvm-svn: 372882
2019-09-25 14:55:57 +00:00
Stanislav Mekhanoshin 1fb584f7a2 [AMDGPU] Added MI bit IsDOT
NFC, needed for future commit.

Differential Revision: https://reviews.llvm.org/D67669

llvm-svn: 372151
2019-09-17 17:56:13 +00:00
Alexander Timofeev 6524a7a2b9 [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Defferential Revision: https://reviews.llvm.org/D67101

Reviewers: rampitec, vpykhtin
llvm-svn: 372086
2019-09-17 09:08:58 +00:00
Alexander Timofeev 9ff70132bf Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
llvm-svn: 371873
2019-09-13 17:37:30 +00:00
Alexander Timofeev c2d292f839 [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Reviewers: rampitec, vpykhtin

  Differential Revision: https://reviews.llvm.org/D67101

llvm-svn: 371508
2019-09-10 10:58:57 +00:00
Matt Arsenault 216d8ff60b AMDGPU: Don't use frame virtual registers
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.

This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).

llvm-svn: 370281
2019-08-29 01:13:47 +00:00
Matt Arsenault 35c96598b1 AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237
2019-07-16 18:05:29 +00:00
Jay Foad 27ec195f39 [AMDGPU] Fix DPP combiner check for exec modification
Summary:
r363675 changed the exec modification helper function, now called
execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks
all instructions in the basic block, even beyond the last use. That
meant that the DPP combiner no longer worked in any basic block that
ended with a control flow instruction, and in particular it didn't work
on code sequences generated by the atomic optimizer.

Fix it by reinstating the old behaviour but in a new helper function
execMayBeModifiedBeforeAnyUse, and limiting the number of instructions
scanned.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64393

llvm-svn: 365910
2019-07-12 15:59:40 +00:00
Stanislav Mekhanoshin b83e283e65 [AMDGPU] gfx908 scheduling
Differential Revision: https://reviews.llvm.org/D64590

llvm-svn: 365826
2019-07-11 21:25:00 +00:00
Stanislav Mekhanoshin 50d7f46460 [AMDGPU] gfx908 mAI instructions, MC part
Differential Revision: https://reviews.llvm.org/D64446

llvm-svn: 365563
2019-07-09 21:43:09 +00:00
Matt Arsenault 60957cb74c AMDGPU: Fold frame index into MUBUF
This matters for byval uses outside of the entry block, which appear
as copies.

Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.

This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.

llvm-svn: 364185
2019-06-24 14:53:56 +00:00