Commit Graph

96 Commits

Author SHA1 Message Date
Piotr Sobczak d869921004 [AMDGPU] Add patterns for i8/i16 local atomic load/store
Add patterns for i8/i16 local atomic load/store.

Added tests for new patterns.

Copied atomic_[store/load]_local.ll to GlobalISel directory.

Differential Revision: https://reviews.llvm.org/D111869
2021-10-18 11:23:10 +02:00
Joe Nash c96839265a [AMDGPU] Enable ds_min/ds_max on more subtargets
Adds patterns for f64 ds_min/ds_max. Shrinks HasLDSFPAtomics
scope to enable f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108994

Change-Id: Id890b677841ee588b20d42b1bb3f4cdbf6e9ba1a
2021-08-31 13:22:31 -04:00
Jay Foad 24ffc343f9 [AMDGPU] Set IsAtomicRet and IsAtomicNoRet on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:23:29 +01:00
Jay Foad 323b3e645d [AMDGPU] Set mayLoad and mayStore on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:10:23 +01:00
Jay Foad 6f778fed8e [AMDGPU] Set more flags on Real instructions
This does not affect codegen, which only tests these flags on Pseudo
instructions, but might help llvm-mca which has to work with Real
instructions. In particular setting LGKM_CNT on DS instructions helps
with the problem identified in D104149.

Differential Revision: https://reviews.llvm.org/D104293
2021-06-16 09:58:50 +01:00
hsmahesha 840c4e4e90 [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D100773
2021-04-20 16:17:15 +05:30
hsmahesha 099dcb68a6 [AMDGPU] Refactor ds_read/ds_write related select code for better readability.
Part of the code related to ds_read/ds_write ISel is refactored, and the
corresponding comment is re-written for better readability, which would help
while implementing any future ds_read/ds_write ISel related modifications.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100300
2021-04-16 08:24:00 +05:30
Dmitry Preobrazhensky 67b39661c8 [AMDGPU][MC][NFC] Removed extra spaces
Fixed bugs 49646, 49647.

Differential Revision: https://reviews.llvm.org/D100173
2021-04-12 13:33:19 +03:00
hsmahesha ac64995ceb [AMDGPU] Only use ds_read/write_b128 for alignment >= 16
PS: Submitting on behalf of Jay.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100008
2021-04-08 08:12:05 +05:30
Jay Foad fc7e3e7dd9 [AMDGPU] Set SchedRW on real instructions
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.

Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.

Differential Revision: https://reviews.llvm.org/D99187
2021-03-23 15:38:11 +00:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin 5cf9292ce3 [AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn
We are using AtomicNoRet map in multiple places to determine
if an instruction atomic, rtn or nortn atomic. This method
does not work always since we have some instructions which
only has rtn or nortn version.

One such instruction is ds_wrxchg_rtn_b32 which does not have
nortn version. This has caused changes in memory legalizer
tests.

Differential Revision: https://reviews.llvm.org/D96639
2021-02-15 11:27:59 -08:00
Mirko Brkusanin 0c7cce54eb [AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.

While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.

To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.

Same goes for ds_write2_b64 and ds_write_b128.

Differential Revision: https://reviews.llvm.org/D92767
2020-12-10 12:40:49 +01:00
Jay Foad 9cee87d72a [AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds
By setting up the AsmStrings correctly we can remove some special cases
from AMDGPUInstPrinter::printOffset.

Differential Revision: https://reviews.llvm.org/D90307
2020-10-29 17:31:59 +00:00
Jay Foad 7a79921edd [AMDGPU] Remove gds operand from ds_gws_* MachineInstrs
The operand value was always 1 (except in some bad MIR tests) so it was
redundant.

Differential Revision: https://reviews.llvm.org/D90378
2020-10-29 15:04:23 +00:00
Jay Foad 9e634bc22f [AMDGPU] Omit needless string concatenations. NFC. 2020-10-28 12:56:52 +00:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Matt Arsenault 53f21e0fb7 TableGen/GlobalISel: Hack the operand order for atomic_store
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.

Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.

Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
2020-08-11 10:22:44 -04:00
Stanislav Mekhanoshin 9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Matt Arsenault 38fb446fc7 AMDGPU/GlobalISel: Fix test failure in release build
The annoying behavior where the output is different due to the
legality check struck again, plus the subtarget predicate wasn't
really correctly set for DS FP atomics.

Some of the FP min/max instructions seem to be in the gfx6/gfx7
manuals, but IIRC this might have been one of the cases where the
manual got ahead of the actual hardware support, but I've left these
as-is for now since the assembler tests seem to expect them.
2020-06-06 11:01:18 -04:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Stanislav Mekhanoshin 03954a12ae [AMDGPU] Fix DS_WRITE_B32 patterns
It uses VGPR_32.RegTypes which includes 16 bit types. As a
result DS_WRITE_B32 may be generated for "store i16" which
is a bug. The only reason we do not hit it now is relative
patterns complexity and sorting. Should DS_WRITE_B16 pattern
complexity become higher and the bug appears.

Differential Revision: https://reviews.llvm.org/D74868
2020-02-19 13:42:16 -08:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Matt Arsenault b4a647449f TableGen/GlobalISel: Add way for SDNodeXForm to work on timm
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.

Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.

Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.

Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.

Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.

One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
2020-01-09 17:37:52 -05:00
Matt Arsenault 3766f4bacc AMDGPU: Use new PatFrag system for d16 stores 2020-01-09 10:29:32 -05:00
Matt Arsenault db7c920779 AMDGPU: Add register class to DS_SWIZZLE_B32 pattern
Reduces diff for a future patch.
2020-01-09 10:29:31 -05:00
Stanislav Mekhanoshin 521fc5e620 [AMDGPU] Add missing flags to DS_Real
Differential Revision: https://reviews.llvm.org/D69867
2019-11-05 14:24:48 -08:00
Stanislav Mekhanoshin 4312c4afd4 [AMDGPU] deduplicate tablegen predicates
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.

We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.

This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886

Differential Revision: https://reviews.llvm.org/D69815
2019-11-04 12:19:17 -08:00
Dmitry Preobrazhensky c4995076c6 [AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32
See https://bugs.llvm.org/show_bug.cgi?id=37941

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68787

llvm-svn: 374561
2019-10-11 14:53:26 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Matt Arsenault ebbd6e4976 AMDGPU: Remove code address space predicates
Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due
to not be reported as legal.

llvm-svn: 371413
2019-09-09 16:02:07 +00:00
Matt Arsenault ee093ba5c9 AMDGPU/GlobalISel: Avoid repeating 32-bit type lists
llvm-svn: 371156
2019-09-06 00:36:10 +00:00
Matt Arsenault 4d90625271 AMDGPU/GlobalISel: Fix load/store of types in other address spaces
There should probably be a size only matcher.

llvm-svn: 371155
2019-09-06 00:36:06 +00:00
Matt Arsenault ae87b9f2c2 AMDGPU/GlobalISel: Select local atomic cmpxchg
llvm-svn: 367511
2019-08-01 03:41:41 +00:00
Matt Arsenault da5b9bfa95 AMDGPU/GlobalISel: Allow selection of DS atomicrmw
llvm-svn: 367507
2019-08-01 03:29:01 +00:00
Matt Arsenault e6ce48422c AMDGPU: Start redefining atomic PatFrags
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.

llvm-svn: 367506
2019-08-01 03:25:52 +00:00
Matt Arsenault 3baf4d3418 AMDGPU/GlobalISel: Select simple local stores
llvm-svn: 367504
2019-08-01 03:09:15 +00:00
Matt Arsenault 827427f65b AMDGPU: Don't use SDNodeXForm for DS offset output
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.

This allows global isel to import the simple cases once the complex
pattern is implemented.

llvm-svn: 366743
2019-07-22 21:38:11 +00:00
Matt Arsenault 85f3890126 AMDGPU: Force s_waitcnt after GWS instructions
This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.

llvm-svn: 366607
2019-07-19 19:47:30 +00:00
Stanislav Mekhanoshin 7938424eb9 [AMDGPU] Copy missing predicate from pseudo to real
NFC at the momemnt, needed for future commit.

Differential Revision: https://reviews.llvm.org/D64761

llvm-svn: 366092
2019-07-15 17:49:25 +00:00
Dmitry Preobrazhensky 8d879c8d95 [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructions
See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64716

llvm-svn: 366067
2019-07-15 14:37:57 +00:00
Matt Arsenault 9e7cbc0e7d AMDGPU: Split extload/zextload local load patterns
This will help removing the custom load predicates, allowing the
global isel emitter to handle them.

llvm-svn: 365398
2019-07-08 22:08:23 +00:00
Nicolai Haehnle 4dc3b2bf95 AMDGPU: Support GDS atomics
Summary:
Original patch by Marek Olšák

Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63452

llvm-svn: 364814
2019-07-01 17:17:45 +00:00
Matt Arsenault 740322f1eb AMDGPU: Add intrinsics for DS GWS semaphore instructions
llvm-svn: 363983
2019-06-20 21:11:42 +00:00
Matt Arsenault 8ad1decf45 AMDGPU: Insert mem_viol check loop around GWS pre-GFX9
It is necessary to emit this loop around GWS operations in case the
wave is preempted pre-GFX9.

llvm-svn: 363979
2019-06-20 20:54:32 +00:00
Matt Arsenault 4d55d024be Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.

llvm-svn: 363870
2019-06-19 19:55:27 +00:00
Simon Pilgrim 128ce93c60 Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.
........
Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/

llvm-svn: 363797
2019-06-19 13:00:54 +00:00