Commit Graph

82 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin f738aee0bb [AMDGPU] Add default 1 glc operand to rtn atomics
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.

Improves asm parser diagnostics.

Differential Revision: https://reviews.llvm.org/D90730
2020-11-05 10:41:59 -08:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Stanislav Mekhanoshin 6ddadf9901 [AMDGPU] flat scratch ST addressing mode on gfx10
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.

Differential Revision: https://reviews.llvm.org/D89501
2020-10-19 15:29:52 -07:00
Stanislav Mekhanoshin 45014ce36f [AMDGPU] Add tied operand to d16 scratch loads
This is still no-op because there is no selection for these
opcodes.

Differential Revision: https://reviews.llvm.org/D88927
2020-10-07 11:13:01 -07:00
Stanislav Mekhanoshin 7361ce73ef [AMDGPU] Use default zero flag operands in flat scratch
This is no-op so far because we do not select these yet.

Differential Revision: https://reviews.llvm.org/D88920
2020-10-07 10:56:47 -07:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Matt Arsenault e1a2f4713c AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Matt Arsenault e0375dbcb3 AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics
Global instructions have the signed offsets.
2020-08-17 09:19:15 -04:00
Matt Arsenault f0af434b79 AMDGPU: Remove register class params from flat memory patterns 2020-08-15 12:12:33 -04:00
Matt Arsenault a7455652c0 AMDGPU: Fix global atomic saddr operand class 2020-08-15 12:12:28 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault e5077b5c2a AMDGPU: Fix matching wrong offsets for global atomic loads
These used signed offsets with a different size.
2020-08-15 12:12:17 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 47af1ac69a AMDGPU: Correct definitions for global saddr instructions
The VGPR component is a 32-bit offset, not 64-bits.

I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
2020-08-15 12:11:57 -04:00
Matt Arsenault e14474a39a AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd
Remove the intermediate transform in the DAG path. I believe this is
the last non-deprecated intrinsic that needs handling.
2020-08-12 10:04:53 -04:00
Matt Arsenault cdd45d5f9c AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub
Remove the custom node boilerplate. Not sure why this tried to handle
the LDS atomic stuff.
2020-07-29 08:27:31 -04:00
Stanislav Mekhanoshin 9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Matt Arsenault 1657f0ebc2 AMDGPU: Fix overriding global FP atomic feature predicates
Global TableGen let override blocks are pretty dangerous and override
any local special cases. In this case, the broader HasFlatGlobalInsts
was overriding the more specific predicate for
FeatureAtomicFaddInsts. Make sure HasFlatGlobalInsts is implied by
FeatureAtomicFaddInsts, and make sure the right predicate is used.

One issue with independently setting the subtarget features on
incompatible targets is all of the encoding families do not define all
opcodes. This will hit an assert on gfx10 for example, since we set
the encoding independently based on the generation and not based on a
feature.
2020-06-04 17:50:38 -04:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Matt Arsenault bc3d900fa5 AMDGPU/GlobalISel: Fix not using global atomics on gfx9+
For some reason the flat/global atomics end up in the generated
matcher table in a different order from SelectionDAG. Use
AddedComplexity to prefer checking for global atomics first.
2020-01-27 07:42:42 -08:00
Matt Arsenault c66b2e1c87 AMDGPU: Eliminate more legacy codepred address space PatFrags
These should now be limited to R600 code.
2020-01-09 10:29:32 -05:00
Matt Arsenault ed9a56b0f2 AMDGPU/GlobalISel: Select some 128-bit load/stores 2019-12-27 08:49:43 -05:00
Matt Arsenault e16a71382d AMDGPU: Select global atomicrmw fadd
This only works if there is no use of the return value.
2019-11-06 16:06:38 -08:00
Matt Arsenault 171cf5302f AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
2019-10-25 13:11:09 -07:00
Stanislav Mekhanoshin befab66a2c [AMDGPU] drop getIsFP td helper
We already have isFloatType helper, and they are out of sync.
Drop one and merge the type list.

Differential Revision: https://reviews.llvm.org/D69138

llvm-svn: 375175
2019-10-17 21:46:56 +00:00
Dmitry Preobrazhensky 94d040706d [AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructions
See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68347

llvm-svn: 373736
2019-10-04 12:10:22 +00:00
Matt Arsenault ee093ba5c9 AMDGPU/GlobalISel: Avoid repeating 32-bit type lists
llvm-svn: 371156
2019-09-06 00:36:10 +00:00
Matt Arsenault 9952f46407 AMDGPU/GlobalISel: Fix flat load/store of pointer types
llvm-svn: 367513
2019-08-01 03:57:42 +00:00
Matt Arsenault 57495268ac AMDGPU/GlobalISel: Remove manual store select code
This regresses the weird types that are newly treated as legal load
types, but fixes incorrectly using flat instrucions on SI.

llvm-svn: 367512
2019-08-01 03:52:40 +00:00
Matt Arsenault e6ce48422c AMDGPU: Start redefining atomic PatFrags
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.

llvm-svn: 367506
2019-08-01 03:25:52 +00:00
Matt Arsenault 70e20c0f08 AMDGPU: Correct FP atomic patterns
These need to use an fadd, not an add. Also make the noret part clear
in the name.

llvm-svn: 367505
2019-08-01 03:22:40 +00:00
Matt Arsenault 7eb1902cd5 AMDGPU: Add register classes to flat store patterns
For some reason GlobalISelEmitter needs register classes to import
these, although it works for the load patterns.

llvm-svn: 366242
2019-07-16 18:26:42 +00:00
Matt Arsenault 8f8d07e93b AMDGPU: Replace store PatFrags
Convert the easy cases to formats understood for GlobalISel.

llvm-svn: 366240
2019-07-16 18:21:25 +00:00
Matt Arsenault c6fd5abecc AMDGPU: Redefine load PatFrags
Rewrite PatFrags using the new PatFrag address space matching in
tablegen. These will now work with both SelectionDAG and GlobalISel.

llvm-svn: 366234
2019-07-16 17:38:50 +00:00
Matt Arsenault 1739b700b1 AMDGPU: Avoid code predicates for extload PatFrags
Use the MemoryVT field. This will be necessary for tablegen to
automatically handle patterns for GlobalISel.

Doesn't handle the d16 lo/hi patterns. Those are a special case since
it involvess the custom node type.

llvm-svn: 366168
2019-07-16 02:46:05 +00:00
Stanislav Mekhanoshin e93279fd1b [AMDGPU] gfx908 atomic fadd and atomic pk_fadd
Differential Revision: https://reviews.llvm.org/D64435

llvm-svn: 365717
2019-07-11 00:10:17 +00:00
Dmitry Preobrazhensky 2eff0318c6 [AMDGPU][MC] Corrected parsing of FLAT offset modifier
Summary of changes:

- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D64244

llvm-svn: 365321
2019-07-08 14:27:37 +00:00
Stanislav Mekhanoshin bdf7f81b89 [AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in
between.

Differential Revision: https://reviews.llvm.org/D63619

llvm-svn: 364074
2019-06-21 16:30:14 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Stanislav Mekhanoshin 5182302a37 [AMDGPU] Sort out and rename multiple CI/VI predicates
Differential Revision: https://reviews.llvm.org/D60346

llvm-svn: 357835
2019-04-06 09:20:48 +00:00
Stanislav Mekhanoshin 7895c03232 [AMDGPU] predicate and feature refactoring
We have done some predicate and feature refactoring lately but
did not upstream it. This is to sync.

Differential revision: https://reviews.llvm.org/D60292

llvm-svn: 357791
2019-04-05 18:24:34 +00:00
Tim Renouf 361b5b2193 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
2019-03-21 12:01:21 +00:00
Matt Arsenault e8c03a2511 AMDGPU: Move d16 load matching to preprocess step
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.

I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.

llvm-svn: 355731
2019-03-08 20:58:11 +00:00
Konstantin Zhuravlyov 9a278bf6b5 Revert "AMDGPU/NFC: Cleanup subtarget predicates"
It breaks one of our downstream merges, so revert it
temporarily while investigating failures downstream

llvm-svn: 354700
2019-02-22 23:21:06 +00:00
Konstantin Zhuravlyov c2650178a1 AMDGPU/NFC: Cleanup subtarget predicates
Differential Revision: https://reviews.llvm.org/D58522

llvm-svn: 354620
2019-02-21 20:43:43 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Ron Lieberman cac749ac88 [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

llvm-svn: 347008
2018-11-16 01:13:34 +00:00
Konstantin Zhuravlyov 15e90e331c AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclasses
llvm-svn: 346361
2018-11-07 21:42:13 +00:00