Commit Graph

367 Commits

Author SHA1 Message Date
Austin Kerbow 0f116fd9d8 [AMDGPU] Fix infinite loop with fma combines
https://reviews.llvm.org/D72312 introduced an infinite loop which involves
DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine.

fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ...

This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32.
Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math',
and no source mods allowed on one of the users of the Op.

This fix makes changes to indicate that it is not free to negate a fma if it
has users with source mods.

Differential Revision: https://reviews.llvm.org/D73939
2020-02-04 13:11:09 -08:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Matt Arsenault 68b102b97a AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16
Manually select this is as a tablegen workraound. Both SelectionDAG
and GlobalISel end up misplacing the copy to m0 when both instructions
in the output need it. Neither considers that both output instructions
depend on m0. I don't know of any other pattern we need to handle this
case, so it's less effort to just workaround this for now.
2020-01-29 08:24:31 -08:00
Stanislav Mekhanoshin 44b865fa7f [AMDGPU] Allow narrowing muti-dword loads
Currently BE allows only a little load narrowing because
of the fear it will produce sub-dword ext loads. However,
we can always allow narrowing if we are shrinking one
multi-dword load to another multi-dword load.

In particular we were unable to reduce s_load_dwordx8 into
s_load_dwordx4 if identity shuffle was used to extract
low 4 dwords.

Differential Revision: https://reviews.llvm.org/D73133
2020-01-24 11:03:41 -08:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00
Matt Arsenault eef92f25cc AMDGPU: Remove custom node for exports
I'm mildly worried about potentially reordering exp/exp_done with
IntrWriteMem on the intrinsic.

Requires hacking out the illegal type on SI, so manually select that
case during lowering.
2020-01-15 18:33:15 -05:00
Matt Arsenault 68e70fb098 AMDGPU: Fix not using v_cvt_f16_[iu]16
We weren't treating i16->f16 casts as legal on targets with these
instructions, and always using a pair of casts through i32.
2020-01-07 15:10:07 -05:00
Craig Topper 787e078f3e [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues instead of creating a MERGE_VALUES node. NFCI
This allows us to clean up some places that were peeking through
the MERGE_VALUES node after the call. By returning the SDValues
directly, we can clean that up.

Unfortunately, there are several call sites in AMDGPU that wanted
the MERGE_VALUES and now need to create their own.
2019-12-30 19:36:04 -08:00
Matt Arsenault 9e1a2a668b AMDGPU: Improve llvm.round.f64 lowering for CI+
The path already used for f16/f32 works a lot better when v_trunc_f64
is available.
2019-12-30 09:55:46 -05:00
Jay Foad f8495017f0 Fix whitespace. 2019-12-16 10:42:34 +00:00
Jay Foad 4f17b1784e Fix for AMDGPU MUL_I24 known bits calculation
Summary:
At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Reviewers: foad, arsenm

Reviewed By: arsenm

Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Patch by Eugene Kuznetsov.

Differential Revision: https://reviews.llvm.org/D70367
2019-12-16 10:25:57 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Matt Arsenault db0ed3e429 AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.

This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
2019-11-19 19:55:43 +05:30
Matt Arsenault 31479d868e AMDGPU: Change boolean content type to 0 or 1
The usage of target boolean checks is overly inflexible, since sext
and zext of a compare are equally cheap. The choice is arbitrary, but
using 0/1 to some degree is the choice of lower resistance since
that's what most targets use. This enables a few combines that don't
bother to support ZeroOrNegativeOneBooleanContent.
2019-11-15 13:43:47 +05:30
Matt Arsenault e16a71382d AMDGPU: Select global atomicrmw fadd
This only works if there is no use of the return value.
2019-11-06 16:06:38 -08:00
Michael Liao 4531aee2ac [amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69735
2019-11-01 17:06:17 -04:00
Matt Arsenault ef9a0278f0 AMDGPU: Select basic interp directly from intrinsics
llvm-svn: 375457
2019-10-21 21:49:44 +00:00
Guillaume Chatelet b65fa48305 [Alignment] Migrate Attribute::getWith(Stack)Alignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Reviewed By: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68792

llvm-svn: 374884
2019-10-15 12:56:24 +00:00
Matt Arsenault 4227c62bc7 AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAG
llvm-svn: 374495
2019-10-11 01:28:27 +00:00
Evandro Menezes c57a9dc487 [AMDGPU] Use math constants defined in MathExtras (NFC)
Use the the new math constants in `MathExtras.h`.

Differential revision: https://reviews.llvm.org/D68285

llvm-svn: 374208
2019-10-09 20:00:43 +00:00
Thomas Raoux 3c8c667235 [TargetLowering] Make allowsMemoryAccess methode virtual.
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.

Differential Revision: https://reviews.llvm.org/D67121

llvm-svn: 372935
2019-09-26 00:16:01 +00:00
Simon Pilgrim 557cee337b [AMDGPU] isSDNodeAlwaysUniform - silence static analyzer dyn_cast<LoadSDNode> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us.

llvm-svn: 372528
2019-09-22 21:01:13 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
Matt Arsenault 64ecca90d4 AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE
Handle the simple case that lowers to a constant.

llvm-svn: 371424
2019-09-09 17:13:44 +00:00
Matt Arsenault acc9571406 AMDGPU: Remove pointless wrapper nodes for init.exec intrinsics
llvm-svn: 371364
2019-09-09 05:49:52 +00:00
Matt Arsenault 59ff77ee38 AMDGPU: Fix emitting multiple stack loads for stack passed workitems
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.

llvm-svn: 371148
2019-09-05 23:40:14 +00:00
Matt Arsenault ede9a5293d AMDGPU: Remove unused custom node definition
llvm-svn: 370603
2019-09-01 02:00:08 +00:00
Matt Arsenault 0a6564980b AMDGPU: Combine directly on mul24 intrinsics
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.

llvm-svn: 369994
2019-08-27 00:18:09 +00:00
Craig Topper 3f59bfd5be [MVT] Add v16f16 and v32f16 vectors.
I might look at improving PR43065 which will require being
able to mark a 256 and 512 bit vector of f16 as Legal.

Differential Revision: https://reviews.llvm.org/D66515

llvm-svn: 369565
2019-08-21 19:14:48 +00:00
Matt Arsenault 1f2b727298 MVT: Add v3i16/v3f16 vectors
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.

Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.

llvm-svn: 369038
2019-08-15 18:58:25 +00:00
Austin Kerbow a05c384132 Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367969
2019-08-06 02:16:11 +00:00
Dmitri Gribenko 37aa8ad663 Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"
This reverts commit r367882. It broke the test
MC/Disassembler/AMDGPU/gfx10_dasm_all.txt.

llvm-svn: 367904
2019-08-05 18:36:43 +00:00
Austin Kerbow 8d229dbb47 [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367882
2019-08-05 16:09:49 +00:00
Nicolai Haehnle e204786b6c AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}
Summary:
Wrapping increment/decrement. These aren't exposed by many APIs...

Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c

Reviewers: arsenm, tpr, dstuttard

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65283

llvm-svn: 367821
2019-08-05 09:36:06 +00:00
Simon Pilgrim c60c12fb10 Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.
llvm-svn: 366808
2019-07-23 14:04:54 +00:00
Matt Arsenault 1022c0dfde AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Matt Arsenault 35c96598b1 AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237
2019-07-16 18:05:29 +00:00
Stanislav Mekhanoshin 1dfae6fe50 [AMDGPU] use v32f32 for 3 mfma intrinsics
These should really use v32f32, but were defined as v32i32
due to the lack of the v32f32 type.

Differential Revision: https://reviews.llvm.org/D64667

llvm-svn: 365972
2019-07-12 22:42:01 +00:00
Stanislav Mekhanoshin e67cc380a8 [AMDGPU] gfx908 mfma support
Differential Revision: https://reviews.llvm.org/D64584

llvm-svn: 365824
2019-07-11 21:19:33 +00:00
Jay Foad c1b7db9eda Remove some redundant code from r290372 and improve a comment.
llvm-svn: 365741
2019-07-11 08:49:52 +00:00
Stanislav Mekhanoshin e93279fd1b [AMDGPU] gfx908 atomic fadd and atomic pk_fadd
Differential Revision: https://reviews.llvm.org/D64435

llvm-svn: 365717
2019-07-11 00:10:17 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Stanislav Mekhanoshin 07fd88d735 [AMDGPU] Packed thread ids in function call ABI
Differential Revision: https://reviews.llvm.org/D63851

llvm-svn: 364619
2019-06-28 01:52:13 +00:00
Nicolai Haehnle 2710171a15 AMDGPU: Write LDS objects out as global symbols in code generation
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
  to a constant at compile times, which means some tests can no longer
  be applied.

  The current "solution" is a terrible hack, but the intrinsic isn't
  used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
  means that we can no longer generate a relevant error message at
  compile time. It would be possible to add a check for the size of
  individual variables, but ultimately the linker will have to perform
  the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

llvm-svn: 364297
2019-06-25 11:52:30 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
Matt Arsenault c5830f5f05 AtomicExpand: Don't crash on non-0 alloca
This now produces garbage on AMDGPU with a call to an nonexistent,
anonymous libcall but won't assert.

llvm-svn: 363022
2019-06-11 01:35:07 +00:00
Matt Arsenault 383e72fcfe AMDGPU: Expand < 32-bit atomics
Also fix AtomicExpand asserting on atomicrmw fadd/fsub.

llvm-svn: 363021
2019-06-11 01:35:00 +00:00
Ryan Taylor 29257eb76c [AMDGPU] Increases available SGPR for Calling Convention
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.

Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61261

llvm-svn: 360778
2019-05-15 14:43:55 +00:00
Simon Pilgrim e3eec06dde [AMDGPU] Reapplied BFE canonicalization from D60462
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.

llvm-svn: 360263
2019-05-08 15:49:10 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00