Commit Graph

2645 Commits

Author SHA1 Message Date
Jay Foad 6e18266aa4 Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.

Reviewers: nhaehnle, arsenm, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65813

llvm-svn: 370667
2019-09-02 14:40:57 +00:00
Piotr Sobczak 252a584cbd [AMDGPU] Add test
Summary:
Add test checking that the redundant immediate MOV instruction
(by-product of handling phi nodes) is not found in the generated code.

Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner

Reviewed By: arsenm

Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63860

llvm-svn: 370634
2019-09-02 10:02:54 +00:00
Matt Arsenault cbd1782c79 AMDGPU/GlobalISel: Legalize sin/cos
llvm-svn: 370402
2019-08-29 20:06:48 +00:00
Jordan Rupprecht f9f81289e6 Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cb)

It causes many benchmark regressions, internally and in llvm's benchmark suite.

llvm-svn: 370398
2019-08-29 19:03:58 +00:00
Matt Arsenault 216d8ff60b AMDGPU: Don't use frame virtual registers
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.

This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).

llvm-svn: 370281
2019-08-29 01:13:47 +00:00
Matt Arsenault 8ec5c10042 GlobalISel/TableGen: Handle setcc patterns
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.

This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.

llvm-svn: 370280
2019-08-29 01:13:41 +00:00
Ryan Taylor 3b1459ed7c [AMDGPU] Adjust number of SGPRs available in Calling Convention
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.

Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
2019-08-28 15:00:45 +00:00
Matt Arsenault a8bbcbd006 AMDGPU/GlobalISel: Fix constraining scalar and/or/xor
If the result register already had a register class assigned, the
sources may not have been properly constrained.

llvm-svn: 370150
2019-08-28 02:11:03 +00:00
Matt Arsenault 5c7e96dc26 AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace
llvm-svn: 370140
2019-08-28 00:58:24 +00:00
Matt Arsenault 2910184936 DAG: computeNumSignBits for MUL
Copied directly from the IR version.

Most of the testcases I've added for this are somewhat problematic
because they really end up testing the yet to be implemented version
for MUL_I24/MUL_U24.

llvm-svn: 370099
2019-08-27 19:05:33 +00:00
Matt Arsenault 2797474dbb AMDGPU: Add baseline test for num sign bits of mul
llvm-svn: 370098
2019-08-27 19:01:02 +00:00
Matt Arsenault 0c096da02f AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16
This is something of a workaround since computeRegisterProperties
seems to be doing the wrong thing.

llvm-svn: 370086
2019-08-27 17:51:56 +00:00
Sanjay Patel b516f1afdd [DAGCombiner] cancel fnegs from multiplied operands of FMA
(-X) * (-Y) + Z --> X * Y + Z

This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.

We do handle the simpler case already:
(-X) * (-Y) --> X * Y

And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.

Differential Revision: https://reviews.llvm.org/D66755

llvm-svn: 370071
2019-08-27 15:17:46 +00:00
Petar Avramovic d568ed40e0 [GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAG
Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS
to match names of surrounding variables.

Differential Revision: https://reviews.llvm.org/D66587

llvm-svn: 370062
2019-08-27 14:22:32 +00:00
Matt Arsenault 0a6564980b AMDGPU: Combine directly on mul24 intrinsics
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.

llvm-svn: 369994
2019-08-27 00:18:09 +00:00
Matt Arsenault 3b95986a32 AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.

As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.

llvm-svn: 369991
2019-08-27 00:08:31 +00:00
Matt Arsenault 74115ef791 AMDGPU: Add baseline test for mul24 ordering issues
llvm-svn: 369858
2019-08-24 22:22:38 +00:00
Matt Arsenault b3dd381a73 AMDGPU: Introduce a flag to disable mul24 intrinsic formation
llvm-svn: 369856
2019-08-24 22:14:41 +00:00
Matt Arsenault c4dd1d1873 AMDGPU: Generate check lines
Checking all the instructions will help catch LICM changes when passes
are reordered. Also switch to using gfx9 since global stores make the
relevant instructions more obvious.

llvm-svn: 369855
2019-08-24 22:14:37 +00:00
Stanislav Mekhanoshin 8fe1245a0f [AMDGPU] w/a for gfx908 mfma SrcC literal HW bug
gfx908 ignores an mfma if SrcC is a literal.

Differential Revision: https://reviews.llvm.org/D66670

llvm-svn: 369816
2019-08-23 22:09:58 +00:00
Volkan Keles 277631e3b8 [GlobalISel] Legalizer: Retry combining illegal artifacts as long as there new artifacts
Summary:
Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s
possible to combine them after processing the rest of the instruction because
the legalization is likely to generate more artifacts that allow ArtifactCombiner
to combine away them.

Instead, move illegal artifacts to another list called RetryList and wait until all of the
instruction in InstList are legalized. After that, check if there is any new artifacts and
try to combine them again if that’s the case. If not, abort. The idea is similar to D59339,
but the approach is a bit different.

This patch fixes the issue described above, but the legalizer still may be unable to handle
some cases depending on when to legalize artifacts. So, in the long run, we probably need
a different legalization strategy that handles this dependency in a better way.

Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette

Reviewed By: dsanders

Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65894

llvm-svn: 369805
2019-08-23 20:30:35 +00:00
Amaury Sechet 05f56a1ddd [AMDGPU] Automatically generate various tests. NFC
llvm-svn: 369787
2019-08-23 17:58:49 +00:00
Jay Foad eac23862a8 [AMDGPU] gfx10 atomic optimizer changes.
Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65644

llvm-svn: 369745
2019-08-23 10:07:43 +00:00
Matt Arsenault fba82858f2 GlobalISel: Don't create G_UADDE with constant false carry in
The x86 tests are now broken (in paticular add-scalar.ll now hits the
DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a
custom lowering for this, so that will need to be implemented.

llvm-svn: 369673
2019-08-22 17:29:17 +00:00
Guozhi Wei 51f48295cb [MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.

To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 369664
2019-08-22 16:21:32 +00:00
Matt Arsenault 954a012b4c GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.

llvm-svn: 369547
2019-08-21 16:59:10 +00:00
Alexander Timofeev 78347c979e [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
Differential Revision: https://reviews.llvm.org/D63731
Reviewers: qcolombet, rampitec

llvm-svn: 369532
2019-08-21 15:15:04 +00:00
Martin Storsjo 100957153a [test] Fix tests when run on windows after SVN r369426. NFC.
When running tests on windows, invoking "llc -march=<arch>" will
implicitly use windows as the target os, making these tests misbehave
after this change.

Fix the issue by using more specific -mtriple values instead of plain
-march in these tests.

This should hopefully fix buildbot failures like
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/9816.

llvm-svn: 369443
2019-08-20 20:58:02 +00:00
Matt Arsenault 4b7fc85c0b Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"
This reverts r367500 and r369203. This is causing various test
failures.

llvm-svn: 369417
2019-08-20 17:45:25 +00:00
Matt Arsenault 479f3bdb2c AMDGPU: Fix iterator error when lowering SI_END_CF
If the instruction is the last in the block, there is no next
instruction but the iteration still needs to look at the new block.

llvm-svn: 369203
2019-08-18 00:20:44 +00:00
Matt Arsenault 0b864bb043 AMDGPU: Reduce number of registers in test
Once the failure this is testing is fixed, this would fail due to
using too many registers.

llvm-svn: 368901
2019-08-14 19:09:48 +00:00
Aditya Nandakumar c65ac865c3 [GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong source index
https://reviews.llvm.org/D66182

llvm-svn: 368781
2019-08-14 01:23:33 +00:00
Aditya Nandakumar 615eee6402 [GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sources
https://reviews.llvm.org/D66171

llvm-svn: 368753
2019-08-13 21:49:11 +00:00
Tim Renouf 10db641aab [AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'
That change (r363670) could leave a copy from vgpr to sgpr. Fixed.

Differential Revision: https://reviews.llvm.org/D66133

Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4
llvm-svn: 368736
2019-08-13 18:57:55 +00:00
Matt Arsenault 28215caa60 GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES
Odd sized vectors aren't handled yet.

llvm-svn: 368713
2019-08-13 16:26:28 +00:00
Matt Arsenault 690645bda0 GlobalISel: Implement lower for G_SHUFFLE_VECTOR
llvm-svn: 368709
2019-08-13 16:09:07 +00:00
Stanislav Mekhanoshin 4c9c98f36b [AMDGPU] Printf runtime binding pass
This pass is a port of the according pass from the HSAIL compiler.
It parses printf calls and setup runtime printf buffer.
After that it copies printf arguments to the buffer and fills in
module metadata for runtime.

Differential Revision: https://reviews.llvm.org/D24035

llvm-svn: 368592
2019-08-12 17:12:29 +00:00
Hans Wennborg a45f301f7a Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"
It caused assertions to fire when building Chromium:

  lib/CodeGen/LiveDebugValues.cpp:331: bool
  {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion
  `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.

See https://crbug.com/992871#c3 for how to reproduce.

> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
>
> To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
>
> Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 368579
2019-08-12 14:23:13 +00:00
Daniel Sanders e9a57c2b23 [globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
  %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
  %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
  %1 = G_CONSTANT 16
  %2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
  %2 = G_SEXT_INREG %0, 16
or:
  %1 = G_CONSTANT 16
  %2:_(s32) = G_SHL %0, %1
  %3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61289

llvm-svn: 368487
2019-08-09 21:11:20 +00:00
Guozhi Wei 80347c3acc [MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.

To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 368339
2019-08-08 20:25:23 +00:00
Tim Renouf 5a0794327a [StructurizeCFG] Enable -structurizecfg-relaxed-uniform-regions by default
D62198 introduced an option to relax the checks for
hasOnlyUniformBranches. This commit turns the option on by default, for
better code generation in some cases in AMDGPU.

Differential Revision: https://reviews.llvm.org/D63198

Change-Id: I9cbff002a1e74d3b7eb96b4192dc8129936d537d
llvm-svn: 368042
2019-08-06 14:30:19 +00:00
Austin Kerbow a05c384132 Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367969
2019-08-06 02:16:11 +00:00
Dmitri Gribenko 37aa8ad663 Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"
This reverts commit r367882. It broke the test
MC/Disassembler/AMDGPU/gfx10_dasm_all.txt.

llvm-svn: 367904
2019-08-05 18:36:43 +00:00
Austin Kerbow 8d229dbb47 [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367882
2019-08-05 16:09:49 +00:00
Tom Stellard e15d95a987 AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOs
Summary:
This is a follow up to r367237.  MachineFunction::getMachineMemOperand()
adds the offset parameter to the existing offset instead of resetting it.
So we need to reset the offset to the correct value after calling this
function.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65557

llvm-svn: 367881
2019-08-05 16:08:44 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Matt Arsenault 0e0a1c80fb AMDGPU: Correct behavior of f16/i16 non-format store intrinsics
This was switching to use a format store for a non-format store for
f16 types. Also fixes i16/f16 stores on targets without legal f16.

The corresponding loads also need to be fixed.

llvm-svn: 367872
2019-08-05 14:57:59 +00:00
Matt Arsenault ff6b007772 AMDGPU/GlobalISel: Alternative mappings for constants
Without context we assume SGPR. Allowing VGPR constants theoretically
helps avoid a copy. This seems to not actually work now, and the
choice isn't based on the use bank.

llvm-svn: 367871
2019-08-05 14:40:26 +00:00
Nicolai Haehnle e204786b6c AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}
Summary:
Wrapping increment/decrement. These aren't exposed by many APIs...

Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c

Reviewers: arsenm, tpr, dstuttard

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65283

llvm-svn: 367821
2019-08-05 09:36:06 +00:00
Tim Northover a009a60a91 IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.

Also modifies the parser to accept IR in that form for obvious reasons.

llvm-svn: 367755
2019-08-03 14:28:34 +00:00