Commit Graph

1849 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov 2d22d24ac4 Revert r345542: AMDGPU: Enable code object v3 by default
It breaks mesa.

llvm-svn: 345662
2018-10-30 22:02:40 +00:00
Jonas Paulsson 611b533f1d [SchedModel] Fix for read advance cycles with implicit pseudo operands.
The SchedModel allows the addition of ReadAdvances to express that certain
operands of the instructions are needed at a later point than the others.

RegAlloc may add pseudo operands that are not part of the instruction
descriptor, and therefore cannot have any read advance entries. This meant
that in some cases the desired read advance was nullified by such a pseudo
operand, which still had the original latency.

This patch fixes this by making sure that such pseudo operands get a zero
latency during DAG construction.

Review: Matthias Braun, Ulrich Weigand.
https://reviews.llvm.org/D49671

llvm-svn: 345606
2018-10-30 15:04:40 +00:00
Simon Pilgrim 858303b827 [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.

This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.	
	
Differential Revision: https://reviews.llvm.org/D53760

llvm-svn: 345578
2018-10-30 10:32:11 +00:00
Matt Arsenault abc4f29f9c AMDGPU: Remove custom BUILD_VECTOR combine
This was looping in a testcase and removing it
now slightly improves a test.

llvm-svn: 345560
2018-10-30 01:37:59 +00:00
Matt Arsenault b0b741efb8 AMDGPU: Use scavengeRegisterBackwards
llvm-svn: 345559
2018-10-30 01:33:14 +00:00
Konstantin Zhuravlyov 5cb950200c AMDGPU: Enable code object v3 by default
Differential Revision: https://reviews.llvm.org/D53525

llvm-svn: 345542
2018-10-29 21:07:27 +00:00
Matthias Braun c045c557b0 Relax fast register allocator related test cases; NFC
- Relex hard coded registers and stack frame sizes
- Some test cleanups
- Change phi-dbg.ll to match on mir output after phi elimination instead
  of going through the whole codegen pipeline.

This is in preparation for https://reviews.llvm.org/D52010
I'm committing all the test changes upfront that work before and after
independently.

llvm-svn: 345532
2018-10-29 20:10:42 +00:00
Stanislav Mekhanoshin 79080ecd82 [AMDGPU] Match v_swap_b32
Differential Revision: https://reviews.llvm.org/D52677

llvm-svn: 345514
2018-10-29 17:26:01 +00:00
Scott Linder 11ef7984b0 [AMDGPU] Add a pass to promote bitcast calls
AMDGPU currently only supports direct calls, but at lower optimisation levels it
fails to lower statically direct calls which appear indirect due to a bitcast.

Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize"
calls.

Differential Revision: https://reviews.llvm.org/D52741

llvm-svn: 345382
2018-10-26 13:18:36 +00:00
Tim Renouf 2a1b1d94b6 [AMDGPU] Defined gfx909 Raven Ridge 2
Differential Revision: https://reviews.llvm.org/D53418

Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef
llvm-svn: 345120
2018-10-24 08:14:07 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Changpeng Fang f95f763ea5 AMDGPU: Add support pattern for SUB of one bit
Summary:
  Add selection patterns to support one bit Sub.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D52946

llvm-svn: 344815
2018-10-19 21:09:21 +00:00
Nicolai Haehnle 4821937d2e AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI
Summary:
To workaround a hardware issue in the (base + offset) calculation
when base is negative. The impact on code quality should be limited
since SILoadStoreOptimizer still runs afterwards and is able to
combine loads/stores based on known sign information.

This fixes visible corruption in Hitman on SI (easily reproducible
by running benchmark mode).

Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923

Reviewers: arsenm, mareko

Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53160

llvm-svn: 344698
2018-10-17 15:37:48 +00:00
Nicolai Haehnle 0823050b9f StructurizeCFG: Simplify inserted PHI nodes
Summary:
This improves subsequent divergence analysis in some cases.

Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3

Reviewers: arsenm, rampitec

Subscribers: jvesely, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53316

llvm-svn: 344697
2018-10-17 15:37:41 +00:00
Nicolai Haehnle c4a2ff0950 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 344696
2018-10-17 15:37:30 +00:00
Nicolai Haehnle 6d92ca61f9 StructurizeCFG,AMDGPU: Test case of a redundant phi and codegen consequences
Change-Id: I9681f9e41ca30f82576f3d1f965c3a550a34b171
llvm-svn: 344569
2018-10-15 22:37:46 +00:00
Konstantin Zhuravlyov 94dfcc2eb2 AMDGPU: Generate .amdgcn_target for object code v3
Differential Revision: https://reviews.llvm.org/D53221

llvm-svn: 344552
2018-10-15 20:37:47 +00:00
Nicolai Haehnle 1bb242e201 AMDGPU: Test showing a scalar buffer load deficiency
Change-Id: I5b64a565f22a8482aa0712488d85e45163ac3d12
llvm-svn: 344506
2018-10-15 11:37:04 +00:00
Tom Stellard a894043910 Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"
This reverts commit r344310.

The test case was failing on some bots.

llvm-svn: 344317
2018-10-11 23:36:46 +00:00
Tom Stellard 4733be6e7b AMDGPU/GlobalISel: Implement select for G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 344310
2018-10-11 22:49:54 +00:00
Scott Linder 823549a6ec [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Recommits r341413 with changes to update the MachineDominatorTree when present.

Differential Revision: https://reviews.llvm.org/D51742

llvm-svn: 343992
2018-10-08 18:47:01 +00:00
Tom Stellard 14d8807d9a AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions
Summary: The 32-bit variants do not exist on VI+.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52958

llvm-svn: 343985
2018-10-08 17:49:29 +00:00
Nicolai Haehnle ea36cd595c AMDGPU: Future-proof {raw,struct}.buffer.atomic intrinsics
Summary:
The ISA is really supposed to support 64-bit atomics as well,
so the data type should be an overload.

Mesa doesn't use these atomics yet, in fact I noticed this
issue while trying to use the atomics from Mesa.

Change-Id: I77f58317a085a0d3eb933cc7e99308c48a19f83e

Reviewers: tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52291

llvm-svn: 343978
2018-10-08 16:53:48 +00:00
Neil Henning 6641657453 [AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
  the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
  record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
  whether the value is uniform/divergent use wavefront wide operations
  (DPP in the divergent case) to calculate the total amount to be
  atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
  operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
  the wavefront, and calculate our individual lanes offset into the
  final result.

Differential Revision: https://reviews.llvm.org/D51969

llvm-svn: 343973
2018-10-08 15:49:19 +00:00
Tom Stellard 7c65078f04 AMDGPU/GlobalISel: Add support for G_INTTOPTR
Summary: This is a no-op.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52916

llvm-svn: 343839
2018-10-05 04:34:09 +00:00
Konstantin Zhuravlyov aa067cb9fb AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa
The isAmdCodeObjectV2 is a misleading name which actually checks whether the os
is amdhsa or mesa.

Also add a test to make sure we do not generate old kernel header for code
object v3.

Differential Revision: https://reviews.llvm.org/D52897

llvm-svn: 343813
2018-10-04 21:02:16 +00:00
Farhana Aleen 4bc597bff5 [AMDGPU] Match signed dot4/8 pattern.
Summary: This patch matches signed dot4 and dot8 pattern.

Author: FarhanaAleen

Reviewed By: msearles

Differential Revision: https://reviews.llvm.org/D52520

llvm-svn: 343798
2018-10-04 16:57:37 +00:00
Tim Renouf a37679d67b [AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics
Summary:
The new buffer/tbuffer intrinsics handle an out-of-range immediate
offset by moving/adding offset&-4096 to a vgpr, leaving an in-range
immediate offset, with a chance of the move/add being CSEd for similar
loads/stores.

However it turns out that a negative offset in a vgpr is illegal, even
if adding the immediate offset makes it legal again.

Therefore, this commit disables the offset&-4096 thing if the offset is
negative.

Differential Revision: https://reviews.llvm.org/D52683

Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13
llvm-svn: 343672
2018-10-03 10:29:43 +00:00
Fangrui Song 3d76d36059 [AMDGPU] Rename pass "isel" to "amdgpu-isel"
Summary: The AMDGPU target specific pass "isel" is a misleading name.

Reviewers: tstellar, echristo, javed.absar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D52759

llvm-svn: 343659
2018-10-03 03:38:22 +00:00
Matt Arsenault 635d479322 AMDGPU: Always run AMDGPUAlwaysInline
Even if calls are enabled, it still needs to be run
for forcing inline of functions that use LDS.

llvm-svn: 343657
2018-10-03 02:47:25 +00:00
Matt Arsenault ab41193312 AMDGPU: Expand atomicrmw nand in IR
llvm-svn: 343559
2018-10-02 03:50:56 +00:00
Bjorn Pettersson c2fc53ac90 [PHIElimination] Lower a PHI node with only undef uses as IMPLICIT_DEF
Summary:
The lowering of PHI nodes used to detect if all inputs originated
from IMPLICIT_DEF's. If so the PHI node was replaced by an
IMPLICIT_DEF. Now we also consider undef uses when checking the
inputs. So if all inputs are implicitly defined or undef we
lower the PHI to an IMPLICIT_DEF. This makes
PHIElimination::LowerPHINode more consistent as it checks
both implicit and undef properties at later stages.

Reviewers: MatzeB, tstellar

Reviewed By: MatzeB

Subscribers: jvesely, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D52558

llvm-svn: 343417
2018-09-30 17:26:58 +00:00
Stanislav Mekhanoshin b080adfc0c [AMDGPU] Fold copy (copy vgpr)
This allows to reduce a number of used VGPRs in some cases.

Differential Revision: https://reviews.llvm.org/D52577

llvm-svn: 343249
2018-09-27 18:55:20 +00:00
Stanislav Mekhanoshin 8dfcd83371 [AMDGPU] Fix ds combine with subregs
Differential Revision: https://reviews.llvm.org/D52522

llvm-svn: 343047
2018-09-25 23:33:18 +00:00
Changpeng Fang 6f4922ccc9 AMDGPU: Add Selection patterns to support add of one bit.
Summary:
  We generate s_xor to lower add of i1s in general cases, and s_not to
lower add with a one-bit imm of -1 (true).

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D52518

llvm-svn: 343030
2018-09-25 21:21:18 +00:00
Daniil Fukalov 349b5943b4 [RegAllocGreedy] avoid using physreg candidates that cannot be correctly spilled
For the AMDGPU target if a MBB contains exec mask restore preamble, SplitEditor may get state when it cannot insert a spill instruction.

E.g. for a MIR

bb.100:
    %1 = S_OR_SAVEEXEC_B64 %2, implicit-def $exec, implicit-def $scc, implicit $exec
and if the regalloc will try to allocate a virtreg to the physreg already assigned to virtreg %1, it should insert spill instruction before the S_OR_SAVEEXEC_B64 instruction.
But it is not possible since can generate incorrect code in terms of exec mask.

The change makes regalloc to ignore such physreg candidates.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D52052

llvm-svn: 343004
2018-09-25 18:37:38 +00:00
Sameer Sahasrabuddhe b4f2d1cb68 [AMDGPU] restore r342722 which was reverted with r342743
[AMDGPU] lower-switch in preISel as a workaround for legacy DA

Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

llvm-svn: 342956
2018-09-25 09:39:21 +00:00
Stanislav Mekhanoshin 14fefe7f8e [AMDGPU] Remove useless check from test. NFC.
The check for assignment of zero is practically useless
while the assignment moves around with different scheduling.

llvm-svn: 342935
2018-09-25 01:24:54 +00:00
Matt Arsenault f432011d33 AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses
If the alignment is at least 4, this should report true.

Something still seems off with how < 4-byte types are
handled here though.

Fixing this seems to change how some combines get
to where they get, but somehow isn't changing the net
result.

llvm-svn: 342879
2018-09-24 13:18:15 +00:00
Sameer Sahasrabuddhe 0807e94951 revert changes from r342722
"[AMDGPU] lower-switch in preISel as a workaround for legacy DA"

This broke regression tests. The first breakage was noticed here:
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549

llvm-svn: 342743
2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe 2de7653fd5 [AMDGPU] lower-switch in preISel as a workaround for legacy DA
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll

Differential Revision: https://reviews.llvm.org/D52221

llvm-svn: 342722
2018-09-21 11:26:55 +00:00
Alexander Timofeev 36617f0160 [AMDGPU] Divergence driven instruction selection. Part 1.
Summary: This change is the first part of the AMDGPU target description
    change. The aim of it is the effective splitting the vector and scalar
    flows at the selection stage. Selection uses predicate functions based
    on the framework implemented earlier - https://reviews.llvm.org/D35267

    Differential revision: https://reviews.llvm.org/D52019

    Reviewers: rampitec

llvm-svn: 342719
2018-09-21 10:31:22 +00:00
Carl Ritson 6b8d75425e [AMDGPU] Add instruction selection for i1 to f16 conversion
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52018

Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
llvm-svn: 342558
2018-09-19 16:32:12 +00:00
Farhana Aleen f5a2848376 [AMDGPU] Match udot8 pattern
Summary: D.u32 = S0.u4[0] * S1.u4[0] +

         S0.u4[1] * S1.u4[1] +
         S0.u4[2] * S1.u4[2] +
         S0.u4[3] * S1.u4[3] +
         S0.u4[4] * S1.u4[4] +
         S0.u4[5] * S1.u4[5] +
         S0.u4[6] * S1.u4[6] +
         S0.u4[7] * S1.u4[7] +
         S2.u32

Author: FarhanaAleen

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D51947

llvm-svn: 342497
2018-09-18 16:59:48 +00:00
Matt Arsenault ebf46143ea AMDGPU: Don't form fmed3 if it will require materialization
If there is a single use constant, it can be folded into the
min/max, but not into med3.

llvm-svn: 342443
2018-09-18 02:34:54 +00:00
Matt Arsenault 9d49c449ec AMDGPU: Expand vector canonicalizes
llvm-svn: 342439
2018-09-18 01:51:33 +00:00
David Stuttard 20de3e99b5 [AMDGPU] Ensure trig range reduction only used for subtargets that require it
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.

Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+

Also updated the tests to check correct behaviour

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51933

Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
2018-09-14 10:27:19 +00:00
Matt Arsenault ff987ac6ea AMDGPU: Fix not preserving alignent in call setups
If an argument was passed on the stack, this
was using the default alignment.

I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.

llvm-svn: 342133
2018-09-13 12:14:31 +00:00
Matt Arsenault 842cda6312 DAG: Fix expansion of unaligned FP loads and stores
This was trying to scalarizing a scalar FP type,
resulting in an assert.

Fixes unaligned f64 stack stores for AMDGPU.

llvm-svn: 342132
2018-09-13 12:14:23 +00:00
Matt Arsenault 9de2fb58fa AMDGPU: Fix some outdated datalayouts in tests
llvm-svn: 342131
2018-09-13 11:56:28 +00:00