Commit Graph

793 Commits

Author SHA1 Message Date
Matt Arsenault bf67cf7e4b AMDGPU: Remove spurious out branches after a kill
The sequence like this:
  v_cmpx_le_f32_e32 vcc, 0, v0
  s_branch BB0_30
  s_cbranch_execnz BB0_30
  ; BB#29:
  exp null off, off, off, off done vm
  s_endpgm
  BB0_30:
  ; %endif110

is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.

The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.

Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.

This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 292985
2017-01-24 22:18:39 +00:00
Matt Arsenault 7aad8fd8f4 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Changpeng Fang c85abbd955 AMDGPU/SI: Give up in promote alloca when a pointer may be captured.
Differential Revision:
  http://reviews.llvm.org/D28970

Reviewer:
  Matt

llvm-svn: 292966
2017-01-24 19:06:28 +00:00
Stanislav Mekhanoshin 22a56f2f5a [AMDGPU] Add VGPR copies post regalloc fix pass
Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.

This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.

Differential Revision: https://reviews.llvm.org/D28874

llvm-svn: 292956
2017-01-24 17:46:17 +00:00
Wei Ding ee21a36f8a AMDGPU : Add trap handler support.
llvm-svn: 292893
2017-01-24 06:41:21 +00:00
Matt Arsenault 3aef809384 AMDGPU: Custom lower more vector operations
This avoids stack usage.

llvm-svn: 292846
2017-01-23 23:09:58 +00:00
Matt Arsenault 4e305c6c1e DAG: Don't fold vector extract into load if target doesn't want to
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.

llvm-svn: 292842
2017-01-23 22:48:53 +00:00
Matt Arsenault a6867fd441 AMDGPU: Combine fp16/fp64 subtarget features
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.

llvm-svn: 292837
2017-01-23 22:31:03 +00:00
Matt Arsenault 7030661364 DAG: Allow legalization of fcanonicalize vector types
llvm-svn: 292814
2017-01-23 18:52:26 +00:00
Benjamin Kramer db9e0b659d Fix some broken CHECK lines.
The colon is important.

llvm-svn: 292761
2017-01-22 20:28:56 +00:00
Jan Vesely f170504c41 AMDGPU/R600: Serialize vector trunc stores to private AS
Add DUMMY_CHAIN SDNode to denote stores of interest

Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411

Differential Revision: https://reviews.llvm.org/D27964

llvm-svn: 292651
2017-01-20 21:24:26 +00:00
Greg Parker 2873766788 [test] Remove a unwanted match for `XFAIL:`.
llvm-svn: 292567
2017-01-20 02:01:04 +00:00
Stanislav Mekhanoshin 6ec3e3a728 [AMDGPU] Prevent spills before exec mask is restored
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.

Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.

Differential Revision: https://reviews.llvm.org/D27997

llvm-svn: 292554
2017-01-20 00:44:31 +00:00
Matt Arsenault 3e6f9b5773 AMDGPU: Disable some fneg combines unless nsz
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.

fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.

llvm-svn: 292473
2017-01-19 06:35:27 +00:00
Matt Arsenault 3b99f12a4e AMDGPU: Remove modifiers from v_div_scale_*
They seem to produce nonsense results when used.

This should be applied to the release branch.

llvm-svn: 292472
2017-01-19 06:04:12 +00:00
Stanislav Mekhanoshin a4e63ead4b [AMDGPU] Do not allow register coalescer to create big superregs
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.

With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.

Differential Revision: https://reviews.llvm.org/D28782

llvm-svn: 292413
2017-01-18 17:30:05 +00:00
Matt Arsenault f411071d63 DAG: Consider nnan in isKnownNeverNaN
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Matt Arsenault 4165efdc58 AMDGPU: Add replacement export intrinsics
llvm-svn: 292205
2017-01-17 07:26:53 +00:00
Jan Vesely 334f51a6fe ADMGPU/EG,CM: Implement _noret global atomics
_RTN versions will be a lot more complicated

Differential Revision: https://reviews.llvm.org/D28067

llvm-svn: 292162
2017-01-16 21:20:13 +00:00
Konstantin Zhuravlyov 7d88275577 [AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)
Differential Revision: https://reviews.llvm.org/D28496

llvm-svn: 291954
2017-01-13 19:49:25 +00:00
Matt Arsenault 45337df08f AMDGPU: Skip fneg/select combine if it can fold into other
llvm-svn: 291792
2017-01-12 18:58:15 +00:00
Matt Arsenault 31c039ef2e AMDGPU: Fold free fneg into sin
llvm-svn: 291790
2017-01-12 18:48:09 +00:00
Matt Arsenault a8c325e2f5 AMDGPU: Fold fneg into fmul_legacy
llvm-svn: 291784
2017-01-12 18:26:30 +00:00
Matt Arsenault ff7e5aadf5 AMDGPU: Fold fneg into rcp
llvm-svn: 291779
2017-01-12 17:46:35 +00:00
Matt Arsenault 4242d48c36 AMDGPU: Fold fneg into fp_round
llvm-svn: 291778
2017-01-12 17:46:33 +00:00
Matt Arsenault 98d2bf1024 AMDGPU: Fold fneg into fp_extend
llvm-svn: 291777
2017-01-12 17:46:28 +00:00
Matt Arsenault 63f953795e AMDGPU: Fold fneg into fma or fmad
Patch mostly by Fiona Glaser

llvm-svn: 291733
2017-01-12 00:32:16 +00:00
Matt Arsenault 4103a81d6d AMDGPU: Fold fneg into fmul
Patch mostly by Fiona Glaser

llvm-svn: 291732
2017-01-12 00:23:20 +00:00
Matt Arsenault 2529fba989 AMDGPU: Fold fneg into fadd
Patch mostly by Fiona Glaser

llvm-svn: 291731
2017-01-12 00:09:34 +00:00
Matt Arsenault 2a04ff97ad AMDGPU: Pull fneg/fabs out of a select
Allows better source modifier usage.

llvm-svn: 291729
2017-01-11 23:57:38 +00:00
Matt Arsenault 24a1273ae1 AMDGPU: Fix shrinking of addc/subb.
To shrink to VOP2 the input carry must also be VCC.

llvm-svn: 291720
2017-01-11 22:58:12 +00:00
Matt Arsenault 682eb4396a AMDGPU: Fix sext_inreg for i1 in i16
This produces worse code when i16 is legal, mostly
due to combines getting confused by conversions inserted
for uniform 16-bit operations.

llvm-svn: 291717
2017-01-11 22:35:22 +00:00
Matt Arsenault 28bd4cbeaf AMDGPU: Fix breaking VOP3 v_add_i32s
This was shrinking the instruction even though the carry output
register was a virtual register, not known VCC.

llvm-svn: 291716
2017-01-11 22:35:17 +00:00
Matt Arsenault 69e3001b84 AMDGPU: Fix folding immediates into mac src2
Whether it is legal or not needs to check for the instruction
it will be replaced with.

llvm-svn: 291711
2017-01-11 22:00:02 +00:00
Kyle Butt efe56fed12 Revert "CodeGen: Allow small copyable blocks to "break" the CFG."
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.

This needs a simple probability check because there are some cases where it is
not profitable.

llvm-svn: 291695
2017-01-11 19:55:19 +00:00
Matt Arsenault e482403e1c DAGCombiner: Add hasOneUse checks to fadd/fma combine
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.

llvm-svn: 291642
2017-01-11 02:02:12 +00:00
Jan Vesely 0d6cb1caaf AMDGPU/EG,CM: Add fp16 conversion instructions
Differential Revision: https://reviews.llvm.org/D28164

llvm-svn: 291622
2017-01-11 00:12:39 +00:00
Matt Arsenault 51818c14b3 AMDGPU: Constant fold when immediate is materialized
In future commits these patterns will appear after moveToVALU changes.

llvm-svn: 291615
2017-01-10 23:32:04 +00:00
Kyle Butt df27aa8c89 CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.

Differential revision: https://reviews.llvm.org/D27742

llvm-svn: 291609
2017-01-10 23:04:30 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Matt Arsenault 8871683d60 AMDGPU: Add tests for HasMultipleConditionRegisters
This was enabled without many specific tests or the comment.

llvm-svn: 291586
2017-01-10 19:08:15 +00:00
Matt Arsenault 6dca542b4a AMDGPU: Add Assert[SZ]Ext during argument load creation
For i16 zeroext arguments when i16 was a legal type, the
known bits information from the truncate was lost. Insert
a zeroext so the known bits optimizations work with the 32-bit
loads.

Fixes code quality regressions vs. SI in min.ll test.

llvm-svn: 291461
2017-01-09 18:52:39 +00:00
Bjorn Pettersson b14afd452d [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.
Summary:
Originally

 i64 = umax t8, Constant:i64<4>

was expanded into

 i32,i32 = umax Constant:i32<0>, Constant:i32<0>
 i32,i32 = umax t7, Constant:i32<4>

Now instead the two produced umax:es return i32 instead of i32, i32.

Thanks to Jan Vesely for help with the test case.

Patch by mikael.holmen at ericsson.com

Reviewers: bogner, jvesely, tstellarAMD, arsenm

Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D28135

llvm-svn: 291441
2017-01-09 12:03:50 +00:00
Jan Vesely 06200bd7bc AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes
This will make transition to SCRATCH_MEMORY easier

Differential Revision: https://reviews.llvm.org/D24746

llvm-svn: 291279
2017-01-06 21:00:46 +00:00
Konstantin Zhuravlyov 67a6d5401a [AMDGPU] Do not emit .AMDGPU.config section for amdhsa
Differential Revision: https://reviews.llvm.org/D27732

llvm-svn: 291245
2017-01-06 17:02:10 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Matt Arsenault 0b26e47345 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Matt Arsenault 941632839f AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Matt Arsenault 18f56be3d2 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault e7d8ed32f9 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00