Commit Graph

1478 Commits

Author SHA1 Message Date
Sanjay Patel 56d59c1f0f [AMDGPU] fix test to be independent of FP undef
llvm-svn: 327147
2018-03-09 16:33:34 +00:00
Stanislav Mekhanoshin c8127fc674 [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D44279

llvm-svn: 327106
2018-03-09 07:21:43 +00:00
Sanjay Patel 672ad3269b [AMDGPU] fix test to survive more FP undef constant folding
llvm-svn: 327066
2018-03-08 21:30:56 +00:00
Sanjay Patel 7325d12f58 [AMDGPU] fix test to survive the most basic undef constant folding
This will likely need to be changed again for anything more than:
fmul undef, undef -> undef

llvm-svn: 327034
2018-03-08 17:34:25 +00:00
Farhana Aleen 89196642f7 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
         loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44179

llvm-svn: 326910
2018-03-07 17:09:18 +00:00
Farhana Aleen 347d12b4ce Revert "[AMDGPU] Widened vector length for global/constant address space."
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.

llvm-svn: 326907
2018-03-07 16:55:27 +00:00
Farhana Aleen 0d03d0588d [AMDGPU] Widened vector length for global/constant address space.
llvm-svn: 326904
2018-03-07 16:29:05 +00:00
Yaxun Liu 46439e8d4a [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785

llvm-svn: 326806
2018-03-06 16:04:39 +00:00
Matt Arsenault e31ab94e97 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
llvm-svn: 326715
2018-03-05 16:25:18 +00:00
Matt Arsenault 71272e6d4e AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

llvm-svn: 326714
2018-03-05 16:25:15 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Matt Arsenault b9699c009d AMDGPU/GlobalISel: InstrMapping for G_ZEXT
llvm-svn: 326589
2018-03-02 16:55:37 +00:00
Matt Arsenault 1c1aab99ae AMDGPU/GlobalISel: InstrMapping for G_TRUNC
llvm-svn: 326588
2018-03-02 16:55:33 +00:00
Matt Arsenault ef8db767d7 AMDGPU/GlobalISel: Define InstrMappings for G_FCMP
Patch by Tom Stellard

llvm-svn: 326587
2018-03-02 16:53:15 +00:00
Matt Arsenault 2607dc60de AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum
Patch by Tom Stellard

llvm-svn: 326586
2018-03-02 16:40:17 +00:00
Matt Arsenault b46c191c49 AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum
Patch by Tom Stellard

llvm-svn: 326567
2018-03-02 12:23:00 +00:00
Jan Vesely b283ea0f0f AMDGPU/GCN: Promote i16 ctpop
i16 capable ASICs do not support i16 operands for this instruction.
Add tablegen pattern to merge chained i16 additions.

Differential Revision: https://reviews.llvm.org/D43985

llvm-svn: 326535
2018-03-02 02:50:22 +00:00
Matt Arsenault 41d2e3d98e AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI
Patch by Tom Stellard

llvm-svn: 326534
2018-03-02 02:19:16 +00:00
Matt Arsenault b23041ad4d AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI
Patch by Tom Stellard

llvm-svn: 326533
2018-03-02 02:19:11 +00:00
Matt Arsenault 327d5fb2e5 AMDGPU/GlobalISel: Define instruction mapping for G_FMUL
llvm-svn: 326532
2018-03-02 02:17:01 +00:00
Matt Arsenault 5a9e834eac AMDGPU/GlobalISel: Define instruction mapping for G_FADD
Patch by Tom Stellard

llvm-svn: 326526
2018-03-02 01:22:13 +00:00
Matt Arsenault d99317f1b3 AMDGPU/GlobalISel: Define instruction mapping for G_SHL
Patch by Tom Stellard

llvm-svn: 326525
2018-03-02 01:22:10 +00:00
Matt Arsenault 3c7a123ccc AMDGPU/GlobalISel: Define instruction mapping for G_XOR
llvm-svn: 326524
2018-03-02 01:22:06 +00:00
Matt Arsenault c0f34c9e36 AMDGPU/GlobalISel: Define instruction mapping for G_AND
Patch by Tom Stellard

llvm-svn: 326523
2018-03-02 01:22:01 +00:00
Matt Arsenault 364f12e8f9 AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz
Patch by Tom Stellard

llvm-svn: 326490
2018-03-01 21:25:30 +00:00
Matt Arsenault 5320ee4a05 AMDGPU/GlobalISel: Define instruction mapping for G_OR
Patch by Tom Stellard

llvm-svn: 326489
2018-03-01 21:25:25 +00:00
Matt Arsenault 62669ede94 AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST
Patch by Tom Stellard

llvm-svn: 326482
2018-03-01 20:59:44 +00:00
Matt Arsenault 0529a8e2de AMDGPU/GlobalISel: Mark i32->i64 zext as legal
llvm-svn: 326481
2018-03-01 20:56:21 +00:00
Matt Arsenault 36b99e1937 AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr
Patch by Tom Stellard

llvm-svn: 326479
2018-03-01 20:40:55 +00:00
Matt Arsenault 8931bbf8df AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp
Patch by Tom Stellard

llvm-svn: 326477
2018-03-01 20:24:37 +00:00
Matt Arsenault 50721ab325 AMDGPU/GlobalISel: Define InstrMappings for G_ICMP
Patch by Tom Stellard

llvm-svn: 326472
2018-03-01 19:27:10 +00:00
Matt Arsenault dc14ec05d4 AMDGPU/GlobalISel: Make i32 mul legal
llvm-svn: 326471
2018-03-01 19:22:05 +00:00
Matt Arsenault 06cbb27a79 AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF
Patch by Tom Stellard

llvm-svn: 326470
2018-03-01 19:16:52 +00:00
Matt Arsenault e3d9ecf2b9 AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT
Patch by Tom Stellard

llvm-svn: 326468
2018-03-01 19:13:30 +00:00
Matt Arsenault 3f6a204eaa AMDGPU/GlobalISel: Make i32 xor legal
llvm-svn: 326466
2018-03-01 19:09:21 +00:00
Matt Arsenault 8e80a5fbca AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal
Patch by Tom Stellard

llvm-svn: 326465
2018-03-01 19:09:16 +00:00
Matt Arsenault dd022ce064 AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal
Patch by Tom Stellard

llvm-svn: 326464
2018-03-01 19:04:25 +00:00
Tim Renouf 2a99fa2c08 [AMDGPU] added writelane intrinsic
Summary:
For use by LLPC SPV_AMD_shader_ballot extension.

The v_writelane instruction was already implemented for use by SGPR
spilling, but I had to add an extra dummy operand tied to the
destination, to represent that all lanes except the selected one keep
the old value of the destination register.

.ll test changes were due to schedule changes caused by that new
operand.

Differential Revision: https://reviews.llvm.org/D42838

llvm-svn: 326353
2018-02-28 19:10:32 +00:00
Geoff Berry a2b9011290 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.

llvm-svn: 326208
2018-02-27 16:59:10 +00:00
Matt Arsenault 2a26a286db AMDGPU/GlobalISel: Make f64 constants legal
llvm-svn: 326101
2018-02-26 17:20:43 +00:00
Tim Renouf 832f90fa0c [AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader
Summary:
With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.

Reviewers: kzhuravl

Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D42203

llvm-svn: 326088
2018-02-26 14:46:43 +00:00
Adam Nemet e4e1de60aa Revert "StructurizeCFG: Test for branch divergence correctly"
This reverts commit r325881.

Breaks many bots

llvm-svn: 326037
2018-02-24 17:29:09 +00:00
Stanislav Mekhanoshin fa48c496e2 [AMDGPU] Shrinking V_SUBBREV_U32
V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when
we try to commute V_SUBB_U32 in order to shrink it we do not then
process V_SUBBREV_U32 and it stay VOP3. This is fixed.

Differential Revision: https://reviews.llvm.org/D43699

llvm-svn: 326011
2018-02-24 01:32:32 +00:00
Stanislav Mekhanoshin b9704c001c [AMDGPU] Fixed madak.ll test on VI, added GFX10. NFC.
llvm-svn: 325995
2018-02-23 23:53:27 +00:00
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Amaury Sechet 893a6b89ff [DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner.
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.

Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.

Reviewers: spatel, hfinkel, niravd, craig.topper

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D41235

llvm-svn: 325892
2018-02-23 11:50:42 +00:00
Nicolai Haehnle 6cf306deca AMDGPU: Track physreg uses in SILoadStoreOptimizer
Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).

Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42647

llvm-svn: 325882
2018-02-23 10:45:56 +00:00
Nicolai Haehnle 43c1115cd4 StructurizeCFG: Test for branch divergence correctly
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Reviewers: arsenm, rampitec, jlebar

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D40546

llvm-svn: 325881
2018-02-23 10:45:46 +00:00
Nicolai Haehnle 770397f4cd AMDGPU: Do not combine loads/store across physreg defs
Summary:
Since this pass operates on machine SSA form, this should only really
affect M0 in practice.

Fixes various piglit variable-indexing/vs-varying-array-mat4-index-*

Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8
Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4")

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40343

llvm-svn: 325677
2018-02-21 13:31:35 +00:00
Konstantin Zhuravlyov 5c1237a1fd Revert "[AMDGPU] Increased vector length for global/constant loads."
https://reviews.llvm.org/rL325518

It breaks following OpenCL conformance tests:
  - Basic - parameter_types
  - Basic - vload_private

llvm-svn: 325643
2018-02-20 23:30:21 +00:00