Commit Graph

1003 Commits

Author SHA1 Message Date
Matt Arsenault 8871683d60 AMDGPU: Add tests for HasMultipleConditionRegisters
This was enabled without many specific tests or the comment.

llvm-svn: 291586
2017-01-10 19:08:15 +00:00
Matt Arsenault 6dca542b4a AMDGPU: Add Assert[SZ]Ext during argument load creation
For i16 zeroext arguments when i16 was a legal type, the
known bits information from the truncate was lost. Insert
a zeroext so the known bits optimizations work with the 32-bit
loads.

Fixes code quality regressions vs. SI in min.ll test.

llvm-svn: 291461
2017-01-09 18:52:39 +00:00
Bjorn Pettersson b14afd452d [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.
Summary:
Originally

 i64 = umax t8, Constant:i64<4>

was expanded into

 i32,i32 = umax Constant:i32<0>, Constant:i32<0>
 i32,i32 = umax t7, Constant:i32<4>

Now instead the two produced umax:es return i32 instead of i32, i32.

Thanks to Jan Vesely for help with the test case.

Patch by mikael.holmen at ericsson.com

Reviewers: bogner, jvesely, tstellarAMD, arsenm

Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D28135

llvm-svn: 291441
2017-01-09 12:03:50 +00:00
Jan Vesely 06200bd7bc AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes
This will make transition to SCRATCH_MEMORY easier

Differential Revision: https://reviews.llvm.org/D24746

llvm-svn: 291279
2017-01-06 21:00:46 +00:00
Konstantin Zhuravlyov 67a6d5401a [AMDGPU] Do not emit .AMDGPU.config section for amdhsa
Differential Revision: https://reviews.llvm.org/D27732

llvm-svn: 291245
2017-01-06 17:02:10 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Matt Arsenault 0b26e47345 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Matt Arsenault 941632839f AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Matt Arsenault 18f56be3d2 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault e7d8ed32f9 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault 46e6b7adef AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault 770ec8680a AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.

llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault ef82ad94ea AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault 9e22bc2cd3 AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault 2920f62423 AMDGPU: setcc test cleanup
llvm-svn: 290306
2016-12-22 03:21:45 +00:00
Matt Arsenault cdff21b14e AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault 4052a576c0 AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault ce84130f85 AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Matt Arsenault 9e91014282 AMDGPU: Allow 16-bit types in inline asm constraints
llvm-svn: 290193
2016-12-20 19:06:12 +00:00
Matt Arsenault d1ceffcd5a AMDGPU: Run fp combine tests on VI
llvm-svn: 290192
2016-12-20 18:55:11 +00:00
Matt Arsenault 4c1e9ec008 AMDGPU: Don't add same instruction multiple times to worklist
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.

llvm-svn: 290191
2016-12-20 18:55:06 +00:00
Tom Stellard 6f9ef14b9d AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*
Reviewers: arsenm, nhaehnle, mareko

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27834

llvm-svn: 290184
2016-12-20 17:19:44 +00:00
Tom Stellard 244891d129 AMDGPU/SI: Add a MachineMemOperand to MIMG instructions
Summary:
Without a MachineMemOperand, the scheduler was assuming MIMG instructions
were ordered memory references, so no loads or stores could be reordered
across them.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27536

llvm-svn: 290179
2016-12-20 15:52:17 +00:00
Konstantin Zhuravlyov 980688cdaf [AMDGPU] When unifying metadata, add operands to named metadata individually
Differential Revision: https://reviews.llvm.org/D27725

llvm-svn: 290114
2016-12-19 16:54:24 +00:00
Matt Arsenault 3aaf11fbd4 AMDGPU: Fix broken check prefix in test
llvm-svn: 290050
2016-12-17 20:03:59 +00:00
Matt Arsenault 327188aa15 AMDGPU: Select branch on undef to uniform scc branch
llvm-svn: 289877
2016-12-15 21:57:11 +00:00
Matt Arsenault 0b386360c5 AMDGPU: Fix asserting on returned tail calls
llvm-svn: 289868
2016-12-15 20:50:12 +00:00
Alexander Timofeev a57511c451 Fix for regression after Global Load Scalarization patch
llvm-svn: 289822
2016-12-15 15:17:19 +00:00
Justin Lebar a091da75b2 [AMDGPU] Fix runtime-metadata.ll test so it doesn't leave an object file in the source tree.
llvm-svn: 289742
2016-12-14 23:24:43 +00:00
Yaxun Liu 07d659bc76 AMDGPU: Emit runtime metadata version 2 as YAML
Differential Revision: https://reviews.llvm.org/D25046

llvm-svn: 289674
2016-12-14 17:16:52 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Matt Arsenault ebfba7027e AMDGPU: Change vintrp printing
llvm-svn: 289664
2016-12-14 16:36:12 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Sanjoy Das 3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Matt Arsenault 2402b95db0 AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.

llvm-svn: 289307
2016-12-10 00:52:50 +00:00
Matt Arsenault 4bd7236193 AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.

Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.

The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.

llvm-svn: 289306
2016-12-10 00:39:12 +00:00
Matt Arsenault 618b330dd0 AMDGPU: Change vintrp printing to better match sc
Some of the immediates need to be printed differently
eventually.

llvm-svn: 289291
2016-12-10 00:23:12 +00:00
Matt Arsenault 5869b5a447 AMDGPU: Cleanup checks in sext_inreg test
llvm-svn: 289272
2016-12-09 21:10:41 +00:00
Marek Olsak 0f55fbae6c AMDGPU/SI: Don't reserve XNACK when it's disabled
Summary:
This frees 2 additional scalar registers.

These are results from all of my 3 patches combined:

  Polaris:
    Spilled SGPRs: 2231 -> 1517 (-32.00 %)

  Tonga:
    Spilled SGPRs: 3829 -> 2608 (-31.89 %)
    Spilled VGPRs: 100 -> 84 (-16.00 %)

  Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader
  limited to 64 VGPRs.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27151

llvm-svn: 289262
2016-12-09 19:49:54 +00:00
Marek Olsak 693e9be918 AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Summary: This frees 2 scalar registers.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27150

llvm-svn: 289261
2016-12-09 19:49:48 +00:00
Marek Olsak 91f22fbf4f AMDGPU/SI: Allow using SGPRs 96-101 on VI
Summary:
There is no point in setting SGPRS=104, because VI allocates SGPRs
in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs
for general purposes.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27149

llvm-svn: 289260
2016-12-09 19:49:40 +00:00
Matthias Braun 2c7d52a540 Move .mir tests to appropriate directories
test/CodeGen/MIR should contain tests that intent to test the MIR
printing or parsing. Tests that test something else should be in
test/CodeGen/TargetName even when they are written in .mir.

As a rule of thumb, only tests using "llc -run-pass none" should be in
test/CodeGen/MIR.

llvm-svn: 289254
2016-12-09 19:08:15 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Tom Stellard 2a48433fcf AMDGPU/SI: Don't mark VINTRP instructions as mayLoad
Summary:
These instructions technically do read from memory, but the memory
is considered to be out of bounds for normal load/store instructions.

shader-db stats:

SGPRS: 1416075 -> 1413323 (-0.19 %)
VGPRS: 867413 -> 863935 (-0.40 %)
Spilled SGPRs: 1409 -> 1354 (-3.90 %)
Spilled VGPRs: 63 -> 63 (0.00 %)
Private memory VGPRs: 880 -> 880 (0.00 %)
Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread
Code Size: 37889052 -> 37897340 (0.02 %) bytes
LDS: 2147 -> 2147 (0.00 %) blocks
Max Waves: 279243 -> 280369 (0.40 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, mareko, arsenm

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27593

llvm-svn: 289219
2016-12-09 15:57:15 +00:00
Matt Arsenault 27c062932a AMDGPU: Select i16 instructions to VOP3 forms
These were selecting directly to the VOP2 form instead
of VOP3 like the i32 instructions. Fixes regressions in
future commits where an immediate isn't folded because it was
initially used for the second operand.

Because uniform 16-bit operations are promoted to i32, it's
difficult to get a simple testcase where this matters. Fold
failures in SIFoldOperands here tend to be hidden by commute
and fold in SIShrinkInstructions.

llvm-svn: 289189
2016-12-09 06:19:12 +00:00
Matt Arsenault e96d03745d AMDGPU: Make f16 ConstantFP legal
Not having this legal led to combine failures, resulting
in dumb things like bitcasts of constants not being folded
away.

The only reason I'm leaving the v_mov_b32 hack that f32
already uses is to avoid madak formation test regressions.
PeepholeOptimizer has an ordering issue where the immediate
fold attempt is into the sgpr->vgpr copy instead of the actual
use. Running it twice avoids that problem.

llvm-svn: 289096
2016-12-08 20:14:46 +00:00
Matt Arsenault 6c06a6f48a AMDGPU: Fix commuting v_sub_u16
The correct commutable opcode was set to itself, so this
was simply swapping the operands to commute instead of also
changing the opcode to v_subrev_u16.

llvm-svn: 289093
2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin 50ea93a2bd [AMDGPU] Add amdgpu-unify-metadata pass
Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092
2016-12-08 19:46:04 +00:00
Alexander Timofeev 18009560c5 [AMDGPU] Scalarization of global uniform loads.
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076
2016-12-08 17:28:47 +00:00
Nicolai Haehnle 2857dc3893 AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg
Summary:
Without the fix to isFrameOffsetLegal to consider the instruction's
immediate offset, the new test case hits the corresponding assertion in
resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a
different base register.

With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of
places because frame base registers are added where they're not needed.
This is addressed by properly implementing needsFrameBaseReg, which also
helps to avoid unnecessary zero frame indices in a bunch of other places.

Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D27344

llvm-svn: 289048
2016-12-08 14:08:02 +00:00
Tom Stellard 8485fa096e AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
Patch By: Wei Ding

Summary: This patch fixes the fdiv precision issues.

Reviewers: b-sumner, cfang, wdng, arsenm

Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26424

llvm-svn: 288879
2016-12-07 02:42:15 +00:00
Tom Stellard 2187bb8a89 AMDGPU: Add llvm.amdgcn.interp.mov intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26725

llvm-svn: 288865
2016-12-06 23:52:13 +00:00
Matt Arsenault 269ffdac4e AMDGPU: Fix crash on i16 constant expression
llvm-svn: 288861
2016-12-06 23:18:06 +00:00
Tom Stellard 175959e350 AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27416

llvm-svn: 288852
2016-12-06 21:53:10 +00:00
Tom Stellard 00cfa74715 AMDGPU/SI: Don't move copies of immediates to the VALU
Summary:
If we write an immediate to a VGPR and then copy the VGPR to an
SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than
moving the copy to the SALU.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27272

llvm-svn: 288849
2016-12-06 21:13:30 +00:00
Matt Arsenault ad55ee5869 AMDGPU: Don't required structured CFG
The structured CFG is just an aid to inserting exec
mask modification instructions, once that is done
we don't really need it anymore. We also
do not analyze blocks with terminators that
modify exec, so this should only be impacting
true branches.

llvm-svn: 288744
2016-12-06 01:02:51 +00:00
Matt Arsenault 8a63cb9044 AMDGPU: Change how exp is printed
This is an improvement over a long list of unreadable numbers.
A follow up patch will try to match how sc formats these.

llvm-svn: 288697
2016-12-05 20:31:49 +00:00
Matt Arsenault 7bee6ac798 AMDGPU: Refactor exp instructions
Structure the definitions a bit more like the other classes.

The main change here is to split EXP with the done bit set
to a separate opcode, so we can set mayLoad = 1 so that it won't
be reordered before the other exp stores, since this has the special
constraint that if the done bit is set then this should be the last
exp in she shader.

Previously all exp instructions were inferred to have unmodeled
side effects.

llvm-svn: 288695
2016-12-05 20:23:10 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Matt Arsenault c47701c0e9 AMDGPU: Use wider scalar spills for SGPR spilling
Since the spill is for the whole wave, these
don't have the swizzling problems that vector stores do
and a single 4-byte allocation is enough to spill a 64 element
register. This should reduce the number of spill instructions and
put all the spills for a register in the same cacheline.

This should save allocated private size, but for now it doesn't.
The extra slots are allocated for each component, but never used
because the frame layout is essentially finalized before frame
indices are replaced. For always using the scalar store path,
this should probably be moved into processFunctionBeforeFrameFinalized.

llvm-svn: 288445
2016-12-02 00:54:45 +00:00
Matthias Braun 709a4cc238 RegisterCoalscer: Only coalesce complete reserved registers.
The coalescer eliminates copies from reserved registers of the form:
   %vregX = COPY %rY
in the case where %rY is a reserved register. However this turns out to
be invalid if only some of the subregisters are reserved (see also
https://reviews.llvm.org/D26648).

Differential Revision: https://reviews.llvm.org/D26687

llvm-svn: 288428
2016-12-01 22:39:51 +00:00
Matt Arsenault 387afc9375 AMDGPU: Move mir tests into mir test directory
llvm-svn: 288262
2016-11-30 18:50:26 +00:00
Matt Arsenault 640c44b893 AMDGPU: Disallow exec as SMEM instruction operand
This is not in the list of valid inputs for the encoding.
When spilling, copies from exec can be folded directly
into the spill instruction which results in broken
stores.

This only fixes the operand constraints, more codegen
work is required to avoid emitting the invalid
spills.

This sort of breaks the dbg.value test. Because the
register class of the s_load_dwordx2 changes, there
is a copy to SReg_64, and the copy is the operand
of dbg_value. The copy is later dead, and removed
from the dbg_value.

llvm-svn: 288191
2016-11-29 19:39:53 +00:00
Matt Arsenault f96eeec005 AMDGPU: Materialize frame index before add
It isn't generally safe to fold the frame index
directly into the operand since it will possibly
not be an inline immediate after it is expanded.

This surprisingly seems to produce better code, since
the FI doesn't prevent folding other immediate operands.

llvm-svn: 288185
2016-11-29 19:20:48 +00:00
Tom Stellard 0bc688116c AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar branches
Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23417

llvm-svn: 288095
2016-11-29 00:46:46 +00:00
Stanislav Mekhanoshin 0ee250eee8 [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp
is implemented. Additional side effect of this is that we may consume less VGPRs
at a cost of more SGPRs in case if holding of multiple conditions is needed, and
that is a clear win in most cases.

Differential Revision: https://reviews.llvm.org/D26114

llvm-svn: 288053
2016-11-28 18:58:49 +00:00
Tom Stellard 1473f07ceb AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsics
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26724

llvm-svn: 287962
2016-11-26 02:26:04 +00:00
Marek Olsak 79c05871a2 AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

llvm-svn: 287942
2016-11-25 17:37:09 +00:00
Marek Olsak e3895bfb47 Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

llvm-svn: 287936
2016-11-25 16:03:34 +00:00
Marek Olsak dad553a5cf Revert "AMDGPU: Fix MMO when splitting spill"
This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1.

llvm-svn: 287935
2016-11-25 16:03:27 +00:00
Marek Olsak a45dae458d Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.

llvm-svn: 287932
2016-11-25 16:03:15 +00:00
Marek Olsak 18a95bcb3c Revert "AMDGPU: Preserve m0 value when spilling"
This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce.

llvm-svn: 287930
2016-11-25 16:03:02 +00:00
Matt Arsenault 7b54dd039e AMDGPU: Preserve m0 value when spilling
llvm-svn: 287844
2016-11-24 00:26:50 +00:00
Matt Arsenault 9e5c7b1031 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Matt Arsenault 2669a76f01 AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.

llvm-svn: 287806
2016-11-23 20:52:53 +00:00
Stanislav Mekhanoshin ae0f6620e4 [AMDGPU] Fix multiple vreg definitions in si-lower-control-flow
Differential Revision: https://reviews.llvm.org/D26939

llvm-svn: 287608
2016-11-22 01:42:34 +00:00
Matt Arsenault b30d2aca58 DAG: Ignore call site attributes when emitting target intrinsic
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.

llvm-svn: 287593
2016-11-21 22:56:42 +00:00
Konstantin Zhuravlyov aefee42e0f [AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input
Differential Revision: https://reviews.llvm.org/D26862

llvm-svn: 287389
2016-11-18 22:31:08 +00:00
Tom Stellard 01e65d2cfc AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26828

llvm-svn: 287342
2016-11-18 13:53:34 +00:00
Nicolai Haehnle ce2b589df5 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

llvm-svn: 287339
2016-11-18 11:55:52 +00:00
Matt Arsenault 742deb2495 AMDGPU: Fix crash on illegal type for inlineasm
There are still crashes on non-MVT types in other
places.

llvm-svn: 287310
2016-11-18 04:42:57 +00:00
Konstantin Zhuravlyov 0a1a7b6b23 Revert "AMDGPU: Enable ConstrainCopy DAG mutation"
This reverts commit r287146.

This breaks few conformance tests.

llvm-svn: 287233
2016-11-17 16:41:49 +00:00
Konstantin Zhuravlyov 20ba24e231 [AMDGPU] Add missing test for rL287203
llvm-svn: 287204
2016-11-17 04:33:20 +00:00
Konstantin Zhuravlyov 3f0cdc7a11 [AMDGPU] Promote f16/i16 conversions to f32/i32
llvm-svn: 287201
2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov 662e01dfbe [AMDGPU] Expand `br_cc` for f16
Differential Revision: https://reviews.llvm.org/D26732

llvm-svn: 287199
2016-11-17 03:49:01 +00:00
Matt Arsenault 3b36bb1d87 AMDGPU: Enable ConstrainCopy DAG mutation
This fixes a probably unintended divergence from the default
scheduler behavior.

llvm-svn: 287146
2016-11-16 20:35:23 +00:00
Tom Stellard 0d162b1c4f AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

llvm-svn: 287131
2016-11-16 18:42:17 +00:00
Konstantin Zhuravlyov 2a87a42035 [AMDGPU] Handle f16 select{_cc}
- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns

Differential Revision: https://reviews.llvm.org/D26714

llvm-svn: 287074
2016-11-16 03:16:26 +00:00
Jan Vesely e8cc395e4f AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument
wbinvl.* are vector instruction that do not sue vector registers.

v2: check only M?BUF instructions

Differential Revision: https://reviews.llvm.org/D26633

llvm-svn: 287056
2016-11-15 23:55:15 +00:00
Tom Stellard d23de360db AMDGPU/SI: Fix pattern for i16 = sign_extend i1
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26670

llvm-svn: 287035
2016-11-15 21:25:56 +00:00
Matt Arsenault d4bb5e4831 AMDGPU: Enable store clustering
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021
2016-11-15 20:22:55 +00:00
Matt Arsenault 3666629837 AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

llvm-svn: 287018
2016-11-15 20:14:27 +00:00
Stanislav Mekhanoshin ea91cca593 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

llvm-svn: 287007
2016-11-15 19:00:15 +00:00
Matt Arsenault c79dc70d50 AMDGPU: Fix f16 fabs/fneg
llvm-svn: 286931
2016-11-15 02:25:28 +00:00
Matt Arsenault 972034bda9 AMDGPU: Fix formatting of 1/2pi immediate
llvm-svn: 286912
2016-11-15 00:04:33 +00:00
Changpeng Fang 8236fe103f AMDGPU/SI: Support data types other than V4f32 in image intrinsics
Summary:
  Extend image intrinsics to support data types of V1F32 and V2F32.

  TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
  even though such case should be very rare.

Reviewers:
  tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D26472

llvm-svn: 286860
2016-11-14 18:33:18 +00:00
Matt Arsenault dc45274d54 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

llvm-svn: 286766
2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov f86e4b7266 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975

llvm-svn: 286753
2016-11-13 07:01:11 +00:00
Tom Stellard b4c8e8e30b AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI
Summary: This fixes a regression caused by r286464.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26570

llvm-svn: 286687
2016-11-12 00:19:11 +00:00
Tom Stellard 9fdbec870c AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.

This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.

This fixes a bug that was uncovered by r285762.

Reviewers: nhaehnle, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26303

llvm-svn: 286676
2016-11-11 23:35:42 +00:00
Matthias Braun 325cd2c98a ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.

Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.

Differential Revision: https://reviews.llvm.org/D25140

llvm-svn: 286544
2016-11-11 01:34:21 +00:00
Stanislav Mekhanoshin 6fc8a1cdaa Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies"
This reverts commit r286171, it breaks piglit test fs-discard-exit-2

llvm-svn: 286530
2016-11-11 00:22:34 +00:00
Yaxun Liu d6fbe65040 AMDGPU: Emit runtime metadata as a note element in .note section
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.

However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).

This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.

Differential Revision: https://reviews.llvm.org/D25781

llvm-svn: 286502
2016-11-10 21:18:49 +00:00
Tom Stellard 115a61560e AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 286464
2016-11-10 16:02:37 +00:00
Stanislav Mekhanoshin 92e01ee90b [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.

llvm-svn: 286171
2016-11-07 23:04:50 +00:00
Matt Arsenault f530e8b3f0 AMDGPU: Remove unnecessary and on conditional branch
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.

llvm-svn: 286134
2016-11-07 19:09:33 +00:00
Tom Stellard 2d2d33f1dc Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

llvm-svn: 285995
2016-11-04 13:06:34 +00:00
Tom Stellard 2b3379cdff AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 285939
2016-11-03 17:13:50 +00:00
Alexander Timofeev f867a40bf6 [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.

Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.

Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.

Differential Revision: https://reviews.llvm.org/D25944

llvm-svn: 285919
2016-11-03 14:37:13 +00:00
Matt Arsenault bf9ee26aea AMDGPU: Cleanup some xfailed tests
Some of these are already fixed or tested somewhere else.

llvm-svn: 285840
2016-11-02 17:24:54 +00:00
Matt Arsenault 44deb7914e BranchRelaxation: Fix computing indirect branch block size
llvm-svn: 285828
2016-11-02 16:18:29 +00:00
Matt Arsenault 663ab8c119 AMDGPU: Use brev for materializing SGPR constants
This is already done with VGPR immediates and saves 4 bytes.

llvm-svn: 285765
2016-11-01 23:14:20 +00:00
Matt Arsenault 3d463193a9 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

llvm-svn: 285762
2016-11-01 22:55:07 +00:00
Konstantin Zhuravlyov d971a1123f [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802

llvm-svn: 285716
2016-11-01 17:49:33 +00:00
Tom Stellard 94c21bc088 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

llvm-svn: 285704
2016-11-01 16:31:48 +00:00
Valery Pykhtin 8a89d3662a [AMDGPU] Expand vector mulhu/mulhs
Differential revision: https://reviews.llvm.org/D26077

llvm-svn: 285684
2016-11-01 10:26:48 +00:00
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Matt Arsenault 7b6475568d AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

llvm-svn: 285463
2016-10-28 21:55:15 +00:00
Matt Arsenault b5f2bb1a88 AMDGPU: Change check prefix in test
llvm-svn: 285449
2016-10-28 20:33:01 +00:00
Matt Arsenault 4eae301995 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

llvm-svn: 285447
2016-10-28 20:31:47 +00:00
Matt Arsenault 08906a3c62 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00
Nicolai Haehnle 7b0e25b7ad AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.

Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].

The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25829

llvm-svn: 285273
2016-10-27 08:15:07 +00:00
Yaxun Liu 94add85adb AMDGPU: Refactor processor definition to use ISA version features
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.

Refactor processor definition to use ISA version features.

Fixed ISA version for stoney.

Based on Laurent Morichetti's patch.

Differential Revision: https://reviews.llvm.org/D25919

llvm-svn: 285210
2016-10-26 16:37:56 +00:00
Matt Arsenault 39787bdcbb Reapply "AMDGPU: Don't use offen if it is 0"
This reverts r283003

llvm-svn: 285203
2016-10-26 15:08:16 +00:00
Tom Stellard f8e6eaff6e AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25788

llvm-svn: 285198
2016-10-26 14:38:47 +00:00
Tom Stellard 9daed22b04 AMDGPU/SI: Remove unnecessary run lines from test
Summary:
This test had run lines disabling/enabling the promote alloca pass, but
enabling/disabling promote alloca had no impact on the output.

Reviewers: arsenm

Subscribers: mgrang, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25787

llvm-svn: 285197
2016-10-26 14:21:09 +00:00
Nicolai Haehnle a785209bc2 AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.

This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.

Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.

Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.

This fixes several GL45-CTS.shaders.indexing.* tests.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25633

llvm-svn: 284980
2016-10-24 14:56:02 +00:00
Tom Stellard 6c7dd980e4 AMDGPU/SI: Fix crash caused by r284267
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25782

llvm-svn: 284875
2016-10-21 20:25:11 +00:00
Konstantin Zhuravlyov 08326b6256 [AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693

llvm-svn: 284759
2016-10-20 18:12:38 +00:00
Valery Pykhtin e55fd41f73 [AMDGPU] add fcopysign(f64, f32) pattern
Differential revision: https://reviews.llvm.org/D25827

llvm-svn: 284743
2016-10-20 16:17:54 +00:00
Wei Ding 3cb2a1e8d1 AMDGPU : Add a function to enable and disable IEEEBit for SC and shader
respectively.

Differential Revision: http://reviews.llvm.org/D25789

llvm-svn: 284655
2016-10-19 22:34:49 +00:00
Konstantin Zhuravlyov 98a3ac7106 [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it
Differential Revision: https://reviews.llvm.org/D25694

llvm-svn: 284435
2016-10-17 22:40:15 +00:00
Tom Stellard bc6c523cce AMDGPU/SI: Fix LowerParameter() for i16 arguments
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25198

llvm-svn: 284397
2016-10-17 16:21:45 +00:00
Tom Stellard 09c2bd6bd4 AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)

Reviewers: arsenm, nhaehnle

Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24672

llvm-svn: 284267
2016-10-14 19:14:29 +00:00
Tom Stellard 64a9d0876c AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Nicolai Haehnle 67624af0cc AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25289

llvm-svn: 284224
2016-10-14 10:30:00 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Matt Arsenault 253640e18d AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

llvm-svn: 284119
2016-10-13 13:10:00 +00:00
Matt Arsenault d486d3f8d1 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Tom Stellard fac248cb5f AMDGPU/SI: Change mimg intrinsic signatures
This makes more fields overridable and removes redundant bits.

Patch by: Changpeng Fang

llvm-svn: 284024
2016-10-12 16:35:29 +00:00
Changpeng Fang 98317d20f4 AMDGPU/SI: Update ISA version numbers for Tonga and Polaris10/11.
Differential Revision:
  http://reviews.llvm.org/D25454

Reviewers:
  tstellarAMD

llvm-svn: 283893
2016-10-11 16:00:47 +00:00
Konstantin Zhuravlyov f74fc60a7d [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302

llvm-svn: 283555
2016-10-07 14:22:58 +00:00
Nicolai Haehnle 87bc4c218b AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Matt Arsenault 93401f4b5e AMDGPU: Change check prefix in test
llvm-svn: 283521
2016-10-07 03:55:04 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Konstantin Zhuravlyov b4eb5d5049 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121

llvm-svn: 283415
2016-10-06 02:20:46 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Matthias Braun d2fc0d40e4 Set some tests to an unknown vendor and OS
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.

llvm-svn: 283149
2016-10-03 21:58:20 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Matt Arsenault 40bae76620 AMDGPU: Fix missing -verify-machineinstrs in test
llvm-svn: 283107
2016-10-03 12:58:59 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Tom Stellard 1b9748c6a2 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Matt Arsenault ac0fc849cf AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault d99ef1144b AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

llvm-svn: 281822
2016-09-17 15:44:16 +00:00
Matt Arsenault 7b1dc2c983 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Tom Stellard 7998db634c AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

llvm-svn: 281789
2016-09-16 22:20:24 +00:00
Matt Arsenault 6408c9135c AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Tom Stellard bbeb45aff6 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault 7ccf6cd104 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Tom Stellard 0b76fc4c77 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Matt Arsenault f40b70fa75 Revert "AMDGPU: Use SOPK compare instructions"
Accidentally committed

llvm-svn: 281514
2016-09-14 18:04:42 +00:00
Matt Arsenault f757c87959 AMDGPU: Use SOPK compare instructions
llvm-svn: 281513
2016-09-14 18:03:53 +00:00
Matt Arsenault 2bc198a333 AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.

llvm-svn: 281491
2016-09-14 15:51:33 +00:00
Matt Arsenault fa5f767a38 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
Matt Arsenault 25dba30017 AMDGPU: Support commuting a FrameIndex operand
llvm-svn: 281369
2016-09-13 19:03:12 +00:00
Nicolai Haehnle e58e0e3fe3 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

llvm-svn: 281230
2016-09-12 16:25:20 +00:00
NAKAMURA Takumi cf6aaa9e1a llvm/test/CodeGen/AMDGPU/infinite-loop-evergreen.ll REQUIRES +Asserts.
This might not *crash* with -Asserts. I saw it caused infinite loop in the codegen.

llvm-svn: 281190
2016-09-12 04:27:28 +00:00
Matt Arsenault 124384f08d AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

llvm-svn: 281117
2016-09-09 23:32:53 +00:00
Matt Arsenault 0efdd06b22 AMDGPU: Run LoadStoreVectorizer pass by default
llvm-svn: 281112
2016-09-09 22:29:28 +00:00
Wei Ding 06f8d39424 AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.
Differential Revision: http://reviews.llvm.org/D23700

llvm-svn: 281081
2016-09-09 19:31:51 +00:00
Tom Stellard b2869eb6e9 AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA
Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24405

llvm-svn: 281080
2016-09-09 19:28:00 +00:00
Tim Northover 0f140c769a GlobalISel: move type information to MachineRegisterInfo.
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.

Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.

As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).

llvm-svn: 281035
2016-09-09 11:46:34 +00:00
Sam Kolton a2e5c88baf [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec
Summary:
Also removed duplicate code from AMDGPUTargetAsmStreamer.
This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D24296

llvm-svn: 281028
2016-09-09 10:08:02 +00:00
Matt Arsenault be90f70d3a AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32
llvm-svn: 280972
2016-09-08 17:35:41 +00:00
Matt Arsenault bbb47da8a1 AMDGPU: Support commuting with immediate in src0
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Yaxun Liu 638914009a AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

llvm-svn: 280829
2016-09-07 17:44:00 +00:00
Konstantin Zhuravlyov 864718c666 [AMDGPU] Wave and register controls
- Add missing test

llvm-svn: 280749
2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Nicolai Haehnle 3bba6a8438 AMDGPU: Reduce the duration of whole-quad-mode
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:

1. Sampling instructions by themselves don't need to run in WQM, only their
   coordinate inputs need it (unless of course there is a dependent sampling
   instruction). The initial scanInstructions step is modified accordingly.

2. When switching back from WQM to Exact, switch back as soon as possible.
   This affects the logic in processBlock.

This should always be a win or at best neutral.

There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22092

llvm-svn: 280590
2016-09-03 12:26:38 +00:00
Nicolai Haehnle a246dccc26 AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

llvm-svn: 280589
2016-09-03 12:26:32 +00:00
Matt Arsenault 2510a31677 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Jan Vesely ea45746d5a AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements.
Fixes R600 piglit regressions since r280298

Differential Revision: https://reviews.llvm.org/D24174

llvm-svn: 280535
2016-09-02 20:13:19 +00:00
Krzysztof Parzyszek 3bf4aeccd6 Do not consider subreg defs as reads when computing subrange liveness
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.

Differential Revision: https://reviews.llvm.org/D24190

llvm-svn: 280532
2016-09-02 19:48:55 +00:00
Jan Vesely 00864886f4 AMDGPU/R600: Expand unaligned writes to local and global AS
LOCAL and GLOBAL AS only
PRIVATE needs special treatment

Differential Revision: https://reviews.llvm.org/D23971

llvm-svn: 280526
2016-09-02 19:07:06 +00:00
Jan Vesely cd6b12b12e AMDGPU: Reorganize store tests
Split by AS.
Merge with some prviously failing tests.

Differential Revision: https://reviews.llvm.org/D23969

llvm-svn: 280523
2016-09-02 18:52:28 +00:00
Yaxun Liu add05a8d95 AMDGPU: Add runtime metadata for pointee alignment of argument.
Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer.

Differential Revision: https://reviews.llvm.org/D24145

llvm-svn: 280399
2016-09-01 18:46:49 +00:00
Matt Arsenault b50eb8dc2b AMDGPU: Fix introducing stack access on unaligned v16i8
llvm-svn: 280298
2016-08-31 21:52:27 +00:00
Tom Stellard ba5730884b AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned
Summary: This fixes some OpenCV tests that were broken by libclc commit r276443.

Reviewers: arsenm, jvesely

Subscribers: arsenm, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24051

llvm-svn: 280274
2016-08-31 18:46:07 +00:00
Nikolay Haustov eba808957e AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass
Summary:
Simply replace usage of aliases to functions with aliasee.
This came up when bitcode linking to builtin library and
calls to aliases not being resolved.

Also made minor improvements to existing test.

Reviewers: tstellarAMD, alex-t, vpykhtin

Subscribers: arsenm, wdng, rampitec

Differential Revision: https://reviews.llvm.org/D24023

llvm-svn: 280221
2016-08-31 11:18:33 +00:00
Matt Arsenault a609e2d5ce AMDGPU: Relax SGPR asm constraint register class
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.

llvm-svn: 280154
2016-08-30 20:50:08 +00:00
Tom Stellard 0d23ebe888 AMDGPU/SI: Implement a custom MachineSchedStrategy
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

llvm-svn: 279995
2016-08-29 19:42:52 +00:00
Tom Stellard c2ff0eb697 AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate.  It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23814

llvm-svn: 279991
2016-08-29 19:15:22 +00:00
Matt Arsenault b90fc9b3b4 AMDGPU/R600: Fix fixups used for constant arrays
Fixes bug 29289

llvm-svn: 279986
2016-08-29 19:01:48 +00:00
Tom Stellard 5d3f71f721 AMDGPU/SI: Improve register allocation hints for sopk instructions
Summary:
For shrinking SOPK instructions, we were creating a hint to tell the
register allocator to use the register allocated for src0 for the dst
operand as well.  However, this seems to not work sometimes depending
on the order virtual registers are assigned physical registers.

To fix this, I've added a second allocation hint which does the reverse,
asks that the register allocated for dst is used for src0.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23862

llvm-svn: 279968
2016-08-29 13:06:10 +00:00
Jan Vesely 38814fa2fd AMDGPU/R600: Enable Load combine
Fix and improve tests

Differential Revision: https://reviews.llvm.org/D23899

llvm-svn: 279925
2016-08-27 19:09:43 +00:00
Matt Arsenault 2712d4a3d8 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Tom Stellard e175d8aba5 AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.

Also, we will want to do this if we move the load/store optimizer before
the scheduler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23776

llvm-svn: 279870
2016-08-26 21:36:47 +00:00
Matt Arsenault f403df38eb Replace subregister uses when processing tied operands
This was for some reason skipping operands that are subregisters
instead of keeping the same subregister index.

v_movreld_b32 expects src0 to be the subregister of the tied
super register use/def.

e.g.

v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3]

was being replaced with

v[4:7] = copy v[0:3]
v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7],

which really writes to v[0:3]

llvm-svn: 279804
2016-08-26 06:31:32 +00:00
Changpeng Fang 75f0968b39 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Krzysztof Parzyszek a7ed090bba Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Matthias Braun 79f85b3b8f MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.

Differential Revision: http://reviews.llvm.org/D22722

llvm-svn: 279600
2016-08-24 01:32:41 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
Matthias Braun fdc4c6b426 Revert "RegScavenging: Add scavengeRegisterBackwards()"
The ppc64 multistage bot fails on this.

This reverts commit r279124.

Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.

llvm-svn: 279199
2016-08-19 03:03:24 +00:00
Tom Stellard a1619cd9aa AMDGPU/SI: Fix a test in wqm.ll to always use s_cbranch_vcc*
Summary:
We need to use floating-point compares to ensure that s_cbranch_vcc*
instructions are always generated.  With integer compares, future
optimizations could cause s_cbranch_scc* to be generated instead.

Reviewers: arsenm, nhaehnle

Subscribers: llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23401

llvm-svn: 279148
2016-08-18 21:21:53 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Matthias Braun 075d0c23d5 RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044 with off-by-1 instruction fix for the reload placement.

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 279124
2016-08-18 19:47:59 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Jan Vesely 0486f739a4 AMDGPU/R600: Convert buffer id to VTX_READ input
Use patterns instead of multiple instructions
Add buffer id to asm string

https://reviews.llvm.org/D22650

llvm-svn: 278749
2016-08-15 21:38:30 +00:00
Matt Arsenault 3661e90e71 AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
2016-08-15 16:18:36 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Mehdi Amini 8c629ecf3a Revert "Revert "Invariant start/end intrinsics overloaded for address space""
This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530.

llvm-svn: 278609
2016-08-13 23:31:24 +00:00
Mehdi Amini 164ac651da Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276447.

llvm-svn: 278608
2016-08-13 23:27:32 +00:00
Matt Arsenault 3cc1e0066d AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

llvm-svn: 278589
2016-08-13 01:43:51 +00:00
Wei Ding 70cda07526 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Matt Arsenault 56684d4538 AMDGPU: Fix crashes on memory functions
llvm-svn: 278369
2016-08-11 17:31:42 +00:00
Wei Ding d3344378c6 AMDGPU : Fix SAD related instruction LIT tests function atttibute issues.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278360
2016-08-11 17:14:17 +00:00
Wei Ding 34e1753585 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Changpeng Fang fb9c3818dd AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Matt Arsenault 57431c9680 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Nicolai Haehnle 02d784172c LiveIntervalAnalysis: fix a crash in repairOldRegInRange
Summary:
See the new test case for one that was (non-deterministically) crashing
on trunk and deterministically hit the assertion that I added in D23302.
Basically, the machine function contains a sequence

     DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
     DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
     %vreg14:sub1<def> = COPY %vreg14:sub0

and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32
instructions into one before calling repairIntervalsInRange.

Now repairIntervalsInRange wants to repair %vreg14, in particular, and
ends up trying to repair %vreg14:sub1 as well, but that only becomes
active _after_ the range that is to be repaired, hence the crash due
to LR.find(...) == LR.begin() at the start of repairOldRegInRange.

I believe that just skipping those subrange is fine, but again, not too
familiar with that code.

Reviewers: MatzeB, kparzysz, tstellarAMD

Subscribers: llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D23303

llvm-svn: 278268
2016-08-10 18:51:14 +00:00
Marek Olsak 355a8642b4 AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland
Summary:
This is the setting of the Vulkan closed source driver.

It decreases the max wave count from 10 to 8.

26010 shaders in 14650 tests
Totals:
VGPRS: 829593 -> 808440 (-2.55 %)
Spilled SGPRs: 81878 -> 42226 (-48.43 %)
Spilled VGPRs: 367 -> 358 (-2.45 %)
Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread
Code Size: 36677864 -> 35923932 (-2.06 %) bytes

There is a massive decrease in SGPR spilling in general and -7.4% spilled
VGPRs for DiRT Showdown (= SGPRs spilled to scratch?)

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23034

llvm-svn: 277867
2016-08-05 21:23:29 +00:00
Yaxun Liu 86c052238a [OpenCL] Add missing tests for getOCLTypeName
Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown.

Patch by Aaron En Ye Shi.

Differential Revision: https://reviews.llvm.org/D22964

llvm-svn: 277759
2016-08-04 19:45:00 +00:00
Matt Arsenault b0e32f1ba1 AMDGPU: Fix a slow test by using basic regalloc
This just tests that the register limit isn't exceeded,
so the regisetr allocation doesn't need to be great.'

The critically slow part is all in greedy RA, so
switch to basic.

llvm-svn: 277700
2016-08-04 07:04:54 +00:00
Matthias Braun 1873998b16 RenameIndependentSubregs: Fix liveness query in rewriteOperands()
rewriteOperands() always performed liveness queries at the base index
rather than the RegSlot/Base as apropriate for the machine operand. This
could lead to illegal rewriting in some cases.

llvm-svn: 277661
2016-08-03 22:37:47 +00:00
Matt Arsenault 979902b3ff AMDGPU: fdiv -1, x -> rcp -x
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Nicolai Haehnle 8a482b33fe AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

llvm-svn: 277504
2016-08-02 19:31:14 +00:00
Nicolai Haehnle bef0e90cf1 AMDGPU: Track physical registers in SIWholeQuadMode
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.

The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.

This is a candidate for the 3.9 branch, as it fixes a possible hang.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22673

llvm-svn: 277500
2016-08-02 19:17:37 +00:00
Matt Arsenault 749035b7b1 AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

llvm-svn: 277260
2016-07-30 01:40:36 +00:00
Changpeng Fang 26fb9d268b AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

llvm-svn: 277073
2016-07-28 23:01:45 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Nicolai Haehnle 3b572002a2 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
Matt Arsenault e3862cdc93 AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.

llvm-svn: 276823
2016-07-26 23:25:44 +00:00
Matt Arsenault 918e81c3d4 AMDGPU: Add more tests for LDS size with occupancy
llvm-svn: 276821
2016-07-26 23:15:59 +00:00
Matthias Braun 333e468d15 MIRParser: Use dot instead of colon to mark subregisters
Change the syntax to use `%0.sub8` to denote a subregister.

This seems like a more natural fit to denote subregisters; I also plan
to introduce a new ":classname" syntax in upcoming patches to denote the
register class of a vreg.

Note that this commit disallows plain identifiers to start with a '.'
character.  This shouldn't affect anything as external names/IR
references are all prefixed with '$'/'%', plain identifiers are only
used for instruction names, register mask names and subreg indexes.

Differential Revision: https://reviews.llvm.org/D22390

llvm-svn: 276815
2016-07-26 21:49:34 +00:00
Tim Northover 26e40bdb9b GlobalISel: omit braces on MachineInstr types when there's only one.
Tidies up the representation a bit in the common case.

llvm-svn: 276772
2016-07-26 17:28:01 +00:00
Matt Arsenault 07f65718bb AMDGPU: Add missing tests for xnack option for HSA
llvm-svn: 276765
2016-07-26 16:45:50 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Jan Vesely b64c8925e9 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

llvm-svn: 276682
2016-07-25 20:17:02 +00:00
Matt Arsenault e047b2598e AMDGPU: Fix missing verify-machineinstrs in control flow test
llvm-svn: 276679
2016-07-25 19:39:06 +00:00
Tom Stellard b8253c88b6 Revert "[AMDGPU] Emit read-only data to .rodata for hsa"
This reverts commit r276298.

Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.

This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01

llvm-svn: 276498
2016-07-22 23:46:40 +00:00
Tim Northover 98a56eb7f4 GlobalISel: allow multiple types on MachineInstrs.
llvm-svn: 276481
2016-07-22 22:13:36 +00:00
Anna Thomas 0be4a0e6a4 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address
space.

With this change, these intrinsics are overloaded for any adddress space
for memory objects
and we can use these llvm invariant intrinsics in non-default address
spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant
memory in managed languages.

Reviewers: apilipenko, reames

Subscribers: llvm-commits
llvm-svn: 276447
2016-07-22 17:49:40 +00:00
Matt Arsenault e2fe67b951 AMDGPU: Remove redundant test
llvm-svn: 276439
2016-07-22 17:01:36 +00:00
Matt Arsenault 3c07c813c0 AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

llvm-svn: 276438
2016-07-22 17:01:33 +00:00
Matt Arsenault 8d718dcfda AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Matt Arsenault 7fb961f3e6 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Anna Thomas c858faa244 Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276316.

llvm-svn: 276320
2016-07-21 19:06:28 +00:00
Anna Thomas 29b24dfe44 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.

With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant memory in managed languages.

Reviewers: tstellarAMD, reames, apilipenko

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22519

llvm-svn: 276316
2016-07-21 18:41:44 +00:00
Konstantin Zhuravlyov 3c0d8d22fe [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538

llvm-svn: 276298
2016-07-21 15:59:23 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
Tim Northover 62ae568bbb GlobalISel: implement low-level type with just size & vector lanes.
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).

llvm-svn: 276158
2016-07-20 19:09:30 +00:00
Matt Arsenault f14db7a933 AMDGPU: Add missing test coverage for control flow breaks
None of the current lit tests hit si_break handling.

llvm-svn: 276129
2016-07-20 15:20:35 +00:00
Yaxun Liu 4b1d9f7f18 AMDGPU: Fix bug causing crash due to invalid opencl version metadata.
Differential Revision: https://reviews.llvm.org/D22526

llvm-svn: 276119
2016-07-20 14:38:06 +00:00
Matthias Braun 5b9722d6c7 Revert "RegScavenging: Add scavengeRegisterBackwards()"
Reverting this commit for now as it seems to be causing failures on
test-suite tests on the clang-ppc64le-linux-lnt bot.

This reverts commit r276044.

llvm-svn: 276068
2016-07-20 00:21:32 +00:00
Matt Arsenault a1fe17c9ad AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051
2016-07-19 23:16:53 +00:00
Matthias Braun 84fd4bee6c RegScavenging: Add scavengeRegisterBackwards()
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 276044
2016-07-19 22:37:09 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault 50b76399ed AMDGPU: Fix test name and broken CHECK-LABEL
llvm-svn: 275928
2016-07-18 23:09:51 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 4c519d3518 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault efb24540b1 AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

llvm-svn: 275869
2016-07-18 18:34:53 +00:00
Nicolai Haehnle bef1ceb815 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

llvm-svn: 275779
2016-07-18 09:02:47 +00:00
Yaxun Liu a711cc7951 Re-commit [AMDGPU] Add metadata for runtime
Attempting to fix lit test failure on ppc.

llvm-svn: 275676
2016-07-16 05:09:21 +00:00
Matt Arsenault 73d2f8954a AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

llvm-svn: 275635
2016-07-15 22:32:02 +00:00
Matt Arsenault a65e6b8335 AMDGPU: Remove brev intrinsic
llvm-svn: 275620
2016-07-15 21:27:13 +00:00
Matt Arsenault 82e5e1e564 AMDGPU: Fix TargetPrefix for remaining r600 intrinsics
llvm-svn: 275619
2016-07-15 21:27:08 +00:00
Matt Arsenault 11d3e21f2b AMDGPU: Remove AMDGPU.ldexp
llvm-svn: 275618
2016-07-15 21:26:56 +00:00
Matt Arsenault 09b2c4aee8 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

llvm-svn: 275617
2016-07-15 21:26:52 +00:00
Vitaly Buka 7f64844481 Revert "[AMDGPU] Add metadata for runtime"
This reverts commit r275566.

llvm-svn: 275599
2016-07-15 19:14:57 +00:00
Yaxun Liu b3d17690eb [AMDGPU] Add metadata for runtime
Added emitting metadata to elf for runtime.

Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.

Differential Revision: https://reviews.llvm.org/D21849

llvm-svn: 275566
2016-07-15 14:58:21 +00:00
Matt Arsenault b91805ea2b AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

llvm-svn: 275510
2016-07-15 00:58:15 +00:00
Matt Arsenault fa5a86a403 AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550

llvm-svn: 275509
2016-07-15 00:58:13 +00:00
Matt Arsenault 83ab049af2 AMDGPU: Fix splitting kill blocks with defs before kill
llvm-svn: 275508
2016-07-15 00:58:09 +00:00
Matt Arsenault ca7f5701f8 AMDGPU/R600: Delete/rename intrinsics no longer used by mesa
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
2016-07-14 05:47:17 +00:00
Matt Arsenault 897eee4187 AMDGPU: Remove unused intrinsics
llvm-svn: 275371
2016-07-14 05:23:19 +00:00
Matt Arsenault aa94c1e7ee AMDGPU: Fix test not actually testing anything
It wasn't actually running the pass, and since it is
missing the llvm prefix, the eh intrinsic was not
really an IntrinsicInst.

Also add missing test for lifetime markers.

llvm-svn: 275370
2016-07-14 05:23:15 +00:00
Quentin Colombet 545e558b82 [MIR] Print on the given output instead of stderr.
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.

This patch fixes that by printing the regular output in the output
specified with -o.

Differential Revision: http://reviews.llvm.org/D22251

llvm-svn: 275314
2016-07-13 20:36:03 +00:00
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00