Commit Graph

465 Commits

Author SHA1 Message Date
Tom Stellard 46937ca4e7 [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match.  Also update MC and CodeGen tests.

Differential Revision: http://reviews.llvm.org/D16927

Patch by: Nikolay Haustov

llvm-svn: 260694
2016-02-12 17:57:54 +00:00
Changpeng Fang e07f1aa8fa AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass.
Summary:
  It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D15093

llvm-svn: 260692
2016-02-12 17:11:04 +00:00
Benjamin Kramer ac5e36f52e Fix uninitialized memory read.
Found by msan.

llvm-svn: 260676
2016-02-12 12:37:21 +00:00
Matt Arsenault 296b849163 AMDGPU: Set flat_scratch from flat_scratch_init reg
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.

Also stops always initializing flat_scratch even when unused.

In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.

llvm-svn: 260658
2016-02-12 06:31:30 +00:00
Matt Arsenault 24ee0785dd AMDGPU: Set element_size in private resource descriptor
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.

llvm-svn: 260651
2016-02-12 02:40:47 +00:00
Matt Arsenault 16f7bcb661 AMDGPU: Fix mishandling alignment when scalarizing vector loads/stores
I don't think this was causing any real problems, so I'm not sure
how to test for this.

llvm-svn: 260646
2016-02-12 02:22:21 +00:00
Matt Arsenault 55d49cfe2d AMDGPU: Initialize SILowerControlFlow
llvm-svn: 260645
2016-02-12 02:16:10 +00:00
Matt Arsenault 806dd0a532 AMDGPU: Remove trailing whitespace
llvm-svn: 260644
2016-02-12 02:16:07 +00:00
Tom Stellard 1397d49ef5 AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations.  When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17102

llvm-svn: 260599
2016-02-11 21:45:07 +00:00
Tom Stellard fa8f204c5b AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist
Summary:
When we split SMRD instructions into two MUBUFs we were adding the users
of the newly created MUBUFs to the VALU worklist.  However, the only
users these instructions had was the REG_SEQUENCE that was inserted
by splitSMRD when the original SMRD instruction was split.

We need to make sure to add the users of the original SMRD to the VALU
worklist before it is split.

I have a test case, but it requires one other bug fix, so it will be
added in a later commt.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17101

llvm-svn: 260588
2016-02-11 21:14:34 +00:00
Tom Stellard e993451f5c [AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, SamWot, nhaustov, vpykhtin

Differential Revision: http://reviews.llvm.org/D16995

Patch By: Artem Tamazov

llvm-svn: 260560
2016-02-11 18:25:26 +00:00
Matt Arsenault fcb345f172 AMDGPU: Fix constant bus use check with subregisters
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.

This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.

llvm-svn: 260495
2016-02-11 06:15:39 +00:00
Matt Arsenault 427c548925 AMDGPU: Fix passes depending on dominator tree for no reason
llvm-svn: 260494
2016-02-11 06:15:34 +00:00
Matt Arsenault fe26def35c AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis
llvm-svn: 260491
2016-02-11 05:32:51 +00:00
Matt Arsenault 9524566314 AMDGPU: Split R600 and SI store lowering
These were only sharing some somewhat incorrect
logic for when to scalarize or split vectors.

llvm-svn: 260490
2016-02-11 05:32:46 +00:00
Tom Stellard a90b9526df [AMDGPU] Assembler: Fix VOP3 only instructions
Separate methods to convert parsed instructions to MCInst:

  - VOP3 only instructions (always create modifiers as operands in MCInst)
  - VOP2 instrunctions with modifiers (create modifiers as operands
    in MCInst when e64 encoding is forced or modifiers are parsed)
  - VOP2 instructions without modifiers (do not create modifiers
    as operands in MCInst)
  - Add VOP3Only flag. Pass HasMods flag to VOP3Common.
  - Simplify code that deals with modifiers (-1 is now same as
    0). This is no longer needed.
  - Add few tests (more will be added separately).
    Update error message now correct.

Patch By: Nikolay Haustov

Differential Revision: http://reviews.llvm.org/D16778

llvm-svn: 260483
2016-02-11 03:28:15 +00:00
Nicolai Haehnle d791bd07c7 AMDGPU: Release the scavenged offset register during VGPR spill
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).

This is a candidate for the release branch.

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, arsenm

Differential Revision: http://reviews.llvm.org/D16558

llvm-svn: 260427
2016-02-10 20:13:58 +00:00
Matt Arsenault a14364115e AMDGPU: Fix indentation and variable names
llvm-svn: 260399
2016-02-10 18:21:45 +00:00
Matt Arsenault 6dfda9625d AMDGPU: Split R600 and SI load lowering
These weren't actually sharing anything in the common
LowerLOAD.

llvm-svn: 260398
2016-02-10 18:21:39 +00:00
Ahmed Bougacha f8dfb47c02 [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Tom Stellard 309617645d AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit
Summary:
We will hit this once we have enabled uniform branches.  The
smrd-vccz-bug.ll test will be added with the uniform branch commit.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16725

llvm-svn: 260137
2016-02-08 19:49:20 +00:00
Matt Arsenault 92edab2df9 AMDGPU: Remove bfi and bfm intrinsics
Nothing is using them.

llvm-svn: 260123
2016-02-08 19:06:01 +00:00
Matt Arsenault 7f83397d72 AMDGPU: Account for LDS alignment
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.

Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.

llvm-svn: 259912
2016-02-05 19:47:29 +00:00
Matt Arsenault cf84e26fb6 AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

llvm-svn: 259911
2016-02-05 19:47:23 +00:00
Tom Stellard 1242ce9695 AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16862

llvm-svn: 259900
2016-02-05 18:44:57 +00:00
Tom Stellard 5dde1d2eb3 AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16863

llvm-svn: 259897
2016-02-05 18:29:17 +00:00
Tom Stellard 6e1967ef66 AMDGPU/SI: Correctly initialize SIInsertWaits pass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

llvm-svn: 259894
2016-02-05 17:42:38 +00:00
Matt Arsenault de4208122b AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

llvm-svn: 259573
2016-02-02 21:16:12 +00:00
Matt Arsenault 7e747f1a38 AMDGPU: Handle promoting memmove
Also add missing tests for the others.

llvm-svn: 259558
2016-02-02 20:28:10 +00:00
Matt Arsenault 8b175672cb AMDGPU: Skip promote alloca with no optimizations
llvm-svn: 259551
2016-02-02 19:32:42 +00:00
Matt Arsenault fb8cdbae0c AMDGPU: Minor cleanups for AMDGPUPromoteAlloca
Mostly convert to use range loops.

llvm-svn: 259550
2016-02-02 19:32:35 +00:00
Matt Arsenault e5737f7cac AMDGPU: Report AMDGPUPromoteAlloca changed the function
llvm-svn: 259547
2016-02-02 19:18:57 +00:00
Matt Arsenault ad1348459f AMDGPU: Whitelist handled intrinsics
We shouldn't crash on unhandled intrinsics.
Also simplify failure handling in loop.

llvm-svn: 259546
2016-02-02 19:18:53 +00:00
Matt Arsenault 853a1fc6d9 AMDGPU: Use inbounds when calculating workitem offset
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.

Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.

llvm-svn: 259545
2016-02-02 19:18:48 +00:00
Oliver Stannard 7e7d983a87 Refactor backend diagnostics for unsupported features
Re-commit of r258951 after fixing layering violation.

The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

llvm-svn: 259498
2016-02-02 13:52:43 +00:00
Matt Arsenault e013246462 AMDGPU: Fix emitting invalid workitem intrinsics for HSA
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297
2016-01-30 05:19:45 +00:00
Matt Arsenault d0799df707 AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.

llvm-svn: 259296
2016-01-30 05:10:59 +00:00
Matt Arsenault 43976df0da AMDGPU: Add new amdgcn workitem intrinsics
These use the correct prefix and follow the HSA naming convention
rather than the config register option names.

llvm-svn: 259293
2016-01-30 04:25:19 +00:00
Matt Arsenault 295875efda AMDGPU: Remove 24-bit intrinsics
The known bit matching code seems to work reasonably well,
so these shouldn't really be needed.

llvm-svn: 259180
2016-01-29 10:05:16 +00:00
Matt Arsenault 5b39b34ca5 AMDGPU: Match fmed3 patterns with legacy fmin/fmax
llvm-svn: 259090
2016-01-28 20:53:48 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault 7293f9895e AMDGPU: Set DX10Clamp bit
llvm-svn: 259088
2016-01-28 20:53:35 +00:00
Tom Stellard 3d2c852958 AMDGPU: waitcnt operand fixes
Summary:
Allow lgkmcnt up to 0xF (hardware allows that).
Fix mask for ExpCnt in AMDGPUInstPrinter.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16314

Patch by: Nikolay Haustov

llvm-svn: 259059
2016-01-28 17:13:44 +00:00
Tom Stellard 2ff726272a AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp
Summary:
Also delete all the stub functions that are identical to the
implementations in TargetInstrInfo.cpp.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16609

llvm-svn: 259054
2016-01-28 16:04:37 +00:00
Oliver Stannard 02fa1c80c4 Revert r259035, it introduces a cyclic library dependency
llvm-svn: 259045
2016-01-28 13:19:47 +00:00
Oliver Stannard b4b092ea1b Add backend dignostic printer for unsupported features
Re-commit of r258951 after fixing layering violation.

The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.

In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 259035
2016-01-28 10:07:27 +00:00
NAKAMURA Takumi 628a7a0aef Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features"
It broke layering violation in LLVMIR.

clang r258950 "Add backend dignostic printer for unsupported features"
llvm  r258951 "Refactor backend diagnostics for unsupported features"

llvm-svn: 259016
2016-01-28 04:41:32 +00:00
Oliver Stannard 1e67a9f196 Refactor backend diagnostics for unsupported features
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 258951
2016-01-27 17:30:33 +00:00
Tom Stellard 6e3b14de62 AMDGPU/SI: Fix commuting of 32-bit VOPC instructions
Summary:
We didn't have entries in the commuting table for the 32-bit
instructions.  I don't think we hit this problem now, but we
will once uniform branching is enabled.  Tests will come in
a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16600

llvm-svn: 258936
2016-01-27 15:53:52 +00:00
Marek Olsak e86f252209 AMDGPU/SI: Stoney has only 16 LDS banks
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.

Reviewers: tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16485

llvm-svn: 258922
2016-01-27 11:19:45 +00:00