Commit Graph

199 Commits

Author SHA1 Message Date
Tom Stellard b4c8e8e30b AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI
Summary: This fixes a regression caused by r286464.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26570

llvm-svn: 286687
2016-11-12 00:19:11 +00:00
Tom Stellard 115a61560e AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 286464
2016-11-10 16:02:37 +00:00
Tom Stellard 2d2d33f1dc Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

llvm-svn: 285995
2016-11-04 13:06:34 +00:00
Tom Stellard 2b3379cdff AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 285939
2016-11-03 17:13:50 +00:00
Tom Stellard f8e6eaff6e AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25788

llvm-svn: 285198
2016-10-26 14:38:47 +00:00
Nicolai Haehnle a785209bc2 AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.

This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.

Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.

Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.

This fixes several GL45-CTS.shaders.indexing.* tests.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25633

llvm-svn: 284980
2016-10-24 14:56:02 +00:00
Konstantin Zhuravlyov fda33eaf0c [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP
This will prevent following regression when enabling i16 support (D18049):
  test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

Differential Revision: https://reviews.llvm.org/D25805

llvm-svn: 284891
2016-10-21 22:10:03 +00:00
Konstantin Zhuravlyov 08326b6256 [AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693

llvm-svn: 284759
2016-10-20 18:12:38 +00:00
Tom Stellard 083f1626f5 AMDGPU/SI: LowerParameter() should be computing align based on memory type
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25203

llvm-svn: 284398
2016-10-17 16:56:19 +00:00
Tom Stellard bc6c523cce AMDGPU/SI: Fix LowerParameter() for i16 arguments
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25198

llvm-svn: 284397
2016-10-17 16:21:45 +00:00
Tom Stellard 64a9d0876c AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Nicolai Haehnle bd15c3267f AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

llvm-svn: 284215
2016-10-14 09:03:04 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Matt Arsenault 253640e18d AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

llvm-svn: 284119
2016-10-13 13:10:00 +00:00
Matt Arsenault dac31db12f AMDGPU: Fix truncate to bool warnings
llvm-svn: 284116
2016-10-13 12:45:16 +00:00
Matt Arsenault d486d3f8d1 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Matt Arsenault dcf0cfca4c AMDGPU: Refactor indirect vector lowering
Allow inserting multiple instructions in the
expanded loop.

llvm-svn: 283177
2016-10-04 01:41:05 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Matt Arsenault ac0fc849cf AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault 6408c9135c AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Tom Stellard bbeb45aff6 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Tom Stellard 0b76fc4c77 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Matt Arsenault fa5f767a38 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Matt Arsenault a609e2d5ce AMDGPU: Relax SGPR asm constraint register class
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.

llvm-svn: 280154
2016-08-30 20:50:08 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
NAKAMURA Takumi 59a20649c6 Untabify.
llvm-svn: 279408
2016-08-22 00:58:04 +00:00
Matt Arsenault d42d58cf21 AMDGPU: Remove dead option
llvm-svn: 278965
2016-08-17 20:07:16 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Matt Arsenault 3cc1e0066d AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

llvm-svn: 278589
2016-08-13 01:43:51 +00:00
Alina Sbirlea 6f937b1144 LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.

Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
  considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
  for a maximum size allowed and relies on the next condition checking for %4 for correctness.
  This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).

Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.

Reviewers: arsenm, jlebar, tstellarAMD

Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23068

llvm-svn: 277735
2016-08-04 16:38:44 +00:00
Matt Arsenault 979902b3ff AMDGPU: fdiv -1, x -> rcp -x
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Matt Arsenault 749035b7b1 AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

llvm-svn: 277260
2016-07-30 01:40:36 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Matt Arsenault fe26775992 AMDGPU: Remove analyzeImmediate
This no longer uses the more complicated classification
of constants.

llvm-svn: 276945
2016-07-28 00:32:02 +00:00
Matt Arsenault 52ef4019fd AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

llvm-svn: 276766
2016-07-26 16:45:58 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Jan Vesely b64c8925e9 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

llvm-svn: 276682
2016-07-25 20:17:02 +00:00
Matt Arsenault 3c07c813c0 AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

llvm-svn: 276438
2016-07-22 17:01:33 +00:00
Matt Arsenault 8d718dcfda AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Matt Arsenault d40ded6681 AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs
llvm-svn: 276434
2016-07-22 17:01:15 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
Matt Arsenault a1fe17c9ad AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051
2016-07-19 23:16:53 +00:00
Matt Arsenault 03006fd3c4 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 09b2c4aee8 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

llvm-svn: 275617
2016-07-15 21:26:52 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Matt Arsenault 83ab049af2 AMDGPU: Fix splitting kill blocks with defs before kill
llvm-svn: 275508
2016-07-15 00:58:09 +00:00