Commit Graph

1028 Commits

Author SHA1 Message Date
Alina Sbirlea 6f937b1144 LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.

Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
  considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
  for a maximum size allowed and relies on the next condition checking for %4 for correctness.
  This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).

Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.

Reviewers: arsenm, jlebar, tstellarAMD

Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23068

llvm-svn: 277735
2016-08-04 16:38:44 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Matt Arsenault 979902b3ff AMDGPU: fdiv -1, x -> rcp -x
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Nicolai Haehnle 8a482b33fe AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

llvm-svn: 277504
2016-08-02 19:31:14 +00:00
Nicolai Haehnle bef0e90cf1 AMDGPU: Track physical registers in SIWholeQuadMode
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.

The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.

This is a candidate for the 3.9 branch, as it fixes a possible hang.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22673

llvm-svn: 277500
2016-08-02 19:17:37 +00:00
Valery Pykhtin 902db3101b [AMDGPU] refactor DS instruction definitions. NFC.
Differential revision: https://reviews.llvm.org/D22522

llvm-svn: 277344
2016-08-01 14:21:30 +00:00
Benjamin Kramer 22ff865a83 [AMDGPU] Fix lifetime of SmallVector temporaries.
Found by asan -fsanitize-address-use-after-scope.

llvm-svn: 277265
2016-07-30 11:31:16 +00:00
Matt Arsenault 749035b7b1 AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

llvm-svn: 277260
2016-07-30 01:40:36 +00:00
Matt Arsenault d2141b6030 AMDGPU: Set s_setpc_b64 as a terminator
llvm-svn: 277259
2016-07-30 01:40:34 +00:00
Matt Arsenault dc744412ad AMDGPU: Remove unused pattern
llvm-svn: 277258
2016-07-30 01:40:30 +00:00
Sjoerd Meijer 0eb96ed0de TargetInstrInfo: add virtual function getInstSizeInBytes
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.

Differential Revision: https://reviews.llvm.org/D22885

llvm-svn: 277126
2016-07-29 08:16:16 +00:00
Changpeng Fang 26fb9d268b AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

llvm-svn: 277073
2016-07-28 23:01:45 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Tom Stellard 19f4301099 AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling
Summary:
We were using reserved VGPRs for SGPR spilling and this was causing
some programs with a workgroup size of 1024 to use more than 64
registers, which is illegal.

Reviewers: arsenm, mareko, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22032

llvm-svn: 276980
2016-07-28 14:30:43 +00:00
Nicolai Haehnle 3b572002a2 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
Matt Arsenault edc7dcb2aa AMDGPU: Turn dead checks into asserts
llvm-svn: 276946
2016-07-28 00:32:05 +00:00
Matt Arsenault fe26775992 AMDGPU: Remove analyzeImmediate
This no longer uses the more complicated classification
of constants.

llvm-svn: 276945
2016-07-28 00:32:02 +00:00
Reid Kleckner 46cb48c74a Remove MCAsmInfo.h include from TargetOptions.h
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.

llvm-svn: 276883
2016-07-27 16:03:57 +00:00
Ahmed Bougacha 6756a2c953 [GlobalISel] Introduce an instruction selector.
And implement it for AArch64, supporting x/w ADD/OR.

Differential Revision: https://reviews.llvm.org/D22373

llvm-svn: 276875
2016-07-27 14:31:55 +00:00
Matt Arsenault e3862cdc93 AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.

llvm-svn: 276823
2016-07-26 23:25:44 +00:00
Matt Arsenault c6b69a9114 AMDGPU: Use implicit_def for selecting anyext
llvm-svn: 276819
2016-07-26 23:06:33 +00:00
Matt Arsenault 60a750fa69 AMDGPU/R600: Remove dead custom inserters
The intrinsics for these were removed, so this is dead.

llvm-svn: 276805
2016-07-26 21:03:38 +00:00
Matt Arsenault b06db8f666 AMDGPU: Minor AsmPrinter cleanups
llvm-svn: 276804
2016-07-26 21:03:36 +00:00
Matt Arsenault 52ef4019fd AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

llvm-svn: 276766
2016-07-26 16:45:58 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Jan Vesely b64c8925e9 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

llvm-svn: 276682
2016-07-25 20:17:02 +00:00
Matt Arsenault 2fa171c43a AMDGPU: Make skip threshold an option
llvm-svn: 276680
2016-07-25 19:48:29 +00:00
Matt Arsenault cdae95bef2 AMDGPU: Delete dead code
llvm-svn: 276675
2016-07-25 19:06:25 +00:00
Joel Jones 373d7d30dd MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.

The corresponding change to clang is in: http://reviews.llvm.org/D16538

Patch by: Joel Jones

Differential Revision: https://reviews.llvm.org/D16213

llvm-svn: 276654
2016-07-25 17:18:28 +00:00
Matt Arsenault b40d8600ca AMDGPU: Delete dead code
This has been dead since r269479

llvm-svn: 276518
2016-07-23 07:07:14 +00:00
Tom Stellard b8253c88b6 Revert "[AMDGPU] Emit read-only data to .rodata for hsa"
This reverts commit r276298.

Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.

This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01

llvm-svn: 276498
2016-07-22 23:46:40 +00:00
Tim Northover 33b07d6725 GlobalISel: implement legalization pass, with just one transformation.
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.

llvm-svn: 276461
2016-07-22 20:03:43 +00:00
Matt Arsenault 3c07c813c0 AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

llvm-svn: 276438
2016-07-22 17:01:33 +00:00
Matt Arsenault 8d718dcfda AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Matt Arsenault f9245b75c0 AMDGPU: Delete more dead code
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.

llvm-svn: 276436
2016-07-22 17:01:25 +00:00
Matt Arsenault 7fb961f3e6 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Matt Arsenault d40ded6681 AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs
llvm-svn: 276434
2016-07-22 17:01:15 +00:00
Konstantin Zhuravlyov 3c0d8d22fe [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538

llvm-svn: 276298
2016-07-21 15:59:23 +00:00
Konstantin Zhuravlyov 155626238b AMDGPU/SI: Add support for R_AMDGPU_ABS32
Differential Revision: https://reviews.llvm.org/D21646

llvm-svn: 276294
2016-07-21 15:29:19 +00:00
Sam Kolton 6c4efcadfe [AMDGPU] Some code cleaning in SIRegisterInfo.td
Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: https://reviews.llvm.org/D22620

llvm-svn: 276274
2016-07-21 13:29:57 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
Yaxun Liu 4b1d9f7f18 AMDGPU: Fix bug causing crash due to invalid opencl version metadata.
Differential Revision: https://reviews.llvm.org/D22526

llvm-svn: 276119
2016-07-20 14:38:06 +00:00
Matt Arsenault a1fe17c9ad AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051
2016-07-19 23:16:53 +00:00
Davide Italiano 63e5968033 [AMDGPU] Remove spurious line (should've been removed in r276029).
llvm-svn: 276030
2016-07-19 21:16:30 +00:00
Davide Italiano 1576e38598 [AMDGPU] Remove dead code.
LGTM'd by Matt Arsenault.

llvm-svn: 276029
2016-07-19 21:10:49 +00:00
Matt Arsenault 03006fd3c4 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault fe358066ea AMDGPU/SI: Fix SI scheduler refcount issue
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.

Patch by Axel Davy

llvm-svn: 275935
2016-07-19 00:35:22 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault 210b7cf3e2 AMDGPU: Remove pointless dyn_cast_or_null
This is already casted above so non-null

llvm-svn: 275881
2016-07-18 19:00:07 +00:00
Matt Arsenault b51dcb97bb AMDGPU: Fix missing switch case warning
llvm-svn: 275873
2016-07-18 18:40:51 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 4c519d3518 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault efb24540b1 AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

llvm-svn: 275869
2016-07-18 18:34:53 +00:00
Matt Arsenault 2e08e181a7 AMDGPU: Remove dead code and redundant check
Non intrinsic calls aren't really handled, and this
IntrinsicInst dyn_cast checks for the function for us.

llvm-svn: 275868
2016-07-18 18:34:48 +00:00
Nicolai Haehnle bef1ceb815 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

llvm-svn: 275779
2016-07-18 09:02:47 +00:00
Yaxun Liu a711cc7951 Re-commit [AMDGPU] Add metadata for runtime
Attempting to fix lit test failure on ppc.

llvm-svn: 275676
2016-07-16 05:09:21 +00:00
Matt Arsenault 73d2f8954a AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

llvm-svn: 275635
2016-07-15 22:32:02 +00:00
Matt Arsenault a65e6b8335 AMDGPU: Remove brev intrinsic
llvm-svn: 275620
2016-07-15 21:27:13 +00:00
Matt Arsenault 82e5e1e564 AMDGPU: Fix TargetPrefix for remaining r600 intrinsics
llvm-svn: 275619
2016-07-15 21:27:08 +00:00
Matt Arsenault 11d3e21f2b AMDGPU: Remove AMDGPU.ldexp
llvm-svn: 275618
2016-07-15 21:26:56 +00:00
Matt Arsenault 09b2c4aee8 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

llvm-svn: 275617
2016-07-15 21:26:52 +00:00
Matt Arsenault d7f4414543 AMDGPU/R600: Delete dead code.
Dead or the same as the base implementation.

llvm-svn: 275616
2016-07-15 21:26:46 +00:00
Vitaly Buka 7f64844481 Revert "[AMDGPU] Add metadata for runtime"
This reverts commit r275566.

llvm-svn: 275599
2016-07-15 19:14:57 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Yaxun Liu b3d17690eb [AMDGPU] Add metadata for runtime
Added emitting metadata to elf for runtime.

Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.

Differential Revision: https://reviews.llvm.org/D21849

llvm-svn: 275566
2016-07-15 14:58:21 +00:00
Jacques Pienaar 71c30a14b7 Rename AnalyzeBranch* to analyzeBranch*.
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.

Reviewers: tstellarAMD, mcrosier

Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai

Differential Revision: https://reviews.llvm.org/D22409

llvm-svn: 275564
2016-07-15 14:41:04 +00:00
Matt Arsenault b91805ea2b AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

llvm-svn: 275510
2016-07-15 00:58:15 +00:00
Matt Arsenault fa5a86a403 AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550

llvm-svn: 275509
2016-07-15 00:58:13 +00:00
Matt Arsenault 83ab049af2 AMDGPU: Fix splitting kill blocks with defs before kill
llvm-svn: 275508
2016-07-15 00:58:09 +00:00
Sam Kolton 7a2a323feb [AMDGPU] Assembler: fix row_bcast parsing
Summary: This change fix bug 28538

Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: https://reviews.llvm.org/D22355

llvm-svn: 275422
2016-07-14 14:50:35 +00:00
Matt Arsenault ca7f5701f8 AMDGPU/R600: Delete/rename intrinsics no longer used by mesa
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
2016-07-14 05:47:17 +00:00
Matt Arsenault 648e422bd9 AMDGPU/R600: Remove intrinsics with no tests and no users
Mesa removed this path, so nothing is using these anymore.

llvm-svn: 275372
2016-07-14 05:23:23 +00:00
Matt Arsenault 897eee4187 AMDGPU: Remove unused intrinsics
llvm-svn: 275371
2016-07-14 05:23:19 +00:00
Matt Arsenault 0bf9984bc8 AMDGPU: Remove dead code
llvm-svn: 275369
2016-07-14 05:23:08 +00:00
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Marek Olsak 0532c190f7 AMDGPU/SI: Emit the number of SGPR and VGPR spills
Summary:
v2: don't count SGPRs spilled to scratch twice

I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
  [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].

The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D22197

llvm-svn: 275288
2016-07-13 17:35:15 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
Matt Arsenault 8dff86d878 AMDGPU: WQM cleanups
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant

llvm-svn: 275252
2016-07-13 05:55:15 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Matt Arsenault 10531d1020 AMDGPU: Set isConvergent on v_cmpx* instructions
No test since these aren't used now, except for one place
in a pre-emit pass.

llvm-svn: 275200
2016-07-12 18:41:03 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Matt Arsenault fc7e6a0a0e AMDGPU: Cleanup pseudoinstructions
llvm-svn: 275133
2016-07-12 00:23:17 +00:00
Matt Arsenault 840593e19d AMDGPU: Fix missing scc def on control flow pseudos
These are all expanded to instructions that include an scc def.

llvm-svn: 275132
2016-07-12 00:08:14 +00:00
Matt Arsenault e3742466b9 AMDGPU: Enable trackLivenessAfterRegAlloc
This has caught a number of bugs.

llvm-svn: 275131
2016-07-11 23:56:30 +00:00
Nicolai Haehnle c06bfa1daa AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

llvm-svn: 275113
2016-07-11 21:59:43 +00:00
Nicolai Haehnle f52c3cf272 AMDGPU: fix local stack slot allocation bugs
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21551

llvm-svn: 275108
2016-07-11 21:44:40 +00:00
Nirav Dave 8603062ee4 Fix branch relaxation in 16-bit mode.
Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation
to generate jumps with 16-bit sized immediates in 16-bit mode.

This fixes PR22097.

Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight

Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D20830

llvm-svn: 275068
2016-07-11 14:23:53 +00:00
Artem Tamazov 53c9de08d2 [AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions.
Fixes issue mentioned at:
  https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13.
Lit tests added.

Differential Revision: http://reviews.llvm.org/D22133

llvm-svn: 275054
2016-07-11 12:07:18 +00:00
Jan Vesely 2fa28c330c AMDGPU/R600: Add implicitarg.ptr intrinsic
Differential Revision: http://reviews.llvm.org/D21622

llvm-svn: 275024
2016-07-10 21:20:29 +00:00
Matt Arsenault 52a4d9b429 AMDGPU: Move R600 only pieces into R600 classes
llvm-svn: 274979
2016-07-09 18:11:15 +00:00
Matt Arsenault 48d70cb486 Revert "AMDGPU: Remove unused control flow intrinsic"
llvm-svn: 274978
2016-07-09 17:18:39 +00:00
NAKAMURA Takumi f35424b73f AMDGPU: Prune AMDGPUAsmParser in libdeps.
llvm-svn: 274970
2016-07-09 07:54:27 +00:00
Matt Arsenault dfec5ce032 AMDGPU: Fix fdiv lowering when f32 denormals supported
Also fix test not actually using function labels.

llvm-svn: 274969
2016-07-09 07:48:11 +00:00
Matt Arsenault 1322b6f8bb AMDGPU: Improve offset folding for register indexing
llvm-svn: 274954
2016-07-09 01:13:56 +00:00
Matt Arsenault 95c7897555 AMDGPU: Simplify isSchedulingBoundary
llvm-svn: 274953
2016-07-09 01:13:51 +00:00
Matt Arsenault 8f0a92f0ba AMDGPU: Remove unused control flow intrinsic
llvm-svn: 274939
2016-07-08 21:39:44 +00:00