Commit Graph

2217 Commits

Author SHA1 Message Date
Matt Arsenault 8728c5f2db AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.

Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.

llvm-svn: 310258
2017-08-07 14:58:04 +00:00
Dmitry Preobrazhensky 50805a0b83 [AMDGPU][MC] Corrected VOP3 version of v_interp_* instructions for VI
See bug 32621: https://bugs.llvm.org//show_bug.cgi?id=32621

Reviewers: vpykhtin, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35902

llvm-svn: 310251
2017-08-07 13:14:12 +00:00
Matt Arsenault a7eb14afc7 AMDGPU: Fix typo in feature description
llvm-svn: 310217
2017-08-06 18:13:23 +00:00
Quentin Colombet c046208c52 [GlobalISel] Remove the GISelAccessor API.
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.

NFC.

llvm-svn: 310115
2017-08-04 20:15:46 +00:00
Connor Abbott 66b9bd6e50 [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34719

llvm-svn: 310088
2017-08-04 18:36:54 +00:00
Connor Abbott 92638ab625 [AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level.  This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.

Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.

Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35524

llvm-svn: 310087
2017-08-04 18:36:52 +00:00
Connor Abbott de068fe8b4 [AMDGPU] refactor WQM pass in preparation for WWM (NFCI)
Summary:
Right now, the WQM pass conflates two different things when tracking the
Needs of an instruction:

1. Needs can be StateWQM, which is propagated to other instructions, and
means that this instruction (and everything it depends on) must be
calculated in WQM.
2. Needs can be StateExact, which is not propagated to other
instructions, and means that this instruction must not be calculated in
WQM and WQM-ness must not be propagated past this instruction.

This works now because there are only two different states, but in the
future we want to be able to express things like "calculate this in WQM,
but please disable WWM and don't propagate it" (to implement
@llvm.amdgcn.set.inactive). In order to do this, we need to split the
per-instruction Needs field in two: a new Needs field, which can only
contain StateWQM (and in the future, StateWWM) and is propagated to
sources, and a Disables field, which can also contain just StateWQM or
nothing for now.

We keep the per-block tracking the same for now, by translating
Needs/Disables to the old representation with only StateWQM or
StateExact. The other place that needs special handling is when we
emit the state transitions. We could just translate back to the old
representation there as well, which we almost do, but instead of 0 as a
placeholder value for "any state," we explicitly or together all the
states an instruction is allowed to be in. This lets us refactor the
code in preparation for WWM, where we'll need to be able to handle
things like "this instruction must be in Exact or WQM, but not WWM."

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35523

llvm-svn: 310086
2017-08-04 18:36:50 +00:00
Connor Abbott 8c217d0a29 [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.

Reviewers: arsenm, tpr, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35167

llvm-svn: 310085
2017-08-04 18:36:49 +00:00
Dmitry Preobrazhensky 4b11a78a6e [AMDGPU][MC] Enabled expressions as operands
See bug 33579: https://bugs.llvm.org//show_bug.cgi?id=33579

Reviewers: vpykhtin, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D36091

llvm-svn: 310059
2017-08-04 13:55:24 +00:00
Florian Gross 2feb105882 [AMDGPU] Fixed MSVC build break
Error was:

field of type 'llvm::ArgDescriptor' has private default constructor
const AMDGPUFunctionArgInfo AMDGPUArgumentUsageInfo::ExternFunctionInfo{};
                                                                        ^

llvm-svn: 310048
2017-08-04 10:53:07 +00:00
Stanislav Mekhanoshin 6c7a8d0b5f [AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILL
In case if SI_KILL is in between of the SI_IF and SI_END_CF we need
to preserve the bits actually flipped by if rather then restoring
the original mask.

Differential Revision: https://reviews.llvm.org/D36299

llvm-svn: 310031
2017-08-04 06:58:42 +00:00
Connor Abbott 00755362b9 [AMDGPU] Add missing hazard for DPP-after-EXEC-write
Summary:
Following the docs, we need at least 5 wait states between an EXEC write
and an instruction that uses DPP.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34849

llvm-svn: 310013
2017-08-04 01:09:43 +00:00
Matt Arsenault 5c921a9291 AMDGPU: Remove pointless asserts
llvm-svn: 310007
2017-08-04 00:00:13 +00:00
Matt Arsenault a176cc5b93 AMDGPU: Don't use report_fatal_error for unsupported call types
llvm-svn: 310004
2017-08-03 23:32:41 +00:00
Matt Arsenault a202538bfa AMDGPU: Remove error on calls for amdgcn
Repurpose the -amdgpu-function-calls flag. Rather
than require it to emit a call, only use it to
run the always inline path or not.

llvm-svn: 310003
2017-08-03 23:24:05 +00:00
Matt Arsenault 817c253e60 AMDGPU: Fix implicitarg.ptr handling special inputs
llvm-svn: 310002
2017-08-03 23:12:44 +00:00
Matt Arsenault 8623e8d864 AMDGPU: Pass special input registers to functions
llvm-svn: 309998
2017-08-03 23:00:29 +00:00
Matt Arsenault 7016f13450 AMDGPU: Add analysis pass for function argument info
This will allow only adding necessary inputs to callee functions
that need special inputs forwarded from the kernel.

llvm-svn: 309996
2017-08-03 22:30:46 +00:00
Quentin Colombet 250e050a50 [GlobalISel] Make GlobalISel a non-optional library.
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.

llvm-svn: 309990
2017-08-03 21:52:25 +00:00
Changpeng Fang ef4dbb46da AMDGPU/SI: Don't fix a PHI under uniform branch in SIFixSGPRCopies only when sources and destination are all sgprs
Summary:
  If a PHI has at lease one VGPR operand, we have to fix the PHI
in SIFixSGPRCopies.

Reviewer:
  Matt

Differential Revision:
  http://reviews.llvm.org/D34727

llvm-svn: 309959
2017-08-03 16:37:02 +00:00
Rafael Espindola 79e238afee Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00
Tom Stellard 3337d74399 AMDGPU/GlobalISel: Mark 32-bit G_FMUL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36218

llvm-svn: 309898
2017-08-02 22:56:30 +00:00
Tom Stellard a2f57be260 AMDGPU/R600: Initialize more passes
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D36128

llvm-svn: 309893
2017-08-02 22:19:45 +00:00
Matt Arsenault 2738ede6b2 AMDGPU: Restore using MRI to find highest used regs
If there are no calls, this is a faster path than
searching the entire program for calls.

This was supposed to be left in r309781.
Fixes unused variable warning.

llvm-svn: 309832
2017-08-02 17:15:01 +00:00
Matt Arsenault 8e8f8f43b0 AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault 1d6317c3ad AMDGPU: Fix emitting encoded calls
This was failing on out of bounds access to the extra operands
on the s_swappc_b64 beyond those in the instruction definition.

This was working, but somehow regressed within the past few weeks,
although I don't see any obvious commit.

llvm-svn: 309782
2017-08-02 01:42:04 +00:00
Matt Arsenault 6ed7b9bfc0 AMDGPU: Analyze callee resource usage in AsmPrinter
llvm-svn: 309781
2017-08-02 01:31:28 +00:00
Stanislav Mekhanoshin f23ae4fbe9 [AMDGPU] Fix asan error after last commit
Previous change "Turn s_and_saveexec_b64 into s_and_b64 if
result is unused" introduced asan use-after-poison error.
Instruction was analyzed after eraseFromParent() calls.

Move analysys higher than erase.

llvm-svn: 309779
2017-08-02 01:18:57 +00:00
Matt Arsenault d1867c0345 AMDGPU: Don't place arguments in emergency stack slot
When finding the fixed offsets for function arguments,
this needs to skip over the 4 bytes reserved for the
emergency stack slot.

llvm-svn: 309776
2017-08-02 00:59:51 +00:00
Stanislav Mekhanoshin da0edef1bd [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused
With SI_END_CF elimination for some nested control flow we can now
eliminate saved exec register completely by turning a saveexec version
of instruction into just a logical instruction.

Differential Revision: https://reviews.llvm.org/D36007

llvm-svn: 309766
2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin 37e7f959c0 [AMDGPU] Collapse adjacent SI_END_CF
Add a pass to remove redundant S_OR_B64 instructions enabling lanes in
the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any
vector instructions between them we can only keep outer SI_END_CF, given
that CFG is structured and exec bits of the outer end statement are always
not less than exec bit of the inner one.

This needs to be done before the RA to eliminate saved exec bits registers
but after register coalescer to have no vector registers copies in between
of different end cf statements.

Differential Revision: https://reviews.llvm.org/D35967

llvm-svn: 309762
2017-08-01 23:14:32 +00:00
Matt Arsenault 206f826348 AMDGPU: Fix handling of div_scale with undef inputs
The src0 register must match src1 or src2, but if these
were undefined they could end up using different implicit_defed
virtual registers. Force these to use one undef vreg or pick the
defined other register.

Also fixes producing invalid nodes without the right number of
inputs when src2 is undef.

llvm-svn: 309743
2017-08-01 20:49:41 +00:00
Matt Arsenault b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00
Davide Italiano ffb1098e92 [AMDGPU] Put a function used only inside assert() under NDEBUG.
llvm-svn: 309723
2017-08-01 19:07:20 +00:00
Tom Stellard 9d8337d857 AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35916

llvm-svn: 309675
2017-08-01 12:38:33 +00:00
Florian Hahn 6b3216aad8 Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

llvm-svn: 309553
2017-07-31 10:07:49 +00:00
Tom Stellard 503fd446ad AMDGPU: Remove deadcode from AMDGPUInstPrinter
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36034

llvm-svn: 309477
2017-07-29 03:56:53 +00:00
Tom Stellard 5c50cdf0e8 AMDGPU: Move INDIRECT_BASE_ADDR definition out of common files
Summary: This is only used by R600.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35926

llvm-svn: 309476
2017-07-29 03:44:07 +00:00
Matt Arsenault 9608a2891d AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat
Checking the encoding is insufficient since now there can
be global or scratch instructions.

llvm-svn: 309472
2017-07-29 01:26:21 +00:00
Matt Arsenault dc8f5cc39c AMDGPU: Teach isLegalAddressingMode about global_* instructions
Also refine the flat check to respect flat-for-global feature,
and constant fallback should check global handling, not
specifically MUBUF.

llvm-svn: 309471
2017-07-29 01:12:31 +00:00
Matt Arsenault 4e309b0861 AMDGPU: Start selecting global instructions
llvm-svn: 309470
2017-07-29 01:03:53 +00:00
Matt Arsenault da9ab148f3 AMDGPU: Look through a bitcast user of an out argument
This allows handling of a lot more of the interesting
cases in Blender. Most of the large functions unlikely
to be inlined have this pattern.

This is a special case for what clang emits for OpenCL 3
element vectors. Annoyingly, these are emitted as
<3 x elt>* pointers, but accessed as <4 x elt>* operations.
This also needs to handle cases where a struct containing
a single vector is used.

llvm-svn: 309419
2017-07-28 19:06:16 +00:00
Matt Arsenault c06574ffc0 AMDGPU: Add pass to replace out arguments
It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.

I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.

I'm also not sure where it should run.I think it should
run  before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.

llvm-svn: 309416
2017-07-28 18:40:05 +00:00
Matt Arsenault 9166ce86e8 AMDGPU: Annotate implicitarg.ptr usage
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.

Also fixes missing test for the intrinsic.

llvm-svn: 309398
2017-07-28 15:52:08 +00:00
Stanislav Mekhanoshin 3197eb6981 [AMDGPU] Optimize SI_IF lowering for simple if regions
Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64.
The xor is used to extract only the changed bits. In case of a simple
if region where the only use of that value is in the SI_END_CF to
restore the old exec mask, we can omit the xor and perform an or of
the exec mask with the original exec value saved by the
s_and_saveexec_b64.

Differential Revision: https://reviews.llvm.org/D35861

llvm-svn: 309185
2017-07-26 21:29:15 +00:00
Wei Ding a126a13bb3 AMDGPU : Widen extending scalar loads to 32-bits.
Differential Revision: http://reviews.llvm.org/D35146

llvm-svn: 309178
2017-07-26 21:07:28 +00:00
Matt Arsenault 894e53d6ac AMDGPU: Fix using SMRD instructions for argument loads in functions
These are not actually uniform values except in kernels.

llvm-svn: 309172
2017-07-26 20:39:42 +00:00
Tom Stellard 55038cd1d3 AMDGPU/GlobalISel: Mark 32-bit G_OR as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35127

llvm-svn: 309165
2017-07-26 20:00:53 +00:00
Zvi Rackover 1b73682243 TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI.
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.

This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.

llvm-svn: 309085
2017-07-26 08:06:58 +00:00
Marek Olsak 6096f542d1 AMDGPU/SI: Fix Depth and Height computation for SI scheduler
Patch by: Axel Davy

Differential Revision: https://reviews.llvm.org/D34967

llvm-svn: 309028
2017-07-25 20:37:03 +00:00
Marek Olsak e6f74384b1 AMDGPU/SI: Force exports at the end for SI scheduler
Patch by: Axel Davy

Differential Revision: https://reviews.llvm.org/D34965

llvm-svn: 309027
2017-07-25 20:36:58 +00:00
Matt Arsenault 7052a6a505 AMDGPU: Fix allocating pseudo-registers
There's no need for these to be part of a class since
they are immediately replaced. New unreachable hit in
existing tests.'

llvm-svn: 308903
2017-07-24 18:06:15 +00:00
Matt Arsenault 416d755675 AMDGPU: Remove leftover td file
All of the instructions were moved out of this a while ago,
so it's just a useless comment now.

llvm-svn: 308815
2017-07-22 00:40:46 +00:00
Konstantin Zhuravlyov e9a5a77ee3 AMDGPU: Implement memory model
llvm-svn: 308781
2017-07-21 21:19:23 +00:00
Konstantin Zhuravlyov 070d88e335 AMDGPU: Introduce maybeAtomic instruction flag
Testing is in the follow up change

llvm-svn: 308779
2017-07-21 21:05:45 +00:00
Matt Arsenault f014d7cbde AMDGPU: Preserve undef flag in eliminateFrameIndex
Fixes verifier errors in some call tests.
Not sure why we haven't run into this before.

Test split into separate patch for once
call support is committed.

llvm-svn: 308774
2017-07-21 19:31:44 +00:00
Matt Arsenault 0ed39d329d AMDGPU: Partially fix improper reliance on memoperands
There are 2 more places doing this, but I'm not sure
what they are doing and don't make any sense to me

llvm-svn: 308770
2017-07-21 18:54:54 +00:00
Matt Arsenault 6ab9ea9614 AMDGPU: Don't track lgkmcnt for global_/scratch_ instructions
llvm-svn: 308766
2017-07-21 18:34:51 +00:00
Matt Arsenault 37a58e03c7 AMDGPU: Fix getMemOpBaseRegImmOfs for flat with offsets
llvm-svn: 308762
2017-07-21 18:06:36 +00:00
Matt Arsenault ca7b0a1777 AMDGPU: Add instruction definitions for some scratch_* instructions
Omit atomics for now since they probably aren't useful.

llvm-svn: 308747
2017-07-21 15:36:16 +00:00
Dmitry Preobrazhensky abf2839478 [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifier
See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591

Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35424

llvm-svn: 308740
2017-07-21 13:54:11 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Matt Arsenault e5456ce5e5 AMDGPU: Rename _RTN atomic instructions
Move the _RTN to the end of the name. It reads
better if the other addressing mode components
line up with the non-RTN version. It is also
more convenient to define saddr variants of
FLAT atomics to have the RTN last, and it is
good to have a consistent naming scheme.

llvm-svn: 308674
2017-07-20 21:06:04 +00:00
Matt Arsenault db78273b6e Add an ID field to StackObjects
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.

This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.

This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.

Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.

llvm-svn: 308673
2017-07-20 21:03:45 +00:00
Krzysztof Parzyszek f3a778d757 Implement LaneBitmask::getNumLanes and LaneBitmask::getHighestLane
This should eliminate most uses of countPopulation and Log2_32 on
the lane mask values.

llvm-svn: 308658
2017-07-20 19:43:19 +00:00
Krzysztof Parzyszek e9f0c1e031 Use LaneBitmask::getLane in a few more places
llvm-svn: 308655
2017-07-20 19:15:56 +00:00
Matt Arsenault c37fe66ec5 AMDGPU: Add encoding for carryless add/sub instructions
llvm-svn: 308639
2017-07-20 17:42:47 +00:00
Matt Arsenault f65c5ac9c9 AMDGPU: Add encodings for global atomics
llvm-svn: 308638
2017-07-20 17:31:56 +00:00
Matt Arsenault 04004716ff AMDGPU: Correct encoding for global instructions
The soffset field needs to be be set to 0x7f to disable it,
not 0. 0 is interpreted as an SGPR offset.

This should be enough to get basic usage of the global instructions
working. Technically it is possible to use an SGPR_32 offset,
but I'm not sure if it's correct with 64-bit pointers, but
that is not handled now. This should also be cleaned up
to be more similar to how different MUBUF modes are handled,
and to have InstrMappings between the different types.

llvm-svn: 308583
2017-07-20 05:17:54 +00:00
Matt Arsenault 254ad3de5c AMDGPU: Annotate necessity of flat-scratch-init
As an approximation of the existing handling to avoid
regressions. Fixes using too many registers with calls
on subtargets with the SGPR allocation bug.

llvm-svn: 308326
2017-07-18 16:44:58 +00:00
Matt Arsenault 1cc47f8413 AMDGPU: Figure out private memory regs after lowering
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.

This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.

Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.

llvm-svn: 308325
2017-07-18 16:44:56 +00:00
Nicolai Haehnle a253e4c028 AMDGPU: Fix crash when folding immediates into multiple uses
Summary:
When an immediate is folded by constant folding, we re-scan the entire
use list for two reasons:

1. The constant folding may have created a new use of the same reg.
2. The constant folding may have removed an additional use in the list
   we're currently traversing (e.g., constant folding an S_ADD_I32 c, c).

However, this could previously lead to a crash when an unrelated use was
added twice into the FoldList. Since we re-scan the whole list anyway, we
might as well just clear the FoldList again before we do so.

Using a MIR test to show this because real code seems to trigger the issue
only in connection with some really subtle control flow structures.

Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35416

llvm-svn: 308314
2017-07-18 14:54:41 +00:00
Sam Kolton 4685b70a77 [AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
llvm-svn: 308310
2017-07-18 14:23:26 +00:00
Dmitry Preobrazhensky 30fc523984 [AMDGPU][MC] Corrected disassembler for proper decoding of v_mqsad_u32_u8
See Bug 33639: https://bugs.llvm.org//show_bug.cgi?id=33639

Reviewers: vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D34892

llvm-svn: 308303
2017-07-18 13:12:48 +00:00
Dmitry Preobrazhensky 00deef8f00 [AMDGPU][MC] Optimized IsRegIntersect function
Optimized IsRegIntersect by using MCRegAliasIterator

See Bug 33800: https://bugs.llvm.org//show_bug.cgi?id=33800

Reviewers: arsenm, artem.tamazov

Differential Revision: https://reviews.llvm.org/D35452

llvm-svn: 308294
2017-07-18 11:14:02 +00:00
Dmitry Preobrazhensky 095ec3da81 [AMDGPU][MC] Added missing VOP3P opcodes
Added support of the following opcodes:
  v_pk_sub_u16
  v_pk_mad_i16
  v_pk_mad_u16

See Bug 33593: https://bugs.llvm.org//show_bug.cgi?id=33593

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D34890

llvm-svn: 308281
2017-07-18 09:24:10 +00:00
Chandler Carruth 9a7442d088 Revert r308179 which causes tablegen to spam stderr on every build.
Original commit log:
[AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions

llvm-svn: 308270
2017-07-18 07:40:47 +00:00
Matt Arsenault e15855d9e3 AMDGPU: Annotate features from x work item/group IDs.
This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.

llvm-svn: 308226
2017-07-17 22:35:50 +00:00
Sam Kolton a2b9e2f755 [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D35350

llvm-svn: 308179
2017-07-17 14:23:38 +00:00
Konstantin Zhuravlyov 2ec725c9d8 AMDGPU: Fix amdgpu-flat-work-group-size/amdgpu-waves-per-eu check
Differential Revision: https://reviews.llvm.org/D35433

llvm-svn: 308147
2017-07-16 19:38:47 +00:00
Konstantin Zhuravlyov 163af2ed7a AMDGPU: Remove duplicate print outs from .AMDGPU.csdata
Differential Revision: https://reviews.llvm.org/D35428

llvm-svn: 308145
2017-07-16 19:24:08 +00:00
Hiroshi Inoue 7f46baff2c fix typos in comments; NFC
llvm-svn: 308127
2017-07-16 08:11:56 +00:00
Matt Arsenault b34635550a AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
2017-07-15 05:52:59 +00:00
Davide Italiano 6fdfede10d [AMDGPU] Throw away more dead code. NFCI.
llvm-svn: 308055
2017-07-14 21:20:29 +00:00
Davide Italiano 502ac724ac [AMDGPU] Garbage collect dead code. NFCI.
Unbreaks the build with GCC7.

llvm-svn: 308047
2017-07-14 18:47:29 +00:00
Alfred Huang 5b27072f57 [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.

Differential Revision: https://reviews.llvm.org/D34726

llvm-svn: 308039
2017-07-14 17:56:55 +00:00
Matt Arsenault 23e4df6a59 AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer
to callee functions. Also don't unconditionally enable
for kernels.

llvm-svn: 307978
2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin dc2890a887 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

llvm-svn: 307976
2017-07-13 23:59:15 +00:00
Matt Arsenault 6b93046f29 AMDGPU: Annotate call graph with used features
Previously this wouldn't detect used features indirectly
used in callee functions.

llvm-svn: 307967
2017-07-13 21:43:42 +00:00
Hiroshi Inoue e9dea6e613 fix typos in comments and error messges; NFC
llvm-svn: 307885
2017-07-13 06:48:39 +00:00
Matt Arsenault ce34ac588e AMDGPU: Fix converting unanalyzable global loads to SMRD
Not all memory dependence queries succeed, so this needs to
be conservative if it fails.

llvm-svn: 307861
2017-07-12 23:06:18 +00:00
Stanislav Mekhanoshin 5680b0ca9f [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

llvm-svn: 307848
2017-07-12 21:20:28 +00:00
Rafael Espindola 1beb702ba2 Fully fix the movw/movt addend.
The issue is not if the value is pcrel. It is whether we have a
relocation or not.

If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.

llvm-svn: 307730
2017-07-11 23:18:25 +00:00
Evandro Menezes 0cd23f5642 [CodeGen] Rename DEBUG_TYPE to match passnames
Rename missing DEBUG_TYPE "machine-scheduler" from backend files, which were
absent from https://reviews.llvm.org/rL303921.

Differential revision: https://reviews.llvm.org/D35231

llvm-svn: 307719
2017-07-11 22:08:28 +00:00
Konstantin Zhuravlyov 94b3b47c73 Revert "AMDGPU: Do not test for SI in getIsaVersion"
This reverts commit r307573.

This breaks downstream test.

llvm-svn: 307678
2017-07-11 17:57:41 +00:00
Nirav Dave 4dcad5dc6b Add DAG argument to canMergeStoresTo NFC.
llvm-svn: 307583
2017-07-10 20:25:54 +00:00
Matt Arsenault 9cff06f37b AMDGPU: Allow SIShrinkInstructions to fold FrameIndexes
llvm-svn: 307576
2017-07-10 20:04:35 +00:00
Matt Arsenault 6c29c5acfe AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

llvm-svn: 307575
2017-07-10 19:53:57 +00:00
Matt Arsenault fda5318204 AMDGPU: Remove unnecessary check for constant operands
An instruction that has an immediate operand can't reach
this point. This is only called for a freshly shrunk instruction,
which prevously couldn't have had a literal constant operand.
This was also not conservative enough since it woudl also have
had to filter other constant-like inputs like frame indexes.

llvm-svn: 307574
2017-07-10 19:33:38 +00:00
Konstantin Zhuravlyov a46241909a AMDGPU: Do not test for SI in getIsaVersion
SI is being tested by isa version in the first two if statements of the function.

llvm-svn: 307573
2017-07-10 19:24:05 +00:00
Simon Pilgrim d362d27c27 [AMDGPU] Fix -Wimplicit-fallthrough warning. NFCI.
llvm-svn: 307485
2017-07-08 19:50:03 +00:00
Simon Pilgrim cb07d67a5c Fix some more -Wimplicit-fallthrough warnings. NFCI.
llvm-svn: 307411
2017-07-07 16:40:06 +00:00
Sam Kolton 10ac2fd2eb [AMDGPU] Assembler: refactor convert methods (VOP3 and MIMG)
Summary: Simplified converter methods for VOP3 and MIMG.

Reviewers: dp, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, vpykhtin, t-tye

Differential Revision: https://reviews.llvm.org/D35047

llvm-svn: 307407
2017-07-07 15:21:52 +00:00
Dmitry Preobrazhensky b2d24e23ce [AMDGPU][mc][gfx9] Added support of op_sel/op_sel_hi for V_MAD_MIX*
See https://bugs.llvm.org//show_bug.cgi?id=33595

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D35021

llvm-svn: 307402
2017-07-07 14:29:06 +00:00
Simon Pilgrim 0f5b35059d [AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI.
llvm-svn: 307381
2017-07-07 10:18:57 +00:00
Sean Fertile 9cd1cdf814 Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

llvm-svn: 307346
2017-07-07 02:00:06 +00:00
Matt Arsenault 9aa45f047f AMDGPU: Add macro fusion schedule DAG mutation
Try to increase opportunities to shrink vcc uses.

llvm-svn: 307313
2017-07-06 20:57:05 +00:00
Matt Arsenault a81198d82d AMDGPU: Minor cleanup of shrinking logic
llvm-svn: 307312
2017-07-06 20:56:59 +00:00
Stanislav Mekhanoshin 9d7b1c9ddb [AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

llvm-svn: 307308
2017-07-06 20:34:21 +00:00
Craig Topper 79ab643da8 [Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.

llvm-svn: 307292
2017-07-06 18:39:47 +00:00
Quentin Colombet f3f7d4d64b [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.
NFC

llvm-svn: 307186
2017-07-05 18:40:56 +00:00
Alexander Timofeev 982aee6a38 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307097
2017-07-04 17:32:00 +00:00
Marek Olsak b83f5c99ba [AMDGPU] Fix latency of MIMG instructions
Patch by cwabbott (Connor Abbott).

llvm-svn: 307081
2017-07-04 14:43:38 +00:00
NAKAMURA Takumi e4a741376b Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

llvm-svn: 307054
2017-07-04 02:14:18 +00:00
Alexander Timofeev ea7f08bee5 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307026
2017-07-03 14:54:11 +00:00
Matt Arsenault 3f031e75aa AMDGPU: Add operand target flags serialization
llvm-svn: 306995
2017-07-02 23:21:48 +00:00
Hiroshi Inoue bb703e8960 fix trivial typos; NFC
suport -> support

llvm-svn: 306968
2017-07-02 03:24:54 +00:00
Matt Arsenault 7c525903ef AMDGPU: Remove SITypeRewriter
This was an old workaround for using v16i8 in some old intrinsics
for resource descriptors.

llvm-svn: 306603
2017-06-28 21:38:50 +00:00
Geoff Berry 66d9bdbca8 [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper

Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D34531

llvm-svn: 306554
2017-06-28 15:53:17 +00:00
Stanislav Mekhanoshin d445455643 [AMDGPU] Add pattern for v_alignbit_b32 with immediate
If immediate in shift is less than 32 we can use alignbit too.

Differential Revision: https://reviews.llvm.org/D34729

llvm-svn: 306500
2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin e8bf6c9629 [AMDGPU] Add 2 new alignbit patterns
Differential Revision: https://reviews.llvm.org/D34655

llvm-svn: 306449
2017-06-27 19:10:47 +00:00
Stanislav Mekhanoshin c9bd53ab59 [AMDGPU] Simplify setcc (sext from i1 b), -1|0, cc
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.

Differential Revision: https://reviews.llvm.org/D34545

llvm-svn: 306446
2017-06-27 18:53:03 +00:00
Stanislav Mekhanoshin 6851ddf942 [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.

Differential Revision: https://reviews.llvm.org/D34500

llvm-svn: 306439
2017-06-27 18:25:26 +00:00
Sam Kolton a179d25b99 [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.

Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye

Differential Revision: https://reviews.llvm.org/D34626

llvm-svn: 306413
2017-06-27 15:02:23 +00:00
Hiroshi Inoue 6a391bbf40 fix trivial typos, NFC
succesor -> successor

llvm-svn: 306393
2017-06-27 10:35:37 +00:00
Nicolai Haehnle 43cc6c4e0f AMDGPU: M0 operands to spill/restore opcodes are dead
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.

This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).

Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33319

llvm-svn: 306375
2017-06-27 08:04:13 +00:00
Matt Arsenault f28683cf51 AMDGPU: Setup SP/FP in callee function prolog/epilog
llvm-svn: 306312
2017-06-26 17:53:59 +00:00
Tom Stellard eb8f1e27d9 AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34589

llvm-svn: 306298
2017-06-26 15:56:52 +00:00
Matt Arsenault 8bcf2f20a7 AMDGPU: Whitespace fixes
llvm-svn: 306265
2017-06-26 03:01:36 +00:00
Matt Arsenault 10fc062b2b AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

llvm-svn: 306264
2017-06-26 03:01:31 +00:00
Rafael Espindola daaee7151b Remove a processFixupValue hack.
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.

I left a fix for anyone trying to figure out what producers should be
producing a different expression.

llvm-svn: 306200
2017-06-24 05:12:29 +00:00
Rafael Espindola f351292141 Remove redundant argument.
llvm-svn: 306189
2017-06-24 00:26:57 +00:00
Rafael Espindola 86c664f9d7 Move Value adjustment to applyFixup. NFC.
llvm-svn: 306178
2017-06-23 23:05:15 +00:00
Rafael Espindola 801b42de31 ARM: move some logic from processFixupValue to applyFixup.
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.

While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.

llvm-svn: 306177
2017-06-23 22:52:36 +00:00
Tom Stellard af552dc352 AMDGPU/GlobalISel: Mark 32-bit G_AND as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34349

llvm-svn: 306112
2017-06-23 15:17:17 +00:00
David Stuttard f677966e2e [AMDGPU] Add intrinsics for tbuffer load and store - build error fix
Variable was unused in non-debug build (used in assert) causing compile time
warning and eventual build failure

llvm-svn: 306034
2017-06-22 17:15:49 +00:00
David Stuttard 70e8bc1bf3 [AMDGPU] Add intrinsics for tbuffer load and store
Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr

Differential Revision: https://reviews.llvm.org/D30687

llvm-svn: 306031
2017-06-22 16:29:22 +00:00
Sam Kolton ca5a30ed74 [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit encoding
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.

Reviewers: arsenm, vpykhtin, cfang

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34403

llvm-svn: 305999
2017-06-22 12:42:14 +00:00
Sam Kolton 3c4933fcc6 [AMDGPU] SDWA: add support for GFX9 in peephole pass
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026

Reviewers: vpykhtin, rampitec, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34241

llvm-svn: 305986
2017-06-22 06:26:41 +00:00
Stanislav Mekhanoshin 3ed38c601a [AMDGPU] Add FP_CLASS to the add/setcc combine
This is one of the nodes which also compile as v_cmp_*.

Differential Revision: https://reviews.llvm.org/D34485

llvm-svn: 305970
2017-06-21 23:46:22 +00:00
Rafael Espindola 88d9e37ec8 Use a MutableArrayRef. NFC.
llvm-svn: 305968
2017-06-21 23:06:53 +00:00
Stanislav Mekhanoshin a8b26936d0 [AMDGPU] Combine add and adde, sub and sube
If one of the arguments of adde/sube is zero we can fold another
add/sub into it.

Differential Revision: https://reviews.llvm.org/D34374

llvm-svn: 305964
2017-06-21 22:30:01 +00:00
Stanislav Mekhanoshin e3eb42cef6 [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.

Differential Revision: https://reviews.llvm.org/D34300

llvm-svn: 305962
2017-06-21 22:05:06 +00:00
Dmitry Preobrazhensky 851a3d9f05 [AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failures
See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509

Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin

Differential Revision: https://reviews.llvm.org/D34360

llvm-svn: 305923
2017-06-21 16:00:54 +00:00
Dmitry Preobrazhensky dc4ac823ec [AMDGPU][MC] Corrected V_*QSAD* instructions to check that dest register is different than any of the src
See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D34003

llvm-svn: 305915
2017-06-21 14:41:34 +00:00
Sam Kolton 549c89d2c9 [AMDGPU] SDWA: merge VI and GFX9 pseudo instructions
Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9.

Reviewers: dp, arsenm, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov

Differential Revision: https://reviews.llvm.org/D34026

llvm-svn: 305886
2017-06-21 08:53:38 +00:00
Matt Arsenault 67cd347e93 AMDGPU: Allow vectorization of packed types
llvm-svn: 305844
2017-06-20 20:38:06 +00:00
Stanislav Mekhanoshin a9d846c6ef [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.

Differential Revison: https://reviews.llvm.org/D34291

llvm-svn: 305840
2017-06-20 20:33:44 +00:00
Matt Arsenault 9698f1c862 AMDGPU: Start adding global_* instructions
llvm-svn: 305838
2017-06-20 19:54:14 +00:00
Matt Arsenault ff3f912e74 AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.

llvm-svn: 305821
2017-06-20 18:56:32 +00:00
Matt Arsenault 76858f5a1d AMDGPU: Preserve undef when folding register operands
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.

llvm-svn: 305816
2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin 465a1ff193 [AMDGPU] Eliminate SGPR to VGPR copy when possible
SGPRs are generally cheaper, so try to use them over VGPRs.

Differential Revision: https://reviews.llvm.org/D34130

llvm-svn: 305815
2017-06-20 18:32:42 +00:00
Matt Arsenault 7f67b35901 AMDGPU: Fix crash with undef vreg input operand
llvm-svn: 305814
2017-06-20 18:28:02 +00:00
Matt Arsenault c595185f8f AMDGPU: Fix scratch wave offset relative FI expansion
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.

llvm-svn: 305761
2017-06-19 23:47:21 +00:00
Stanislav Mekhanoshin 50c2f251f5 [AMDGPU] Add infer address spaces pass before SROA
It adds it for the target after inlining but before SROA where
we can get most out of it.

Differential Revision: https://reviews.llvm.org/D34366

llvm-svn: 305759
2017-06-19 23:17:36 +00:00
Matt Arsenault e0e68a757e AMDGPU: Cleanup CreateLiveInRegister
llvm-svn: 305748
2017-06-19 21:52:45 +00:00
Tom Stellard ff63ee0db5 AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34129

llvm-svn: 305692
2017-06-19 13:15:45 +00:00
Alfred Huang f9b521fdaf [AMDGPU] Testing commit access only, no real change
llvm-svn: 305523
2017-06-15 23:02:55 +00:00
Alexander Timofeev 0f9c84cd93 DivergencyAnalysis patch for review
llvm-svn: 305494
2017-06-15 19:33:10 +00:00
Davide Italiano 36559b2527 [AMDGPU] Remove now dead defaultOffsetS13(). NFCI.
Fixes the GCC7 build with -Werror.

llvm-svn: 305329
2017-06-13 22:24:24 +00:00
Tom Stellard ee6e6452df AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33992

llvm-svn: 305232
2017-06-12 20:54:56 +00:00
Matt Arsenault 05c26472fa AMDGPU: Don't add same implicit use multiple times
For the last component, the same register use
was added as an implicit use and another implicit kill use.

llvm-svn: 305205
2017-06-12 17:19:20 +00:00
Matt Arsenault d9b77848f2 AMDGPU: Teach isLegalAddressingMode about flat offsets
Also fix reporting r+r as a valid addressing mode without
offsets.

llvm-svn: 305203
2017-06-12 17:06:35 +00:00
Matt Arsenault db7c6a8731 AMDGPU: Start selecting flat instruction offsets
llvm-svn: 305201
2017-06-12 16:53:51 +00:00
Matt Arsenault 89ad17ce4c AMDGPU: Verify that flat offsets aren't used pre-GFX9
For convenience the operand is always present in the instruction,
but it isn't valid to use except on GFX9.

llvm-svn: 305200
2017-06-12 16:37:55 +00:00
Matt Arsenault fd02314113 AMDGPU: Start adding offset fields to flat instructions
llvm-svn: 305194
2017-06-12 15:55:58 +00:00
Daniel Neilson c0112ae8da Const correctness for TTI::getRegisterBitWidth
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.

Reviewers: chandlerc, rnk, reames

Reviewed By: reames

Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D33903

llvm-svn: 305189
2017-06-12 14:22:21 +00:00
Wei Ding 7c3e5115a5 AMDGPU : Fix ISA Version Definitions.
Differential Revision: http://reviews.llvm.org/D28531

llvm-svn: 305137
2017-06-10 03:53:19 +00:00
Stanislav Mekhanoshin 1a61ab8172 [AMDGPU] Add intrinsics for alignbit and alignbyte instructions
Differential Revision: https://reviews.llvm.org/D34046

llvm-svn: 305098
2017-06-09 19:03:00 +00:00
David Stuttard 82618baa0f [AMDGPU] Fix for issue in alloca to vector promotion pass
Summary:
Alloca promotion pass not dealing with non-canonical input

Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)

Also added some test cases in non-canonical form to check that it no longer crashes

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31710

llvm-svn: 305079
2017-06-09 14:16:22 +00:00
Matt Arsenault f1202e650a AMDGPU: Work around build special casing .inc files
It complains because it assumes these were autogenerated files
in the source directory.

llvm-svn: 305005
2017-06-08 19:25:21 +00:00
Matt Arsenault 3c7581bbeb AMDGPU: Use correct register names in inline assembly
Fixes using physical registers in inline asm from clang.

llvm-svn: 305004
2017-06-08 19:03:20 +00:00
Mark Searles e5c7832311 [AMDGPU] Force qsads instrs to use different dest register than source registers
The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes.

Differential Revision: https://reviews.llvm.org/D33783

llvm-svn: 304998
2017-06-08 18:21:19 +00:00
Dmitry Preobrazhensky 5a2f881b39 [AMDGPU][MC] Corrected error message for s_waitcnt helpers
See Bug 32711: https://bugs.llvm.org//show_bug.cgi?id=32711

Reviewers: artem.tamazov

Differential Revision: https://reviews.llvm.org/D33781

llvm-svn: 304922
2017-06-07 16:08:02 +00:00
Tom Stellard 2860a428f7 AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33949

llvm-svn: 304910
2017-06-07 13:54:51 +00:00
Zachary Turner 264b5d9e88 Move Object format code to lib/BinaryFormat.
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.

Differential Revision: https://reviews.llvm.org/D33843

llvm-svn: 304864
2017-06-07 03:48:56 +00:00
Konstantin Zhuravlyov 1e2b87893b AMDGPU/NFC: Move amdgpu code object metadata to support
Differential Revision: https://reviews.llvm.org/D31437

llvm-svn: 304812
2017-06-06 18:35:50 +00:00
Stanislav Mekhanoshin e4cda7417c [AMDGPU] Return correct value from SDWA pass
Differential Revision: https://reviews.llvm.org/D33927

llvm-svn: 304805
2017-06-06 16:42:30 +00:00
Tom Stellard 8cd60a5067 AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33890

llvm-svn: 304797
2017-06-06 14:16:50 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Mandeep Singh Grang 5e1697ef28 [llvm] Remove double semicolons
Reviewers: craig.topper, arsenm, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33924

llvm-svn: 304767
2017-06-06 05:08:36 +00:00
Konstantin Zhuravlyov 5b0bf2ff0d AMDGPU: Remove deprecated and unused elf definitions
Differential Revision: https://reviews.llvm.org/D33689

llvm-svn: 304737
2017-06-05 21:33:40 +00:00
Mark Searles 602ee930bf [AMDGPU] Fix uninit'ed var (RevisitLoop)
Differential Revision: https://reviews.llvm.org/D33907

llvm-svn: 304729
2017-06-05 19:29:01 +00:00
Stanislav Mekhanoshin 286a4225b9 [AMDGPU] Fix SIFoldOperands crash with clamp
Fixes bug #33302. Pass did not account that Src1 of max instruction
can be an immediate.

Differential Revision: https://reviews.llvm.org/D33884

llvm-svn: 304696
2017-06-05 01:03:04 +00:00
Stanislav Mekhanoshin 0330660403 [AMDGPU] Untangle SDWA pass from SIShrinkInstructions
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.

Also added handling to preserve original src modifiers.

Differential Revision: https://reviews.llvm.org/D33860

llvm-svn: 304665
2017-06-03 17:39:47 +00:00
Tom Stellard e042412ef1 AMDGPU/GlobalISel: Mark 1-bit integer constants as legal
Summary:
These are mostly legal, but will probably need special lowering for some
cases.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33791

llvm-svn: 304628
2017-06-03 01:13:33 +00:00
Stanislav Mekhanoshin f154b4f52c [AMDGPU] Preserve operand order in SIFoldOperands
SIFoldOperands can commute operands even if no folding was done.
This change is to preserve IR is no folding was done.

Differential Revision: https://reviews.llvm.org/D33802

llvm-svn: 304625
2017-06-03 00:41:52 +00:00
Stanislav Mekhanoshin ca5d2efe5a [AMDGPU] V_DIV_FIXUP_F16 is not a commutable operation
Differential Revision: https://reviews.llvm.org/D33808

llvm-svn: 304619
2017-06-03 00:16:44 +00:00
Matt Arsenault 746e065716 AMDGPU: Register AMDGPUAlwaysInline
llvm-svn: 304574
2017-06-02 18:02:42 +00:00
Konstantin Zhuravlyov be6c0ca5e2 AMDGPU: Make auto waitcnt before barrier a feature
Differential Revision: https://reviews.llvm.org/D33793

llvm-svn: 304571
2017-06-02 17:40:26 +00:00
Alexander Timofeev 3f70b619a9 AMDGPUAnnotateUniformValue should always treat volatile loads as divergent
llvm-svn: 304554
2017-06-02 15:25:52 +00:00
Mark Searles 70359ac60d [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.
-enable-si-insert-waitcnts=1 becomes the default
-enable-si-insert-waitcnts=0 to use old pass

Differential Revision: https://reviews.llvm.org/D33730

llvm-svn: 304551
2017-06-02 14:19:25 +00:00
Yaxun Liu a618acf923 [AMDGPU] Fix kernel arg segment size for amdgizcl
Differential Revision: https://reviews.llvm.org/D33307

llvm-svn: 304482
2017-06-01 21:31:53 +00:00
Matt Arsenault 3416b8c874 AMDGPU: Remove error on call in AsmPrinter
Partial revert of r301938 which is making it harder
to split patches up.

llvm-svn: 304418
2017-06-01 15:05:15 +00:00
Matt Arsenault 50f43e4168 AMDGPU: Set high getCSRFirstUseCost
llvm-svn: 304416
2017-06-01 14:38:02 +00:00
Matthias Braun d6a36ae282 TargetMachine: Indicate whether machine verifier passes.
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.

This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!

Differential Revision: https://reviews.llvm.org/D33696

llvm-svn: 304320
2017-05-31 18:41:23 +00:00
Mark Searles 11d0a04050 [AMDGPU] Fix bugs in new waitcnt pass. Add test.
- new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it
- fix handling of PERMUTE ops
- fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass )
- add new test

  Differential Revision: https://reviews.llvm.org/D33114

llvm-svn: 304311
2017-05-31 16:44:23 +00:00
Dmitry Preobrazhensky 793c592652 [AMDGPU][MC] New syntax for ds_swizzle_b32 offset
See Bug 28601: https://bugs.llvm.org//show_bug.cgi?id=28601

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33542

llvm-svn: 304309
2017-05-31 16:26:47 +00:00
Matthias Braun 5e394c3d6f TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.

While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.

llvm-svn: 304247
2017-05-30 21:36:41 +00:00
Stanislav Mekhanoshin 56ea488d8b [AMDGPU] Allow SDWA in instructions with immediates and SGPRs
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.

Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.

Differential Revision: https://reviews.llvm.org/D33583

llvm-svn: 304219
2017-05-30 16:49:24 +00:00
Mark Searles 00ce96f6ee [AMDGPU] Require waitcnt before barrier for all targets; adjust tests.
Differential Revision: https://reviews.llvm.org/D33576

llvm-svn: 304217
2017-05-30 16:22:43 +00:00
Konstantin Zhuravlyov b2ff8dfea0 Resubmit r303859 with test fixed.
[AMDGPU] add intrinsic for s_getpc

Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Patch by Tim Corringham

llvm-svn: 304031
2017-05-26 20:38:26 +00:00
Benjamin Kramer debb3c35e0 Make helper functions static. NFC.
llvm-svn: 304029
2017-05-26 20:09:00 +00:00
Dmitry Preobrazhensky 6a2431df0b [AMDGPU][MC][GFX9] Corrected encoding of flat_scratch* for SDWA opcodes
See bug 33171: https://bugs.llvm.org/show_bug.cgi?id=33171

Reviewers: Sam Kolton

Differential Revision: https://reviews.llvm.org/D33553

llvm-svn: 304015
2017-05-26 18:01:29 +00:00
Tom Stellard dde28a8c92 AMDGPU/GlobalISel: Mark 32-bit float constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33212

llvm-svn: 304003
2017-05-26 16:40:03 +00:00
Sam Kolton 363f47a2c7 [AMDGPU] SDWA: add disassembler support for GFX9
Summary: Added decoder methods and tests

Reviewers: vpykhtin, artem.tamazov, dp

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D33545

llvm-svn: 303999
2017-05-26 15:52:00 +00:00
Nico Weber b3d83a092a Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.
llvm-svn: 303902
2017-05-25 19:19:29 +00:00
Tim Corringham 32d0d38679 [AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32862

llvm-svn: 303859
2017-05-25 14:04:14 +00:00
Nirav Dave d20066cbad [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.
Various address spaces on the SI and R600 subtargets have stricter
limits on memory access size that other address spaces. Use
canMergeStoresTo predicate to prevent the DAGCombiner from creating
these stores as they will be split up during legalization.

llvm-svn: 303767
2017-05-24 15:59:09 +00:00
Marek Olsak 8973a0a22c Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns"
This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa.

It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of
the patterns, so it was putting 32-bit literals into the 8-bit field.

llvm-svn: 303754
2017-05-24 14:53:50 +00:00
Simon Pilgrim c910a70b21 [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)
This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045

Differential Revision: https://reviews.llvm.org/D33451

llvm-svn: 303691
2017-05-23 21:27:15 +00:00
Changpeng Fang 1dbace195d AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca
Summary:
  Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other.
As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage
related checking out and replace it after the calling convention checking.

Reviewer:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33139

llvm-svn: 303684
2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin 53a21292f8 [AMDGPU] Combine and (srl) into shl (bfe)
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.

It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.

TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.

Differential Revision: https://reviews.llvm.org/D33455

llvm-svn: 303681
2017-05-23 19:54:48 +00:00
Marek Olsak 7dadd86a35 AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns
This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4.

Reviewers: arsenm, nhaehnle, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28994

llvm-svn: 303658
2017-05-23 17:14:34 +00:00
Stanislav Mekhanoshin a96ec3f360 [AMDGPU] Convert shl (add) into add (shl)
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.

Differential Revision: https://reviews.llvm.org/D33432

llvm-svn: 303641
2017-05-23 15:59:58 +00:00
Sam Kolton f7659d71eb [AMDGPU] SDWA: Add assembler support for GFX9
Summary:
Added separate pseudo and real instruction for GFX9 SDWA instructions.
Currently supports only in assembler.
Depends D32493

Reviewers: vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D33132

llvm-svn: 303620
2017-05-23 10:08:55 +00:00
Stanislav Mekhanoshin 5fa289f0d8 [AMDGPU] Narrow lshl from 64 to 32 bit if possible
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)

Differential Revision: https://reviews.llvm.org/D33367

llvm-svn: 303569
2017-05-22 16:58:10 +00:00
Valery Pykhtin 74cb9c8831 [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker
Differential revision: https://reviews.llvm.org/D33289

llvm-svn: 303548
2017-05-22 13:09:40 +00:00
Dmitry Preobrazhensky ce941c9c38 [AMDGPU][MC] Corrected disassembler to decode instructions with 2 literals
See bug 32922: https://bugs.llvm.org//show_bug.cgi?id=32922

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D32912

llvm-svn: 303428
2017-05-19 14:27:52 +00:00
Dmitry Preobrazhensky 9321e8fcec [AMDGPU][MC] Fixed bugs in export instruction
See Bugs 33019, 33056:
  https://bugs.llvm.org//show_bug.cgi?id=33019
  https://bugs.llvm.org//show_bug.cgi?id=33056

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33288

llvm-svn: 303423
2017-05-19 13:36:09 +00:00
Francis Visoiu Mistrih 8b61764cbb [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00
Sam Kolton ebfdaf7394 [AMDGPU] SDWA operands should not intersect with potential MIs
Summary:
There should be no intesection between SDWA operands and potential MIs. E.g.:
```
v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0
v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0
v_add_u32 v3, v4, v2
```
In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it.

Reviewers: vpykhtin, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32804

llvm-svn: 303347
2017-05-18 12:12:03 +00:00
Matt Arsenault 2b1f9aa577 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Matt Arsenault 2525e4e4c2 AMDGPU: Expand frame indexes to be relative to scratch wave offset
In order for an arbitrary callee to access an object
in a caller's stack frame, the 32-bit offset used as
the private pointer needs to be relative to the kernel's
scratch wave offset register.

Convert to this by finding the difference from the current
stack frame and scaling by the wavefront size.

llvm-svn: 303303
2017-05-17 21:23:14 +00:00
Matt Arsenault 156d3ae0b6 AMDGPU: Change mubuf soffset register when SP relative
Check the MachinePointerInfo for whether the access is
supposed to be relative to the stack pointer.

No tests because this is used in later commits implementing
calls.

llvm-svn: 303301
2017-05-17 21:02:58 +00:00
Matt Arsenault 98f2946ab3 AMDGPU: Make better use of op_sel with high components
Handle more general swizzles.

llvm-svn: 303296
2017-05-17 20:30:58 +00:00
Matt Arsenault 786eeea23e AMDGPU: Try to use op_sel when selecting packed instructions
Avoids instructions to pack a vector when the source is really
a scalar being broadcast.

Also be smarter and look for per-component fneg.

Doesn't yet handle scalar from upper half of register
or other swizzles.

llvm-svn: 303291
2017-05-17 20:00:00 +00:00
Matt Arsenault ea8a4ed588 AMDGPU: Use appropriate soffset for spilling
This needs to be the frame offset register, and not the global
scratch wave offset register. For kernels, these are the same.

llvm-svn: 303287
2017-05-17 19:37:57 +00:00
Matt Arsenault ee324ffc1f AMDGPU: Fix min3/max3 combines for f16/i16
Fix missing instruction definitions for min3/max3.

llvm-svn: 303284
2017-05-17 19:25:06 +00:00
Stanislav Mekhanoshin acca0f5c02 [AMDGPU] Use GCNRPTracker dumper methods in scheduler
Differential Revision: https://reviews.llvm.org/D33244

llvm-svn: 303186
2017-05-16 16:31:45 +00:00
Stanislav Mekhanoshin b10860788f [AMDGPU] Cache live-ins and register pressure in scheduler
Using LIS can be quite expensive, so caching of calculated region
live-ins and pressure is implemented. It does two things:

1. Caches the info for the second stage when we schedule with
   decreased target occupancy.
2. Tracks the basic block from top to bottom thus eliminating the
   need to scan whole register file liveness at every region split
   in the middle of the block.

The scheduling is now done in 3 stages instead of two, with the first
one being really a no-op and only used to collect scheduling regions
as sent by the scheduler driver.

There is no functional change to the current behavior, only compilation
speed is affected. In general computeBlockPressure() could be simplified
if we switch to backward RP tracker, because scheduler sends regions
within a block starting from the last upward. We could use a natural
order of upward tracker to seamlessly change between regions of the same
block, since live reg set of a previous tracked region would become a
live-out of the next region. That however requires fixing upward tracker
to properly account defs and uses of the same instruction as both are
contributing to the current pressure. When we converge on the produced
pressure we should be able to switch between them back and forth. In
addition, backward tracker is less expensive as it uses LIS in recede
less often than forward uses it in advance.

At the moment the worst known case compilation time has improved from 26
minutes to 8.5.

Differential Revision: https://reviews.llvm.org/D33117

llvm-svn: 303184
2017-05-16 16:11:26 +00:00
Stanislav Mekhanoshin 464cecf81e [AMDGPU] Turn register pressure estimation into forward tracker
This factors register pressure estimation mechanism from the
GCNSchedStrategy into the forward tracker to unify interface
with other strategies and expose it to other interested phases.

Differential Revision: https://reviews.llvm.org/D33105

llvm-svn: 303179
2017-05-16 15:43:52 +00:00
NAKAMURA Takumi 994a43d27a AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]
llvm-svn: 303137
2017-05-16 04:01:23 +00:00
Davide Italiano 60d36c7506 [AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI.
llvm-svn: 303122
2017-05-15 22:10:15 +00:00
Jan Sjodin a06bfe054e Re-submit AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303111
2017-05-15 20:18:37 +00:00
Jan Sjodin 0e289822fa Revert 303091.
llvm-svn: 303098
2017-05-15 18:39:47 +00:00
Jan Sjodin e9d2ddc9dd Add AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303091
2017-05-15 18:13:56 +00:00
Dmitry Preobrazhensky 167f8b69e3 [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33123

llvm-svn: 303070
2017-05-15 14:28:23 +00:00
Dmitry Preobrazhensky 03852a9dca [AMDGPU][MC] Removed V_MQSAD_U16_U8
This instruction does not really exist

See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018

Reviewers: vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D33126

llvm-svn: 303055
2017-05-15 12:37:03 +00:00
Changpeng Fang 161e8c39af AMDGPU/SI: Don't promote to vector if the load/store is volatile.
Summary:
  We should not change volatile loads/stores in promoting alloca to vector.

Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33107

llvm-svn: 302943
2017-05-12 20:31:12 +00:00
Craig Topper 8df66c602a [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Tom Stellard a0d67c748a AMDGPU/GlobalISel: Mark 32-bit integer constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33115

llvm-svn: 302919
2017-05-12 16:46:46 +00:00
Davide Italiano 0dcc015a81 [AMDGPU] Placate unused variable warning in release builds.
llvm-svn: 302821
2017-05-11 19:58:52 +00:00
Matt Arsenault 47ccafe787 AMDGPU: Remove tfe bit from flat instruction definitions
We don't use it and it was removed in gfx9, and the encoding
bit repurposed.

Additionally actually using it requires changing the output register
class, which wasn't done anyway.

llvm-svn: 302814
2017-05-11 17:38:33 +00:00
Matt Arsenault bf5482e4bb AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

llvm-svn: 302813
2017-05-11 17:26:25 +00:00
Stanislav Mekhanoshin 33a97ec4ed [AMDGPU] Fix incorrect register pressure calculation
Earlier fix D32572 introduced a bug where live-ins were calculated
for basic block instead of scheduling region. This change fixes it.

Differential Revision: https://reviews.llvm.org/D33086

llvm-svn: 302812
2017-05-11 17:16:55 +00:00
Serge Guelton 1b421c259f Remove now useless trailing nullptr in StructType::get
llvm-svn: 302779
2017-05-11 08:46:02 +00:00
Matt Arsenault 3c5e4237c6 AMDGPU: Make some packed shuffles free
VOP3P instructions can encode access to either
half of the register.

llvm-svn: 302730
2017-05-10 21:29:33 +00:00
Matt Arsenault acdc7659cc AMDGPU: Add new subtarget features for gfx9 flat instructions
Flat instructions gain an immediate offset, and 2 new
sets of segment specific flat instructions are added.

llvm-svn: 302729
2017-05-10 21:19:05 +00:00
Dmitry Preobrazhensky da61a7f9ef [AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output
See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D32913

llvm-svn: 302648
2017-05-10 13:00:28 +00:00