Commit Graph

2073 Commits

Author SHA1 Message Date
Matt Arsenault bd57cea6e4 AMDGPU: Implement getMinimumNopSize
llvm-svn: 310310
2017-08-07 22:00:58 +00:00
Connor Abbott 79f3ade51a [AMDGPU] Add pseudo "old" source to all DPP instructions
Summary:
All instructions with the DPP modifier may not write to certain lanes of
the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask
aren't set, so the destination register may be both defined and modified.
The right way to handle this is to add a constraint that the destination
register is the same as one of the inputs. We could tie the destination
to the first source, but that would be too restrictive for some use-cases
where we want the destination to be some other value before the
instruction executes. Instead, add a fake "old" source and tie it to the
destination. Effectively, the "old" source defines what value unwritten
lanes will get. We'll expose this functionality to users with a new
intrinsic later.

Also, we want to use DPP instructions for computing derivatives, which
means we need to set WQM for them. We also need to enable the entire
wavefront when using DPP intrinsics to implement nonuniform subgroup
reductions, since otherwise we'll get incorrect results in some cases.
To accomodate this, add a new operand to all DPP instructions which will
be interpreted by the SI WQM pass. This will be exposed with a new
intrinsic later. We'll also add support for Whole Wavefront Mode later.

I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up
the test. However, I could also keep the old behavior (where lanes that
aren't written are undefined) if people want it.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34716

llvm-svn: 310283
2017-08-07 19:10:56 +00:00
Matt Arsenault 36b4b0bed7 AMDGPU: Remove -mcpu=SI
Leftover from before amdgcn/r600 split.

llvm-svn: 310277
2017-08-07 18:30:35 +00:00
Matt Arsenault 9d288e69f5 AMDGPU: Remove redundant opt level check
addOptimizedRegAlloc isn't used for -O0 already.

llvm-svn: 310275
2017-08-07 18:12:48 +00:00
Matt Arsenault 3db456820d AMDGPU: Remove FixControlFlowLiveIntervals pass
This hasn't done anything in a long time. This was
running after the the control flow pseudos were expanded,
so this would never find them. The control flow pseudo
expansion was moved to solve the problem this pass was
supposed to solve in the first place, except handling
it earlier also fixes it for fast regalloc which doesn't
use LiveIntervals.

Noticed by checking LCOV reports.

llvm-svn: 310274
2017-08-07 18:12:47 +00:00
Matt Arsenault aac47c1c00 AMDGPU: Use a custom areInlineCompatible
Fixes not inlining OpenCL library functions on AMDGPU,
which don't have an explicitly set target-cpu.

llvm-svn: 310269
2017-08-07 17:08:44 +00:00
Matt Arsenault 8728c5f2db AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.

Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.

llvm-svn: 310258
2017-08-07 14:58:04 +00:00
Dmitry Preobrazhensky 50805a0b83 [AMDGPU][MC] Corrected VOP3 version of v_interp_* instructions for VI
See bug 32621: https://bugs.llvm.org//show_bug.cgi?id=32621

Reviewers: vpykhtin, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35902

llvm-svn: 310251
2017-08-07 13:14:12 +00:00
Matt Arsenault a7eb14afc7 AMDGPU: Fix typo in feature description
llvm-svn: 310217
2017-08-06 18:13:23 +00:00
Quentin Colombet c046208c52 [GlobalISel] Remove the GISelAccessor API.
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.

NFC.

llvm-svn: 310115
2017-08-04 20:15:46 +00:00
Connor Abbott 66b9bd6e50 [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34719

llvm-svn: 310088
2017-08-04 18:36:54 +00:00
Connor Abbott 92638ab625 [AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level.  This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.

Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.

Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35524

llvm-svn: 310087
2017-08-04 18:36:52 +00:00
Connor Abbott de068fe8b4 [AMDGPU] refactor WQM pass in preparation for WWM (NFCI)
Summary:
Right now, the WQM pass conflates two different things when tracking the
Needs of an instruction:

1. Needs can be StateWQM, which is propagated to other instructions, and
means that this instruction (and everything it depends on) must be
calculated in WQM.
2. Needs can be StateExact, which is not propagated to other
instructions, and means that this instruction must not be calculated in
WQM and WQM-ness must not be propagated past this instruction.

This works now because there are only two different states, but in the
future we want to be able to express things like "calculate this in WQM,
but please disable WWM and don't propagate it" (to implement
@llvm.amdgcn.set.inactive). In order to do this, we need to split the
per-instruction Needs field in two: a new Needs field, which can only
contain StateWQM (and in the future, StateWWM) and is propagated to
sources, and a Disables field, which can also contain just StateWQM or
nothing for now.

We keep the per-block tracking the same for now, by translating
Needs/Disables to the old representation with only StateWQM or
StateExact. The other place that needs special handling is when we
emit the state transitions. We could just translate back to the old
representation there as well, which we almost do, but instead of 0 as a
placeholder value for "any state," we explicitly or together all the
states an instruction is allowed to be in. This lets us refactor the
code in preparation for WWM, where we'll need to be able to handle
things like "this instruction must be in Exact or WQM, but not WWM."

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35523

llvm-svn: 310086
2017-08-04 18:36:50 +00:00
Connor Abbott 8c217d0a29 [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.

Reviewers: arsenm, tpr, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35167

llvm-svn: 310085
2017-08-04 18:36:49 +00:00
Dmitry Preobrazhensky 4b11a78a6e [AMDGPU][MC] Enabled expressions as operands
See bug 33579: https://bugs.llvm.org//show_bug.cgi?id=33579

Reviewers: vpykhtin, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D36091

llvm-svn: 310059
2017-08-04 13:55:24 +00:00
Florian Gross 2feb105882 [AMDGPU] Fixed MSVC build break
Error was:

field of type 'llvm::ArgDescriptor' has private default constructor
const AMDGPUFunctionArgInfo AMDGPUArgumentUsageInfo::ExternFunctionInfo{};
                                                                        ^

llvm-svn: 310048
2017-08-04 10:53:07 +00:00
Stanislav Mekhanoshin 6c7a8d0b5f [AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILL
In case if SI_KILL is in between of the SI_IF and SI_END_CF we need
to preserve the bits actually flipped by if rather then restoring
the original mask.

Differential Revision: https://reviews.llvm.org/D36299

llvm-svn: 310031
2017-08-04 06:58:42 +00:00
Connor Abbott 00755362b9 [AMDGPU] Add missing hazard for DPP-after-EXEC-write
Summary:
Following the docs, we need at least 5 wait states between an EXEC write
and an instruction that uses DPP.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34849

llvm-svn: 310013
2017-08-04 01:09:43 +00:00
Matt Arsenault 5c921a9291 AMDGPU: Remove pointless asserts
llvm-svn: 310007
2017-08-04 00:00:13 +00:00
Matt Arsenault a176cc5b93 AMDGPU: Don't use report_fatal_error for unsupported call types
llvm-svn: 310004
2017-08-03 23:32:41 +00:00
Matt Arsenault a202538bfa AMDGPU: Remove error on calls for amdgcn
Repurpose the -amdgpu-function-calls flag. Rather
than require it to emit a call, only use it to
run the always inline path or not.

llvm-svn: 310003
2017-08-03 23:24:05 +00:00
Matt Arsenault 817c253e60 AMDGPU: Fix implicitarg.ptr handling special inputs
llvm-svn: 310002
2017-08-03 23:12:44 +00:00
Matt Arsenault 8623e8d864 AMDGPU: Pass special input registers to functions
llvm-svn: 309998
2017-08-03 23:00:29 +00:00
Matt Arsenault 7016f13450 AMDGPU: Add analysis pass for function argument info
This will allow only adding necessary inputs to callee functions
that need special inputs forwarded from the kernel.

llvm-svn: 309996
2017-08-03 22:30:46 +00:00
Quentin Colombet 250e050a50 [GlobalISel] Make GlobalISel a non-optional library.
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.

llvm-svn: 309990
2017-08-03 21:52:25 +00:00
Changpeng Fang ef4dbb46da AMDGPU/SI: Don't fix a PHI under uniform branch in SIFixSGPRCopies only when sources and destination are all sgprs
Summary:
  If a PHI has at lease one VGPR operand, we have to fix the PHI
in SIFixSGPRCopies.

Reviewer:
  Matt

Differential Revision:
  http://reviews.llvm.org/D34727

llvm-svn: 309959
2017-08-03 16:37:02 +00:00
Rafael Espindola 79e238afee Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00
Tom Stellard 3337d74399 AMDGPU/GlobalISel: Mark 32-bit G_FMUL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36218

llvm-svn: 309898
2017-08-02 22:56:30 +00:00
Tom Stellard a2f57be260 AMDGPU/R600: Initialize more passes
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D36128

llvm-svn: 309893
2017-08-02 22:19:45 +00:00
Matt Arsenault 2738ede6b2 AMDGPU: Restore using MRI to find highest used regs
If there are no calls, this is a faster path than
searching the entire program for calls.

This was supposed to be left in r309781.
Fixes unused variable warning.

llvm-svn: 309832
2017-08-02 17:15:01 +00:00
Matt Arsenault 8e8f8f43b0 AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault 1d6317c3ad AMDGPU: Fix emitting encoded calls
This was failing on out of bounds access to the extra operands
on the s_swappc_b64 beyond those in the instruction definition.

This was working, but somehow regressed within the past few weeks,
although I don't see any obvious commit.

llvm-svn: 309782
2017-08-02 01:42:04 +00:00
Matt Arsenault 6ed7b9bfc0 AMDGPU: Analyze callee resource usage in AsmPrinter
llvm-svn: 309781
2017-08-02 01:31:28 +00:00
Stanislav Mekhanoshin f23ae4fbe9 [AMDGPU] Fix asan error after last commit
Previous change "Turn s_and_saveexec_b64 into s_and_b64 if
result is unused" introduced asan use-after-poison error.
Instruction was analyzed after eraseFromParent() calls.

Move analysys higher than erase.

llvm-svn: 309779
2017-08-02 01:18:57 +00:00
Matt Arsenault d1867c0345 AMDGPU: Don't place arguments in emergency stack slot
When finding the fixed offsets for function arguments,
this needs to skip over the 4 bytes reserved for the
emergency stack slot.

llvm-svn: 309776
2017-08-02 00:59:51 +00:00
Stanislav Mekhanoshin da0edef1bd [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused
With SI_END_CF elimination for some nested control flow we can now
eliminate saved exec register completely by turning a saveexec version
of instruction into just a logical instruction.

Differential Revision: https://reviews.llvm.org/D36007

llvm-svn: 309766
2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin 37e7f959c0 [AMDGPU] Collapse adjacent SI_END_CF
Add a pass to remove redundant S_OR_B64 instructions enabling lanes in
the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any
vector instructions between them we can only keep outer SI_END_CF, given
that CFG is structured and exec bits of the outer end statement are always
not less than exec bit of the inner one.

This needs to be done before the RA to eliminate saved exec bits registers
but after register coalescer to have no vector registers copies in between
of different end cf statements.

Differential Revision: https://reviews.llvm.org/D35967

llvm-svn: 309762
2017-08-01 23:14:32 +00:00
Matt Arsenault 206f826348 AMDGPU: Fix handling of div_scale with undef inputs
The src0 register must match src1 or src2, but if these
were undefined they could end up using different implicit_defed
virtual registers. Force these to use one undef vreg or pick the
defined other register.

Also fixes producing invalid nodes without the right number of
inputs when src2 is undef.

llvm-svn: 309743
2017-08-01 20:49:41 +00:00
Matt Arsenault b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00
Davide Italiano ffb1098e92 [AMDGPU] Put a function used only inside assert() under NDEBUG.
llvm-svn: 309723
2017-08-01 19:07:20 +00:00
Tom Stellard 9d8337d857 AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35916

llvm-svn: 309675
2017-08-01 12:38:33 +00:00
Florian Hahn 6b3216aad8 Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

llvm-svn: 309553
2017-07-31 10:07:49 +00:00
Tom Stellard 503fd446ad AMDGPU: Remove deadcode from AMDGPUInstPrinter
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36034

llvm-svn: 309477
2017-07-29 03:56:53 +00:00
Tom Stellard 5c50cdf0e8 AMDGPU: Move INDIRECT_BASE_ADDR definition out of common files
Summary: This is only used by R600.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35926

llvm-svn: 309476
2017-07-29 03:44:07 +00:00
Matt Arsenault 9608a2891d AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat
Checking the encoding is insufficient since now there can
be global or scratch instructions.

llvm-svn: 309472
2017-07-29 01:26:21 +00:00
Matt Arsenault dc8f5cc39c AMDGPU: Teach isLegalAddressingMode about global_* instructions
Also refine the flat check to respect flat-for-global feature,
and constant fallback should check global handling, not
specifically MUBUF.

llvm-svn: 309471
2017-07-29 01:12:31 +00:00
Matt Arsenault 4e309b0861 AMDGPU: Start selecting global instructions
llvm-svn: 309470
2017-07-29 01:03:53 +00:00
Matt Arsenault da9ab148f3 AMDGPU: Look through a bitcast user of an out argument
This allows handling of a lot more of the interesting
cases in Blender. Most of the large functions unlikely
to be inlined have this pattern.

This is a special case for what clang emits for OpenCL 3
element vectors. Annoyingly, these are emitted as
<3 x elt>* pointers, but accessed as <4 x elt>* operations.
This also needs to handle cases where a struct containing
a single vector is used.

llvm-svn: 309419
2017-07-28 19:06:16 +00:00
Matt Arsenault c06574ffc0 AMDGPU: Add pass to replace out arguments
It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.

I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.

I'm also not sure where it should run.I think it should
run  before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.

llvm-svn: 309416
2017-07-28 18:40:05 +00:00
Matt Arsenault 9166ce86e8 AMDGPU: Annotate implicitarg.ptr usage
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.

Also fixes missing test for the intrinsic.

llvm-svn: 309398
2017-07-28 15:52:08 +00:00
Stanislav Mekhanoshin 3197eb6981 [AMDGPU] Optimize SI_IF lowering for simple if regions
Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64.
The xor is used to extract only the changed bits. In case of a simple
if region where the only use of that value is in the SI_END_CF to
restore the old exec mask, we can omit the xor and perform an or of
the exec mask with the original exec value saved by the
s_and_saveexec_b64.

Differential Revision: https://reviews.llvm.org/D35861

llvm-svn: 309185
2017-07-26 21:29:15 +00:00
Wei Ding a126a13bb3 AMDGPU : Widen extending scalar loads to 32-bits.
Differential Revision: http://reviews.llvm.org/D35146

llvm-svn: 309178
2017-07-26 21:07:28 +00:00
Matt Arsenault 894e53d6ac AMDGPU: Fix using SMRD instructions for argument loads in functions
These are not actually uniform values except in kernels.

llvm-svn: 309172
2017-07-26 20:39:42 +00:00
Tom Stellard 55038cd1d3 AMDGPU/GlobalISel: Mark 32-bit G_OR as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35127

llvm-svn: 309165
2017-07-26 20:00:53 +00:00
Zvi Rackover 1b73682243 TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI.
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.

This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.

llvm-svn: 309085
2017-07-26 08:06:58 +00:00
Marek Olsak 6096f542d1 AMDGPU/SI: Fix Depth and Height computation for SI scheduler
Patch by: Axel Davy

Differential Revision: https://reviews.llvm.org/D34967

llvm-svn: 309028
2017-07-25 20:37:03 +00:00
Marek Olsak e6f74384b1 AMDGPU/SI: Force exports at the end for SI scheduler
Patch by: Axel Davy

Differential Revision: https://reviews.llvm.org/D34965

llvm-svn: 309027
2017-07-25 20:36:58 +00:00
Matt Arsenault 7052a6a505 AMDGPU: Fix allocating pseudo-registers
There's no need for these to be part of a class since
they are immediately replaced. New unreachable hit in
existing tests.'

llvm-svn: 308903
2017-07-24 18:06:15 +00:00
Matt Arsenault 416d755675 AMDGPU: Remove leftover td file
All of the instructions were moved out of this a while ago,
so it's just a useless comment now.

llvm-svn: 308815
2017-07-22 00:40:46 +00:00
Konstantin Zhuravlyov e9a5a77ee3 AMDGPU: Implement memory model
llvm-svn: 308781
2017-07-21 21:19:23 +00:00
Konstantin Zhuravlyov 070d88e335 AMDGPU: Introduce maybeAtomic instruction flag
Testing is in the follow up change

llvm-svn: 308779
2017-07-21 21:05:45 +00:00
Matt Arsenault f014d7cbde AMDGPU: Preserve undef flag in eliminateFrameIndex
Fixes verifier errors in some call tests.
Not sure why we haven't run into this before.

Test split into separate patch for once
call support is committed.

llvm-svn: 308774
2017-07-21 19:31:44 +00:00
Matt Arsenault 0ed39d329d AMDGPU: Partially fix improper reliance on memoperands
There are 2 more places doing this, but I'm not sure
what they are doing and don't make any sense to me

llvm-svn: 308770
2017-07-21 18:54:54 +00:00
Matt Arsenault 6ab9ea9614 AMDGPU: Don't track lgkmcnt for global_/scratch_ instructions
llvm-svn: 308766
2017-07-21 18:34:51 +00:00
Matt Arsenault 37a58e03c7 AMDGPU: Fix getMemOpBaseRegImmOfs for flat with offsets
llvm-svn: 308762
2017-07-21 18:06:36 +00:00
Matt Arsenault ca7b0a1777 AMDGPU: Add instruction definitions for some scratch_* instructions
Omit atomics for now since they probably aren't useful.

llvm-svn: 308747
2017-07-21 15:36:16 +00:00
Dmitry Preobrazhensky abf2839478 [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifier
See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591

Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35424

llvm-svn: 308740
2017-07-21 13:54:11 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Matt Arsenault e5456ce5e5 AMDGPU: Rename _RTN atomic instructions
Move the _RTN to the end of the name. It reads
better if the other addressing mode components
line up with the non-RTN version. It is also
more convenient to define saddr variants of
FLAT atomics to have the RTN last, and it is
good to have a consistent naming scheme.

llvm-svn: 308674
2017-07-20 21:06:04 +00:00
Matt Arsenault db78273b6e Add an ID field to StackObjects
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.

This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.

This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.

Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.

llvm-svn: 308673
2017-07-20 21:03:45 +00:00
Krzysztof Parzyszek f3a778d757 Implement LaneBitmask::getNumLanes and LaneBitmask::getHighestLane
This should eliminate most uses of countPopulation and Log2_32 on
the lane mask values.

llvm-svn: 308658
2017-07-20 19:43:19 +00:00
Krzysztof Parzyszek e9f0c1e031 Use LaneBitmask::getLane in a few more places
llvm-svn: 308655
2017-07-20 19:15:56 +00:00
Matt Arsenault c37fe66ec5 AMDGPU: Add encoding for carryless add/sub instructions
llvm-svn: 308639
2017-07-20 17:42:47 +00:00
Matt Arsenault f65c5ac9c9 AMDGPU: Add encodings for global atomics
llvm-svn: 308638
2017-07-20 17:31:56 +00:00
Matt Arsenault 04004716ff AMDGPU: Correct encoding for global instructions
The soffset field needs to be be set to 0x7f to disable it,
not 0. 0 is interpreted as an SGPR offset.

This should be enough to get basic usage of the global instructions
working. Technically it is possible to use an SGPR_32 offset,
but I'm not sure if it's correct with 64-bit pointers, but
that is not handled now. This should also be cleaned up
to be more similar to how different MUBUF modes are handled,
and to have InstrMappings between the different types.

llvm-svn: 308583
2017-07-20 05:17:54 +00:00
Matt Arsenault 254ad3de5c AMDGPU: Annotate necessity of flat-scratch-init
As an approximation of the existing handling to avoid
regressions. Fixes using too many registers with calls
on subtargets with the SGPR allocation bug.

llvm-svn: 308326
2017-07-18 16:44:58 +00:00
Matt Arsenault 1cc47f8413 AMDGPU: Figure out private memory regs after lowering
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.

This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.

Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.

llvm-svn: 308325
2017-07-18 16:44:56 +00:00
Nicolai Haehnle a253e4c028 AMDGPU: Fix crash when folding immediates into multiple uses
Summary:
When an immediate is folded by constant folding, we re-scan the entire
use list for two reasons:

1. The constant folding may have created a new use of the same reg.
2. The constant folding may have removed an additional use in the list
   we're currently traversing (e.g., constant folding an S_ADD_I32 c, c).

However, this could previously lead to a crash when an unrelated use was
added twice into the FoldList. Since we re-scan the whole list anyway, we
might as well just clear the FoldList again before we do so.

Using a MIR test to show this because real code seems to trigger the issue
only in connection with some really subtle control flow structures.

Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35416

llvm-svn: 308314
2017-07-18 14:54:41 +00:00
Sam Kolton 4685b70a77 [AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
llvm-svn: 308310
2017-07-18 14:23:26 +00:00
Dmitry Preobrazhensky 30fc523984 [AMDGPU][MC] Corrected disassembler for proper decoding of v_mqsad_u32_u8
See Bug 33639: https://bugs.llvm.org//show_bug.cgi?id=33639

Reviewers: vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D34892

llvm-svn: 308303
2017-07-18 13:12:48 +00:00
Dmitry Preobrazhensky 00deef8f00 [AMDGPU][MC] Optimized IsRegIntersect function
Optimized IsRegIntersect by using MCRegAliasIterator

See Bug 33800: https://bugs.llvm.org//show_bug.cgi?id=33800

Reviewers: arsenm, artem.tamazov

Differential Revision: https://reviews.llvm.org/D35452

llvm-svn: 308294
2017-07-18 11:14:02 +00:00
Dmitry Preobrazhensky 095ec3da81 [AMDGPU][MC] Added missing VOP3P opcodes
Added support of the following opcodes:
  v_pk_sub_u16
  v_pk_mad_i16
  v_pk_mad_u16

See Bug 33593: https://bugs.llvm.org//show_bug.cgi?id=33593

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D34890

llvm-svn: 308281
2017-07-18 09:24:10 +00:00
Chandler Carruth 9a7442d088 Revert r308179 which causes tablegen to spam stderr on every build.
Original commit log:
[AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions

llvm-svn: 308270
2017-07-18 07:40:47 +00:00
Matt Arsenault e15855d9e3 AMDGPU: Annotate features from x work item/group IDs.
This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.

llvm-svn: 308226
2017-07-17 22:35:50 +00:00
Sam Kolton a2b9e2f755 [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D35350

llvm-svn: 308179
2017-07-17 14:23:38 +00:00
Konstantin Zhuravlyov 2ec725c9d8 AMDGPU: Fix amdgpu-flat-work-group-size/amdgpu-waves-per-eu check
Differential Revision: https://reviews.llvm.org/D35433

llvm-svn: 308147
2017-07-16 19:38:47 +00:00
Konstantin Zhuravlyov 163af2ed7a AMDGPU: Remove duplicate print outs from .AMDGPU.csdata
Differential Revision: https://reviews.llvm.org/D35428

llvm-svn: 308145
2017-07-16 19:24:08 +00:00
Hiroshi Inoue 7f46baff2c fix typos in comments; NFC
llvm-svn: 308127
2017-07-16 08:11:56 +00:00
Matt Arsenault b34635550a AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
2017-07-15 05:52:59 +00:00
Davide Italiano 6fdfede10d [AMDGPU] Throw away more dead code. NFCI.
llvm-svn: 308055
2017-07-14 21:20:29 +00:00
Davide Italiano 502ac724ac [AMDGPU] Garbage collect dead code. NFCI.
Unbreaks the build with GCC7.

llvm-svn: 308047
2017-07-14 18:47:29 +00:00
Alfred Huang 5b27072f57 [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.

Differential Revision: https://reviews.llvm.org/D34726

llvm-svn: 308039
2017-07-14 17:56:55 +00:00
Matt Arsenault 23e4df6a59 AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer
to callee functions. Also don't unconditionally enable
for kernels.

llvm-svn: 307978
2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin dc2890a887 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

llvm-svn: 307976
2017-07-13 23:59:15 +00:00
Matt Arsenault 6b93046f29 AMDGPU: Annotate call graph with used features
Previously this wouldn't detect used features indirectly
used in callee functions.

llvm-svn: 307967
2017-07-13 21:43:42 +00:00
Hiroshi Inoue e9dea6e613 fix typos in comments and error messges; NFC
llvm-svn: 307885
2017-07-13 06:48:39 +00:00
Matt Arsenault ce34ac588e AMDGPU: Fix converting unanalyzable global loads to SMRD
Not all memory dependence queries succeed, so this needs to
be conservative if it fails.

llvm-svn: 307861
2017-07-12 23:06:18 +00:00
Stanislav Mekhanoshin 5680b0ca9f [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

llvm-svn: 307848
2017-07-12 21:20:28 +00:00
Rafael Espindola 1beb702ba2 Fully fix the movw/movt addend.
The issue is not if the value is pcrel. It is whether we have a
relocation or not.

If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.

llvm-svn: 307730
2017-07-11 23:18:25 +00:00
Evandro Menezes 0cd23f5642 [CodeGen] Rename DEBUG_TYPE to match passnames
Rename missing DEBUG_TYPE "machine-scheduler" from backend files, which were
absent from https://reviews.llvm.org/rL303921.

Differential revision: https://reviews.llvm.org/D35231

llvm-svn: 307719
2017-07-11 22:08:28 +00:00