Commit Graph

147 Commits

Author SHA1 Message Date
David Blaikie 1be62f0327 Move TargetFrameLowering.h to CodeGen where it's implemented
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.

This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.

llvm-svn: 317379
2017-11-03 22:32:11 +00:00
Benjamin Kramer f9ab3ddb8f [AMDGPU] Clean up symbols in the global namespace.
llvm-svn: 317051
2017-10-31 23:21:30 +00:00
Matt Arsenault 28f52e51f1 AMDGPU: Add max-mix-insts subtarget feature
llvm-svn: 316553
2017-10-25 07:00:51 +00:00
Konstantin Zhuravlyov 339e74440a AMDGPU: Initialize WavefrontSize from TD files
Differential Revision: https://reviews.llvm.org/D39205

llvm-svn: 316389
2017-10-23 23:02:39 +00:00
Matt Arsenault b791802aef AMDGPU: Fix default range in non-kernel functions
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.

llvm-svn: 316346
2017-10-23 17:09:35 +00:00
Konstantin Zhuravlyov eda425edd4 AMDGPU: Do not emit deprecated notes for code object v3
Differential Revision: https://reviews.llvm.org/D38749

llvm-svn: 315810
2017-10-14 15:59:07 +00:00
Stanislav Mekhanoshin d4ae470d2e [AMDGPU] Prevent post-RA scheduler from breaking memory clauses
The pre-RA scheduler does load/store clustering, but post-RA
scheduler undoes it. Add mutation to prevent it.

Differential Revision: https://reviews.llvm.org/D38014

llvm-svn: 313670
2017-09-19 20:54:38 +00:00
Dmitry Preobrazhensky ff64aa514b [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodes
See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152

Reviewers: SamWot, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D36674

llvm-svn: 311006
2017-08-16 13:51:56 +00:00
Quentin Colombet 61d71a138b Reapply "[GlobalISel] Remove the GISelAccessor API."
This reverts commit r310425, thus reapplying r310335 with a fix for link
issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON.

Original commit message:
[GlobalISel] Remove the GISelAccessor API.

Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.

NFC.

----
The fix for the link issue consists in adding the GlobalISel library in
the list of dependencies for the AArch64 unittests. This dependency
comes from the use of AArch64Subtarget that needs to know how
to destruct the GISel related APIs when being detroyed.

Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and
understand the problem.

llvm-svn: 310969
2017-08-15 22:31:51 +00:00
Quentin Colombet 8dd90fb54b Revert "[GlobalISel] Remove the GISelAccessor API."
This reverts commit r310115.

It causes a linker failure for the one of the unittests of AArch64 on one
of the linux bot:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429

: && /home/fedora/gcc/install/gcc-7.1.0/bin/g++   -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -O2
-L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined
-Wl,-O3 -Wl,--gc-sections
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o  -o
unittests/Target/AArch64/AArch64Tests
lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn
lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn
lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn
lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn
lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread
lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread
-Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib
&& :
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0):
undefined reference to `vtable for llvm::LegalizerInfo'
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8):
undefined reference to `vtable for llvm::RegisterBankInfo'

The particularity of this bot is that it is built with
BUILD_SHARED_LIBS=ON

However, I was not able to reproduce the problem so far.
Reverting to unblock the bot.

llvm-svn: 310425
2017-08-08 22:22:30 +00:00
Matt Arsenault 8728c5f2db AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.

Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.

llvm-svn: 310258
2017-08-07 14:58:04 +00:00
Quentin Colombet c046208c52 [GlobalISel] Remove the GISelAccessor API.
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.

NFC.

llvm-svn: 310115
2017-08-04 20:15:46 +00:00
Quentin Colombet 250e050a50 [GlobalISel] Make GlobalISel a non-optional library.
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.

llvm-svn: 309990
2017-08-03 21:52:25 +00:00
Matt Arsenault c37fe66ec5 AMDGPU: Add encoding for carryless add/sub instructions
llvm-svn: 308639
2017-07-20 17:42:47 +00:00
Konstantin Zhuravlyov 2ec725c9d8 AMDGPU: Fix amdgpu-flat-work-group-size/amdgpu-waves-per-eu check
Differential Revision: https://reviews.llvm.org/D35433

llvm-svn: 308147
2017-07-16 19:38:47 +00:00
Simon Pilgrim 0f5b35059d [AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI.
llvm-svn: 307381
2017-07-07 10:18:57 +00:00
Quentin Colombet f3f7d4d64b [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.
NFC

llvm-svn: 307186
2017-07-05 18:40:56 +00:00
Sam Kolton a179d25b99 [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.

Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye

Differential Revision: https://reviews.llvm.org/D34626

llvm-svn: 306413
2017-06-27 15:02:23 +00:00
Sam Kolton 3c4933fcc6 [AMDGPU] SDWA: add support for GFX9 in peephole pass
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026

Reviewers: vpykhtin, rampitec, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34241

llvm-svn: 305986
2017-06-22 06:26:41 +00:00
Konstantin Zhuravlyov be6c0ca5e2 AMDGPU: Make auto waitcnt before barrier a feature
Differential Revision: https://reviews.llvm.org/D33793

llvm-svn: 304571
2017-06-02 17:40:26 +00:00
Matt Arsenault acdc7659cc AMDGPU: Add new subtarget features for gfx9 flat instructions
Flat instructions gain an immediate offset, and 2 new
sets of segment specific flat instructions are added.

llvm-svn: 302729
2017-05-10 21:19:05 +00:00
Stanislav Mekhanoshin c90347d760 [AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local
id  and local size queries.

Differential Revision: https://reviews.llvm.org/D31804

llvm-svn: 300102
2017-04-12 20:48:56 +00:00
Matt Arsenault 678e111e11 AMDGPU: Fix crash when disassembling VOP3 mac
The unused dummy src2_modifiers is missing, so it crashes
when trying to print it.

I tried to fully remove src2_modifiers, but there are some
irritations in the places where it is converted to mad since
it starts to require modifying use lists while iterating over
them.

llvm-svn: 299861
2017-04-10 17:58:06 +00:00
Yaxun Liu 1a14bfa022 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284

llvm-svn: 298846
2017-03-27 14:04:01 +00:00
Matt Arsenault 9be7b0d485 AMDGPU: Add VOP3P instruction format
Add a few non-VOP3P but instructions related to packed.

Includes hack with dummy operands for the benefit of the assembler

llvm-svn: 296368
2017-02-27 18:49:11 +00:00
Matt Arsenault 2fdf2a1a18 AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

llvm-svn: 295788
2017-02-21 23:35:48 +00:00
Matt Arsenault 2021f08080 AMDGPU: Fix assembler subtarget predicate for gfx9
This was accepting GFX9 instructions on VI.

llvm-svn: 295557
2017-02-18 19:12:26 +00:00
Matt Arsenault e823d92f7f AMDGPU: Merge initial gfx9 support
llvm-svn: 295554
2017-02-18 18:29:53 +00:00
Alexander Timofeev 9f61feac4a Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.

llvm-svn: 295054
2017-02-14 14:29:05 +00:00
Wei Ding 205bfdb3e9 AMDGPU : Add trap handler support.
Differential Revision: http://reviews.llvm.org/D26010

llvm-svn: 294692
2017-02-10 02:15:29 +00:00
Konstantin Zhuravlyov fd87137710 [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement
Differential Revision: https://reviews.llvm.org/D29741

llvm-svn: 294627
2017-02-09 21:33:23 +00:00
Konstantin Zhuravlyov 9f89ede107 [AMDGPU] Add target information that is required by tools to metadata
Differential Revision: https://reviews.llvm.org/D28760#fb670e28

llvm-svn: 294449
2017-02-08 14:05:23 +00:00
Konstantin Zhuravlyov 27d64c3566 [AMDGPU][NFC] De-tabify
llvm-svn: 294445
2017-02-08 13:29:23 +00:00
Konstantin Zhuravlyov e03b1d7b6a [AMDGPU] Move register related queries to subtarget class
Differential Revision: https://reviews.llvm.org/D29318

llvm-svn: 294440
2017-02-08 13:02:33 +00:00
Alexander Timofeev a3dace3619 [AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track
lane masks.

	 Differential revision: https://reviews.llvm.org/D29442

llvm-svn: 294324
2017-02-07 17:57:48 +00:00
Stanislav Mekhanoshin 2b913b1f49 [AMDGPU] Account workgroup size in LDS occupancy limits
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.

In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.

Differential Revision: https://reviews.llvm.org/D29423

llvm-svn: 293837
2017-02-01 22:59:50 +00:00
Matt Arsenault d8f7ea381f AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands
Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 293310
2017-01-27 17:42:26 +00:00
Tom Stellard 2f3f9855f0 AMDGPU add support for spilling to a user sgpr pointed buffers
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Patch By: Dave Airlie

Reviewers: nhaehnle, arsenm, tstellarAMD

Reviewed By: arsenm

Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25428

llvm-svn: 293000
2017-01-25 01:25:13 +00:00
Matt Arsenault 7aad8fd8f4 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Matt Arsenault a6867fd441 AMDGPU: Combine fp16/fp64 subtarget features
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.

llvm-svn: 292837
2017-01-23 22:31:03 +00:00
Benjamin Kramer 11590b8281 Pacify -Wreorder.
llvm-svn: 292599
2017-01-20 10:37:53 +00:00
Sam Kolton 07dbde214b [AMDGPU] Add subtarget features for SDWA/DPP
Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28900

llvm-svn: 292596
2017-01-20 10:01:25 +00:00
Eugene Zelenko 6a9226d9b8 [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289475
2016-12-12 22:23:53 +00:00
Alexander Timofeev 18009560c5 [AMDGPU] Scalarization of global uniform loads.
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076
2016-12-08 17:28:47 +00:00
Konstantin Zhuravlyov f86e4b7266 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975

llvm-svn: 286753
2016-11-13 07:01:11 +00:00
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Matt Arsenault 4eae301995 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

llvm-svn: 285447
2016-10-28 20:31:47 +00:00
Tom Stellard 64a9d0876c AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Matt Arsenault cc88ce36ed AMDGPU: Add instruction definitions for VGPR indexing
VI added a second method of indexing into VGPRs
besides using v_movrel*

llvm-svn: 284027
2016-10-12 18:00:51 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Tom Stellard 0d23ebe888 AMDGPU/SI: Implement a custom MachineSchedStrategy
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

llvm-svn: 279995
2016-08-29 19:42:52 +00:00
Matt Arsenault 56684d4538 AMDGPU: Fix crashes on memory functions
llvm-svn: 278369
2016-08-11 17:31:42 +00:00
Matt Arsenault cdae95bef2 AMDGPU: Delete dead code
llvm-svn: 276675
2016-07-25 19:06:25 +00:00
Matt Arsenault 7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Duncan P. N. Exon Smith 632987296f Target: Remove unused arguments from overrideSchedPolicy, NFC
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator.  One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.

llvm-svn: 274304
2016-07-01 00:23:27 +00:00
Matt Arsenault eb9025d698 AMDGPU: Fix global isel crashes
llvm-svn: 274039
2016-06-28 17:42:09 +00:00
Matt Arsenault 55dff27122 AMDGPU: Fix global isel build
llvm-svn: 273964
2016-06-28 00:11:26 +00:00
Matt Arsenault 59c0ffa22a AMDGPU: Implement per-function subtargets
llvm-svn: 273940
2016-06-27 20:48:03 +00:00
Matt Arsenault 03d8584590 AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
2016-06-27 20:32:13 +00:00
Konstantin Zhuravlyov f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Matt Arsenault c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Matt Arsenault 43e92fe306 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Matt Arsenault 0e5befe315 AMDGPU: Make FrameLowering stack alignment 16
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.

llvm-svn: 273448
2016-06-22 17:47:39 +00:00
Matt Arsenault 8e00194be8 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

llvm-svn: 271561
2016-06-02 18:37:16 +00:00
Changpeng Fang 71369b3a39 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

llvm-svn: 270894
2016-05-26 19:35:29 +00:00
Konstantin Zhuravlyov 29ddd2b2f2 [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081

llvm-svn: 270594
2016-05-24 18:37:18 +00:00
Matt Arsenault 8a028bf4d7 AMDGPU: Fix promote alloca pass creating huge arrays
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.

By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.

Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.

llvm-svn: 269708
2016-05-16 21:19:59 +00:00
Matt Arsenault e8ed8e59e5 AMDGPU: Change private_element_size to 4
llvm-svn: 269145
2016-05-11 00:28:54 +00:00
Konstantin Zhuravlyov 1d99c4d03c [AMDGPU] Reserve VGPRs for trap handler usage if instructed
Differential Revision: http://reviews.llvm.org/D19235

llvm-svn: 267563
2016-04-26 15:43:14 +00:00
Konstantin Zhuravlyov 8c273ad719 [AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test

Differential Revision: http://reviews.llvm.org/D19079

llvm-svn: 266626
2016-04-18 16:28:23 +00:00
Tom Stellard 000c5af3e6 AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

llvm-svn: 266356
2016-04-14 19:09:28 +00:00
Nicolai Haehnle df3a20cd80 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Tom Stellard 0bc954e3bc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

llvm-svn: 264876
2016-03-30 16:35:09 +00:00
Matt Arsenault 3a61985b2f AMDGPU: More bits of frame index are known to be zero
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.

This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.

llvm-svn: 262153
2016-02-27 20:26:57 +00:00
Matt Arsenault 9d82ee7526 AMDGPU: Split vi-insts subtarget feature
This will be more useful for marking builtins acceptable for which
subtargets.

llvm-svn: 262121
2016-02-27 08:53:55 +00:00
Matt Arsenault 61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Matt Arsenault 24ee0785dd AMDGPU: Set element_size in private resource descriptor
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.

llvm-svn: 260651
2016-02-12 02:40:47 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault b22828f2fb AMDGPU: Fix default device handling
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.

Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.

Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.

Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.

llvm-svn: 258901
2016-01-27 02:17:49 +00:00
Matt Arsenault 2a93bb6365 AMDGPU: Remove Feature64BitPtr
This is a leftover from AMDIL that doesn't do anything
and doesn't belong here.

llvm-svn: 258606
2016-01-23 05:32:14 +00:00
Tom Stellard de008d338c AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute
Summary:
Currently the SI scheduler can be selected via command line option,
but it turned out it would be better if it was selectable via a Target Attribute.

This patch adds "si-scheduler" attribute to the backend.

Reviewers: tstellarAMD, echristo

Subscribers: echristo, arsenm

Differential Revision: http://reviews.llvm.org/D16192

llvm-svn: 258386
2016-01-21 04:28:34 +00:00
Matt Arsenault e83690c1cc AMDGPU: Add subtarget feature for instruction rates
llvm-svn: 258085
2016-01-18 21:13:50 +00:00
Nicolai Haehnle 5b50497617 AMDGPU: add +xnack feature
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.

The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15869

llvm-svn: 256794
2016-01-04 23:35:53 +00:00
Changpeng Fang b41574a961 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

  NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256282
2015-12-22 20:55:23 +00:00
Rafael Espindola 4b0d24c00a Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"
This reverts commit r256273.

It broke CodeGen/AMDGPU/llvm.dbg.value.ll

llvm-svn: 256275
2015-12-22 19:46:44 +00:00
Changpeng Fang 9b8a9be058 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256273
2015-12-22 19:32:28 +00:00
Matt Arsenault f59e538937 AMDGPU: Cleanup includes
llvm-svn: 252328
2015-11-06 18:23:00 +00:00
Matt Arsenault 0c90e9501e AMDGPU: Create emergency stack slots during frame lowering
Test has a bogus verifier error which will be fixed by later commits.

llvm-svn: 252327
2015-11-06 18:17:45 +00:00
Daniel Sanders 50f17235dd Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders 153010c52d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders c40de48041 Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders 18d4b0dab7 Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Tom Stellard c98ee20328 AMDPGU/SI: Use AssertZext node to mask high bit for scratch offsets
Summary:
We can safely assume that the high bit of scratch offsets will never
be set, because this would require at least 128 GB of GPU memory.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11225

llvm-svn: 242433
2015-07-16 19:40:07 +00:00
Matt Arsenault 706f930b72 AMDGPU/SI: Add debugging subtarget feature for DS offsets
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.

llvm-svn: 241462
2015-07-06 16:01:58 +00:00
Tom Stellard 347ac79b15 AMDGPU/SI: Add hsa code object directives
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10757

llvm-svn: 240831
2015-06-26 21:15:07 +00:00
Tom Stellard 45bb48ea19 R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00