Commit Graph

861 Commits

Author SHA1 Message Date
Matt Arsenault 7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Matt Arsenault 8af47a09e5 AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Matt Arsenault 327bb5ad82 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Matt Arsenault 105c2a204c AMDGPU/SI: Enable testing several variants for si scheduler
Enable testing different scheduling variants if sgpr usage
is very high. It was previously disabled because of a bug
in handleMove, but it has been fixed since.

Patch by Axel Davy

llvm-svn: 274372
2016-07-01 18:03:46 +00:00
Nikolay Haustov beb24f5b20 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 274341
2016-07-01 10:00:58 +00:00
Sam Kolton 5196b88f07 [AMDGPU] Assembler: support SDWA for VOPC instructions
Summary: dst_sel and dst_unused disabled for VOPC as they have no effect on result

Reviewers: artem.tamazov, tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21376

llvm-svn: 274340
2016-07-01 09:59:21 +00:00
NAKAMURA Takumi 566597330a Update libdeps; AMDGPUCodeGen requires LLVMVectorize.
llvm-svn: 274339
2016-07-01 09:55:23 +00:00
Matt Arsenault 908b9e26a6 AMDGPU: Add option to run the load/store vectorizer
llvm-svn: 274329
2016-07-01 03:33:52 +00:00
Matt Arsenault 0994bd57fb AMDGPU: Implement getLoadStoreVecRegBitWidth
llvm-svn: 274312
2016-07-01 00:56:27 +00:00
Duncan P. N. Exon Smith 632987296f Target: Remove unused arguments from overrideSchedPolicy, NFC
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator.  One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.

llvm-svn: 274304
2016-07-01 00:23:27 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Matt Arsenault c1142725bd AMDGPU: Add m0 vgpr load loop block as successor
This shows up as a verifier error when I move this
earlier, not sure why it didn't before.

llvm-svn: 274275
2016-06-30 20:49:28 +00:00
Rafael Espindola d86e8bb0ed Delete MCCodeGenInfo.
MC doesn't really care about CodeGen stuff, so this was just
complicating target initialization.

llvm-svn: 274258
2016-06-30 18:25:11 +00:00
Duncan P. N. Exon Smith 9cfc75c214 CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr.  This is a
general API improvement.

Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other.  Instead I've done everything as a block and just
updated what was necessary.

This is mostly mechanical fixes: adding and removing `*` and `&`
operators.  The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency.  Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy.  I couldn't run tests
for AVR since llc doesn't link with it turned on.

llvm-svn: 274189
2016-06-30 00:01:54 +00:00
Matt Arsenault eb9025d698 AMDGPU: Fix global isel crashes
llvm-svn: 274039
2016-06-28 17:42:09 +00:00
Matt Arsenault 254a6450dd AMDGPU: Fix typo
llvm-svn: 274034
2016-06-28 16:59:53 +00:00
Matt Arsenault 3d846501fb AMDGPU: Remove unused function
llvm-svn: 274033
2016-06-28 16:59:49 +00:00
Matt Arsenault b4d9503171 AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

llvm-svn: 273977
2016-06-28 01:09:00 +00:00
Matt Arsenault 55dff27122 AMDGPU: Fix global isel build
llvm-svn: 273964
2016-06-28 00:11:26 +00:00
Matt Arsenault ed1fd7e056 AMDGPU: Set MinInstAlignment
Not sure this actually changes anything

llvm-svn: 273947
2016-06-27 21:42:49 +00:00
Matt Arsenault 59c0ffa22a AMDGPU: Implement per-function subtargets
llvm-svn: 273940
2016-06-27 20:48:03 +00:00
Matt Arsenault 03d8584590 AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
2016-06-27 20:32:13 +00:00
Matt Arsenault 21a4625a16 AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.

llvm-svn: 273916
2016-06-27 19:57:44 +00:00
Simon Pilgrim 634dde36b4 Fix "not all control paths return a value" warning on MSVC
llvm-svn: 273872
2016-06-27 12:58:10 +00:00
NAKAMURA Takumi 5cbd41e0a8 SIMachineFunctionInfo.cpp: Appease msc18 to use std::array.
llvm-svn: 273860
2016-06-27 10:26:43 +00:00
NAKAMURA Takumi f619b50fab Reformat.
llvm-svn: 273859
2016-06-27 10:26:36 +00:00
NAKAMURA Takumi d377ad806e Reformat blank lines.
llvm-svn: 273858
2016-06-27 10:26:25 +00:00
Jan Vesely 3bc1af2be4 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

llvm-svn: 273785
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Tom Stellard b164a9843b AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

llvm-svn: 273765
2016-06-25 01:59:16 +00:00
Matthias Braun 1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Matt Arsenault 86de486d31 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657
2016-06-24 07:07:55 +00:00
Matt Arsenault c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Matt Arsenault 43e92fe306 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Tom Stellard 14416ae6cd Support/ELF: Add R_AMDGPU_GOTPCREL relocation
Summary:
We will start generating this in a future patch.

Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21482

llvm-svn: 273628
2016-06-23 23:11:29 +00:00
Matt Arsenault 8d4b0eddd6 AMDGPU: Add option to disable spilling SGPRs to VGPRs.
This can help debug spilling problems.

llvm-svn: 273605
2016-06-23 20:00:34 +00:00
Valery Pykhtin a852d695b8 [AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields.
Differential Revision: http://reviews.llvm.org/D21380

llvm-svn: 273561
2016-06-23 14:13:06 +00:00
Diana Picus e440f99913 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

llvm-svn: 273550
2016-06-23 09:19:16 +00:00
Matt Arsenault 529cf25e60 AMDGPU: readlane/writelane do not read exec
llvm-svn: 273525
2016-06-23 01:26:16 +00:00
Matt Arsenault 3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Changpeng Fang 47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Matt Arsenault 4a07bf6372 AMDGPU: Run verifier after 2nd run of SIShrinkInstructions
llvm-svn: 273469
2016-06-22 20:26:24 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Matt Arsenault 0e5befe315 AMDGPU: Make FrameLowering stack alignment 16
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.

llvm-svn: 273448
2016-06-22 17:47:39 +00:00
Matt Arsenault 180e0d5cef AMDGPU: Fix gcc warnings
Mostly removing dead code. Apparently gcc's warning
for unused functions is better

llvm-svn: 273363
2016-06-22 01:53:49 +00:00
Rafael Espindola 7b4ef068c6 Delete more dead code.
Found by gcc 6.

llvm-svn: 273322
2016-06-21 21:51:41 +00:00
Jan Vesely fea814d531 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Rafael Espindola 48975881ab Delete some dead code.
Found by gcc 6.

llvm-svn: 273303
2016-06-21 19:48:12 +00:00
Matt Arsenault 2209625387 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

llvm-svn: 273182
2016-06-20 18:34:00 +00:00
Matt Arsenault b6d8c37e1a AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

llvm-svn: 273181
2016-06-20 18:33:56 +00:00