Commit Graph

1028 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith 4d29511894 AMDGPU: Remove implicit iterator conversions, NFC
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the AMDGPU backend.  In most cases, I made them less
attractive by preferring MachineInstr& or using a ranged-based for loop.

Once all the backends are fixed I'll make the operator explicit so that
this doesn't bitrot back.

llvm-svn: 274906
2016-07-08 19:16:05 +00:00
Duncan P. N. Exon Smith 221847ef63 AMDGPU: Make infinite loop clear, NFC
Change a while loop that was checking for nullptr on an
iterator-to-pointer conversion to an infinite for loop.  Now it's clear
that the condition doesn't terminate.

The only change in behaviour is if an invalid iterator (holding nullptr)
was passed into AMDGPUCFGStructurizer::reversePredicateSetter.  There
are only two callers, and they both dereference the iterator before
sending it in, so rather than adding an early return to avoid the loop
I've just asserted (using a static_cast, to avoid an implicit conversion
to pointer).

llvm-svn: 274902
2016-07-08 19:00:17 +00:00
Matt Arsenault b63f18c9c3 AMDGPU: Minor adjustment to r274817
The commit message is inaccurate, modifiesRegister
will check for partial defs of exec.

We currently don't ever emit partial defs of exec,
so it doesn't really matter.

llvm-svn: 274886
2016-07-08 17:06:48 +00:00
Valery Pykhtin 68853ab2c5 [AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371)
Differential Revision: http://reviews.llvm.org/D22049

llvm-svn: 274852
2016-07-08 15:12:46 +00:00
Matt Arsenault a74374a86b AMDGPU: Move si_mask_branch register operand to be a use
llvm-svn: 274818
2016-07-08 00:55:44 +00:00
Matt Arsenault d4a84b1ed2 AMDGPU: Cleanup. Use definesRegister instead of manual loop
Also this will be more precise since it will check
exec_lo/exec_hi writes.

llvm-svn: 274817
2016-07-08 00:55:39 +00:00
Valery Pykhtin af8b1bddbd [AMDGPU] fix ds_write_src2 encoding (bz26027)
Differential revision: http://reviews.llvm.org/D22041

llvm-svn: 274756
2016-07-07 14:23:38 +00:00
Michael Kuperstein aa71bdd3af [TTI] The cost model should not assume vector casts get completely scalarized
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast  gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251

llvm-svn: 274642
2016-07-06 17:30:56 +00:00
Nicolai Haehnle e40530ea7b AMDGPU: Fix return of non-void-returning shaders
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

llvm-svn: 274612
2016-07-06 08:35:17 +00:00
Matt Arsenault e8dbf791b1 AMDGPU: Remove unnecessary string usage in AsmPrinter
Registers are printed a lot, so don't create temporary
std::strings. Using char instead of a string to an ostream
saves a function call.

llvm-svn: 274581
2016-07-05 22:06:56 +00:00
Matt Arsenault ffc8275f2b AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

llvm-svn: 274564
2016-07-05 17:09:01 +00:00
Tom Stellard a4b746d808 AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISel
Summary:
These have been replaced with TableGen code (except for isConstantLoad,
which is still used for R600).  The queries were broken for cases
where MemOperand was a PseudoSourceValue.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21684

llvm-svn: 274561
2016-07-05 16:10:44 +00:00
Valery Pykhtin e65b39ec09 [AMDGPU] rename DS_1A1D_Off8_NORET to DS_1A2D_Off8_NORET as ds_write2xx use 2 source registers. NFC.
llvm-svn: 274556
2016-07-05 15:15:28 +00:00
Sam Kolton a9cd6aa895 [AMDGPU] Assembler: Fix parsing error with floating-point literals passed to integer instructions
Differential Revision: http://reviews.llvm.org/D21972

llvm-svn: 274551
2016-07-05 14:01:11 +00:00
Tom Stellard 4a105d73a9 AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loads
This moves of the r600 logic out of isGlobalLoad() and into the
TableGen files.

Differential Revision: http://reviews.llvm.org/D21710

llvm-svn: 274527
2016-07-05 00:12:51 +00:00
Tom Stellard 17a0ec5400 AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructions
Summary:
The isGlobalLoad() query was returning true for constant address space loads
with memory types less than 32-bits, which is wrong.  This logic has been
replaced with PatFrag in the TableGen files, to provide the same functionality.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21696

llvm-svn: 274521
2016-07-04 20:41:48 +00:00
Jan Vesely 991dfd7b07 AMDGPU/R600: Add indentation to VTX and TEX fetch asm strings
These are printed as part of Fetch clauses.

Differential Revision: http://reviews.llvm.org/D21730

llvm-svn: 274517
2016-07-04 19:45:00 +00:00
Matt Arsenault 7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Matt Arsenault 8af47a09e5 AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Matt Arsenault 327bb5ad82 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Matt Arsenault 105c2a204c AMDGPU/SI: Enable testing several variants for si scheduler
Enable testing different scheduling variants if sgpr usage
is very high. It was previously disabled because of a bug
in handleMove, but it has been fixed since.

Patch by Axel Davy

llvm-svn: 274372
2016-07-01 18:03:46 +00:00
Nikolay Haustov beb24f5b20 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 274341
2016-07-01 10:00:58 +00:00
Sam Kolton 5196b88f07 [AMDGPU] Assembler: support SDWA for VOPC instructions
Summary: dst_sel and dst_unused disabled for VOPC as they have no effect on result

Reviewers: artem.tamazov, tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21376

llvm-svn: 274340
2016-07-01 09:59:21 +00:00
NAKAMURA Takumi 566597330a Update libdeps; AMDGPUCodeGen requires LLVMVectorize.
llvm-svn: 274339
2016-07-01 09:55:23 +00:00
Matt Arsenault 908b9e26a6 AMDGPU: Add option to run the load/store vectorizer
llvm-svn: 274329
2016-07-01 03:33:52 +00:00
Matt Arsenault 0994bd57fb AMDGPU: Implement getLoadStoreVecRegBitWidth
llvm-svn: 274312
2016-07-01 00:56:27 +00:00
Duncan P. N. Exon Smith 632987296f Target: Remove unused arguments from overrideSchedPolicy, NFC
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator.  One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.

llvm-svn: 274304
2016-07-01 00:23:27 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Matt Arsenault c1142725bd AMDGPU: Add m0 vgpr load loop block as successor
This shows up as a verifier error when I move this
earlier, not sure why it didn't before.

llvm-svn: 274275
2016-06-30 20:49:28 +00:00
Rafael Espindola d86e8bb0ed Delete MCCodeGenInfo.
MC doesn't really care about CodeGen stuff, so this was just
complicating target initialization.

llvm-svn: 274258
2016-06-30 18:25:11 +00:00
Duncan P. N. Exon Smith 9cfc75c214 CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr.  This is a
general API improvement.

Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other.  Instead I've done everything as a block and just
updated what was necessary.

This is mostly mechanical fixes: adding and removing `*` and `&`
operators.  The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency.  Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy.  I couldn't run tests
for AVR since llc doesn't link with it turned on.

llvm-svn: 274189
2016-06-30 00:01:54 +00:00
Matt Arsenault eb9025d698 AMDGPU: Fix global isel crashes
llvm-svn: 274039
2016-06-28 17:42:09 +00:00
Matt Arsenault 254a6450dd AMDGPU: Fix typo
llvm-svn: 274034
2016-06-28 16:59:53 +00:00
Matt Arsenault 3d846501fb AMDGPU: Remove unused function
llvm-svn: 274033
2016-06-28 16:59:49 +00:00
Matt Arsenault b4d9503171 AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

llvm-svn: 273977
2016-06-28 01:09:00 +00:00
Matt Arsenault 55dff27122 AMDGPU: Fix global isel build
llvm-svn: 273964
2016-06-28 00:11:26 +00:00
Matt Arsenault ed1fd7e056 AMDGPU: Set MinInstAlignment
Not sure this actually changes anything

llvm-svn: 273947
2016-06-27 21:42:49 +00:00
Matt Arsenault 59c0ffa22a AMDGPU: Implement per-function subtargets
llvm-svn: 273940
2016-06-27 20:48:03 +00:00
Matt Arsenault 03d8584590 AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
2016-06-27 20:32:13 +00:00
Matt Arsenault 21a4625a16 AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.

llvm-svn: 273916
2016-06-27 19:57:44 +00:00
Simon Pilgrim 634dde36b4 Fix "not all control paths return a value" warning on MSVC
llvm-svn: 273872
2016-06-27 12:58:10 +00:00
NAKAMURA Takumi 5cbd41e0a8 SIMachineFunctionInfo.cpp: Appease msc18 to use std::array.
llvm-svn: 273860
2016-06-27 10:26:43 +00:00
NAKAMURA Takumi f619b50fab Reformat.
llvm-svn: 273859
2016-06-27 10:26:36 +00:00
NAKAMURA Takumi d377ad806e Reformat blank lines.
llvm-svn: 273858
2016-06-27 10:26:25 +00:00
Jan Vesely 3bc1af2be4 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

llvm-svn: 273785
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Tom Stellard b164a9843b AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

llvm-svn: 273765
2016-06-25 01:59:16 +00:00
Matthias Braun 1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Matt Arsenault 86de486d31 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657
2016-06-24 07:07:55 +00:00
Matt Arsenault c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Matt Arsenault 43e92fe306 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Tom Stellard 14416ae6cd Support/ELF: Add R_AMDGPU_GOTPCREL relocation
Summary:
We will start generating this in a future patch.

Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21482

llvm-svn: 273628
2016-06-23 23:11:29 +00:00
Matt Arsenault 8d4b0eddd6 AMDGPU: Add option to disable spilling SGPRs to VGPRs.
This can help debug spilling problems.

llvm-svn: 273605
2016-06-23 20:00:34 +00:00
Valery Pykhtin a852d695b8 [AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields.
Differential Revision: http://reviews.llvm.org/D21380

llvm-svn: 273561
2016-06-23 14:13:06 +00:00
Diana Picus e440f99913 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

llvm-svn: 273550
2016-06-23 09:19:16 +00:00
Matt Arsenault 529cf25e60 AMDGPU: readlane/writelane do not read exec
llvm-svn: 273525
2016-06-23 01:26:16 +00:00
Matt Arsenault 3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Changpeng Fang 47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Matt Arsenault 4a07bf6372 AMDGPU: Run verifier after 2nd run of SIShrinkInstructions
llvm-svn: 273469
2016-06-22 20:26:24 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Matt Arsenault 0e5befe315 AMDGPU: Make FrameLowering stack alignment 16
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.

llvm-svn: 273448
2016-06-22 17:47:39 +00:00
Matt Arsenault 180e0d5cef AMDGPU: Fix gcc warnings
Mostly removing dead code. Apparently gcc's warning
for unused functions is better

llvm-svn: 273363
2016-06-22 01:53:49 +00:00
Rafael Espindola 7b4ef068c6 Delete more dead code.
Found by gcc 6.

llvm-svn: 273322
2016-06-21 21:51:41 +00:00
Jan Vesely fea814d531 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Rafael Espindola 48975881ab Delete some dead code.
Found by gcc 6.

llvm-svn: 273303
2016-06-21 19:48:12 +00:00
Matt Arsenault 2209625387 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

llvm-svn: 273182
2016-06-20 18:34:00 +00:00
Matt Arsenault b6d8c37e1a AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

llvm-svn: 273181
2016-06-20 18:33:56 +00:00
Matt Arsenault ff98241f37 Generalize DiagnosticInfoStackSize to support other limits
Backends may want to report errors on resources other than
stack size.

llvm-svn: 273177
2016-06-20 18:13:04 +00:00
Matt Arsenault a9720c67f1 AMDGPU: Use correct method for determining instruction size
llvm-svn: 273172
2016-06-20 17:51:32 +00:00
Tom Stellard 5350894265 AMDGPU: Add support for R_AMDGPU_REL32 relocations
Reviewers: arsenm, kzhuravl, rafael

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21401

llvm-svn: 273168
2016-06-20 17:33:43 +00:00
Tom Stellard 1c89eb7db0 AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations
Reviewers: arsenm, rafael, kzhuravl

Subscribers: rafael, arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21400

llvm-svn: 273166
2016-06-20 16:59:44 +00:00
NAKAMURA Takumi fd92154b20 Reformat blank lines.
llvm-svn: 273131
2016-06-20 01:05:15 +00:00
NAKAMURA Takumi fe1202c4cb Untabify.
llvm-svn: 273129
2016-06-20 00:37:41 +00:00
Matt Arsenault e935f05a94 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

llvm-svn: 273080
2016-06-18 05:15:53 +00:00
Matt Arsenault 0bb294b224 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

llvm-svn: 273062
2016-06-17 22:27:03 +00:00
Tom Stellard 0114bb5aa0 AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress()
This change were suggested in http://reviews.llvm.org/D21154.

llvm-svn: 273059
2016-06-17 22:22:09 +00:00
Matt Arsenault 8885910f8e AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

llvm-svn: 273050
2016-06-17 21:18:41 +00:00
Changpeng Fang 3e06e1edac AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and eliminateFrameIndex
Reviewers: arsenm, tstellarAMD

Differential Revision:  http://reviews.llvm.org/21438

llvm-svn: 272958
2016-06-16 21:20:47 +00:00
Matt Arsenault 01e062f5c6 AMDGPU: Fix maximum instruction size for amdgcn
This was causing the conservative estimate of inline asm
size to be twice as big as expected.

llvm-svn: 272956
2016-06-16 21:14:05 +00:00
Wei Ding ab3d91b8f1 AMDGPU: Add v_mad 16-bit instructions definition.
Differential Revision: http://reviews.llvm.org/D21362

llvm-svn: 272919
2016-06-16 16:50:04 +00:00
Valery Pykhtin 02e2086e41 [AMDGPU] Fix few coding style issues. NFC.
llvm-svn: 272785
2016-06-15 13:55:09 +00:00
Nicolai Haehnle a609259832 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

llvm-svn: 272761
2016-06-15 07:13:05 +00:00
Tom Stellard 82785e9fe7 AMDGPU/SI: Correctly encode constant expressions
Summary:
We we have an MCConstantExpr, we can encode it directly into the instruction
instead of emitting fixups.

Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21236

Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948
llvm-svn: 272750
2016-06-15 03:09:39 +00:00
Tom Stellard 89049702ce AMDGPU/AsmParser: Add support for parsing symbol operands
Summary:
We can now reference symbols directly in operands, like this:
s_mov_b32 s0, global

Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21038

llvm-svn: 272748
2016-06-15 02:54:14 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Tom Stellard bf3e6e5bb4 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Re-commit this after fixing a bug where we were trying to use a
reference to a Triple object that had already been destroyed.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272705
2016-06-14 20:29:59 +00:00
Tom Stellard b1a523fa68 Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"
This reverts commit r272675.

llvm-svn: 272677
2016-06-14 15:16:35 +00:00
Tom Stellard 5e6298b0f2 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272675
2016-06-14 15:11:01 +00:00
Artem Tamazov 17091364d1 [AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly source (.option.machine_version...)
The feature allows for conditional assembly etc.
TODO: make those symbols read-only.
Test added.

Differential Revision: http://reviews.llvm.org/D21238

llvm-svn: 272673
2016-06-14 15:03:59 +00:00
Marek Olsak e93f6d6923 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Matt Arsenault 80bc355048 AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc
The condition reg of the cndmask_b64 expansion can't be killed by
the first one, and the implicit super register implicit def is needed.

llvm-svn: 272554
2016-06-13 15:53:52 +00:00
Benjamin Kramer d3f4c05aea Move instances of std::function.
Or replace with llvm::function_ref if it's never stored. NFC intended.

llvm-svn: 272513
2016-06-12 16:13:55 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Tom Stellard f3af841462 AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

llvm-svn: 272417
2016-06-10 19:26:38 +00:00
Sam Kolton 945231ada5 [AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand.
Summary:
sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported.
Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier.
Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers.
Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...).

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20968

llvm-svn: 272384
2016-06-10 09:57:59 +00:00
Matt Arsenault 37fefd68cc AMDGPU: Fix trailing whitespace
llvm-svn: 272364
2016-06-10 02:18:02 +00:00
Matt Arsenault 58ddad5bd6 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

llvm-svn: 272351
2016-06-10 00:18:41 +00:00
Tom Stellard 26a2ab7477 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

llvm-svn: 272349
2016-06-10 00:01:04 +00:00
Tom Stellard 1d3940e8cc AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMI
Reviewers: arsenm, axeldavy

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19823

llvm-svn: 272346
2016-06-09 23:48:02 +00:00
Matt Arsenault 7757c59e48 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

llvm-svn: 272345
2016-06-09 23:42:54 +00:00
Matt Arsenault 887018179a AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Matt Arsenault e2bd9a32bc AMDGPU: Run verifer after insert waits pass
llvm-svn: 272338
2016-06-09 23:19:14 +00:00
Matt Arsenault cfb61e780a AMDGPU: Remove incorrect assertion
I'm still not sure under what circumstances the offset here is non-0,
but private memory is not limited to 27-bits.

llvm-svn: 272337
2016-06-09 23:19:08 +00:00
Matt Arsenault c3a01ec9db AMDGPU: Properly initialize SIShrinkInstructions
llvm-svn: 272336
2016-06-09 23:18:47 +00:00
Wei Ding ed0f97fad2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

llvm-svn: 272292
2016-06-09 19:17:15 +00:00
Sam Kolton c9bdcb75c4 [AMDGPU] Disassembler: Support for sdwa instructions
Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21129

llvm-svn: 272255
2016-06-09 11:04:45 +00:00
Nicolai Haehnle c00e03b8f5 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

llvm-svn: 272063
2016-06-07 21:37:17 +00:00
Eric Christopher 538d09d0dd Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

llvm-svn: 272056
2016-06-07 20:27:12 +00:00
Wei Ding a70216f1b3 Differential Revision: http://reviews.llvm.org/D20557
llvm-svn: 272044
2016-06-07 19:04:44 +00:00
Matt Arsenault 02458c2d27 AMDGPU: Add function for getting instruction size
llvm-svn: 271936
2016-06-06 20:10:33 +00:00
Matt Arsenault 3b2e2a59e8 AMDGPU: Fix constantexpr addrspacecasts
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.

llvm-svn: 271935
2016-06-06 20:03:31 +00:00
Artem Tamazov 135487767b [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Differential Revision: http://reviews.llvm.org/D20796

llvm-svn: 271900
2016-06-06 15:23:43 +00:00
Artem Tamazov f88397c84c [test/AMDGPU] Square-braced-syntax for registers: add macro test/example.
Test added as per discussion in http://reviews.llvm.org/D20588.
The macro is just a demonstration, useless in practice.
Coding style fixes.

Differential Revision: http://reviews.llvm.org/D20797

llvm-svn: 271675
2016-06-03 14:41:17 +00:00
Sam Kolton a4a99ad1bc [AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float modifiers.
Summary: Depends on D20625

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20674

llvm-svn: 271662
2016-06-03 11:43:09 +00:00
Sam Kolton 05ef1c940f [AMDGPU] Assembler: Custom converters for SDWA instructions. Support for _dpp and _sdwa suffixes in mnemonics.
Summary:
Added custom converters for SDWA instruction to support optional operands and modifiers.
Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20625

llvm-svn: 271655
2016-06-03 10:27:37 +00:00
Matt Arsenault 43578ec2a8 AMDGPU: Handle flat in getMemOpBaseRegImmOfs
It can still report the base register, and the uses give up when it
fails.

llvm-svn: 271575
2016-06-02 20:05:20 +00:00
Matt Arsenault d1097a38e2 AMDGPU: Cleanup load tests
There are a lot of different kinds of loads to test for,
and these were scattered around inconsistently with
some redundancy. Try to comprehensively test all loads
in a consistent way.

llvm-svn: 271571
2016-06-02 19:54:26 +00:00
Matt Arsenault 52dec8d36a AMDGPU: Temporary fix for broken store combine
llvm-svn: 271567
2016-06-02 19:00:55 +00:00
Matt Arsenault 8e00194be8 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

llvm-svn: 271561
2016-06-02 18:37:16 +00:00
Matt Arsenault 598f55387a AMDGPU: Fix incorrectly setting kill flag when copying register tuples
This fixes some verifier errors when trackLivenessAfterRegAlloc is
enabled.

llvm-svn: 271446
2016-06-02 00:04:30 +00:00
Matt Arsenault d3e4c646ea AMDGPU: SIDebuggerInsertNops preserves CFG
This saves an additional run of the DominatorTree and
MachineLoopInfo

llvm-svn: 271444
2016-06-02 00:04:22 +00:00
Matt Arsenault ec30eb507e AMDGPU: Remove unused address space
Also return a single StringRef instead of building a string.

llvm-svn: 271296
2016-05-31 16:57:45 +00:00
Matt Arsenault d8d304d1d6 AMDGPU: Fix trailing whitespace
llvm-svn: 271081
2016-05-28 00:50:51 +00:00
Matt Arsenault 7401516985 AMDGPU: Add fract intrinsic
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.

llvm-svn: 271078
2016-05-28 00:19:52 +00:00
Artem Tamazov 7da9b82e02 [AMDGPU][llvm-mc] Square-braced-syntax for registers - make ":expr2" optional.
Register numbers may be specified as assembly-time expressions.
This feature can be useful in macros and alike. However, expressions
are supported within sqare braces only.

Sqare braces were initially intended to support specifying of multiple
(pairs/quads...) registers. Syntax like v[8:8] which specifies single register
is also supported. That allows expressions but looks a bit unnatural.

This change supports syntax REG[EXPR].
Tests added.

Differential Revision: http://reviews.llvm.org/D20588

llvm-svn: 270990
2016-05-27 12:50:13 +00:00
Benjamin Kramer 4fed928f53 Avoid some copies by using const references.
clang-tidy's performance-unnecessary-copy-initialization with some manual
fixes. No functional changes intended.

llvm-svn: 270988
2016-05-27 12:30:51 +00:00
Benjamin Kramer 3e9a5d3468 Apply clang-tidy's misc-static-assert where it makes sense.
Also fold conditions into assert(0) where it makes sense. No functional
change intended.

llvm-svn: 270982
2016-05-27 11:36:04 +00:00
Changpeng Fang 71369b3a39 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

llvm-svn: 270894
2016-05-26 19:35:29 +00:00
Artem Tamazov 6edc135d0f [AMDGPU][llvm-mc] s_getreg/setreg* - hwreg - factor out strings/literals etc.
Hwreg(...) syntax implementation unified with sendmsg(...).
Common strings moved to Utils
MathExtras.h functionality utilized.
Added missing build dependency in Disassembler.

Differential Revision: http://reviews.llvm.org/D20381

llvm-svn: 270871
2016-05-26 17:00:33 +00:00
Artem Tamazov b49c3361e5 Fix build warning introduced in r270552 "[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers."
llvm-svn: 270859
2016-05-26 15:52:16 +00:00
Diana Picus 81bc3170e8 [AMDGPU] Remove exit-on-error flag from test (PR27762)
Similar to r269948, but for argument lowering.

Fixes PR27762

Differential Revision: http://reviews.llvm.org/D20430

llvm-svn: 270856
2016-05-26 15:24:55 +00:00
Matt Arsenault e57206d81b AMDGPU: Fix v2i64/v2f64 bitcasts
These operations tend to get promoted away to v4i32 so
this doesn't happen often.

llvm-svn: 270740
2016-05-25 18:07:36 +00:00
Matt Arsenault 1cc4991412 AMDGPU: Fix inconsistent lowering of select of vectors
f32 vectors would use a sequence of BFI instructions instead
of unrolled cmp + select. This was better in the case of a VALU
select with SGPR inputs, but we don't have a way of dealing with that
in the DAG.

llvm-svn: 270731
2016-05-25 17:34:58 +00:00
Nirav Dave e003bb7ead Soften assertion in AMDGPU emitPrologue.
[AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not
the scratch descriptor. Continue search if unused register found fails
other requirements.

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20526

llvm-svn: 270646
2016-05-25 01:45:42 +00:00
Konstantin Zhuravlyov 29ddd2b2f2 [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081

llvm-svn: 270594
2016-05-24 18:37:18 +00:00
Sam Kolton 11de370cca [AMDGPU] Assembler: rework parsing of optional operands.
Summary:
Change process of parsing of optional operands. All optional operands use same parsing method - parseOptionalOperand().
No default values are added to OperandsVector.
Get rid of WORKAROUND_USE_DUMMY_OPERANDS_INSTEAD_MUTIPLE_DEFAULT_OPERANDS.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov, nhaustov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20527

llvm-svn: 270556
2016-05-24 12:38:33 +00:00
Artem Tamazov 212a251c8d [AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers.
Differential Revision: http://reviews.llvm.org/D20476

llvm-svn: 270552
2016-05-24 12:05:16 +00:00
Sam Kolton 1bdcef7697 [AMDGPU] Assembler: refactor parsing of modifiers and immediates. Allow modifiers for imms.
Reviewers: nhaustov, tstellarAMD

Subscribers: kzhuravl, arsenm

Differential Revision: http://reviews.llvm.org/D20166

llvm-svn: 270415
2016-05-23 09:59:02 +00:00
Matt Arsenault 7f9eabd2c2 AMDGPU: Define priorities for register classes
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).

Patch by Matthias Braun

llvm-svn: 270312
2016-05-21 03:55:07 +00:00
Matt Arsenault 71e6676169 AMDGPU: Cleanup lowering actions
These are kind of a mess and hard to follow, particularly
for loads and stores. Fix various redundant, unnecessary
and dead settings.

llvm-svn: 270307
2016-05-21 02:27:49 +00:00
Matt Arsenault 81a709503d AMDGPU: Fix high bits after division optimization
This is essentially doing a 24-bit signed division with FP.
We need to truncate to the N bit result.

llvm-svn: 270305
2016-05-21 01:53:33 +00:00
Matt Arsenault b6e1cc2a92 AMDGPU: Fix verifier error when spilling SGPRs
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.

llvm-svn: 270301
2016-05-21 00:53:42 +00:00
Matt Arsenault 8f5e008534 AMDGPU: Fix relationship between SReg_32 and SReg_32_XM0
llvm-svn: 270300
2016-05-21 00:53:28 +00:00
Matt Arsenault 4945905f5f AMDGPU: Handle cbranch vccz/vccnz
llvm-svn: 270297
2016-05-21 00:29:40 +00:00
Matt Arsenault 72fcd5f597 AMDGPU: Implement ReverseBranchCondition
llvm-svn: 270296
2016-05-21 00:29:34 +00:00
Matt Arsenault 6d09380532 AMDGPU: Implement AnalyzeBranch
Original patch by Tom Stellard

llvm-svn: 270295
2016-05-21 00:29:27 +00:00
Matt Arsenault 4e3d383c46 AMDGPU: Remove pointless conversions
llvm-svn: 270139
2016-05-19 21:09:58 +00:00
Matt Arsenault 4318ea354a AMDGPU: Also look for s_cbranch_vccz
llvm-svn: 270091
2016-05-19 18:20:25 +00:00
Artem Tamazov 8ce1f7177b [AMDGPU][llvm-mc] Fixes to support buffer atomics.
Fixes for MUBUF_Atomic instructions to make operand list valid:
 - For RTN insns, make a copy of $vdata_in operand as $vdata.
 - Do not add operand for GLC, it is hardcoded and comes as a token.
Workaround to avoid adding multiple default optional operands.
Tests added.

Differential Revision: http://reviews.llvm.org/D20257

llvm-svn: 270049
2016-05-19 12:22:39 +00:00
Matt Arsenault c5bebac934 AMDGPU: Fix verifier error when spilling undef subreg
llvm-svn: 270002
2016-05-18 23:35:53 +00:00
Matt Arsenault c438ef574d AMDGPU: Fix promote alloca for pointer loads
If the load has a pointer type, we don't want to change
its type.

llvm-svn: 270000
2016-05-18 23:20:24 +00:00
Rafael Espindola 8c34dd8257 Delete Reloc::Default.
Having an enum member named Default is quite confusing: Is it distinct
from the others?

This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.

llvm-svn: 269988
2016-05-18 22:04:49 +00:00
Jan Vesely ae265c03f7 AMDGPU: Fix incorrect simm check
Use signed division otherwise all back jumps fail the check
Fixes regression introduced in r269951

Differential Revision: http://reviews.llvm.org/D20380

llvm-svn: 269972
2016-05-18 19:07:58 +00:00
Matt Arsenault a519cf593f AMDGPU: Error if branch distance exceeds limit
llvm-svn: 269951
2016-05-18 16:10:24 +00:00
Matt Arsenault 1735da460b AMDGPU: Other sizes of popcnt are fast
We can chain bcnt instructions together, so
any width popcnt is pretty fast.

llvm-svn: 269950
2016-05-18 16:10:19 +00:00
Matt Arsenault 9430b9113a AMDGPU: Fix assert when erroring on a call
For some reason an assert is now hit when a valid chain
is not returned, so return the entry chain.

llvm-svn: 269948
2016-05-18 16:10:11 +00:00
Matt Arsenault 891fccc0c1 AMDGPU: Handle alloca promoting with null operands
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.

llvm-svn: 269945
2016-05-18 15:57:21 +00:00
Matt Arsenault bde80346c1 AMDGPU: Don't run passes that aren't useful
llvm-svn: 269943
2016-05-18 15:41:07 +00:00
Matt Arsenault ab3429c2b4 AMDGPU: Fix assert on ttmp registers
Use register class that does not include them when looking
for unallocated registers.

This is hit by the udiv v8i64 test in the opencl integer
conformance test, and takes a few seconds to compile in
a debug build so no test included.

llvm-svn: 269938
2016-05-18 15:19:50 +00:00
Jan Vesely 687ca8df18 AMDGPU/R600: Use correct number of vector elements when lowering private loads
Reviewer: tstellardAMD, arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D20032

llvm-svn: 269725
2016-05-16 23:56:32 +00:00
Matt Arsenault 8a028bf4d7 AMDGPU: Fix promote alloca pass creating huge arrays
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.

By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.

Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.

llvm-svn: 269708
2016-05-16 21:19:59 +00:00
Jan Vesely 91aacad9c3 AMDGPU: Unify LowerGlobalAddress
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19794

llvm-svn: 269481
2016-05-13 20:39:34 +00:00
Jan Vesely 1680039a7a AMDGPU/R600: Fold global address operand
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19793

llvm-svn: 269480
2016-05-13 20:39:31 +00:00
Jan Vesely f97de00745 AMDGPU/R600: Implement memory loads from constant AS
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19792

llvm-svn: 269479
2016-05-13 20:39:29 +00:00
Jan Vesely a1f9fdfcbc AMDGPU/R600: Add support for emitting MCExpr
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19791

llvm-svn: 269478
2016-05-13 20:39:26 +00:00
Jan Vesely 7971464da8 AMDGPU: Add support for MCExpr to instruction printer
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19790

llvm-svn: 269477
2016-05-13 20:39:24 +00:00
Jan Vesely 4368c1cf7e AMDGPU/R600: Use machine operands instead of ints to track literals
This will be used for global addresses

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19789

llvm-svn: 269476
2016-05-13 20:39:22 +00:00
Jan Vesely fac8d7ecb1 AMDGPU/R600: There are other uses for ALU_LITERAL besides Imm
This will be used for GV

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19788

llvm-svn: 269475
2016-05-13 20:39:20 +00:00
Jan Vesely fbcb754e66 AMDGPU: Make CONST_DATA_PTR available to R600
Rename to AMDGPUconstdata_ptr

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19786

llvm-svn: 269474
2016-05-13 20:39:18 +00:00
Jan Vesely 81f1b30035 AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2)
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19785

llvm-svn: 269473
2016-05-13 20:39:16 +00:00
Konstantin Zhuravlyov e3d322af57 [AMDGPU] Update nop insertion for debugger usage
- Insert one nop for each high level statement instead of two
- Do not insert nop before prologue

Differential Revision: http://reviews.llvm.org/D20215

llvm-svn: 269452
2016-05-13 18:21:28 +00:00
Matt Arsenault 999f7dd84c AMDGPU: Remove verifier check for scc live ins
We only really need this to be true for SIFixSGPRCopies.
I'm not sure there's any way this could happen before that point.

Fixes a case where MachineCSE could introduce a cross block
scc use.

llvm-svn: 269391
2016-05-13 04:15:48 +00:00
Justin Bogner 95927c0fd0 SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISel
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
  the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.

Part of llvm.org/pr26808.

llvm-svn: 269349
2016-05-12 21:03:32 +00:00
Matt Arsenault 8300272823 AMDGPU: Fix getIntegerAttribute type and error message
llvm-svn: 269268
2016-05-12 02:45:18 +00:00
Matt Arsenault a61cb48dd2 AMDGPU: Fix breaking IR on instructions with multiple pointer operands
The promote alloca pass would attempt to promote an alloca with
a select, icmp, or phi user, even though the other operand was
from a non-promotable source, producing a select on two different
pointer types.

Only do this if we know that both operands derive from the same
alloca. In the future we should be able to relax this to an alloca
which will also be promoted.

llvm-svn: 269265
2016-05-12 01:58:58 +00:00
Matt Arsenault 4234542503 AMDGPU: Make some instructions convergent
llvm-svn: 269147
2016-05-11 00:32:31 +00:00
Matt Arsenault e8ed8e59e5 AMDGPU: Change private_element_size to 4
llvm-svn: 269145
2016-05-11 00:28:54 +00:00
Peter Collingbourne dba995601b Cloning: Clean up the interface to the CloneFunction function.
Remove the ModuleLevelChanges argument, and the ability to create new
subprograms for cloned functions. The latter was added without review in
r203662, but it has no in-tree clients (all non-test callers pass false
for ModuleLevelChanges [1], so it isn't reachable outside of tests). It
also isn't clear that adding a duplicate subprogram to the compile unit is
always the right thing to do when cloning a function within a module. If
this functionality comes back it should be accompanied with a more concrete
use case.

Furthermore, all in-tree clients add the returned function to the module.
Since that's pretty much the only sensible thing you can do with the function,
just do that in CloneFunction.

[1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction

Differential Revision: http://reviews.llvm.org/D18628

llvm-svn: 269110
2016-05-10 20:23:24 +00:00
Konstantin Zhuravlyov a791932145 [AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops
Differential Revision: http://reviews.llvm.org/D20117

llvm-svn: 269098
2016-05-10 18:33:41 +00:00
Matthias Braun 31d19d43c7 CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.

llvm-svn: 269011
2016-05-10 03:21:59 +00:00
Simon Pilgrim 0a81921cdb Fixed unused but set variable warning
llvm-svn: 268931
2016-05-09 16:42:23 +00:00
Matt Arsenault a949dc619c AMDGPU: Fold shift into cvt_f32_ubyteN
llvm-svn: 268930
2016-05-09 16:29:50 +00:00
Artem Tamazov f0b6b40fa4 [AMDGPU][llvm-mc] Some refactoring of .td files
Some custom Operands and AsmOperandClasses moved to proper place.
No functional changes.

Differential Revision: http://reviews.llvm.org/D20012

llvm-svn: 268780
2016-05-06 19:32:38 +00:00
Artem Tamazov ebe71ce36a [AMDGPU][llvm-mc] Add support for sendmsg(...) syntax.
Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax
in s_sendmsg and s_sendmsghalt instructions.
The syntax matches the SP3 assembler/disassembler rules.
That is why implicit inputs (like M0 and EXEC) are not printed
to disassembly output anymore.

sendmsg(...) allows only known message types and attributes,
even if literals are used instead of symbolic names.
However, raw literal (without "sendmsg") still can be used,
and that allows for any 16-bit value.

Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19596

llvm-svn: 268762
2016-05-06 17:48:48 +00:00
Nikolay Haustov 6eb050ea4e Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2."
This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380.

It broke calling OpenCL kernel from another kernel.

llvm-svn: 268739
2016-05-06 14:59:04 +00:00
Sam Kolton 5f10a137d0 [TableGen] AsmMatcher: support for default values for optional operands
Summary:
This change allows to specify "DefaultMethod" for optional operand (IsOptional = 1) in AsmOperandClass that return default value for operand. This is used in convertToMCInst to set default values in MCInst.
Previously if you wanted to set default value for operand you had to create custom converter method. With this change it is possible to use standard converters even when optional operands presented.

Reviewers: tstellarAMD, ab, craig.topper

Subscribers: jyknight, dsanders, arsenm, nhaustov, llvm-commits

Differential Revision: http://reviews.llvm.org/D18242

llvm-svn: 268726
2016-05-06 11:31:17 +00:00
Nikolay Haustov dc1bb79b92 AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
Summary:
    Check calling convention in AMDGPUMachineFunction::isKernel

    This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

    Also, in the future unused non-kernels may be optimized.

    Reviewers: tstellarAMD, arsenm

    Subscribers: arsenm, joker.eph, llvm-commits

    Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 268719
2016-05-06 09:23:13 +00:00
Justin Bogner b012699741 SDAG: Rename Select->SelectImpl and repurpose Select as returning void
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.

We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.

Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.

llvm-svn: 268693
2016-05-05 23:19:08 +00:00
Matt Arsenault 539ca882c6 AMDGPU: Simplify control flow / conditions
llvm-svn: 268676
2016-05-05 20:27:02 +00:00
Nicolai Haehnle ffbd56a1c9 AMDGPU: Uniform branch conditions can originate with intrinsics
Summary:
Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS
GL43-CTS.shader_storage_buffer_object.advanced-matrix.

In this particular case, the buffer load intrinsic fed into a uniform
conditional branch, and led the brcond lowering down the wrong path.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19931

llvm-svn: 268650
2016-05-05 17:36:36 +00:00
Tom Stellard fcfaea4cff AMDGPU/SI: Add support for AMD code object version 2.
Summary:
Version 2 is now the default.  If you want to emit version 1, use
the amdgcn--amdhsa-amdcov1 triple.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19283

llvm-svn: 268647
2016-05-05 17:03:33 +00:00
Jan Vesely bbc2231983 AMDGPU/R600: Minor cleanup in InstrInfo
Use std::make_pair instead of constructor
Use C++11 loop
Reuse helper var

Reviewers: tstellardAMD

Subsribers: arsenm

Differential Revision: http://reviews.llvm.org/D19787

llvm-svn: 268503
2016-05-04 14:55:45 +00:00
Tom Stellard 4a304b3886 AMDGPU/SI: Use range loops to simplify some code in the SI Scheduler
Reviewers: arsenm, axeldavy

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19822

llvm-svn: 268396
2016-05-03 16:30:56 +00:00
Aaron Ballman 3bd56b3b43 Silence unused variable warning; NFC.
llvm-svn: 268392
2016-05-03 15:17:25 +00:00
Matt Arsenault bcdfee7030 AMDGPU: Custom lower v2i32 loads and stores
This will allow us to split up 64-bit private accesses when
necessary.

llvm-svn: 268296
2016-05-02 20:13:51 +00:00
Tom Stellard 154c9cdd24 AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch
We were using v_readlane_b32 with the lane set to zero, but this won't
work if thread 0 is not active.

Differential Revision: http://reviews.llvm.org/D19745

llvm-svn: 268295
2016-05-02 20:11:44 +00:00
Matt Arsenault 2b957b5a6f AMDGPU: Make i64 loads/stores promote to v2i32
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.

This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.

llvm-svn: 268293
2016-05-02 20:07:26 +00:00
Reid Kleckner 0549ab6033 Fix instance of -Winconsistent-missing-override in AMDGPU code
llvm-svn: 268289
2016-05-02 19:45:10 +00:00
Tom Stellard ce5e994887 AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch
Summary:
When we restore an SGPR value from scratch, we first load it into a
temporary VGPR and then use v_readlane_b32 to copy the value from the
VGPR back into an SGPR.

We weren't setting the kill flag on the VGPR in the v_readlane_b32
instruction, so the register scavenger wasn't able to re-use this
temp value later.

I wasn't able to create a lit test for this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19744

llvm-svn: 268287
2016-05-02 19:37:56 +00:00