Commit Graph

870 Commits

Author SHA1 Message Date
Eric Christopher 538d09d0dd Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

llvm-svn: 272056
2016-06-07 20:27:12 +00:00
Wei Ding a70216f1b3 Differential Revision: http://reviews.llvm.org/D20557
llvm-svn: 272044
2016-06-07 19:04:44 +00:00
Matt Arsenault 02458c2d27 AMDGPU: Add function for getting instruction size
llvm-svn: 271936
2016-06-06 20:10:33 +00:00
Matt Arsenault 3b2e2a59e8 AMDGPU: Fix constantexpr addrspacecasts
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.

llvm-svn: 271935
2016-06-06 20:03:31 +00:00
Artem Tamazov 135487767b [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Differential Revision: http://reviews.llvm.org/D20796

llvm-svn: 271900
2016-06-06 15:23:43 +00:00
Artem Tamazov f88397c84c [test/AMDGPU] Square-braced-syntax for registers: add macro test/example.
Test added as per discussion in http://reviews.llvm.org/D20588.
The macro is just a demonstration, useless in practice.
Coding style fixes.

Differential Revision: http://reviews.llvm.org/D20797

llvm-svn: 271675
2016-06-03 14:41:17 +00:00
Sam Kolton a4a99ad1bc [AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float modifiers.
Summary: Depends on D20625

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20674

llvm-svn: 271662
2016-06-03 11:43:09 +00:00
Sam Kolton 05ef1c940f [AMDGPU] Assembler: Custom converters for SDWA instructions. Support for _dpp and _sdwa suffixes in mnemonics.
Summary:
Added custom converters for SDWA instruction to support optional operands and modifiers.
Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20625

llvm-svn: 271655
2016-06-03 10:27:37 +00:00
Matt Arsenault 43578ec2a8 AMDGPU: Handle flat in getMemOpBaseRegImmOfs
It can still report the base register, and the uses give up when it
fails.

llvm-svn: 271575
2016-06-02 20:05:20 +00:00
Matt Arsenault d1097a38e2 AMDGPU: Cleanup load tests
There are a lot of different kinds of loads to test for,
and these were scattered around inconsistently with
some redundancy. Try to comprehensively test all loads
in a consistent way.

llvm-svn: 271571
2016-06-02 19:54:26 +00:00
Matt Arsenault 52dec8d36a AMDGPU: Temporary fix for broken store combine
llvm-svn: 271567
2016-06-02 19:00:55 +00:00
Matt Arsenault 8e00194be8 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

llvm-svn: 271561
2016-06-02 18:37:16 +00:00
Matt Arsenault 598f55387a AMDGPU: Fix incorrectly setting kill flag when copying register tuples
This fixes some verifier errors when trackLivenessAfterRegAlloc is
enabled.

llvm-svn: 271446
2016-06-02 00:04:30 +00:00
Matt Arsenault d3e4c646ea AMDGPU: SIDebuggerInsertNops preserves CFG
This saves an additional run of the DominatorTree and
MachineLoopInfo

llvm-svn: 271444
2016-06-02 00:04:22 +00:00
Matt Arsenault ec30eb507e AMDGPU: Remove unused address space
Also return a single StringRef instead of building a string.

llvm-svn: 271296
2016-05-31 16:57:45 +00:00
Matt Arsenault d8d304d1d6 AMDGPU: Fix trailing whitespace
llvm-svn: 271081
2016-05-28 00:50:51 +00:00
Matt Arsenault 7401516985 AMDGPU: Add fract intrinsic
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.

llvm-svn: 271078
2016-05-28 00:19:52 +00:00
Artem Tamazov 7da9b82e02 [AMDGPU][llvm-mc] Square-braced-syntax for registers - make ":expr2" optional.
Register numbers may be specified as assembly-time expressions.
This feature can be useful in macros and alike. However, expressions
are supported within sqare braces only.

Sqare braces were initially intended to support specifying of multiple
(pairs/quads...) registers. Syntax like v[8:8] which specifies single register
is also supported. That allows expressions but looks a bit unnatural.

This change supports syntax REG[EXPR].
Tests added.

Differential Revision: http://reviews.llvm.org/D20588

llvm-svn: 270990
2016-05-27 12:50:13 +00:00
Benjamin Kramer 4fed928f53 Avoid some copies by using const references.
clang-tidy's performance-unnecessary-copy-initialization with some manual
fixes. No functional changes intended.

llvm-svn: 270988
2016-05-27 12:30:51 +00:00
Benjamin Kramer 3e9a5d3468 Apply clang-tidy's misc-static-assert where it makes sense.
Also fold conditions into assert(0) where it makes sense. No functional
change intended.

llvm-svn: 270982
2016-05-27 11:36:04 +00:00
Changpeng Fang 71369b3a39 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

llvm-svn: 270894
2016-05-26 19:35:29 +00:00
Artem Tamazov 6edc135d0f [AMDGPU][llvm-mc] s_getreg/setreg* - hwreg - factor out strings/literals etc.
Hwreg(...) syntax implementation unified with sendmsg(...).
Common strings moved to Utils
MathExtras.h functionality utilized.
Added missing build dependency in Disassembler.

Differential Revision: http://reviews.llvm.org/D20381

llvm-svn: 270871
2016-05-26 17:00:33 +00:00
Artem Tamazov b49c3361e5 Fix build warning introduced in r270552 "[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers."
llvm-svn: 270859
2016-05-26 15:52:16 +00:00
Diana Picus 81bc3170e8 [AMDGPU] Remove exit-on-error flag from test (PR27762)
Similar to r269948, but for argument lowering.

Fixes PR27762

Differential Revision: http://reviews.llvm.org/D20430

llvm-svn: 270856
2016-05-26 15:24:55 +00:00
Matt Arsenault e57206d81b AMDGPU: Fix v2i64/v2f64 bitcasts
These operations tend to get promoted away to v4i32 so
this doesn't happen often.

llvm-svn: 270740
2016-05-25 18:07:36 +00:00
Matt Arsenault 1cc4991412 AMDGPU: Fix inconsistent lowering of select of vectors
f32 vectors would use a sequence of BFI instructions instead
of unrolled cmp + select. This was better in the case of a VALU
select with SGPR inputs, but we don't have a way of dealing with that
in the DAG.

llvm-svn: 270731
2016-05-25 17:34:58 +00:00
Nirav Dave e003bb7ead Soften assertion in AMDGPU emitPrologue.
[AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not
the scratch descriptor. Continue search if unused register found fails
other requirements.

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20526

llvm-svn: 270646
2016-05-25 01:45:42 +00:00
Konstantin Zhuravlyov 29ddd2b2f2 [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081

llvm-svn: 270594
2016-05-24 18:37:18 +00:00
Sam Kolton 11de370cca [AMDGPU] Assembler: rework parsing of optional operands.
Summary:
Change process of parsing of optional operands. All optional operands use same parsing method - parseOptionalOperand().
No default values are added to OperandsVector.
Get rid of WORKAROUND_USE_DUMMY_OPERANDS_INSTEAD_MUTIPLE_DEFAULT_OPERANDS.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov, nhaustov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20527

llvm-svn: 270556
2016-05-24 12:38:33 +00:00
Artem Tamazov 212a251c8d [AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers.
Differential Revision: http://reviews.llvm.org/D20476

llvm-svn: 270552
2016-05-24 12:05:16 +00:00
Sam Kolton 1bdcef7697 [AMDGPU] Assembler: refactor parsing of modifiers and immediates. Allow modifiers for imms.
Reviewers: nhaustov, tstellarAMD

Subscribers: kzhuravl, arsenm

Differential Revision: http://reviews.llvm.org/D20166

llvm-svn: 270415
2016-05-23 09:59:02 +00:00
Matt Arsenault 7f9eabd2c2 AMDGPU: Define priorities for register classes
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).

Patch by Matthias Braun

llvm-svn: 270312
2016-05-21 03:55:07 +00:00
Matt Arsenault 71e6676169 AMDGPU: Cleanup lowering actions
These are kind of a mess and hard to follow, particularly
for loads and stores. Fix various redundant, unnecessary
and dead settings.

llvm-svn: 270307
2016-05-21 02:27:49 +00:00
Matt Arsenault 81a709503d AMDGPU: Fix high bits after division optimization
This is essentially doing a 24-bit signed division with FP.
We need to truncate to the N bit result.

llvm-svn: 270305
2016-05-21 01:53:33 +00:00
Matt Arsenault b6e1cc2a92 AMDGPU: Fix verifier error when spilling SGPRs
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.

llvm-svn: 270301
2016-05-21 00:53:42 +00:00
Matt Arsenault 8f5e008534 AMDGPU: Fix relationship between SReg_32 and SReg_32_XM0
llvm-svn: 270300
2016-05-21 00:53:28 +00:00
Matt Arsenault 4945905f5f AMDGPU: Handle cbranch vccz/vccnz
llvm-svn: 270297
2016-05-21 00:29:40 +00:00
Matt Arsenault 72fcd5f597 AMDGPU: Implement ReverseBranchCondition
llvm-svn: 270296
2016-05-21 00:29:34 +00:00
Matt Arsenault 6d09380532 AMDGPU: Implement AnalyzeBranch
Original patch by Tom Stellard

llvm-svn: 270295
2016-05-21 00:29:27 +00:00
Matt Arsenault 4e3d383c46 AMDGPU: Remove pointless conversions
llvm-svn: 270139
2016-05-19 21:09:58 +00:00
Matt Arsenault 4318ea354a AMDGPU: Also look for s_cbranch_vccz
llvm-svn: 270091
2016-05-19 18:20:25 +00:00
Artem Tamazov 8ce1f7177b [AMDGPU][llvm-mc] Fixes to support buffer atomics.
Fixes for MUBUF_Atomic instructions to make operand list valid:
 - For RTN insns, make a copy of $vdata_in operand as $vdata.
 - Do not add operand for GLC, it is hardcoded and comes as a token.
Workaround to avoid adding multiple default optional operands.
Tests added.

Differential Revision: http://reviews.llvm.org/D20257

llvm-svn: 270049
2016-05-19 12:22:39 +00:00
Matt Arsenault c5bebac934 AMDGPU: Fix verifier error when spilling undef subreg
llvm-svn: 270002
2016-05-18 23:35:53 +00:00
Matt Arsenault c438ef574d AMDGPU: Fix promote alloca for pointer loads
If the load has a pointer type, we don't want to change
its type.

llvm-svn: 270000
2016-05-18 23:20:24 +00:00
Rafael Espindola 8c34dd8257 Delete Reloc::Default.
Having an enum member named Default is quite confusing: Is it distinct
from the others?

This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.

llvm-svn: 269988
2016-05-18 22:04:49 +00:00
Jan Vesely ae265c03f7 AMDGPU: Fix incorrect simm check
Use signed division otherwise all back jumps fail the check
Fixes regression introduced in r269951

Differential Revision: http://reviews.llvm.org/D20380

llvm-svn: 269972
2016-05-18 19:07:58 +00:00
Matt Arsenault a519cf593f AMDGPU: Error if branch distance exceeds limit
llvm-svn: 269951
2016-05-18 16:10:24 +00:00
Matt Arsenault 1735da460b AMDGPU: Other sizes of popcnt are fast
We can chain bcnt instructions together, so
any width popcnt is pretty fast.

llvm-svn: 269950
2016-05-18 16:10:19 +00:00
Matt Arsenault 9430b9113a AMDGPU: Fix assert when erroring on a call
For some reason an assert is now hit when a valid chain
is not returned, so return the entry chain.

llvm-svn: 269948
2016-05-18 16:10:11 +00:00
Matt Arsenault 891fccc0c1 AMDGPU: Handle alloca promoting with null operands
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.

llvm-svn: 269945
2016-05-18 15:57:21 +00:00
Matt Arsenault bde80346c1 AMDGPU: Don't run passes that aren't useful
llvm-svn: 269943
2016-05-18 15:41:07 +00:00
Matt Arsenault ab3429c2b4 AMDGPU: Fix assert on ttmp registers
Use register class that does not include them when looking
for unallocated registers.

This is hit by the udiv v8i64 test in the opencl integer
conformance test, and takes a few seconds to compile in
a debug build so no test included.

llvm-svn: 269938
2016-05-18 15:19:50 +00:00
Jan Vesely 687ca8df18 AMDGPU/R600: Use correct number of vector elements when lowering private loads
Reviewer: tstellardAMD, arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D20032

llvm-svn: 269725
2016-05-16 23:56:32 +00:00
Matt Arsenault 8a028bf4d7 AMDGPU: Fix promote alloca pass creating huge arrays
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.

By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.

Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.

llvm-svn: 269708
2016-05-16 21:19:59 +00:00
Jan Vesely 91aacad9c3 AMDGPU: Unify LowerGlobalAddress
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19794

llvm-svn: 269481
2016-05-13 20:39:34 +00:00
Jan Vesely 1680039a7a AMDGPU/R600: Fold global address operand
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19793

llvm-svn: 269480
2016-05-13 20:39:31 +00:00
Jan Vesely f97de00745 AMDGPU/R600: Implement memory loads from constant AS
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19792

llvm-svn: 269479
2016-05-13 20:39:29 +00:00
Jan Vesely a1f9fdfcbc AMDGPU/R600: Add support for emitting MCExpr
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19791

llvm-svn: 269478
2016-05-13 20:39:26 +00:00
Jan Vesely 7971464da8 AMDGPU: Add support for MCExpr to instruction printer
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19790

llvm-svn: 269477
2016-05-13 20:39:24 +00:00
Jan Vesely 4368c1cf7e AMDGPU/R600: Use machine operands instead of ints to track literals
This will be used for global addresses

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19789

llvm-svn: 269476
2016-05-13 20:39:22 +00:00
Jan Vesely fac8d7ecb1 AMDGPU/R600: There are other uses for ALU_LITERAL besides Imm
This will be used for GV

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19788

llvm-svn: 269475
2016-05-13 20:39:20 +00:00
Jan Vesely fbcb754e66 AMDGPU: Make CONST_DATA_PTR available to R600
Rename to AMDGPUconstdata_ptr

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19786

llvm-svn: 269474
2016-05-13 20:39:18 +00:00
Jan Vesely 81f1b30035 AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2)
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19785

llvm-svn: 269473
2016-05-13 20:39:16 +00:00
Konstantin Zhuravlyov e3d322af57 [AMDGPU] Update nop insertion for debugger usage
- Insert one nop for each high level statement instead of two
- Do not insert nop before prologue

Differential Revision: http://reviews.llvm.org/D20215

llvm-svn: 269452
2016-05-13 18:21:28 +00:00
Matt Arsenault 999f7dd84c AMDGPU: Remove verifier check for scc live ins
We only really need this to be true for SIFixSGPRCopies.
I'm not sure there's any way this could happen before that point.

Fixes a case where MachineCSE could introduce a cross block
scc use.

llvm-svn: 269391
2016-05-13 04:15:48 +00:00
Justin Bogner 95927c0fd0 SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISel
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
  the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.

Part of llvm.org/pr26808.

llvm-svn: 269349
2016-05-12 21:03:32 +00:00
Matt Arsenault 8300272823 AMDGPU: Fix getIntegerAttribute type and error message
llvm-svn: 269268
2016-05-12 02:45:18 +00:00
Matt Arsenault a61cb48dd2 AMDGPU: Fix breaking IR on instructions with multiple pointer operands
The promote alloca pass would attempt to promote an alloca with
a select, icmp, or phi user, even though the other operand was
from a non-promotable source, producing a select on two different
pointer types.

Only do this if we know that both operands derive from the same
alloca. In the future we should be able to relax this to an alloca
which will also be promoted.

llvm-svn: 269265
2016-05-12 01:58:58 +00:00
Matt Arsenault 4234542503 AMDGPU: Make some instructions convergent
llvm-svn: 269147
2016-05-11 00:32:31 +00:00
Matt Arsenault e8ed8e59e5 AMDGPU: Change private_element_size to 4
llvm-svn: 269145
2016-05-11 00:28:54 +00:00
Peter Collingbourne dba995601b Cloning: Clean up the interface to the CloneFunction function.
Remove the ModuleLevelChanges argument, and the ability to create new
subprograms for cloned functions. The latter was added without review in
r203662, but it has no in-tree clients (all non-test callers pass false
for ModuleLevelChanges [1], so it isn't reachable outside of tests). It
also isn't clear that adding a duplicate subprogram to the compile unit is
always the right thing to do when cloning a function within a module. If
this functionality comes back it should be accompanied with a more concrete
use case.

Furthermore, all in-tree clients add the returned function to the module.
Since that's pretty much the only sensible thing you can do with the function,
just do that in CloneFunction.

[1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction

Differential Revision: http://reviews.llvm.org/D18628

llvm-svn: 269110
2016-05-10 20:23:24 +00:00
Konstantin Zhuravlyov a791932145 [AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops
Differential Revision: http://reviews.llvm.org/D20117

llvm-svn: 269098
2016-05-10 18:33:41 +00:00
Matthias Braun 31d19d43c7 CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.

llvm-svn: 269011
2016-05-10 03:21:59 +00:00
Simon Pilgrim 0a81921cdb Fixed unused but set variable warning
llvm-svn: 268931
2016-05-09 16:42:23 +00:00
Matt Arsenault a949dc619c AMDGPU: Fold shift into cvt_f32_ubyteN
llvm-svn: 268930
2016-05-09 16:29:50 +00:00
Artem Tamazov f0b6b40fa4 [AMDGPU][llvm-mc] Some refactoring of .td files
Some custom Operands and AsmOperandClasses moved to proper place.
No functional changes.

Differential Revision: http://reviews.llvm.org/D20012

llvm-svn: 268780
2016-05-06 19:32:38 +00:00
Artem Tamazov ebe71ce36a [AMDGPU][llvm-mc] Add support for sendmsg(...) syntax.
Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax
in s_sendmsg and s_sendmsghalt instructions.
The syntax matches the SP3 assembler/disassembler rules.
That is why implicit inputs (like M0 and EXEC) are not printed
to disassembly output anymore.

sendmsg(...) allows only known message types and attributes,
even if literals are used instead of symbolic names.
However, raw literal (without "sendmsg") still can be used,
and that allows for any 16-bit value.

Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19596

llvm-svn: 268762
2016-05-06 17:48:48 +00:00
Nikolay Haustov 6eb050ea4e Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2."
This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380.

It broke calling OpenCL kernel from another kernel.

llvm-svn: 268739
2016-05-06 14:59:04 +00:00
Sam Kolton 5f10a137d0 [TableGen] AsmMatcher: support for default values for optional operands
Summary:
This change allows to specify "DefaultMethod" for optional operand (IsOptional = 1) in AsmOperandClass that return default value for operand. This is used in convertToMCInst to set default values in MCInst.
Previously if you wanted to set default value for operand you had to create custom converter method. With this change it is possible to use standard converters even when optional operands presented.

Reviewers: tstellarAMD, ab, craig.topper

Subscribers: jyknight, dsanders, arsenm, nhaustov, llvm-commits

Differential Revision: http://reviews.llvm.org/D18242

llvm-svn: 268726
2016-05-06 11:31:17 +00:00
Nikolay Haustov dc1bb79b92 AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
Summary:
    Check calling convention in AMDGPUMachineFunction::isKernel

    This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

    Also, in the future unused non-kernels may be optimized.

    Reviewers: tstellarAMD, arsenm

    Subscribers: arsenm, joker.eph, llvm-commits

    Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 268719
2016-05-06 09:23:13 +00:00
Justin Bogner b012699741 SDAG: Rename Select->SelectImpl and repurpose Select as returning void
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.

We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.

Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.

llvm-svn: 268693
2016-05-05 23:19:08 +00:00
Matt Arsenault 539ca882c6 AMDGPU: Simplify control flow / conditions
llvm-svn: 268676
2016-05-05 20:27:02 +00:00
Nicolai Haehnle ffbd56a1c9 AMDGPU: Uniform branch conditions can originate with intrinsics
Summary:
Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS
GL43-CTS.shader_storage_buffer_object.advanced-matrix.

In this particular case, the buffer load intrinsic fed into a uniform
conditional branch, and led the brcond lowering down the wrong path.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19931

llvm-svn: 268650
2016-05-05 17:36:36 +00:00
Tom Stellard fcfaea4cff AMDGPU/SI: Add support for AMD code object version 2.
Summary:
Version 2 is now the default.  If you want to emit version 1, use
the amdgcn--amdhsa-amdcov1 triple.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19283

llvm-svn: 268647
2016-05-05 17:03:33 +00:00
Jan Vesely bbc2231983 AMDGPU/R600: Minor cleanup in InstrInfo
Use std::make_pair instead of constructor
Use C++11 loop
Reuse helper var

Reviewers: tstellardAMD

Subsribers: arsenm

Differential Revision: http://reviews.llvm.org/D19787

llvm-svn: 268503
2016-05-04 14:55:45 +00:00
Tom Stellard 4a304b3886 AMDGPU/SI: Use range loops to simplify some code in the SI Scheduler
Reviewers: arsenm, axeldavy

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19822

llvm-svn: 268396
2016-05-03 16:30:56 +00:00
Aaron Ballman 3bd56b3b43 Silence unused variable warning; NFC.
llvm-svn: 268392
2016-05-03 15:17:25 +00:00
Matt Arsenault bcdfee7030 AMDGPU: Custom lower v2i32 loads and stores
This will allow us to split up 64-bit private accesses when
necessary.

llvm-svn: 268296
2016-05-02 20:13:51 +00:00
Tom Stellard 154c9cdd24 AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch
We were using v_readlane_b32 with the lane set to zero, but this won't
work if thread 0 is not active.

Differential Revision: http://reviews.llvm.org/D19745

llvm-svn: 268295
2016-05-02 20:11:44 +00:00
Matt Arsenault 2b957b5a6f AMDGPU: Make i64 loads/stores promote to v2i32
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.

This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.

llvm-svn: 268293
2016-05-02 20:07:26 +00:00
Reid Kleckner 0549ab6033 Fix instance of -Winconsistent-missing-override in AMDGPU code
llvm-svn: 268289
2016-05-02 19:45:10 +00:00
Tom Stellard ce5e994887 AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch
Summary:
When we restore an SGPR value from scratch, we first load it into a
temporary VGPR and then use v_readlane_b32 to copy the value from the
VGPR back into an SGPR.

We weren't setting the kill flag on the VGPR in the v_readlane_b32
instruction, so the register scavenger wasn't able to re-use this
temp value later.

I wasn't able to create a lit test for this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19744

llvm-svn: 268287
2016-05-02 19:37:56 +00:00
Tom Stellard 27233b727f AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cpp
Reviewers: arsenm

Subscribers: jvesely, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19736

llvm-svn: 268267
2016-05-02 18:05:17 +00:00
Tom Stellard 341e293d67 AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260
We can't use MI->getDebugLoc() when MI is an iterator that could be
MBB.end().

llvm-svn: 268265
2016-05-02 18:02:24 +00:00
Tom Stellard 1f520e5c98 AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses
Summary:
Add support for detecting hazards in SMEM soft clauses, so that we only
break the clauses when necessary, either by adding s_nop or re-ordering
other alu instructions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18870

llvm-svn: 268260
2016-05-02 17:39:06 +00:00
Nicolai Haehnle 119d3d80cb AMDGPU: llvm.SI.fs.constant is a source of divergence
Summary:
This intrinsic is used to get flat-shaded fragment shader inputs. Those are
uniform across a primitive, but a fragment shader wave may process pixels from
multiple primitives (as indicated by the prim_mask), and so that's where
divergence can arise.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19747

llvm-svn: 268259
2016-05-02 17:37:01 +00:00
Tom Stellard a27007eb4f AMDGPU/SI: Use hazard recognizer to detect DPP hazards
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18603

llvm-svn: 268247
2016-05-02 16:23:09 +00:00
Aaron Ballman 5c190d056d Silence unused variable warnings; NFC.
llvm-svn: 268234
2016-05-02 14:48:03 +00:00
Rafael Espindola 92dd7b82be Add missing override.
llvm-svn: 268163
2016-04-30 15:18:21 +00:00
Tom Stellard c51e4468b7 AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits
This was supposed to be part of r268143.

llvm-svn: 268154
2016-04-30 04:04:48 +00:00
Tom Stellard cb6ba62d6f AMDGPU/SI: Enable the post-ra scheduler
Summary:
This includes a hazard recognizer implementation to replace some of
the hazard handling we had during frame index elimination.

Reviewers: arsenm

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18602

llvm-svn: 268143
2016-04-30 00:23:06 +00:00
Matt Arsenault 701c21ea10 AMDGPU: Fix crash with unreachable terminators.
If a block has no successors because it ends in unreachable,
this was accessing an invalid iterator.

Also stop counting instructions that don't emit any
real instructions.

llvm-svn: 268119
2016-04-29 21:52:13 +00:00
Matt Arsenault dc4ebad6d4 AMDGPU: Add kernarg.segment.ptr intrinsic
llvm-svn: 268105
2016-04-29 21:16:52 +00:00
Matt Arsenault cf2744f1c8 AMDGPU/SI: Move post regalloc run of SIShrinkInstructions
Move to addPreEmitPass. This is so it runs after post-RA
scheduling so we can merge s_nops emitted by the scheduler
and hazard recognizer.

llvm-svn: 268095
2016-04-29 20:23:42 +00:00
Artem Tamazov 38e496b175 Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
Previously reverted by r267752.

r267733 review:
Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 268066
2016-04-29 17:04:50 +00:00
Tom Stellard 92b24f324b AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions
Summary:
These instructions can add an immediate offset to the address, like other
ds instructions.

Reviewers: arsenm

Subscribers: arsenm, scchan

Differential Revision: http://reviews.llvm.org/D19233

llvm-svn: 268043
2016-04-29 14:34:26 +00:00
Nikolay Haustov 4f672a34ed AMDGPU/SI: Assembler: Unify parsing/printing of operands.
Summary:
The goal is for each operand type to have its own parse function and
at the same time share common code for tracking state as different
instruction types share operand types (e.g. glc/glc_flat, etc).

Introduce parseAMDGPUOperand which can parse any optional operand.
DPP and Clamp/OMod have custom handling for now. Sam also suggested
to have class hierarchy for operand types instead of table. This
can be done in separate change.

Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps,
parseMubufOptionalOps, parseDPPOptionalOps.
Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class.
Rename AsmMatcher/InstPrinter methods accordingly.
Print immediate type when printing parsed immediate operand.
Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3).
Update tests.

Reviewers: tstellarAMD, SamWot, artem.tamazov

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19584

llvm-svn: 268015
2016-04-29 09:02:30 +00:00
Matt Arsenault 7d1b6c81af AMDGPU: Stop reporting an addressing mode for unknown addrspace
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.

llvm-svn: 268002
2016-04-29 06:25:10 +00:00
Matt Arsenault 1c4d0efe56 AMDGPU: Emit error if too much LDS is used
llvm-svn: 267922
2016-04-28 19:37:35 +00:00
Matt Arsenault c5fce69031 AMDGPU: Fix mishandling array allocations when promoting alloca
The canonical form for allocas is a single allocation of the array type.
In case we see a non-canonical array alloca, make sure we aren't
replacing this with an array N times smaller.

llvm-svn: 267916
2016-04-28 18:38:48 +00:00
Craig Topper 33772c5375 [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior.
llvm-svn: 267853
2016-04-28 03:34:31 +00:00
Matt Arsenault 0547b016b1 AMDGPU: Account for globals in AMDGPUPromoteAlloca pass
Patch by Bas Nieuwenhuizen

llvm-svn: 267791
2016-04-27 21:05:08 +00:00
Chad Rosier 03e1647d19 Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
This reverts commit r267733 due to a -Werror,-Wunused-function error.

llvm-svn: 267752
2016-04-27 18:29:11 +00:00
Reid Kleckner 7f0ae15e9d Silence a -Wdangling-else
llvm-svn: 267737
2016-04-27 16:46:33 +00:00
Artem Tamazov 3896f8f83d [AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD.
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.

Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 267733
2016-04-27 16:20:23 +00:00
Nicolai Haehnle f66bdb5ea8 AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.

(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)

Reviewers: arsenm, mareko, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19203

llvm-svn: 267729
2016-04-27 15:46:01 +00:00
Artem Tamazov 5cd55b1784 [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers.
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19335

llvm-svn: 267724
2016-04-27 15:17:03 +00:00
Ahmed Bougacha 128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
Konstantin Zhuravlyov 71515e57f9 [AMDGPU] Move reserved vgpr count for trap handler usage to SIMachineFunctionInfo + minor commenting changes
Differential Revision: http://reviews.llvm.org/D19537

llvm-svn: 267573
2016-04-26 17:24:40 +00:00
Konstantin Zhuravlyov 1d99c4d03c [AMDGPU] Reserve VGPRs for trap handler usage if instructed
Differential Revision: http://reviews.llvm.org/D19235

llvm-svn: 267563
2016-04-26 15:43:14 +00:00
Sam Kolton 3025e7f25f [AMDGPU] Assembler: basic support for SDWA instructions
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
  - converters for support optional operands and modifiers
  - VOPC
  - sext() modifier
  - intrinsics
  - VOP2b (see vop_dpp.s)
  - V_MAC_F32 (see vop_dpp.s)

Differential Revision: http://reviews.llvm.org/D19360

llvm-svn: 267553
2016-04-26 13:33:56 +00:00
Andrew Kaylor 7de74af929 Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450

llvm-svn: 267485
2016-04-25 22:23:44 +00:00
Matt Arsenault 074ea2851c AMDGPU/SI: Optimize adjacent s_nop instructions
Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.

llvm-svn: 267456
2016-04-25 19:53:22 +00:00
Matt Arsenault 99c14524ec AMDGPU: Implement addrspacecast
llvm-svn: 267452
2016-04-25 19:27:24 +00:00
Matt Arsenault 48ab526f12 AMDGPU: Add queue ptr intrinsic
llvm-svn: 267451
2016-04-25 19:27:18 +00:00
Matt Arsenault dfaf4261ab AMDGPU: Add DAG to debug dump
Also reorder case to match enum order

llvm-svn: 267449
2016-04-25 19:27:09 +00:00
Etienne Bergeron 06c14ec31e Fix incorrect redundant expression in target AMDGPU.
Summary:
The expression is detected as a redundant expression.
Turn out, this is probably a bug.

```
/home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression]
  if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) {
```

Reviewers: rnk, tstellarAMD

Subscribers: arsenm, cfe-commits

Differential Revision: http://reviews.llvm.org/D19460

llvm-svn: 267415
2016-04-25 15:06:33 +00:00
Artem Tamazov d6468666b5 [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.
Added hwreg(reg[,offset,width]) syntax.
Default offset = 0, default width = 32.
Possibility to specify 16-bit immediate kept.
Added out-of-range checks.
Disassembling is always to hwreg(...) format.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19329

llvm-svn: 267410
2016-04-25 14:13:51 +00:00
Craig Topper 855d182656 Fix a couple assertions that can never fire because they just contained the text string which always evaluates to true. Add a ! so they'll evaluate to false.
llvm-svn: 267312
2016-04-24 02:01:25 +00:00
Matt Arsenault 7e8de01f84 AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size
llvm-svn: 267244
2016-04-22 22:59:16 +00:00
Matt Arsenault efa3fe14d1 AMDGPU: Re-visit nodes in performAndCombine
This fixes test regressions when i64 loads/stores are made promote.

llvm-svn: 267240
2016-04-22 22:48:38 +00:00
Konstantin Zhuravlyov a40d8358e7 [AMDGPU] Insert nop pass: take care of outstanding feedback
- Switch few loops to range-based for loops
- Fix nop insertion at the end of BB
- Fix formatting
- Check for endpgm

Differential Revision: http://reviews.llvm.org/D19380

llvm-svn: 267167
2016-04-22 17:04:51 +00:00
Nicolai Haehnle b0c9748709 AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.

Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.

This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19191

llvm-svn: 267102
2016-04-22 04:04:08 +00:00
Matt Arsenault 98f8394e7c AMDGPU: Fix debug name of pass to better match
I get this wrong every time I try to debug this.

llvm-svn: 267030
2016-04-21 18:21:54 +00:00
Nicolai Haehnle 97788020c5 Split IntrReadArgMem into IntrReadMem and IntrArgMemOnly
Summary:
IntrReadWriteArgMem simply becomes IntrArgMemOnly.

So there are fewer intrinsic properties that express their orthogonality
better, and correspond more closely to the corresponding IR attributes.

Suggested by: Philip Reames

Reviewers: joker.eph, reames, tstellarAMD

Subscribers: jholewinski, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19291

llvm-svn: 267021
2016-04-21 17:48:02 +00:00
Sam Kolton 201398e8a3 [AMDGPU] Assembler: prevent parseDPPCtrlOps from eating invalid tokens
Reviewers: nhaustov, tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19317

llvm-svn: 266984
2016-04-21 13:14:24 +00:00
Nikolay Haustov fb5c307ccd AMDGPU/SI: Assembler: improvements to support trap handlers.
Add ParseAMDGPURegister which can be invoked recursively for parsing lists.
Rename getRegForName to getSpecialRegForName.
Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi].
Add 64-bit registers TBA, TMA where missing.
Add some tests.

Differential Revision: http://reviews.llvm.org/D19163

llvm-svn: 266865
2016-04-20 09:34:48 +00:00
Nicolai Haehnle b48275f134 Add IntrWrite[Arg]Mem intrinsic property
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.

An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.

With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.

Reviewers: tstellarAMD, arsenm, joker.eph, reames

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18291

llvm-svn: 266826
2016-04-19 21:58:33 +00:00
Nicolai Haehnle e2dda4f750 AMDGPU: Guard VOPC instructions against incorrect commute
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19208

llvm-svn: 266825
2016-04-19 21:58:22 +00:00
Nicolai Haehnle 7483937bf0 AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi
Summary:
A shader stored the live mask (initial exec mask) in an SGPR which was then
spilled during register allocation. The allocator quite reasonably
optimized turned the spill into

  v_writelane_b32 %vgpr, exec_lo, N
  v_writelane_b32 %vgpr, exec_hi, N+1

at the beginning of the shader, confusing the SGPR accounting.

No test case, because si-sgpr-spill.ll together with an upcoming patch for
WQM handling exhibits the problem.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19199

llvm-svn: 266824
2016-04-19 21:58:17 +00:00
Konstantin Zhuravlyov 8c273ad719 [AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test

Differential Revision: http://reviews.llvm.org/D19079

llvm-svn: 266626
2016-04-18 16:28:23 +00:00
Artem Tamazov e2762423c2 [AMDGPU][llvm-mc] s_setreg* - Fix order of operands
Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first.

Differential Revision: http://reviews.llvm.org/D19161

llvm-svn: 266617
2016-04-18 14:54:26 +00:00
Aaron Ballman 2eeefe8ed8 Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC.
llvm-svn: 266616
2016-04-18 14:47:19 +00:00
Mehdi Amini b550cb1750 [NFC] Header cleanup
Removed some unused headers, replaced some headers with forward class declarations.

Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'

Patch by Eugene Kosov <claprix@yandex.ru>

Differential Revision: http://reviews.llvm.org/D19219

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
2016-04-18 09:17:29 +00:00
Matt Arsenault c10783c42d AMDGPU: Enable LocalStackSlotAllocation pass
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.

This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.

llvm-svn: 266508
2016-04-16 02:13:37 +00:00
Matt Arsenault b6be202779 AMDGPU: Use s_addk_i32 / s_mulk_i32
llvm-svn: 266506
2016-04-16 01:46:49 +00:00
Jun Bum Lim 4c5bd58ebe [MachineScheduler]Add support for store clustering
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.

This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().

llvm-svn: 266437
2016-04-15 14:58:38 +00:00
Nicolai Haehnle 750082d1fe AMDGPU/SI: Fix regression with no-return atomics
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19043

llvm-svn: 266433
2016-04-15 14:42:36 +00:00
Matt Arsenault 9c499c3a74 AMDGPU: Remove custom load/store scalarization
llvm-svn: 266385
2016-04-14 23:31:26 +00:00
Matt Arsenault fd8ab09c0e AMDGPU: Include LDS size in printed comment
llvm-svn: 266382
2016-04-14 22:11:51 +00:00