Commit Graph

2512 Commits

Author SHA1 Message Date
Daniil Fukalov 6e1dc68117 [AMDGPU] fix LDS f32 intrinsics
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic

Reviewed by: b-sumner

Differential Revision: https://reviews.llvm.org/D42383

llvm-svn: 323516
2018-01-26 11:09:38 +00:00
Geoff Berry c4796d4745 [AMDGPU] Make sure all super regs of reserved regs are marked reserved.
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.

Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.

Reviewers: arsenm, tstellar, MatzeB, qcolombet

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D42448

llvm-svn: 323356
2018-01-24 18:09:53 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Mark Searles 7687d42052 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153
2018-01-22 21:46:43 +00:00
Hiroshi Inoue 290adb3184 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323074
2018-01-22 05:54:46 +00:00
Dmitry Preobrazhensky 0e074e349d [AMDGPU][MC] Corrected parsing of image modifiers and encoding of image atomics
See bugs
    35962: https://bugs.llvm.org/show_bug.cgi?id=35962
    35963: https://bugs.llvm.org/show_bug.cgi?id=35963

Differential Revision: https://reviews.llvm.org/D42184

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322942
2018-01-19 13:49:53 +00:00
Matthias Braun 4a7c8e7aa2 Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.

Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.

llvm-svn: 322927
2018-01-19 06:46:10 +00:00
Changpeng Fang ba6240cc71 AMDGPU/SI: Fix typos in d16 support patch the buffer intrinsics.
llvm-svn: 322906
2018-01-18 22:57:57 +00:00
Changpeng Fang 4737e892de AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

llvm-svn: 322903
2018-01-18 22:08:53 +00:00
Aditya Nandakumar 18b3f9d384 [GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC
https://reviews.llvm.org/D42149

llvm-svn: 322743
2018-01-17 19:31:33 +00:00
Matt Arsenault 1491ca8911 AMDGPU: Error in SIAnnotateControlFlow instead of assert
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.

llvm-svn: 322685
2018-01-17 16:30:01 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Dmitry Preobrazhensky 6b65f7c380 [AMDGPU][MC][GFX9] Enable inline constants for SDWA operands
See bug 35771: https://bugs.llvm.org/show_bug.cgi?id=35771

Differential Revision: https://reviews.llvm.org/D42058

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322655
2018-01-17 14:00:48 +00:00
Stanislav Mekhanoshin 62875fcd6c [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32
Differential Revision: https://reviews.llvm.org/D41617

llvm-svn: 322500
2018-01-15 18:49:15 +00:00
Stanislav Mekhanoshin f630047ef6 [AMDGPU] Copy impdefs from pseudo to real instructions
In some cases we do not copy implicit defs from pseudo to real
VOP instructions. It has no visible impact at the moment thus no
tests are affected or added.

Differential Revision: https://reviews.llvm.org/D41783

llvm-svn: 322496
2018-01-15 17:55:35 +00:00
Tim Renouf 75ced9d5b8 [AMDGPU] stop image_store being moved illegally
Summary:
A recent change
321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
can allow the machine instruction scheduler to move an image store past
an image load using the same descriptor.

V2: Fixed by marking image ops as mayAlias and isAliased. This may be
overly conservative, and we may need to revisit.
V3: Reverted test change done on 321556.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D41969

llvm-svn: 322419
2018-01-12 22:57:24 +00:00
Changpeng Fang 44dfa1de3b AMDGPU/SI: Add d16 support for buffer intrinsics.
Differential Revision:
  https://reviews.llvm.org/D38906

Reviewers:
  Matt and Brian.

llvm-svn: 322402
2018-01-12 21:12:19 +00:00
Dmitry Preobrazhensky 3afbd825a3 [AMDGPU][MC][GFX8][GFX9] Added XNACK_MASK support
See bug 35764: https://bugs.llvm.org/show_bug.cgi?id=35764

Differential Revision: https://reviews.llvm.org/D41614

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322189
2018-01-10 14:22:19 +00:00
Tim Renouf 6eaad1e539 [AMDGPU] Fixed incorrect uniform branch condition
Summary:
I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.

There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have added the same code into SILowerControlFlow for
the case that the branch is instruction selected as s_cbranch_vccnz.

This de-optimizes the code in some cases where the s_and is not needed,
because vcc is the result of a v_cmp, or multiple v_cmp instructions
combined by s_and/s_or. We should add a pass to re-optimize those cases.

Reviewers: arsenm, kzhuravl

Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle

Differential Revision: https://reviews.llvm.org/D41292

llvm-svn: 322119
2018-01-09 21:34:43 +00:00
Matt Arsenault 4ff5e002ea AMDGPU: Remove dead file
llvm-svn: 321752
2018-01-03 18:45:42 +00:00
Alex Bradbury b22f751fa7 Thread MCSubtargetInfo through Target::createMCAsmBackend
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. 
D20830 threaded an MCSubtargetInfo reference through 
MCAsmBackend::relaxInstruction, but this isn't the only function that would 
benefit from access. This patch removes the Triple and CPUString arguments 
from createMCAsmBackend and replaces them with MCSubtargetInfo.

This patch just changes the interface without making any intentional 
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)

This change initially exposed PR35686, which has since been resolved in r321026.

Differential Revision: https://reviews.llvm.org/D41349

llvm-svn: 321692
2018-01-03 08:53:05 +00:00
Matt Arsenault e19bc2ee0f AMDGPU: Use unique PSVs for buffer resources
Also fixes using the wrong memory type for some
intrinsics when custom lowering them.

llvm-svn: 321557
2017-12-29 17:18:21 +00:00
Matt Arsenault d94b63d765 AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
Atomics still have hasSideEffects set on them because
of the mess that is the memory properties.

llvm-svn: 321556
2017-12-29 17:18:18 +00:00
Matt Arsenault 905f3518ba AMDGPU: Implement getTgtMemIntrinsic for images
Currently all images are lowered to have a single
image PseudoSourceValue. Image stores happen to have
overly strict mayLoad/mayStore/hasSideEffects flags
set on them, so this happens to work. When these
are fixed to be correct, the scheduler breaks
this because the identical PSVs are assumed to
be the same address. These need to be unique
to the image resource value.

llvm-svn: 321555
2017-12-29 17:18:14 +00:00
Dmitry Preobrazhensky 414e05383f [AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers
See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730

Differential Revision: https://reviews.llvm.org/D41598

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 321552
2017-12-29 13:55:11 +00:00
Sanjoy Das 26d11ca4b0 (Re-landing) Expose a TargetMachine::getTargetTransformInfo function
Re-land r321234.  It had to be reverted because it broke the shared
library build.  The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target.  As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).

Original commit message:

This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321375
2017-12-22 18:21:59 +00:00
Dmitry Preobrazhensky 471adf7fdc [AMDGPU][MC] Corrected handling of negative expressions
See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41488

llvm-svn: 321372
2017-12-22 18:03:35 +00:00
Dmitry Preobrazhensky c5b0c172f6 [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32
See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41186

llvm-svn: 321367
2017-12-22 17:13:28 +00:00
Dmitry Preobrazhensky 2713495318 [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registers
See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561

This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic.

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41437

llvm-svn: 321359
2017-12-22 15:18:06 +00:00
Sanjoy Das 747d1114d6 Revert "Expose a TargetMachine::getTargetTransformInfo function"
This reverts commit r321234.  It breaks the -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 321243
2017-12-21 02:34:39 +00:00
Sanjoy Das 0c3de350b4 Expose a TargetMachine::getTargetTransformInfo function
Summary:
This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321234
2017-12-21 01:06:58 +00:00
Matt Arsenault f7f59b5292 [AMDGPU, AsmParser] Enable the mnemonic spell corrector.
Patch by Dmitry Venikov

llvm-svn: 321202
2017-12-20 18:52:57 +00:00
Mark Searles e4f067ebe2 [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed.
Differential Revision: https://reviews.llvm.org/D41377

llvm-svn: 321100
2017-12-19 19:26:23 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Yaxun Liu c41e2f6e7b Recommit CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
The regression on ppc64 was not due to this commit.

llvm-svn: 320788
2017-12-15 03:56:57 +00:00
Matt Arsenault 7d7adf4f2e TLI: Allow using PSV for intrinsic mem operands
llvm-svn: 320756
2017-12-14 22:34:10 +00:00
Matt Arsenault 1117133687 DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.

On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.

llvm-svn: 320746
2017-12-14 21:39:51 +00:00
Yaxun Liu f902ef0a5d Revert CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
This commit might have caused regression on ppc64. Revert it to verify that.

llvm-svn: 320712
2017-12-14 16:12:04 +00:00
Yaxun Liu a5315a040d CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
Two issues were found about machine inst scheduler when compiling ProRender
with -g for amdgcn target:

GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it
should not since DBG_VALUE is not mapped in LiveIntervals.

when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and
ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D41132

llvm-svn: 320650
2017-12-13 22:38:09 +00:00
Matt Arsenault cad7fa857c AMDGPU: Partially fix disassembly of MIMG instructions
Stores failed to decode at all since they didn't have a
DecoderNamespace set. Loads worked, but did not change
the register width displayed to match the numbmer of
enabled channels.

The number of printed registers for vaddr is still wrong,
but I don't think that's encoded in the instruction so
there's not much we can do about that.

Image atomics are still broken. MIMG is the same
encoding for SI/VI, but the image atomic classes
are split up into encoding specific versions unlike
every other MIMG instruction. They have isAsmParserOnly
set on them for some reason. dmask is also special for
these, so we probably should not have it as an explicit
operand as it is now.

llvm-svn: 320614
2017-12-13 21:07:51 +00:00
Craig Topper ac59db2efe [Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately.
Most of the targets don't need the scheduler class enum.

I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.

llvm-svn: 320552
2017-12-13 07:26:17 +00:00
Matthias Braun f842297d50 Rename LiveIntervalAnalysis.h to LiveIntervals.h
Headers/Implementation files should be named after the class they
declare/define.

Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`

llvm-svn: 320546
2017-12-13 02:51:04 +00:00
Matt Arsenault 3e268cc0dd LSR: Check more intrinsic pointer operands
llvm-svn: 320424
2017-12-11 21:38:43 +00:00
Dmitry Preobrazhensky ac2b02643b [AMDGPU][MC][GFX9] Corrected encoding of ttmp registers, disabled tba/tma
See bugs 35494 and 35559:
https://bugs.llvm.org/show_bug.cgi?id=35494
https://bugs.llvm.org/show_bug.cgi?id=35559

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41007

llvm-svn: 320375
2017-12-11 15:23:20 +00:00
Konstantin Zhuravlyov c40d9f2e5d AMDGPU/GCN: Bring processors in sync with AMDGPUUsage
- Add gfx704
    - Change bonaire to gfx704
  - Remove gfx804
  - Remove gfx901
  - Remove gfx903

Differential Revision: https://reviews.llvm.org/D40046

llvm-svn: 320194
2017-12-08 20:52:28 +00:00
Matt Arsenault 73ce93b08b AMDGPU: Set IntrReadMem on memtime intrinsics
llvm-svn: 320188
2017-12-08 20:01:02 +00:00
Matt Arsenault 856777d8c9 AMDGPU: image_getlod and image_getresinfo do not read memory
llvm-svn: 320187
2017-12-08 20:00:57 +00:00
Matt Arsenault ecad0d5364 AMDGPU: Preserve MMO in adjustWritemask
Follow up to r319705. Currently the MMO is
produced after this in the custom inserter,
so this doesn't change anything yet.

llvm-svn: 320186
2017-12-08 20:00:45 +00:00
Konstantin Zhuravlyov e30f88f3a9 AMDGPU: Report Arg's Value name in metadata if kernel_arg_name metadata is not available
Differential Revision: https://reviews.llvm.org/D40924

llvm-svn: 320176
2017-12-08 19:22:12 +00:00
Tim Renouf cead41d42f [AMDGPU] add labels to +DumpCode output
Summary:
+DumpCode is a hack to embed disassembly in the ELF file. This commit
fixes it to include labels, to make it slightly more useful.

Reviewers: arsenm, kzhuravl

Subscribers: nhaehnle, timcorringham, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D40169

llvm-svn: 320146
2017-12-08 14:09:34 +00:00
Mark Searles 9ebdbb433a [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/15733/steps/build_Lld/logs/stdio :
lib/Target/AMDGPU/SIInsertWaitcnts.cpp:396:11: error: private field 'InstCnt' is not used [-Werror,-Wunused-private-field]
  int32_t InstCnt = 0;
          ^
1 error generated.
"
This reverts commit 71627f79010aafe74fdcba901bba28dd7caa0869.

llvm-svn: 320086
2017-12-07 21:14:41 +00:00
Mark Searles a84d23489a [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 320084
2017-12-07 20:36:39 +00:00
Mark Searles d29f24acfb [AMDGPU] Add GCNHazardRecognizer::checkInlineAsmHazards() and GCNHazardRecognizer::checkVALUHazardsHelper(). checkInlineAsmHazards() checks INLINEASM for hazards that we particularly care about (so not exhaustive); this patch adds a check for INLINEASM that defs vregs that hold data-to-be stored by immediately preceding store of more than 8 bytes. If the instr were not within an INLINEASM, this scenario would be handled by checkVALUHazard(). Add checkVALUHazardsHelper(), which will be called by both checkVALUHazards() and checkInlineAsmHazards().
Differential Revision: https://reviews.llvm.org/D40098

llvm-svn: 320083
2017-12-07 20:34:25 +00:00
Francis Visoiu Mistrih a8a83d150f [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Matt Arsenault 8ae38bc0bd AMDGPU: Fix SDWA crash on inline asm
This was only searching for explicit defs,
and asserting for any implicit or variadic
instruction defs, like inline asm.

llvm-svn: 319826
2017-12-05 20:32:01 +00:00
Matt Arsenault 7f0a527300 AMDGPU: Fix infinite loop with dbg_value
Surprisingly SIOptimizeExecMaskingPreRA can infinite loop
in some case with DBG_VALUE. Most tests using dbg_value are
run at -O0, so don't run this pass. This seems to only
happen when the value argument is undef.

llvm-svn: 319808
2017-12-05 18:23:17 +00:00
Matt Arsenault e42b08d96d AMDGPU: Fix missing subtarget feature initializer
llvm-svn: 319733
2017-12-05 03:15:44 +00:00
Matt Arsenault 9a60c3ea36 AMDGPU: Fix crash when scheduling DBG_VALUE
This calls handleMove with a DBG_VALUE instruction,
which isn't tracked by LiveIntervals. I'm not sure
this is the correct place to fix this. The generic
scheduler seems to have more deliberate region
selection that skips dbg_value.

The test is also really hard to reduce. I haven't been able
to figure out what exactly causes this particular case to
try moving the dbg_value.

llvm-svn: 319732
2017-12-05 03:09:23 +00:00
Jan Vesely 39aeab4f30 AMDGPU/EG: Add a new FeatureFMA and use it to selectively enable FMA instruction
Only used by pre-GCN targets
v2: fix predicate setting for FMA_Common

Differential Revision: https://reviews.llvm.org/D40692

llvm-svn: 319712
2017-12-04 23:07:28 +00:00
Jan Vesely d1c9b61e2b AMDGPU: Disable fp64 support on pre GCN asics
It's not implemented.
Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it

v2: fix hasFP64 query

Differential Revision: https://reviews.llvm.org/D39931

llvm-svn: 319709
2017-12-04 22:57:29 +00:00
Matt Arsenault 68f0505263 AMDGPU: Fix creating invalid copy when adjusting dmask
Move the entire optimization to one place. Before it was possible
to adjust dmask without changing the register class of the output
instruction, since they were done in separate places. Fix all
lane sizes and move all of the optimization into the DAG folding.

llvm-svn: 319705
2017-12-04 22:18:27 +00:00
Matt Arsenault e6667ded4d AMDGPU: Use return value of MorphNodeTo
llvm-svn: 319704
2017-12-04 22:18:22 +00:00
Francis Visoiu Mistrih 25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Sam Kolton 5f7f32c382 [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.
Summary:

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D37817

llvm-svn: 319662
2017-12-04 16:22:32 +00:00
Tim Corringham 6c6d5e24cd AMDGPU: fix missing s_waitcnt
Summary:
The pass that inserts s_waitcnt instructions where needed propagated
info used to track dependencies for each block by iterating over the
predecessor blocks. The iteration was terminated when a predecessor
that had not yet been processed was encountered. Any info in blocks
later in the list was therefore not processed, leading to the
possiblility of a required s_waitcnt not being inserted.

The fix is simply to change the "break" to "continue" for the
relevant loops, so that all visited blocks are processed. This
is likely what was intended when the code was written.

There is no test case provided for this fix because:
1) the only example that reproduces this is large and resistant to
being reduced
2) the change is trivial

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D40544

llvm-svn: 319651
2017-12-04 12:30:49 +00:00
Alexander Timofeev c1425c9d6b [AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI
Differential revision: https://reviews.llvm.org/D40556

llvm-svn: 319534
2017-12-01 11:56:34 +00:00
Matt Arsenault 686d5c728f AMDGPU: Use carry-less adds in FI elimination
llvm-svn: 319501
2017-11-30 23:42:30 +00:00
Matt Arsenault 84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Francis Visoiu Mistrih 93ef145862 [CodeGen] Print "%vreg0" as "%0" in both MIR and debug output
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).

Basically:

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed

Differential Revision: https://reviews.llvm.org/D40420

llvm-svn: 319427
2017-11-30 12:12:19 +00:00
Matt Arsenault caf0ed4d74 AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors
used for private access, so it should be OK to use vaddr with
a potentially negative value.

llvm-svn: 319393
2017-11-30 00:52:40 +00:00
Dmitry Preobrazhensky 1ac7177abb [AMDGPU][MC][GFX9] Corrected mapping of GFX9 v_add/sub/subrev_u32
When translating pseudo to MC, v_add/sub/subrev_u32 shall be mapped via a separate table as GFX8 has opcodes with the same names.
These instructions shall also be labelled as renamed for pseudoToMCOpcode to handle them correctly.

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D40550

llvm-svn: 319311
2017-11-29 13:33:40 +00:00
Matt Arsenault b655fa9ce2 DAG: Add nuw when splitting loads and stores
The object can't straddle the address space
wrap around, so I think it's OK to assume any
offsets added to the base object pointer can't
overflow. Similar logic already appears to be
applied in SelectionDAGBuilder when lowering
aggregate returns.

llvm-svn: 319272
2017-11-29 01:25:12 +00:00
Matt Arsenault 3f71c0e3ee AMDGPU: Select DS insts without m0 initialization
GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.

llvm-svn: 319270
2017-11-29 00:55:57 +00:00
Matt Arsenault 607a756651 AMDGPU: Enable IPRA
llvm-svn: 319256
2017-11-28 23:40:12 +00:00
Konstantin Zhuravlyov 06ae4ec78e AMDGPU: Add num spilled s/vgprs to metadata
This was requested by tools.

Differential Revision: https://reviews.llvm.org/D40321

llvm-svn: 319192
2017-11-28 17:51:08 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Francis Visoiu Mistrih 9d419d3b0c [CodeGen] Rename functions PrintReg* to printReg*
LLVM Coding Standards:
  Function names should be verb phrases (as they represent actions), and
  command-like function should be imperative. The name should be camel
  case, and start with a lower case letter (e.g. openFile() or isFoo()).

Differential Revision: https://reviews.llvm.org/D40416

llvm-svn: 319168
2017-11-28 12:42:37 +00:00
Nicolai Haehnle b4f28deda0 AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer
Summary:
The entire algorithm operates per basic-block, so for cache locality
it should be better to re-optimize a basic-block immediately rather than
in a separate loop.

I don't have performance measurements.

Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40344

llvm-svn: 319156
2017-11-28 08:42:46 +00:00
Nicolai Haehnle 39980dac0b AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate
Summary:
The PeepholeOptimizer pass calls this function solely based on checking
DefMI->isMoveImmediate(), which only checks the MoveImm bit of the
instruction description. So it's up to FoldImmediate itself to properly
check that DefMI *actually* moves from an immediate.

I don't have a separate test case for this, but the next patch introduces
a test case which happens to crash without this change.

This error is caught by the assertion in MachineOperand::getImm().

Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40342

llvm-svn: 319155
2017-11-28 08:41:50 +00:00
Dmitry Preobrazhensky 16608e67d3 [AMDGPU][MC][DISASSEMBLER][GFX9] Corrected decoding of GLOBAL/SCRATCH opcodes
See bug 35433: https://bugs.llvm.org/show_bug.cgi?id=35433

Differential Revision: https://reviews.llvm.org/D40493

Reviewers: artem.tamazov, SamWot, arsenm
llvm-svn: 319050
2017-11-27 17:14:35 +00:00
Vedran Miletic ad21f2687d [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics
AMDGPU backend errors with "unsupported call to function" upon
encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch
adds custom lowering to avoid that error on both R600 and SI.

Reviewers: arsenm, jvesely

Subscribers: tstellar

Differential Revision: https://reviews.llvm.org/D29942

llvm-svn: 319025
2017-11-27 13:26:38 +00:00
Dmitry Preobrazhensky 0e8924a5c7 [AMDGPU][MC][GFX9] Added v_interp_p2_f16 and v_interp_p2_legacy_f16
See bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629

Reviewers: artem.tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D39488

llvm-svn: 318955
2017-11-24 15:37:14 +00:00
Benjamin Kramer 51ebcaaf25 Make helpers static. NFC.
llvm-svn: 318953
2017-11-24 14:55:41 +00:00
Dmitry Preobrazhensky dd2f1c993e [AMDGPU][MC][GFX9] Added support of 'inst_offset' modifier for compatibility with SP3
See bug 35329: https://bugs.llvm.org//show_bug.cgi?id=35329

Reviewers: arsenm, vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D40350

llvm-svn: 318947
2017-11-24 13:22:38 +00:00
Yaxun Liu c596226604 [AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument
SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes
flat load instead of buffer load.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D40040

llvm-svn: 318844
2017-11-22 16:13:35 +00:00
Nicolai Haehnle dd059c161d AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer
Summary:
This bug seems to have gone unnoticed because critical cases with LDS
instructions are eliminated by the peephole optimizer.

However, equivalent situations arise with buffer loads and stores
as well, so this fixes regressions since r317751 ("AMDGPU: Merge
S_BUFFER_LOAD_DWORD_IMM into x2, x4").

Fixes at least:
KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
KHR-GL45.cull_distance.functional
piglit tes-input-gl_ClipDistance.shader_test
... and probably more

Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40303

llvm-svn: 318829
2017-11-22 12:25:21 +00:00
Dmitry Preobrazhensky a0342dc9eb [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

llvm-svn: 318675
2017-11-20 18:24:21 +00:00
Valery Pykhtin f2fe9725ea AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental)
Differential revision: https://reviews.llvm.org/D39897

llvm-svn: 318649
2017-11-20 14:35:53 +00:00
Matt Arsenault a41351e37c AMDGPU: Move hazard avoidance out of waitcnt pass.
This is mostly moving VMEM clause breaking into
the hazard recognizer. Also move another hazard
currently handled in the waitcnt pass.

Also stops breaking clauses unless xnack is enabled.

llvm-svn: 318557
2017-11-17 21:35:32 +00:00
Dmitry Preobrazhensky 682a654758 [AMDGPU][MC][GFX9][disassembler] Corrected decoding of op_sel_hi for v_mad_mix*
See bug 35148: https://bugs.llvm.org//show_bug.cgi?id=35148

Reviewers: tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D39492

llvm-svn: 318526
2017-11-17 15:15:40 +00:00
Matt Arsenault 4512d0a68b AMDGPU: Replace list of SMEM buffer opcodes
llvm-svn: 318506
2017-11-17 04:18:26 +00:00
Matt Arsenault 03c67d1eb2 AMDGPU: Fix breaking SMEM clauses
This was completely ignoring subregisters,
so was not very useful. Also only break them
if xnack is actually enabled.

llvm-svn: 318505
2017-11-17 04:18:24 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Eric Christopher 6348188e87 Fix thinko in last commit.
llvm-svn: 318374
2017-11-16 03:25:02 +00:00
Eric Christopher 3148a1be88 Add NDEBUG checks around LLVM_DUMP_METHOD functions for Wunused-function warnings.
llvm-svn: 318373
2017-11-16 03:18:15 +00:00
Daniel Sanders f76f315436 [globalisel][tablegen] Generate rule coverage and use it to identify untested rules
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.

This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.

Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler

Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
  step due to a lack of a portable 'cat' command. It should be the
  concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
  changes

Depends on D39742

Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka

Reviewed By: rovka

Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39747

llvm-svn: 318356
2017-11-16 00:46:35 +00:00
Daniel Sanders 725584e26d Add backend name to Target to enable runtime info to be fed back into TableGen
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.

Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.

Reviewers: qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D39742

llvm-svn: 318352
2017-11-15 23:55:44 +00:00
Matt Arsenault 301162c4fe AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Matt Arsenault 10c472dd83 AMDGPU: Add separate definitions for DS insts without m0 use
llvm-svn: 318246
2017-11-15 01:34:06 +00:00
Matt Arsenault 45b98189bd AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not
allow this if vaddr was not known to be positive.

llvm-svn: 318240
2017-11-15 00:45:43 +00:00
Matt Arsenault c8903125cd AMDGPU: Handle or in multi-use shl ptr combine
llvm-svn: 318223
2017-11-14 23:46:42 +00:00
Matt Arsenault 9ba465a972 AMDGPU: Error on stack size overflow
llvm-svn: 318189
2017-11-14 20:33:14 +00:00
Matt Arsenault 57c37b2dcd AMDGPU: Fix producing saveexec when the copy is spilled
If the register from the copy from exec was spilled,
the copy before the spill was deleted leaving a spill
of undefined register verifier error and miscompiling.
Check for other use instructions of the copy register.

llvm-svn: 318132
2017-11-14 02:16:54 +00:00
Matt Arsenault 4b7938c658 AMDGPU: Fix not converting d16 load/stores to offset
Fixes missed optimization with new MUBUF instructions.

llvm-svn: 318106
2017-11-13 23:24:26 +00:00
Matt Arsenault 4eea3f3da3 AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt
llvm-svn: 318100
2017-11-13 22:55:05 +00:00
Jan Vesely b17f32040c AMDGPU: Drop duplicate setOperationAction
These are set with other scalar int ops few lines up

Differential Revision: https://reviews.llvm.org/D39928

llvm-svn: 318051
2017-11-13 16:46:07 +00:00
Matt Arsenault e5e0c742df AMDGPU: Preserve nuw in shl add ptr combine
llvm-svn: 318017
2017-11-13 05:33:35 +00:00
Matt Arsenault fbe9533509 AMDGPU: Fix multi-use shl/add combine
This was using a custom function that didn't handle the
addressing modes properly for private. Use
isLegalAddressingMode to avoid duplicating this.

Additionally, skip the combine if there is only one use
since the standard combine will handle it.

llvm-svn: 318013
2017-11-13 05:11:54 +00:00
Matt Arsenault e1cd482fda AMDGPU: Select d16 loads into low component of register
llvm-svn: 318005
2017-11-13 00:22:09 +00:00
Konstantin Zhuravlyov 27b0a033d8 AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.td
Differential Revision: https://reviews.llvm.org/D39880

llvm-svn: 317920
2017-11-10 20:01:58 +00:00
Yaxun Liu 35845f06a4 [AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment
r600 uses dummy pointer info for lowering load/store. Since dummy pointer info
assumes address space 0, this causes isel failure when temporary load/store SDNodes
are generated for amdgiz environment.

Since the offest is not constant, FixedStack pseudo source value cannot be used
to create the pointer info. This patch creates pointer info using llvm undef value.
At least this provides correct address space so that isel can be done correctly.

Differential Revision: https://reviews.llvm.org/D39698

llvm-svn: 317862
2017-11-10 02:03:28 +00:00
Yaxun Liu 920cc2f813 [AMDGPU] Fix pointer info for pseudo source for r600
The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39670

llvm-svn: 317861
2017-11-10 01:53:24 +00:00
Vitaly Buka bee1964d80 Fix "default label in switch which covers all enumeration values" warning
llvm-svn: 317771
2017-11-09 07:46:13 +00:00
Marek Olsak 58410f37ff AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Summary:
Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spilled VGPRs: 94 -> 112 (19.15 %)
Scratch size: 524 -> 528 (0.76 %) dwords per thread
Code Size: 187400 -> 184992 (-1.28 %) bytes

One DiRT Showdown shader spills 6 more VGPRs.
One Grid Autosport shader spills 12 more VGPRs.

The other 54 shaders only have a decrease in code size.
(I'm ignoring the SGPR noise)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39012

llvm-svn: 317755
2017-11-09 01:52:55 +00:00
Marek Olsak 5cec64195c AMDGPU: Lower buffer store and atomic intrinsics manually
Summary:
Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39060

llvm-svn: 317754
2017-11-09 01:52:48 +00:00
Marek Olsak 4c421a2db2 AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Summary: Only 3 (out of 48486) shaders are affected.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38951

llvm-svn: 317753
2017-11-09 01:52:36 +00:00
Marek Olsak 6a0548acaa AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Summary:
-9.9% code size decrease in affected shaders.

Totals (changed stats only):
SGPRS: 2151462 -> 2170646 (0.89 %)
VGPRS: 1634612 -> 1640288 (0.35 %)
Spilled SGPRs: 8942 -> 8940 (-0.02 %)
Code Size: 52940672 -> 51727288 (-2.29 %) bytes
Max Waves: 373066 -> 371718 (-0.36 %)

Totals from affected shaders:
SGPRS: 283520 -> 302704 (6.77 %)
VGPRS: 227632 -> 233308 (2.49 %)
Spilled SGPRs: 3966 -> 3964 (-0.05 %)
Code Size: 12203080 -> 10989696 (-9.94 %) bytes
Max Waves: 44070 -> 42722 (-3.06 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38950

llvm-svn: 317752
2017-11-09 01:52:30 +00:00
Marek Olsak b953cc36e2 AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

 SGPRS: 2072198 -> 2151462 (3.83 %)
 VGPRS: 1628024 -> 1634612 (0.40 %)
 Spilled SGPRs: 7883 -> 8942 (13.43 %)
 Spilled VGPRs: 97 -> 101 (4.12 %)
 Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
 Code Size: 60222620 -> 52940672 (-12.09 %) bytes
 Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

 SGPR SPILLING APPS   Shaders SpillSGPR AvgPerSh
 alien_isolation         2938       100      0.0
 batman_arkham_origins    589         6      0.0
 bioshock-infinite       1769         4      0.0
 borderlands2            3968        22      0.0
 counter_strike_glob..   1142        60      0.1
 deus_ex_mankind_div..   1410        79      0.1
 dirt-showdown            533         4      0.0
 dirt_rally               364      1163      3.2
 divinity                1052         2      0.0
 dota2                   1747         7      0.0
 f1-2015                  776      1515      2.0
 grid_autosport          1767      1505      0.9
 hitman                  1413       273      0.2
 left_4_dead_2           1762         4      0.0
 life_is_strange         1296        26      0.0
 mad_max                  358        96      0.3
 metro_2033_redux        2670        60      0.0
 payday2                 1362        22      0.0
 portal                   474         3      0.0
 saints_row_iv           1704         8      0.0
 serious_sam_3_bfe        392      1348      3.4
 shadow_of_mordor        1418        12      0.0
 shadow_warrior          3956       239      0.1
 talos_principle          324      1735      5.4
 thea                     172        17      0.1
 tomb_raider             1449       215      0.1
 total_war_warhammer      242        56      0.2
 ue4_effects_cave         295        55      0.2
 ue4_elemental            572        12      0.0
 unigine_tropics          210        56      0.3
 unigine_valley           278       152      0.5
 victor_vran             1262        84      0.1
 yofrankie                 82         2      0.0

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38949

llvm-svn: 317751
2017-11-09 01:52:23 +00:00
Marek Olsak ffadcb744b AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Summary:
-5.3% code size in affected shaders.

Changed stats only:

48486 shaders in 30489 tests
Totals:
SGPRS: 2086406 -> 2072430 (-0.67 %)
VGPRS: 1626872 -> 1627960 (0.07 %)
Spilled SGPRs: 7865 -> 7912 (0.60 %)
Code Size: 60978060 -> 60188764 (-1.29 %) bytes
Max Waves: 374530 -> 374342 (-0.05 %)

Totals from affected shaders:
SGPRS: 299664 -> 285688 (-4.66 %)
VGPRS: 233844 -> 234932 (0.47 %)
Spilled SGPRs: 3959 -> 4006 (1.19 %)
Code Size: 14905272 -> 14115976 (-5.30 %) bytes
Max Waves: 46202 -> 46014 (-0.41 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38915

llvm-svn: 317750
2017-11-09 01:52:17 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Matt Arsenault 4709ab9124 AMDGPU: Set correct sched model on v_mad_u64_u32
llvm-svn: 317645
2017-11-08 00:48:25 +00:00
Matt Arsenault 6119f80034 AMDGPU: Remove redundant combine
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).

This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.

Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.

llvm-svn: 317526
2017-11-07 00:06:32 +00:00
Matt Arsenault 4f6318fe1b AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32
llvm-svn: 317492
2017-11-06 17:04:37 +00:00
Sanjay Patel 629c411538 [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic 
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc' 
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did 
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), 
but that's apparently already used for other purposes. Also, I don't think we can just 
add a field to FPMathOperator because Operator is not intended to be instantiated. 
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the 
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile 
them. For example, if nsw is ever replaced with something else, dropping it would be 
a valid way to upgrade the IR." 
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR 
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will 
fail to optimize some previously 'fast' code because it's no longer recognized as 
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow 
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488
2017-11-06 16:27:15 +00:00
Yaxun Liu cc56a8b108 [AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment
Differential Revision: https://reviews.llvm.org/D39657

llvm-svn: 317479
2017-11-06 14:32:33 +00:00
Yaxun Liu 1ac16619d2 [AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit
The backend assumes pointer in default addr space is 32 bit, which is not
true for the new addr space mapping and causes assertion for unresolved
functions.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39643

llvm-svn: 317476
2017-11-06 13:01:33 +00:00
Yaxun Liu 0d9673cff2 [AMDGPU] Remove hardcoded address space value from AMDGPULibFunc
AMDGPULibFunc hardcodes address space values of the old address space mapping,
which causes invalid addrspacecast instructions and undefined functions in
APPSDK sample MonteCarloAsianDP.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39616

llvm-svn: 317409
2017-11-04 17:37:43 +00:00
David Blaikie 1be62f0327 Move TargetFrameLowering.h to CodeGen where it's implemented
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.

This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.

llvm-svn: 317379
2017-11-03 22:32:11 +00:00
Konstantin Zhuravlyov 275a4f76c4 AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]
llvm-svn: 317280
2017-11-02 22:35:22 +00:00
Konstantin Zhuravlyov b695cd41b3 AMDGPU: Remove outdated fixme (it was already fixed)
llvm-svn: 317266
2017-11-02 20:48:06 +00:00
Konstantin Zhuravlyov 435151ad75 AMDGPU: Fix set but not used warnings related to AMDGPUAS
Differential Revision: https://reviews.llvm.org/D39499

llvm-svn: 317114
2017-11-01 19:12:38 +00:00
NAKAMURA Takumi 1657f2ad99 Fix warnings discovered by rL317076. [-Wunused-private-field]
llvm-svn: 317091
2017-11-01 13:47:55 +00:00
Benjamin Kramer f9ab3ddb8f [AMDGPU] Clean up symbols in the global namespace.
llvm-svn: 317051
2017-10-31 23:21:30 +00:00
Marek Olsak 5914ece6aa AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset
Summary:
Apps that benefit:
- alien isolation
- bioshock infinite
- civilization: beyond earth
- company of heroes 2
- dirt showdown
- dota 2
- F1 2015
- grid autosport
- hitman
- legend of grimrock
- serious sam 3: bfe
- shadow warrior
- talos principle
- total war: warhammer
- UE4 demos: effects cave, elemental, sun temple

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38914

llvm-svn: 317038
2017-10-31 21:06:42 +00:00
Yaxun Liu c928f2a6d4 [AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.

Differential Revision: https://reviews.llvm.org/D39255

llvm-svn: 316907
2017-10-30 14:30:28 +00:00
Tom Stellard d0c6cf2e8c AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38439

llvm-svn: 316815
2017-10-27 23:57:41 +00:00
Marek Olsak 2232243863 AMDGPU: Handle s_buffer_load_dword hazard on SI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39171

llvm-svn: 316666
2017-10-26 14:43:02 +00:00
Matt Arsenault 28f52e51f1 AMDGPU: Add max-mix-insts subtarget feature
llvm-svn: 316553
2017-10-25 07:00:51 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Marek Olsak 2114fc3bcb AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D38543

llvm-svn: 316426
2017-10-24 10:26:59 +00:00
Konstantin Zhuravlyov 339e74440a AMDGPU: Initialize WavefrontSize from TD files
Differential Revision: https://reviews.llvm.org/D39205

llvm-svn: 316389
2017-10-23 23:02:39 +00:00
Matt Arsenault a030e2688f AMDGPU: Cleanup local atomic node names
llvm-svn: 316349
2017-10-23 17:16:43 +00:00
Matt Arsenault b791802aef AMDGPU: Fix default range in non-kernel functions
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.

llvm-svn: 316346
2017-10-23 17:09:35 +00:00
Konstantin Zhuravlyov 8d5e9e110c AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency
Differential Revision: https://reviews.llvm.org/D38957

llvm-svn: 316097
2017-10-18 17:31:09 +00:00
NAKAMURA Takumi 6f43bd4bde Untabify.
llvm-svn: 316079
2017-10-18 13:31:28 +00:00
Wei Ding 7ab1f7a421 AMDGPU : Fix an error for the llvm.cttz implementation.
Differential Revision: http://reviews.llvm.org/D39014

llvm-svn: 316037
2017-10-17 21:49:52 +00:00
Eugene Zelenko 6cadde7f40 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316034
2017-10-17 21:27:42 +00:00
Konstantin Zhuravlyov 7dabe9ced7 AMDGPU: Start generating metadata for MaxFlatWorkGroupSize
Differential Revision: https://reviews.llvm.org/D38958

llvm-svn: 316024
2017-10-17 20:03:21 +00:00
Mark Searles 4e3d6160db Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats).
Differential Revision: https://reviews.llvm.org/D38466

llvm-svn: 315957
2017-10-16 23:38:53 +00:00
Krzysztof Parzyszek 72518eaa6f Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC
llvm-svn: 315927
2017-10-16 19:08:41 +00:00
Aaron Ballman 615eb47035 Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people.
Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1

llvm-svn: 315854
2017-10-15 14:32:27 +00:00
Vitaly Buka 7450398e01 Remove unused variables
llvm-svn: 315847
2017-10-15 05:35:02 +00:00
Konstantin Zhuravlyov 8c18f5b3d4 AMDGPU: Don't use TargetStreamer if it has not been initialized
Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl
test after r315808

We may hit few other similar issues, but I want to discuss good
solution offline.

llvm-svn: 315830
2017-10-14 22:16:26 +00:00
Konstantin Zhuravlyov a01d8b0b63 AMDGPU: Bring HSA metadata on par with the specification
Differential Revision: https://reviews.llvm.org/D38753

llvm-svn: 315821
2017-10-14 19:03:51 +00:00
Konstantin Zhuravlyov 219066bab8 AMDGPU: Improve note directive verification in assembler
- Do not allow amd_amdgpu_isa directives on non-amdgcn architectures
  - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes
  - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes

Differential Revision: https://reviews.llvm.org/D38750

llvm-svn: 315812
2017-10-14 16:15:28 +00:00
Konstantin Zhuravlyov eda425edd4 AMDGPU: Do not emit deprecated notes for code object v3
Differential Revision: https://reviews.llvm.org/D38749

llvm-svn: 315810
2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov 9c05b2bc3b AMDGPU: Add support for isa version note
- Emit NT_AMD_AMDGPU_ISA
  - Add assembler parsing for isa version directive
    - If isa version directive does not match command line arguments, then return error

Differential Revision: https://reviews.llvm.org/D38748

llvm-svn: 315808
2017-10-14 15:40:33 +00:00
Matt Arsenault e11d8aca77 AMDGPU: Implement hasBitPreservingFPLogic
llvm-svn: 315754
2017-10-13 21:10:22 +00:00
Matt Arsenault 550c66d10f AMDGPU: Look for src mods before fp_extend
When selecting modifiers for mad_mix instructions,
look at fneg/fabs that occur before the conversion.

llvm-svn: 315748
2017-10-13 20:45:49 +00:00
Matt Arsenault 4d70754e3c AMDGPU: Implement isFPExtFoldable
This helps match v_mad_mix* in some cases.

llvm-svn: 315744
2017-10-13 20:18:59 +00:00
Matthias Braun bb8507e63c Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.

This reverts commit r315633.

llvm-svn: 315637
2017-10-12 22:57:28 +00:00
Matthias Braun 3a9c114b24 TargetMachine: Merge TargetMachine and LLVMTargetMachine
Merge LLVMTargetMachine into TargetMachine.

- There is no in-tree target anymore that just implements TargetMachine
  but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
  case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
  interface.

Differential Revision: https://reviews.llvm.org/D38489

llvm-svn: 315633
2017-10-12 22:28:54 +00:00
Wei Ding 5676acad9e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

llvm-svn: 315610
2017-10-12 19:37:14 +00:00
Konstantin Zhuravlyov 70303c011f AMDGPU/NFC: Move AMDGPU specific note types to ELF.h
Differential Revision: https://reviews.llvm.org/D38747

llvm-svn: 315608
2017-10-12 18:59:54 +00:00
Konstantin Zhuravlyov 63e87f5a02 AMDGPU: Fix warnings introduced in r315526
llvm-svn: 315596
2017-10-12 17:34:05 +00:00
Tim Renouf c8ffffe462 [AMDGPU] For amdpal, widen interpolation mode workaround
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.

However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37758

llvm-svn: 315591
2017-10-12 16:16:41 +00:00
Don Hinton 3e0199f7eb [dump] Remove NDEBUG from test to enable dump methods [NFC]
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.

Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.

Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.

Differential Revision: https://reviews.llvm.org/D38406

llvm-svn: 315590
2017-10-12 16:16:06 +00:00
Reid Kleckner c18c12e385 Fix AMDGPU build issue
llvm-svn: 315535
2017-10-11 23:53:36 +00:00
Lang Hames 2241ffa43c [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315531
2017-10-11 23:34:47 +00:00
Konstantin Zhuravlyov 516651b154 AMDGPU/NFC: Minor clean ups in HSA metadata
- Use HSA metadata streamer directly from AMDGPUAsmPrinter
  - Make naming consistent with PAL metadata

Differential Revision: https://reviews.llvm.org/D38746

llvm-svn: 315526
2017-10-11 22:59:35 +00:00
Konstantin Zhuravlyov c3beb6a075 AMDGPU/NFC: Minor clean ups in PAL metadata
- Move PAL metadata definitions to AMDGPUMetadata
  - Make naming consistent with HSA metadata

Differential Revision: https://reviews.llvm.org/D38745

llvm-svn: 315523
2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov a63b0f9d20 AMDGPU/NFC: Rename code object metadata as HSA metadata
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
  - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
  - Introduce HSAMD namespace
  - Other minor name changes in function and test names

llvm-svn: 315522
2017-10-11 22:18:53 +00:00
Oliver Stannard 4191b9eaea [Asm] Add debug tracing in table-generated assembly matcher
This adds debug tracing to the table-generated assembly instruction matcher,
enabled by the -debug-only=asm-matcher option.

The changes in the target AsmParsers are to add an MCInstrInfo reference under
a consistent name, so that we can use it from table-generated code. This was
already being used this way for targets that use deprecation warnings, but 5
targets did not have it, and Hexagon had it under a different name to the other
backends.

llvm-svn: 315445
2017-10-11 09:17:43 +00:00
Lang Hames 02d330548d [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315410
2017-10-11 01:57:21 +00:00
Matt Arsenault f42074b699 AMDGPU: Fix missing skipFunction calls
llvm-svn: 315361
2017-10-10 20:48:36 +00:00
Matt Arsenault d674e0ac0d AMDGPU: Fix failure to select branch with optnone
opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass.
The heuristic in the custom selector for brcond deferred the
branch uniformity check to the pattern, which would fail.

llvm-svn: 315360
2017-10-10 20:34:49 +00:00
Matt Arsenault cc85223f87 AMDGPU: Fix incorrect selection of pseudo-branches
These should only be used if the machine structurizer is enabled.

llvm-svn: 315357
2017-10-10 20:22:07 +00:00
Yaxun Liu de4b88d9a1 [AMDGPU] Lower enqueued blocks and generate runtime metadata
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.

In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.

This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.

Differential Revision: https://reviews.llvm.org/D38610

llvm-svn: 315352
2017-10-10 19:39:48 +00:00
Lang Hames 60fbc7cc38 [MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter
functions.

This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.

llvm-svn: 315327
2017-10-10 16:28:07 +00:00
Nicolai Haehnle 312b64f4d7 AMDGPU: Split MUBUF offset into aligned components
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.

Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.

Fixes dEQP-GLES31.functional.ssbo.atomic.*

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37850

llvm-svn: 315302
2017-10-10 12:22:23 +00:00
NAKAMURA Takumi aba2b3d1f3 SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256.
llvm-svn: 315283
2017-10-10 08:30:53 +00:00
Lang Hames dcb312bdb9 [MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to
ELFObjectWriter's constructor.

Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.

llvm-svn: 315254
2017-10-09 23:53:15 +00:00
Stanislav Mekhanoshin de42c29a68 [AMDGPU] New 64 bit div/rem expansion
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.

Passes OpenCL conformance test_integer_ops quick_[u]long_math

Differential Revision: https://reviews.llvm.org/D38607

llvm-svn: 315081
2017-10-06 17:24:45 +00:00
Matt Arsenault 2d3f8f333d AMDGPU: Set v2i32 any_extend to expand
llvm-svn: 314993
2017-10-05 17:38:30 +00:00
Konstantin Zhuravlyov aa0835a7ab AMDGPU: Add and set AMDGPU-specific e_flags
Differential Revision: https://reviews.llvm.org/D38556

llvm-svn: 314987
2017-10-05 16:19:18 +00:00
Matt Arsenault f48e5c9ce5 AMDGPU: Add comment about clamps
llvm-svn: 314952
2017-10-05 00:13:20 +00:00
Matt Arsenault aafff87dda AMDGPU: Do not fold clamp instructions when sources are different
Patch by hakzsam (Samuel Pitoiset)

llvm-svn: 314951
2017-10-05 00:13:17 +00:00
Matt Arsenault 9ab1fa6803 AMDGPU: Fix not accounting for instruction size in bundles
These were counted as 0. Fixes branch limit exceeded errors
in some large programs.

llvm-svn: 314944
2017-10-04 22:59:12 +00:00
Konstantin Zhuravlyov 8684f7b4f9 AMDGPU: Correctly set EI_OSABI based on the os
Differential Revision: https://reviews.llvm.org/D38555

llvm-svn: 314943
2017-10-04 22:44:13 +00:00
Sanjay Patel 4c33d5213b [SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI
This is a follow-up to https://reviews.llvm.org/D38138. 

I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping 
any options by using default param values.

llvm-svn: 314930
2017-10-04 20:26:25 +00:00
Konstantin Zhuravlyov 22bc039c89 AMDGPU: Expand setcc for v2f32 and v4f32
llvm-svn: 314853
2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov 908fa90b51 AMDGPU: Expand setcc for v2i32 and v4i32
llvm-svn: 314852
2017-10-03 21:31:24 +00:00
Tim Renouf 72800f0436 [AMDGPU] implemented pal metadata
Summary:
For the amdpal OS type:

We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.

This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.

Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37753

llvm-svn: 314829
2017-10-03 19:03:52 +00:00
Alexander Timofeev 4651396584 [AMDGPU] Avoid predicated execution of the basic blocks containing scalar
instructions.

Differential revision: https://reviews.llvm.org/D38293

llvm-svn: 314828
2017-10-03 18:55:36 +00:00
Matt Arsenault 90c7593a75 AMDGPU: Remove global isGCN predicates
These are problematic because they apply to everything,
and can easily clobber whatever more specific predicate
you are trying to add to a function.

Currently instructions use SubtargetPredicate/PredicateControl
to apply this to patterns applied to an instruction definition,
but not to free standing Pats. Add a wrapper around Pat
so the special PredicateControls requirements can be appended
to the final predicate list like how Mips does it.

llvm-svn: 314742
2017-10-03 00:06:41 +00:00
Matt Arsenault c6baa85fc6 AMDGPU: Fix typos
llvm-svn: 314715
2017-10-02 20:31:18 +00:00
Stanislav Mekhanoshin 1d8cf2be89 [AMDGPU] Set fast-math flags on functions given the options
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.

This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.

Differential Revision: https://reviews.llvm.org/D38325

llvm-svn: 314568
2017-09-29 23:40:19 +00:00
Nicolai Haehnle ce4ddd06da AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC
The hardware will only forward EXEC_LO; the high 32 bits will be zero.

Additionally, inline constants do not work. At least,

   v_addc_u32_e64 v0, vcc, v0, v1, -1

which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.

The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine

   s_mov_b64 s[0:1], exec
   v_cndmask_b32_e64 v0, v1, v2, s[0:1]

into

   v_mov_b32 v0, v3

but it's not particularly high priority.

Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*

llvm-svn: 314522
2017-09-29 15:37:31 +00:00
Jonas Paulsson c9e363ac69 [SystemZ] implement shouldCoalesce()
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.

If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).

This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899
https://bugs.llvm.org/show_bug.cgi?id=34610

llvm-svn: 314516
2017-09-29 14:31:39 +00:00
Tim Renouf ef1ae8ffac [AMDGPU] calling conventions for AMDPAL OS type
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37752

llvm-svn: 314502
2017-09-29 09:51:22 +00:00
Tim Renouf 132291589f [AMDGPU] AMDPAL scratch buffer support
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.

With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.

Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.

The documentation for the AMDPAL ABI will be added in a later commit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye

Differential Revision: https://reviews.llvm.org/D37483

llvm-svn: 314501
2017-09-29 09:49:35 +00:00
Tim Renouf 9f7ead3334 [Triple] Add AMDPAL operating system type
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.

Currently it generates the same code as not specifying an OS at all.
That will change in future commits.

Patch from Tim Corringham.

Subscribers: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D37380

llvm-svn: 314500
2017-09-29 09:48:12 +00:00
Sanjay Patel 0f9b4773c1 [SimplifyCFG] add a struct to house optional folds (PR34603)
This was intended to be no-functional-change, but it's not - there's a test diff.

So I thought I should stop here and post it as-is to see if this looks like what was expected 
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Notes:
 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried 
    through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. 
    The parameter isn't passed down, so we pick up the default value from the function signature 
    after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 
    'SimplifyCFG' calls.

 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. 
    This would theoretically allow us to differentiate the transforms controlled by those params 
    independently.

 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. 
    I just stopped here to minimize the diffs.

 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that 
    could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG 
    set to true no matter where in the pipeline it's creating SimplifyCFG passes?

    // Create an early function pass manager to cleanup the output of the
    // frontend.
    EarlyFPM.addPass(SimplifyCFGPass());

    -->

    /// \brief Construct a pass with the default thresholds
    /// and switch optimizations.
    SimplifyCFGPass::SimplifyCFGPass()
       : BonusInstThreshold(UserBonusInstThreshold),
         LateSimplifyCFG(true) {}   <-- switches get converted to lookup tables and loops may not be in canonical form

    If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' 
    setting via recursion was masking this bug.

Differential Revision: https://reviews.llvm.org/D38138

llvm-svn: 314308
2017-09-27 14:54:16 +00:00
Matt Arsenault 1390af2dd2 AMDGPU: Add option to stress calls
This inverts the behavior of the AlwaysInline pass to mark
every function not already marked alwaysinline as noinline.

llvm-svn: 313865
2017-09-21 07:00:48 +00:00
Matt Arsenault fdcdd88d57 AMDGPU: Fix crash on immediate operand
We can have a v_mac with an immediate src0.
We can still fold if it's an inline immediate,
otherwise it already uses the constant bus.

llvm-svn: 313852
2017-09-21 00:45:59 +00:00
Matt Arsenault 8cbb4884a5 AMDGPU: Start selecting v_mad_mixhi_f16
llvm-svn: 313814
2017-09-20 21:01:24 +00:00
Matt Arsenault e135c4c6a6 AMDGPU: Add tied operands to v_mad_mix{lo|hi}_f16
These write to the low and high half of the destination
register and leave the other 16-bits unchanged. This is true
for most 16-bit instructions on gfx9, but we don't use that
now.

llvm-svn: 313812
2017-09-20 20:53:49 +00:00
Matt Arsenault 76935122cc AMDGPU: Start selecting v_mad_mixlo_f16
Also add some tests that should be able to use v_mad_mixhi_f16,
but do not yet. This is trickier because we don't really model
the partial update of the register done by 16-bit instructions.

llvm-svn: 313806
2017-09-20 20:28:39 +00:00
Matt Arsenault 644883ff07 AMDGPU: Fix encoding of op_sel for mad_mix* opcodes
llvm-svn: 313797
2017-09-20 19:09:28 +00:00
Stanislav Mekhanoshin 2e3bf37ec4 [AMDGPU] Fixed memory leak with inliner replaced
Delete inliner before replacing it.

llvm-svn: 313723
2017-09-20 06:34:28 +00:00
Matt Arsenault c8aea66627 AMDGPU: Move r600 only code into r600 only td file
llvm-svn: 313719
2017-09-20 06:11:25 +00:00
Stanislav Mekhanoshin 5641820141 [AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll
llvm-svn: 313718
2017-09-20 06:10:15 +00:00
Matt Arsenault b81495dccb AMDGPU: Match load d16 hi instructions
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.

We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.

llvm-svn: 313716
2017-09-20 05:01:53 +00:00
Stanislav Mekhanoshin 5670e6d482 [AMDGPU] Port of HSAIL inliner
Differential Revision: https://reviews.llvm.org/D36849

llvm-svn: 313714
2017-09-20 04:25:58 +00:00
Matt Arsenault bc68383166 AMDGPU: Cleanup load/store PatFrags
Try to use a consistent naming scheme.

llvm-svn: 313713
2017-09-20 03:43:35 +00:00
Matt Arsenault fcc213fab7 AMDGPU: Match store d16_hi instructions
llvm-svn: 313712
2017-09-20 03:20:09 +00:00
Stanislav Mekhanoshin d4ae470d2e [AMDGPU] Prevent post-RA scheduler from breaking memory clauses
The pre-RA scheduler does load/store clustering, but post-RA
scheduler undoes it. Add mutation to prevent it.

Differential Revision: https://reviews.llvm.org/D38014

llvm-svn: 313670
2017-09-19 20:54:38 +00:00
Matt Arsenault e745d9963e AMDGPU: Run internalize symbols at -O0
The relocations used for externally visible functions
aren't supported, so the direct call emitted ends
up hitting a linker error.

llvm-svn: 313616
2017-09-19 07:40:11 +00:00
Konstantin Zhuravlyov ca8946a376 AMDGPU: Start selecting s_xnor_{b32, b64}
Differential Revision: https://reviews.llvm.org/D37981

llvm-svn: 313565
2017-09-18 21:22:45 +00:00
Jan Sjodin 1f2f57a7ea Fix warnings in r313297.
llvm-svn: 313302
2017-09-14 21:49:52 +00:00
Matt Arsenault c317287fde AMDGPU: Fix violating constant bus restriction
You can't use madmk/madmk if it already uses an SGPR input.

llvm-svn: 313298
2017-09-14 20:54:29 +00:00
Jan Sjodin 312ccf761c Add AddresSpace to PseudoSourceValue.
Differential Revision: https://reviews.llvm.org/D35089

llvm-svn: 313297
2017-09-14 20:53:51 +00:00
Matt Arsenault 37ab4cf8b8 AMDGPU: Fix assert on alloca of array of struct
llvm-svn: 313282
2017-09-14 18:02:29 +00:00
Matt Arsenault defe371771 AMDGPU: Stop modifying SP in call sequences
Because the stack growth direction and addressing is done
in the same direction, modifying SP at the beginning of the
call sequence was incorrect. If we had a stack passed argument,
we would end up skipping that number of bytes before pushing
arguments, leaving unused/inconsistent space.

The callee creates fixed stack objects in its frame, so
the space necessary for these is already logically allocated
in the callee, so we just let the callee increment SP if
it really requires it.

llvm-svn: 313279
2017-09-14 17:37:40 +00:00
Matt Arsenault 6efd082c01 AMDGPU: Make frame register caller preserved
Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted after the true prolog where the FP
was clobbered.

I have a hacky solution which works that continues to use
split CSR, but for now this is simpler and will get to working
programs.

llvm-svn: 313274
2017-09-14 17:14:57 +00:00
Matt Arsenault ecb43ef1bc AMDGPU: Don't spill SP reg like a normal CSR
llvm-svn: 313217
2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin 7fe9a5d9b4 Allow target to decide when to cluster loads/stores in misched
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.

Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.

Differential Revision: https://reviews.llvm.org/D37698

llvm-svn: 313208
2017-09-13 22:20:47 +00:00
Matt Arsenault fb017ae155 AMDGPU: Handle coldcc in more places
Missed in r312936

llvm-svn: 313205
2017-09-13 21:55:52 +00:00
Matt Arsenault 537bd3b906 AMDGPU: Allow coldcc calls
llvm-svn: 312936
2017-09-11 18:54:20 +00:00
Stanislav Mekhanoshin 710da42b86 [AMDGPU] Produce madak and madmk from the two-address pass
These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.

Differential Revision: https://reviews.llvm.org/D37389

llvm-svn: 312928
2017-09-11 17:13:57 +00:00
Tim Renouf 660ba2b8af [AMDGPU] exp should not be in WQM mode
A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports
the exec mask as the valid mask to determine which pixels to render.

This commit marks any exp as needing to be in exact mode.

Actually, if there are multiple mrt exps, only one needs to have vm=1,
and only that one needs to be in exact mode. But that is an optimization
for another day.

Differential Revision: https://reviews.llvm.org/D36305

llvm-svn: 312915
2017-09-11 13:55:39 +00:00
Tim Renouf 6cb007fc72 AMDGPU: trivial comment change
... to check commit access for new committer.

llvm-svn: 312900
2017-09-11 08:31:32 +00:00
Davide Italiano 0731a4f52a [AMDGPU] Remove unused function. NFCI.
llvm-svn: 312836
2017-09-08 23:54:11 +00:00
Matt Arsenault 461ed08fbd AMDGPU: Start using !con operator
We have a lot of operand definition work essentially producing
every valid permutation of operands to workaround builiding
operand lists based on the instruction features. Apparently tablegen
already has a mostly undocumented operator to concat dags which
simplies this.

Convert one simple place to use this. The BUF instruction definitions
have much more complicated logic that can be totally rewritten now.

llvm-svn: 312822
2017-09-08 19:09:13 +00:00
Matt Arsenault 2f4df7ec41 AMDGPU: Recompute scc liveness
The various scalar bit operations set SCC,
so one is erased or moved it needs to be recomputed.
Not sure why the existing tests don't fail on this.

llvm-svn: 312819
2017-09-08 18:51:26 +00:00
Matt Arsenault d7e2303df2 AMDGPU: Start selecting v_mad_mix_f32
llvm-svn: 312732
2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov 5f5b586c99 AMDGPU: Handle non-temporal loads and stores
Differential Revision: https://reviews.llvm.org/D36862

llvm-svn: 312729
2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov c8c9d4a0a6 AMDGPU: Handle more than one memory operand in SIMemoryLegalizer
Differential Revision: https://reviews.llvm.org/D37397

llvm-svn: 312725
2017-09-07 16:14:21 +00:00
Matt Arsenault 65ca292a8d AMDGPU: Don't legalize i16 extloads to i32 with legal i16
Keeping non-i16 extloads makes it easier to match some new
gfx9 load instructions.

llvm-svn: 312699
2017-09-07 05:37:34 +00:00
Stanislav Mekhanoshin 442e28dd42 [AMDGPU] Use v_pk_max_f16 for fcanonicalize
Differential Revision: https://reviews.llvm.org/D37325

llvm-svn: 312676
2017-09-06 22:27:29 +00:00
Stanislav Mekhanoshin ea134bcb13 [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
Differential Revision: https://reviews.llvm.org/D37522

llvm-svn: 312660
2017-09-06 18:29:51 +00:00
Stanislav Mekhanoshin 949fac9e40 [AMDGPU] Fix shouldClusterMemOps to process flat loads
Flat loads do not have vdata operand but have vdst instead.

Differential Revision: https://reviews.llvm.org/D37502

llvm-svn: 312640
2017-09-06 15:31:30 +00:00
Nicolai Haehnle 523827145b AMDGPU: Make worst-case assumption about the wait states in inline assembly
Summary:
Mesa still uses a hack where empty inline assembly is used as a kind of
optimization barrier. This exposed a problem where not enough wait states
were inserted, because the hazard recognizer implicitly assumed that each
inline assembly "instruction" has at least one wait state.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37205

llvm-svn: 312635
2017-09-06 13:50:13 +00:00
Yaxun Liu fc5121a722 [AMDGPU] Transform __read_pipe_* and __write_pipe_*
When packet size equals packet align and is power of 2, transform
__read_pipe* and __write_pipe* to specialized library function.

Differential Revision: https://reviews.llvm.org/D36831

llvm-svn: 312598
2017-09-06 00:30:27 +00:00
Konstantin Zhuravlyov 80528702c9 AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]:
- Refactor SIMemOpInfo's constructors
  - Allow construction of NotAtomic SIMemOpInfo

Differential Revision: https://reviews.llvm.org/D37396

llvm-svn: 312563
2017-09-05 19:01:10 +00:00
Matt Arsenault 22cdb61a78 AMDGPU: Fix not accounting for tail call resource usage
If the only call in a function is a tail call, the
function isn't considered to have a call since it's a
type of return.

llvm-svn: 312561
2017-09-05 18:36:36 +00:00
Konstantin Zhuravlyov 1aa667fe64 AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [2]:
- Make SIMemOpInfo a class
  - Add accessor methods to SIMemOpInfo
  - Move get*Info methods to SIMemOpInfo

Differential Revision: https://reviews.llvm.org/D37395

llvm-svn: 312541
2017-09-05 16:41:25 +00:00
Konstantin Zhuravlyov 844845ae06 AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [1]:
- Rename MemOpInfo -> SIMemOpInfo
  - Move SIMemOpInfo class out of SIMemoryLegalizer class

Differential Revision: https://reviews.llvm.org/D37394

llvm-svn: 312540
2017-09-05 16:18:05 +00:00
Stanislav Mekhanoshin dbfda5b601 [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()
Differential Revision: https://reviews.llvm.org/D37392

llvm-svn: 312364
2017-09-01 20:43:20 +00:00
Matt Arsenault efa1d655d4 AMDGPU: Add ds_{read|write}_addtid_b32 definitions
llvm-svn: 312349
2017-09-01 18:38:02 +00:00