Commit Graph

2315 Commits

Author SHA1 Message Date
Matt Arsenault ecad0d5364 AMDGPU: Preserve MMO in adjustWritemask
Follow up to r319705. Currently the MMO is
produced after this in the custom inserter,
so this doesn't change anything yet.

llvm-svn: 320186
2017-12-08 20:00:45 +00:00
Konstantin Zhuravlyov e30f88f3a9 AMDGPU: Report Arg's Value name in metadata if kernel_arg_name metadata is not available
Differential Revision: https://reviews.llvm.org/D40924

llvm-svn: 320176
2017-12-08 19:22:12 +00:00
Tim Renouf cead41d42f [AMDGPU] add labels to +DumpCode output
Summary:
+DumpCode is a hack to embed disassembly in the ELF file. This commit
fixes it to include labels, to make it slightly more useful.

Reviewers: arsenm, kzhuravl

Subscribers: nhaehnle, timcorringham, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D40169

llvm-svn: 320146
2017-12-08 14:09:34 +00:00
Mark Searles 9ebdbb433a [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/15733/steps/build_Lld/logs/stdio :
lib/Target/AMDGPU/SIInsertWaitcnts.cpp:396:11: error: private field 'InstCnt' is not used [-Werror,-Wunused-private-field]
  int32_t InstCnt = 0;
          ^
1 error generated.
"
This reverts commit 71627f79010aafe74fdcba901bba28dd7caa0869.

llvm-svn: 320086
2017-12-07 21:14:41 +00:00
Mark Searles a84d23489a [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 320084
2017-12-07 20:36:39 +00:00
Mark Searles d29f24acfb [AMDGPU] Add GCNHazardRecognizer::checkInlineAsmHazards() and GCNHazardRecognizer::checkVALUHazardsHelper(). checkInlineAsmHazards() checks INLINEASM for hazards that we particularly care about (so not exhaustive); this patch adds a check for INLINEASM that defs vregs that hold data-to-be stored by immediately preceding store of more than 8 bytes. If the instr were not within an INLINEASM, this scenario would be handled by checkVALUHazard(). Add checkVALUHazardsHelper(), which will be called by both checkVALUHazards() and checkInlineAsmHazards().
Differential Revision: https://reviews.llvm.org/D40098

llvm-svn: 320083
2017-12-07 20:34:25 +00:00
Francis Visoiu Mistrih a8a83d150f [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Matt Arsenault 8ae38bc0bd AMDGPU: Fix SDWA crash on inline asm
This was only searching for explicit defs,
and asserting for any implicit or variadic
instruction defs, like inline asm.

llvm-svn: 319826
2017-12-05 20:32:01 +00:00
Matt Arsenault 7f0a527300 AMDGPU: Fix infinite loop with dbg_value
Surprisingly SIOptimizeExecMaskingPreRA can infinite loop
in some case with DBG_VALUE. Most tests using dbg_value are
run at -O0, so don't run this pass. This seems to only
happen when the value argument is undef.

llvm-svn: 319808
2017-12-05 18:23:17 +00:00
Matt Arsenault e42b08d96d AMDGPU: Fix missing subtarget feature initializer
llvm-svn: 319733
2017-12-05 03:15:44 +00:00
Matt Arsenault 9a60c3ea36 AMDGPU: Fix crash when scheduling DBG_VALUE
This calls handleMove with a DBG_VALUE instruction,
which isn't tracked by LiveIntervals. I'm not sure
this is the correct place to fix this. The generic
scheduler seems to have more deliberate region
selection that skips dbg_value.

The test is also really hard to reduce. I haven't been able
to figure out what exactly causes this particular case to
try moving the dbg_value.

llvm-svn: 319732
2017-12-05 03:09:23 +00:00
Jan Vesely 39aeab4f30 AMDGPU/EG: Add a new FeatureFMA and use it to selectively enable FMA instruction
Only used by pre-GCN targets
v2: fix predicate setting for FMA_Common

Differential Revision: https://reviews.llvm.org/D40692

llvm-svn: 319712
2017-12-04 23:07:28 +00:00
Jan Vesely d1c9b61e2b AMDGPU: Disable fp64 support on pre GCN asics
It's not implemented.
Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it

v2: fix hasFP64 query

Differential Revision: https://reviews.llvm.org/D39931

llvm-svn: 319709
2017-12-04 22:57:29 +00:00
Matt Arsenault 68f0505263 AMDGPU: Fix creating invalid copy when adjusting dmask
Move the entire optimization to one place. Before it was possible
to adjust dmask without changing the register class of the output
instruction, since they were done in separate places. Fix all
lane sizes and move all of the optimization into the DAG folding.

llvm-svn: 319705
2017-12-04 22:18:27 +00:00
Matt Arsenault e6667ded4d AMDGPU: Use return value of MorphNodeTo
llvm-svn: 319704
2017-12-04 22:18:22 +00:00
Francis Visoiu Mistrih 25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Sam Kolton 5f7f32c382 [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.
Summary:

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D37817

llvm-svn: 319662
2017-12-04 16:22:32 +00:00
Tim Corringham 6c6d5e24cd AMDGPU: fix missing s_waitcnt
Summary:
The pass that inserts s_waitcnt instructions where needed propagated
info used to track dependencies for each block by iterating over the
predecessor blocks. The iteration was terminated when a predecessor
that had not yet been processed was encountered. Any info in blocks
later in the list was therefore not processed, leading to the
possiblility of a required s_waitcnt not being inserted.

The fix is simply to change the "break" to "continue" for the
relevant loops, so that all visited blocks are processed. This
is likely what was intended when the code was written.

There is no test case provided for this fix because:
1) the only example that reproduces this is large and resistant to
being reduced
2) the change is trivial

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D40544

llvm-svn: 319651
2017-12-04 12:30:49 +00:00
Alexander Timofeev c1425c9d6b [AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI
Differential revision: https://reviews.llvm.org/D40556

llvm-svn: 319534
2017-12-01 11:56:34 +00:00
Matt Arsenault 686d5c728f AMDGPU: Use carry-less adds in FI elimination
llvm-svn: 319501
2017-11-30 23:42:30 +00:00
Matt Arsenault 84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Francis Visoiu Mistrih 93ef145862 [CodeGen] Print "%vreg0" as "%0" in both MIR and debug output
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).

Basically:

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed

Differential Revision: https://reviews.llvm.org/D40420

llvm-svn: 319427
2017-11-30 12:12:19 +00:00
Matt Arsenault caf0ed4d74 AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors
used for private access, so it should be OK to use vaddr with
a potentially negative value.

llvm-svn: 319393
2017-11-30 00:52:40 +00:00
Dmitry Preobrazhensky 1ac7177abb [AMDGPU][MC][GFX9] Corrected mapping of GFX9 v_add/sub/subrev_u32
When translating pseudo to MC, v_add/sub/subrev_u32 shall be mapped via a separate table as GFX8 has opcodes with the same names.
These instructions shall also be labelled as renamed for pseudoToMCOpcode to handle them correctly.

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D40550

llvm-svn: 319311
2017-11-29 13:33:40 +00:00
Matt Arsenault b655fa9ce2 DAG: Add nuw when splitting loads and stores
The object can't straddle the address space
wrap around, so I think it's OK to assume any
offsets added to the base object pointer can't
overflow. Similar logic already appears to be
applied in SelectionDAGBuilder when lowering
aggregate returns.

llvm-svn: 319272
2017-11-29 01:25:12 +00:00
Matt Arsenault 3f71c0e3ee AMDGPU: Select DS insts without m0 initialization
GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.

llvm-svn: 319270
2017-11-29 00:55:57 +00:00
Matt Arsenault 607a756651 AMDGPU: Enable IPRA
llvm-svn: 319256
2017-11-28 23:40:12 +00:00
Konstantin Zhuravlyov 06ae4ec78e AMDGPU: Add num spilled s/vgprs to metadata
This was requested by tools.

Differential Revision: https://reviews.llvm.org/D40321

llvm-svn: 319192
2017-11-28 17:51:08 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Francis Visoiu Mistrih 9d419d3b0c [CodeGen] Rename functions PrintReg* to printReg*
LLVM Coding Standards:
  Function names should be verb phrases (as they represent actions), and
  command-like function should be imperative. The name should be camel
  case, and start with a lower case letter (e.g. openFile() or isFoo()).

Differential Revision: https://reviews.llvm.org/D40416

llvm-svn: 319168
2017-11-28 12:42:37 +00:00
Nicolai Haehnle b4f28deda0 AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer
Summary:
The entire algorithm operates per basic-block, so for cache locality
it should be better to re-optimize a basic-block immediately rather than
in a separate loop.

I don't have performance measurements.

Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40344

llvm-svn: 319156
2017-11-28 08:42:46 +00:00
Nicolai Haehnle 39980dac0b AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate
Summary:
The PeepholeOptimizer pass calls this function solely based on checking
DefMI->isMoveImmediate(), which only checks the MoveImm bit of the
instruction description. So it's up to FoldImmediate itself to properly
check that DefMI *actually* moves from an immediate.

I don't have a separate test case for this, but the next patch introduces
a test case which happens to crash without this change.

This error is caught by the assertion in MachineOperand::getImm().

Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40342

llvm-svn: 319155
2017-11-28 08:41:50 +00:00
Dmitry Preobrazhensky 16608e67d3 [AMDGPU][MC][DISASSEMBLER][GFX9] Corrected decoding of GLOBAL/SCRATCH opcodes
See bug 35433: https://bugs.llvm.org/show_bug.cgi?id=35433

Differential Revision: https://reviews.llvm.org/D40493

Reviewers: artem.tamazov, SamWot, arsenm
llvm-svn: 319050
2017-11-27 17:14:35 +00:00
Vedran Miletic ad21f2687d [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics
AMDGPU backend errors with "unsupported call to function" upon
encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch
adds custom lowering to avoid that error on both R600 and SI.

Reviewers: arsenm, jvesely

Subscribers: tstellar

Differential Revision: https://reviews.llvm.org/D29942

llvm-svn: 319025
2017-11-27 13:26:38 +00:00
Dmitry Preobrazhensky 0e8924a5c7 [AMDGPU][MC][GFX9] Added v_interp_p2_f16 and v_interp_p2_legacy_f16
See bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629

Reviewers: artem.tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D39488

llvm-svn: 318955
2017-11-24 15:37:14 +00:00
Benjamin Kramer 51ebcaaf25 Make helpers static. NFC.
llvm-svn: 318953
2017-11-24 14:55:41 +00:00
Dmitry Preobrazhensky dd2f1c993e [AMDGPU][MC][GFX9] Added support of 'inst_offset' modifier for compatibility with SP3
See bug 35329: https://bugs.llvm.org//show_bug.cgi?id=35329

Reviewers: arsenm, vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D40350

llvm-svn: 318947
2017-11-24 13:22:38 +00:00
Yaxun Liu c596226604 [AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument
SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes
flat load instead of buffer load.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D40040

llvm-svn: 318844
2017-11-22 16:13:35 +00:00
Nicolai Haehnle dd059c161d AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer
Summary:
This bug seems to have gone unnoticed because critical cases with LDS
instructions are eliminated by the peephole optimizer.

However, equivalent situations arise with buffer loads and stores
as well, so this fixes regressions since r317751 ("AMDGPU: Merge
S_BUFFER_LOAD_DWORD_IMM into x2, x4").

Fixes at least:
KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
KHR-GL45.cull_distance.functional
piglit tes-input-gl_ClipDistance.shader_test
... and probably more

Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40303

llvm-svn: 318829
2017-11-22 12:25:21 +00:00
Dmitry Preobrazhensky a0342dc9eb [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

llvm-svn: 318675
2017-11-20 18:24:21 +00:00
Valery Pykhtin f2fe9725ea AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental)
Differential revision: https://reviews.llvm.org/D39897

llvm-svn: 318649
2017-11-20 14:35:53 +00:00
Matt Arsenault a41351e37c AMDGPU: Move hazard avoidance out of waitcnt pass.
This is mostly moving VMEM clause breaking into
the hazard recognizer. Also move another hazard
currently handled in the waitcnt pass.

Also stops breaking clauses unless xnack is enabled.

llvm-svn: 318557
2017-11-17 21:35:32 +00:00
Dmitry Preobrazhensky 682a654758 [AMDGPU][MC][GFX9][disassembler] Corrected decoding of op_sel_hi for v_mad_mix*
See bug 35148: https://bugs.llvm.org//show_bug.cgi?id=35148

Reviewers: tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D39492

llvm-svn: 318526
2017-11-17 15:15:40 +00:00
Matt Arsenault 4512d0a68b AMDGPU: Replace list of SMEM buffer opcodes
llvm-svn: 318506
2017-11-17 04:18:26 +00:00
Matt Arsenault 03c67d1eb2 AMDGPU: Fix breaking SMEM clauses
This was completely ignoring subregisters,
so was not very useful. Also only break them
if xnack is actually enabled.

llvm-svn: 318505
2017-11-17 04:18:24 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Eric Christopher 6348188e87 Fix thinko in last commit.
llvm-svn: 318374
2017-11-16 03:25:02 +00:00
Eric Christopher 3148a1be88 Add NDEBUG checks around LLVM_DUMP_METHOD functions for Wunused-function warnings.
llvm-svn: 318373
2017-11-16 03:18:15 +00:00
Daniel Sanders f76f315436 [globalisel][tablegen] Generate rule coverage and use it to identify untested rules
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.

This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.

Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler

Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
  step due to a lack of a portable 'cat' command. It should be the
  concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
  changes

Depends on D39742

Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka

Reviewed By: rovka

Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39747

llvm-svn: 318356
2017-11-16 00:46:35 +00:00
Daniel Sanders 725584e26d Add backend name to Target to enable runtime info to be fed back into TableGen
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.

Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.

Reviewers: qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D39742

llvm-svn: 318352
2017-11-15 23:55:44 +00:00
Matt Arsenault 301162c4fe AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Matt Arsenault 10c472dd83 AMDGPU: Add separate definitions for DS insts without m0 use
llvm-svn: 318246
2017-11-15 01:34:06 +00:00
Matt Arsenault 45b98189bd AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not
allow this if vaddr was not known to be positive.

llvm-svn: 318240
2017-11-15 00:45:43 +00:00
Matt Arsenault c8903125cd AMDGPU: Handle or in multi-use shl ptr combine
llvm-svn: 318223
2017-11-14 23:46:42 +00:00
Matt Arsenault 9ba465a972 AMDGPU: Error on stack size overflow
llvm-svn: 318189
2017-11-14 20:33:14 +00:00
Matt Arsenault 57c37b2dcd AMDGPU: Fix producing saveexec when the copy is spilled
If the register from the copy from exec was spilled,
the copy before the spill was deleted leaving a spill
of undefined register verifier error and miscompiling.
Check for other use instructions of the copy register.

llvm-svn: 318132
2017-11-14 02:16:54 +00:00
Matt Arsenault 4b7938c658 AMDGPU: Fix not converting d16 load/stores to offset
Fixes missed optimization with new MUBUF instructions.

llvm-svn: 318106
2017-11-13 23:24:26 +00:00
Matt Arsenault 4eea3f3da3 AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt
llvm-svn: 318100
2017-11-13 22:55:05 +00:00
Jan Vesely b17f32040c AMDGPU: Drop duplicate setOperationAction
These are set with other scalar int ops few lines up

Differential Revision: https://reviews.llvm.org/D39928

llvm-svn: 318051
2017-11-13 16:46:07 +00:00
Matt Arsenault e5e0c742df AMDGPU: Preserve nuw in shl add ptr combine
llvm-svn: 318017
2017-11-13 05:33:35 +00:00
Matt Arsenault fbe9533509 AMDGPU: Fix multi-use shl/add combine
This was using a custom function that didn't handle the
addressing modes properly for private. Use
isLegalAddressingMode to avoid duplicating this.

Additionally, skip the combine if there is only one use
since the standard combine will handle it.

llvm-svn: 318013
2017-11-13 05:11:54 +00:00
Matt Arsenault e1cd482fda AMDGPU: Select d16 loads into low component of register
llvm-svn: 318005
2017-11-13 00:22:09 +00:00
Konstantin Zhuravlyov 27b0a033d8 AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.td
Differential Revision: https://reviews.llvm.org/D39880

llvm-svn: 317920
2017-11-10 20:01:58 +00:00
Yaxun Liu 35845f06a4 [AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment
r600 uses dummy pointer info for lowering load/store. Since dummy pointer info
assumes address space 0, this causes isel failure when temporary load/store SDNodes
are generated for amdgiz environment.

Since the offest is not constant, FixedStack pseudo source value cannot be used
to create the pointer info. This patch creates pointer info using llvm undef value.
At least this provides correct address space so that isel can be done correctly.

Differential Revision: https://reviews.llvm.org/D39698

llvm-svn: 317862
2017-11-10 02:03:28 +00:00
Yaxun Liu 920cc2f813 [AMDGPU] Fix pointer info for pseudo source for r600
The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39670

llvm-svn: 317861
2017-11-10 01:53:24 +00:00
Vitaly Buka bee1964d80 Fix "default label in switch which covers all enumeration values" warning
llvm-svn: 317771
2017-11-09 07:46:13 +00:00
Marek Olsak 58410f37ff AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Summary:
Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spilled VGPRs: 94 -> 112 (19.15 %)
Scratch size: 524 -> 528 (0.76 %) dwords per thread
Code Size: 187400 -> 184992 (-1.28 %) bytes

One DiRT Showdown shader spills 6 more VGPRs.
One Grid Autosport shader spills 12 more VGPRs.

The other 54 shaders only have a decrease in code size.
(I'm ignoring the SGPR noise)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39012

llvm-svn: 317755
2017-11-09 01:52:55 +00:00
Marek Olsak 5cec64195c AMDGPU: Lower buffer store and atomic intrinsics manually
Summary:
Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39060

llvm-svn: 317754
2017-11-09 01:52:48 +00:00
Marek Olsak 4c421a2db2 AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Summary: Only 3 (out of 48486) shaders are affected.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38951

llvm-svn: 317753
2017-11-09 01:52:36 +00:00
Marek Olsak 6a0548acaa AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Summary:
-9.9% code size decrease in affected shaders.

Totals (changed stats only):
SGPRS: 2151462 -> 2170646 (0.89 %)
VGPRS: 1634612 -> 1640288 (0.35 %)
Spilled SGPRs: 8942 -> 8940 (-0.02 %)
Code Size: 52940672 -> 51727288 (-2.29 %) bytes
Max Waves: 373066 -> 371718 (-0.36 %)

Totals from affected shaders:
SGPRS: 283520 -> 302704 (6.77 %)
VGPRS: 227632 -> 233308 (2.49 %)
Spilled SGPRs: 3966 -> 3964 (-0.05 %)
Code Size: 12203080 -> 10989696 (-9.94 %) bytes
Max Waves: 44070 -> 42722 (-3.06 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38950

llvm-svn: 317752
2017-11-09 01:52:30 +00:00
Marek Olsak b953cc36e2 AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

 SGPRS: 2072198 -> 2151462 (3.83 %)
 VGPRS: 1628024 -> 1634612 (0.40 %)
 Spilled SGPRs: 7883 -> 8942 (13.43 %)
 Spilled VGPRs: 97 -> 101 (4.12 %)
 Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
 Code Size: 60222620 -> 52940672 (-12.09 %) bytes
 Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

 SGPR SPILLING APPS   Shaders SpillSGPR AvgPerSh
 alien_isolation         2938       100      0.0
 batman_arkham_origins    589         6      0.0
 bioshock-infinite       1769         4      0.0
 borderlands2            3968        22      0.0
 counter_strike_glob..   1142        60      0.1
 deus_ex_mankind_div..   1410        79      0.1
 dirt-showdown            533         4      0.0
 dirt_rally               364      1163      3.2
 divinity                1052         2      0.0
 dota2                   1747         7      0.0
 f1-2015                  776      1515      2.0
 grid_autosport          1767      1505      0.9
 hitman                  1413       273      0.2
 left_4_dead_2           1762         4      0.0
 life_is_strange         1296        26      0.0
 mad_max                  358        96      0.3
 metro_2033_redux        2670        60      0.0
 payday2                 1362        22      0.0
 portal                   474         3      0.0
 saints_row_iv           1704         8      0.0
 serious_sam_3_bfe        392      1348      3.4
 shadow_of_mordor        1418        12      0.0
 shadow_warrior          3956       239      0.1
 talos_principle          324      1735      5.4
 thea                     172        17      0.1
 tomb_raider             1449       215      0.1
 total_war_warhammer      242        56      0.2
 ue4_effects_cave         295        55      0.2
 ue4_elemental            572        12      0.0
 unigine_tropics          210        56      0.3
 unigine_valley           278       152      0.5
 victor_vran             1262        84      0.1
 yofrankie                 82         2      0.0

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38949

llvm-svn: 317751
2017-11-09 01:52:23 +00:00
Marek Olsak ffadcb744b AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Summary:
-5.3% code size in affected shaders.

Changed stats only:

48486 shaders in 30489 tests
Totals:
SGPRS: 2086406 -> 2072430 (-0.67 %)
VGPRS: 1626872 -> 1627960 (0.07 %)
Spilled SGPRs: 7865 -> 7912 (0.60 %)
Code Size: 60978060 -> 60188764 (-1.29 %) bytes
Max Waves: 374530 -> 374342 (-0.05 %)

Totals from affected shaders:
SGPRS: 299664 -> 285688 (-4.66 %)
VGPRS: 233844 -> 234932 (0.47 %)
Spilled SGPRs: 3959 -> 4006 (1.19 %)
Code Size: 14905272 -> 14115976 (-5.30 %) bytes
Max Waves: 46202 -> 46014 (-0.41 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38915

llvm-svn: 317750
2017-11-09 01:52:17 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Matt Arsenault 4709ab9124 AMDGPU: Set correct sched model on v_mad_u64_u32
llvm-svn: 317645
2017-11-08 00:48:25 +00:00
Matt Arsenault 6119f80034 AMDGPU: Remove redundant combine
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).

This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.

Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.

llvm-svn: 317526
2017-11-07 00:06:32 +00:00
Matt Arsenault 4f6318fe1b AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32
llvm-svn: 317492
2017-11-06 17:04:37 +00:00
Sanjay Patel 629c411538 [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic 
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc' 
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did 
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), 
but that's apparently already used for other purposes. Also, I don't think we can just 
add a field to FPMathOperator because Operator is not intended to be instantiated. 
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the 
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile 
them. For example, if nsw is ever replaced with something else, dropping it would be 
a valid way to upgrade the IR." 
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR 
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will 
fail to optimize some previously 'fast' code because it's no longer recognized as 
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow 
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488
2017-11-06 16:27:15 +00:00
Yaxun Liu cc56a8b108 [AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment
Differential Revision: https://reviews.llvm.org/D39657

llvm-svn: 317479
2017-11-06 14:32:33 +00:00
Yaxun Liu 1ac16619d2 [AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit
The backend assumes pointer in default addr space is 32 bit, which is not
true for the new addr space mapping and causes assertion for unresolved
functions.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39643

llvm-svn: 317476
2017-11-06 13:01:33 +00:00
Yaxun Liu 0d9673cff2 [AMDGPU] Remove hardcoded address space value from AMDGPULibFunc
AMDGPULibFunc hardcodes address space values of the old address space mapping,
which causes invalid addrspacecast instructions and undefined functions in
APPSDK sample MonteCarloAsianDP.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39616

llvm-svn: 317409
2017-11-04 17:37:43 +00:00
David Blaikie 1be62f0327 Move TargetFrameLowering.h to CodeGen where it's implemented
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.

This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.

llvm-svn: 317379
2017-11-03 22:32:11 +00:00
Konstantin Zhuravlyov 275a4f76c4 AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]
llvm-svn: 317280
2017-11-02 22:35:22 +00:00
Konstantin Zhuravlyov b695cd41b3 AMDGPU: Remove outdated fixme (it was already fixed)
llvm-svn: 317266
2017-11-02 20:48:06 +00:00
Konstantin Zhuravlyov 435151ad75 AMDGPU: Fix set but not used warnings related to AMDGPUAS
Differential Revision: https://reviews.llvm.org/D39499

llvm-svn: 317114
2017-11-01 19:12:38 +00:00
NAKAMURA Takumi 1657f2ad99 Fix warnings discovered by rL317076. [-Wunused-private-field]
llvm-svn: 317091
2017-11-01 13:47:55 +00:00
Benjamin Kramer f9ab3ddb8f [AMDGPU] Clean up symbols in the global namespace.
llvm-svn: 317051
2017-10-31 23:21:30 +00:00
Marek Olsak 5914ece6aa AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset
Summary:
Apps that benefit:
- alien isolation
- bioshock infinite
- civilization: beyond earth
- company of heroes 2
- dirt showdown
- dota 2
- F1 2015
- grid autosport
- hitman
- legend of grimrock
- serious sam 3: bfe
- shadow warrior
- talos principle
- total war: warhammer
- UE4 demos: effects cave, elemental, sun temple

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38914

llvm-svn: 317038
2017-10-31 21:06:42 +00:00
Yaxun Liu c928f2a6d4 [AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.

Differential Revision: https://reviews.llvm.org/D39255

llvm-svn: 316907
2017-10-30 14:30:28 +00:00
Tom Stellard d0c6cf2e8c AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38439

llvm-svn: 316815
2017-10-27 23:57:41 +00:00
Marek Olsak 2232243863 AMDGPU: Handle s_buffer_load_dword hazard on SI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39171

llvm-svn: 316666
2017-10-26 14:43:02 +00:00
Matt Arsenault 28f52e51f1 AMDGPU: Add max-mix-insts subtarget feature
llvm-svn: 316553
2017-10-25 07:00:51 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Marek Olsak 2114fc3bcb AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D38543

llvm-svn: 316426
2017-10-24 10:26:59 +00:00
Konstantin Zhuravlyov 339e74440a AMDGPU: Initialize WavefrontSize from TD files
Differential Revision: https://reviews.llvm.org/D39205

llvm-svn: 316389
2017-10-23 23:02:39 +00:00
Matt Arsenault a030e2688f AMDGPU: Cleanup local atomic node names
llvm-svn: 316349
2017-10-23 17:16:43 +00:00
Matt Arsenault b791802aef AMDGPU: Fix default range in non-kernel functions
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.

llvm-svn: 316346
2017-10-23 17:09:35 +00:00
Konstantin Zhuravlyov 8d5e9e110c AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency
Differential Revision: https://reviews.llvm.org/D38957

llvm-svn: 316097
2017-10-18 17:31:09 +00:00
NAKAMURA Takumi 6f43bd4bde Untabify.
llvm-svn: 316079
2017-10-18 13:31:28 +00:00
Wei Ding 7ab1f7a421 AMDGPU : Fix an error for the llvm.cttz implementation.
Differential Revision: http://reviews.llvm.org/D39014

llvm-svn: 316037
2017-10-17 21:49:52 +00:00
Eugene Zelenko 6cadde7f40 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316034
2017-10-17 21:27:42 +00:00