Commit Graph

1079 Commits

Author SHA1 Message Date
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Matthias Braun 90799ce8b2 MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form.  In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.

Differential Revision: https://reviews.llvm.org/D22719

llvm-svn: 279573
2016-08-23 21:19:49 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
NAKAMURA Takumi 59a20649c6 Untabify.
llvm-svn: 279408
2016-08-22 00:58:04 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Michael Kuperstein 2bc3d4d46c [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129
2016-08-18 20:08:15 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Matt Arsenault d42d58cf21 AMDGPU: Remove dead option
llvm-svn: 278965
2016-08-17 20:07:16 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Duncan P. N. Exon Smith db53d99d02 AMDGPU: Avoid looking for the DebugLoc in end()
The end() iterator isn't a safe thing to dereference.  Pass the DebugLoc
into EmitFetchClause and EmitALUClause to avoid it.

llvm-svn: 278873
2016-08-17 00:06:43 +00:00
Konstantin Zhuravlyov e0b87181cf [AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops pass
Differential Revision: https://reviews.llvm.org/D23556

llvm-svn: 278863
2016-08-16 22:30:11 +00:00
Matt Arsenault 7f19298bfa AMDGPU: Remove excessive padding from ImmOp and RegOp.
The structs ImmOp and RegOp are in AArch64AsmParser.cpp (inside
anonymous namespace).
This diff changes the order of fields and removes the excessive padding
(8 bytes).

Patch by Alexander Shaposhnikov

llvm-svn: 278844
2016-08-16 20:28:06 +00:00
Reid Kleckner 229d32abfc [AMDGPU] Give enum an explicit 64-bit type to fix MSVC 2013 failures
Recall that MSVC always gives enums the type 'int', nothing else.  MSVC
2015 does not appear to have this problem anymore.

Clang-cl -Wmicrosoft-enum-value flags this, FWIW, so now I have a true
positive for my warning. :)

llvm-svn: 278762
2016-08-15 23:54:44 +00:00
Jan Vesely 0486f739a4 AMDGPU/R600: Convert buffer id to VTX_READ input
Use patterns instead of multiple instructions
Add buffer id to asm string

https://reviews.llvm.org/D22650

llvm-svn: 278749
2016-08-15 21:38:30 +00:00
Yaxun Liu c7cbd72921 AMDGPU: Update AMDGPURuntimeMetadata.h for enums of address space qualifiers
llvm-svn: 278682
2016-08-15 16:54:25 +00:00
Matt Arsenault 3661e90e71 AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
2016-08-15 16:18:36 +00:00
Valery Pykhtin c761675ef4 [AMDGPU] fix failure on printing of non-existing instruction operands.
Differential revision: https://reviews.llvm.org/D23323

llvm-svn: 278665
2016-08-15 10:56:48 +00:00
Matt Arsenault c1ebd82ebe AMDGPU: Fix not estimating MBB operand sizes correctly
llvm-svn: 278590
2016-08-13 01:43:54 +00:00
Matt Arsenault 3cc1e0066d AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

llvm-svn: 278589
2016-08-13 01:43:51 +00:00
Matt Arsenault 44f6d694b3 AMDGPU/R600: Remove macros
llvm-svn: 278588
2016-08-13 01:43:46 +00:00
Hans Wennborg 0dd9ed1d45 Fix more dereferenced end() iterators after r278532
llvm-svn: 278587
2016-08-13 01:12:49 +00:00
Duncan P. N. Exon Smith f197b1f78f ADT: Remove all ilist_iterator => pointer casts, NFC
Remove all ilist_iterator to pointer casts.  There were two reasons for
casts:

  - Checking for an uninitialized (i.e., null) iterator.  I added
    MachineInstrBundleIterator::isValid() to check for that case.

  - Comparing an iterator against the underlying pointer value while
    avoiding converting the pointer value to an iterator.  This is
    occasionally necessary in MachineInstrBundleIterator, since there is
    an assertion in the constructors that the underlying MachineInstr is
    not bundled (but we don't care about that if we're just checking for
    pointer equality).

To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.

  - The implicit constructors now use enable_if to exclude
    const-iterator => non-const-iterator conversions from overload
    resolution (previously it was a compiler error on instantiation, now
    it's SFINAE).

  - The == and != operators are now global (friends), and are not
    templated.

  - MachineInstrBundleIterator has overloads to compare against both
    const_pointer and const_reference.  This avoids the implicit
    conversions to MachineInstrBundleIterator that assert, instead just
    checking the address (and I added unit tests to confirm this).

Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore.  It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.

llvm-svn: 278478
2016-08-12 05:05:36 +00:00
David Majnemer 562e82945e Use the range variant of find_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278443
2016-08-12 00:18:03 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
Matt Arsenault 18da70dd2d AMDGPU: Remove unused tablegen utilities
llvm-svn: 278414
2016-08-11 21:08:43 +00:00
Wei Ding 70cda07526 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Matt Arsenault 2ffe8fd2ce AMDGPU: Prune includes
llvm-svn: 278391
2016-08-11 19:18:50 +00:00
Matt Arsenault 56684d4538 AMDGPU: Fix crashes on memory functions
llvm-svn: 278369
2016-08-11 17:31:42 +00:00
Matt Arsenault 4b5fc093d0 AMDGPU: Remove custom getSubReg
This was kind of confusing, the subregister
class shouldn't really be necessary.

llvm-svn: 278362
2016-08-11 17:15:32 +00:00
Matt Arsenault 69fd2c1179 AMDGPU: Remove unused tracking of flat instructions
llvm-svn: 278361
2016-08-11 17:15:28 +00:00
Wei Ding 34e1753585 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Valery Pykhtin 82c73bee2b Revert "[AMDGPU] fix failure on printing of non-existing instruction operands."
This reverts revision 278333, newly added test failed.

llvm-svn: 278336
2016-08-11 14:22:05 +00:00
Valery Pykhtin 3048ff6ec3 [AMDGPU] fix failure on printing of non-existing instruction operands.
Differential revision: https://reviews.llvm.org/D23323

llvm-svn: 278333
2016-08-11 13:49:46 +00:00
Tim Northover 406024a108 GlobalISel: implement simple function calls on AArch64.
We're still limited in the arguments we support, but this at least handles the
basic cases.

llvm-svn: 278293
2016-08-10 21:44:01 +00:00
Changpeng Fang fb9c3818dd AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Matt Arsenault 61f8ba8b79 AMDGPU: s_setpc_b64 should be an indirect branch
llvm-svn: 278278
2016-08-10 19:20:02 +00:00
Matt Arsenault c6b1350039 AMDGPU: Set sizes on control flow pseudos
llvm-svn: 278276
2016-08-10 19:11:51 +00:00
Matt Arsenault f4af802381 AMDGPU: Remove empty file comment
llvm-svn: 278275
2016-08-10 19:11:48 +00:00
Matt Arsenault 11587d97be AMDGPU: Remove unnecessary cast
llvm-svn: 278274
2016-08-10 19:11:45 +00:00
Matt Arsenault 57431c9680 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Matt Arsenault b920e9987d AMDGPU: Use CreateStackObject instead of CreateSpillStackObject
I'm not sure what the difference is, but no other target
uses this for emergency spill slots.

llvm-svn: 278272
2016-08-10 19:11:36 +00:00
Marek Olsak 355a8642b4 AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland
Summary:
This is the setting of the Vulkan closed source driver.

It decreases the max wave count from 10 to 8.

26010 shaders in 14650 tests
Totals:
VGPRS: 829593 -> 808440 (-2.55 %)
Spilled SGPRs: 81878 -> 42226 (-48.43 %)
Spilled VGPRs: 367 -> 358 (-2.45 %)
Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread
Code Size: 36677864 -> 35923932 (-2.06 %) bytes

There is a massive decrease in SGPR spilling in general and -7.4% spilled
VGPRs for DiRT Showdown (= SGPRs spilled to scratch?)

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23034

llvm-svn: 277867
2016-08-05 21:23:29 +00:00
Yaxun Liu 86c052238a [OpenCL] Add missing tests for getOCLTypeName
Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown.

Patch by Aaron En Ye Shi.

Differential Revision: https://reviews.llvm.org/D22964

llvm-svn: 277759
2016-08-04 19:45:00 +00:00
Alina Sbirlea 6f937b1144 LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.

Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
  considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
  for a maximum size allowed and relies on the next condition checking for %4 for correctness.
  This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).

Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.

Reviewers: arsenm, jlebar, tstellarAMD

Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23068

llvm-svn: 277735
2016-08-04 16:38:44 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Matt Arsenault 979902b3ff AMDGPU: fdiv -1, x -> rcp -x
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Nicolai Haehnle 8a482b33fe AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

llvm-svn: 277504
2016-08-02 19:31:14 +00:00
Nicolai Haehnle bef0e90cf1 AMDGPU: Track physical registers in SIWholeQuadMode
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.

The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.

This is a candidate for the 3.9 branch, as it fixes a possible hang.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22673

llvm-svn: 277500
2016-08-02 19:17:37 +00:00
Valery Pykhtin 902db3101b [AMDGPU] refactor DS instruction definitions. NFC.
Differential revision: https://reviews.llvm.org/D22522

llvm-svn: 277344
2016-08-01 14:21:30 +00:00
Benjamin Kramer 22ff865a83 [AMDGPU] Fix lifetime of SmallVector temporaries.
Found by asan -fsanitize-address-use-after-scope.

llvm-svn: 277265
2016-07-30 11:31:16 +00:00
Matt Arsenault 749035b7b1 AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

llvm-svn: 277260
2016-07-30 01:40:36 +00:00
Matt Arsenault d2141b6030 AMDGPU: Set s_setpc_b64 as a terminator
llvm-svn: 277259
2016-07-30 01:40:34 +00:00
Matt Arsenault dc744412ad AMDGPU: Remove unused pattern
llvm-svn: 277258
2016-07-30 01:40:30 +00:00
Sjoerd Meijer 0eb96ed0de TargetInstrInfo: add virtual function getInstSizeInBytes
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.

Differential Revision: https://reviews.llvm.org/D22885

llvm-svn: 277126
2016-07-29 08:16:16 +00:00
Changpeng Fang 26fb9d268b AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

llvm-svn: 277073
2016-07-28 23:01:45 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Tom Stellard 19f4301099 AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling
Summary:
We were using reserved VGPRs for SGPR spilling and this was causing
some programs with a workgroup size of 1024 to use more than 64
registers, which is illegal.

Reviewers: arsenm, mareko, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22032

llvm-svn: 276980
2016-07-28 14:30:43 +00:00
Nicolai Haehnle 3b572002a2 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
Matt Arsenault edc7dcb2aa AMDGPU: Turn dead checks into asserts
llvm-svn: 276946
2016-07-28 00:32:05 +00:00
Matt Arsenault fe26775992 AMDGPU: Remove analyzeImmediate
This no longer uses the more complicated classification
of constants.

llvm-svn: 276945
2016-07-28 00:32:02 +00:00
Reid Kleckner 46cb48c74a Remove MCAsmInfo.h include from TargetOptions.h
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.

llvm-svn: 276883
2016-07-27 16:03:57 +00:00
Ahmed Bougacha 6756a2c953 [GlobalISel] Introduce an instruction selector.
And implement it for AArch64, supporting x/w ADD/OR.

Differential Revision: https://reviews.llvm.org/D22373

llvm-svn: 276875
2016-07-27 14:31:55 +00:00
Matt Arsenault e3862cdc93 AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.

llvm-svn: 276823
2016-07-26 23:25:44 +00:00
Matt Arsenault c6b69a9114 AMDGPU: Use implicit_def for selecting anyext
llvm-svn: 276819
2016-07-26 23:06:33 +00:00
Matt Arsenault 60a750fa69 AMDGPU/R600: Remove dead custom inserters
The intrinsics for these were removed, so this is dead.

llvm-svn: 276805
2016-07-26 21:03:38 +00:00
Matt Arsenault b06db8f666 AMDGPU: Minor AsmPrinter cleanups
llvm-svn: 276804
2016-07-26 21:03:36 +00:00
Matt Arsenault 52ef4019fd AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

llvm-svn: 276766
2016-07-26 16:45:58 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Jan Vesely b64c8925e9 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

llvm-svn: 276682
2016-07-25 20:17:02 +00:00
Matt Arsenault 2fa171c43a AMDGPU: Make skip threshold an option
llvm-svn: 276680
2016-07-25 19:48:29 +00:00
Matt Arsenault cdae95bef2 AMDGPU: Delete dead code
llvm-svn: 276675
2016-07-25 19:06:25 +00:00
Joel Jones 373d7d30dd MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.

The corresponding change to clang is in: http://reviews.llvm.org/D16538

Patch by: Joel Jones

Differential Revision: https://reviews.llvm.org/D16213

llvm-svn: 276654
2016-07-25 17:18:28 +00:00
Matt Arsenault b40d8600ca AMDGPU: Delete dead code
This has been dead since r269479

llvm-svn: 276518
2016-07-23 07:07:14 +00:00
Tom Stellard b8253c88b6 Revert "[AMDGPU] Emit read-only data to .rodata for hsa"
This reverts commit r276298.

Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.

This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01

llvm-svn: 276498
2016-07-22 23:46:40 +00:00
Tim Northover 33b07d6725 GlobalISel: implement legalization pass, with just one transformation.
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.

llvm-svn: 276461
2016-07-22 20:03:43 +00:00
Matt Arsenault 3c07c813c0 AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

llvm-svn: 276438
2016-07-22 17:01:33 +00:00
Matt Arsenault 8d718dcfda AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Matt Arsenault f9245b75c0 AMDGPU: Delete more dead code
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.

llvm-svn: 276436
2016-07-22 17:01:25 +00:00
Matt Arsenault 7fb961f3e6 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Matt Arsenault d40ded6681 AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs
llvm-svn: 276434
2016-07-22 17:01:15 +00:00
Konstantin Zhuravlyov 3c0d8d22fe [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538

llvm-svn: 276298
2016-07-21 15:59:23 +00:00
Konstantin Zhuravlyov 155626238b AMDGPU/SI: Add support for R_AMDGPU_ABS32
Differential Revision: https://reviews.llvm.org/D21646

llvm-svn: 276294
2016-07-21 15:29:19 +00:00
Sam Kolton 6c4efcadfe [AMDGPU] Some code cleaning in SIRegisterInfo.td
Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: https://reviews.llvm.org/D22620

llvm-svn: 276274
2016-07-21 13:29:57 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
Yaxun Liu 4b1d9f7f18 AMDGPU: Fix bug causing crash due to invalid opencl version metadata.
Differential Revision: https://reviews.llvm.org/D22526

llvm-svn: 276119
2016-07-20 14:38:06 +00:00
Matt Arsenault a1fe17c9ad AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051
2016-07-19 23:16:53 +00:00
Davide Italiano 63e5968033 [AMDGPU] Remove spurious line (should've been removed in r276029).
llvm-svn: 276030
2016-07-19 21:16:30 +00:00
Davide Italiano 1576e38598 [AMDGPU] Remove dead code.
LGTM'd by Matt Arsenault.

llvm-svn: 276029
2016-07-19 21:10:49 +00:00
Matt Arsenault 03006fd3c4 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault fe358066ea AMDGPU/SI: Fix SI scheduler refcount issue
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.

Patch by Axel Davy

llvm-svn: 275935
2016-07-19 00:35:22 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00