Commit Graph

3505 Commits

Author SHA1 Message Date
Alexander Timofeev 37bd9bd137 [AMDGPU] Partial revert for the ba447bae74
"Divergence driven ISel. Assign register class for cross block values
       according to the divergence."
       that discovered the design flaw leading to several issues that
       required to be solved before.

       This change reverts AMDGPU specific changes and keeps common part
       unaffected.

llvm-svn: 362749
2019-06-06 21:13:02 +00:00
Matt Arsenault 34c8b835b1 AMDGPU: Don't fix emergency stack slot at offset 0
This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

llvm-svn: 362665
2019-06-05 22:37:50 +00:00
Matt Arsenault b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Matt Arsenault 4fb580c314 AMDGPU: Remove amdgpu-max-work-group-size attribute
This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.

llvm-svn: 362641
2019-06-05 20:32:32 +00:00
Matt Arsenault 0f8a764e8f AMDGPU: Fix using 2 different enums for same operand flags
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

llvm-svn: 362640
2019-06-05 20:32:25 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Shawn Landden 669775f9db [Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned
This matches APInt's versions of these functions, and there is no need for these to be size_t.

(as well as __builtin_clzll())

Differential Revision: https://reviews.llvm.org/D60823

llvm-svn: 362503
2019-06-04 14:51:15 +00:00
Matt Arsenault 0ceda9fb5c AMDGPU: Disable stack realignment for kernels
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447
2019-06-03 21:33:22 +00:00
Matt Arsenault 8dbeb9256c TTI: Improve default costs for addrspacecast
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436
2019-06-03 18:41:34 +00:00
Dmitry Preobrazhensky 9111f35f02 [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

llvm-svn: 362400
2019-06-03 13:51:24 +00:00
Nicolai Haehnle edfa756f3f AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand
Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da

Reviewers: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62720

Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e
llvm-svn: 362390
2019-06-03 12:07:41 +00:00
Matt Arsenault 302eedcbfa AMDGPU: Fix not adding ImplicitBufferPtr as a live-in
Fixes missing test from r293000.

llvm-svn: 362275
2019-05-31 22:47:36 +00:00
Stanislav Mekhanoshin fbbe5230f4 [AMDGPU] Use InliningThresholdMultiplier for inline hint
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Differential Revision: https://reviews.llvm.org/D62707

llvm-svn: 362239
2019-05-31 16:19:26 +00:00
Matt Arsenault e0a4da8c0a AMDGPU/GlobalISel: Add wave scratch offset argument
Avoids crashing in PEI in a future change.

llvm-svn: 362136
2019-05-30 19:33:18 +00:00
Tim Renouf 7fecdf36cc [AMDGPU] Added target-specific attribute amdgpu-max-memory-clause
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.

That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.

Differential Revision: https://reviews.llvm.org/D62572

Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
2019-05-30 18:46:34 +00:00
Aakanksha Patil d5443f8c21 AMDGPU: Return address lowering
The patch computes the return address for the current function.

Differential revision: https://reviews.llvm.org/D59666

llvm-svn: 362001
2019-05-29 18:20:11 +00:00
Konstantin Zhuravlyov fe23ed2c68 AMDGPU: Temporary drop s_mul_hi_i/u32 patterns
It introduces performance regressions in several applications.

This has already been submitted downstream.

llvm-svn: 361879
2019-05-28 21:18:34 +00:00
Michael Liao 5fc1dfa784 [AMDGPU] Correct the handling of inlineasm output registers.
Summary:
- There's a regression due to the cross-block RC assignment. Use the
  proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868
2019-05-28 19:37:09 +00:00
Matt Arsenault 24e80b8d04 AMDGPU: Don't enable all lanes with non-CSR VGPR spills
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.

llvm-svn: 361848
2019-05-28 16:46:02 +00:00
Michael Liao 7166843f1e [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
  Otherwise, that promotes that scalar register into vector one, which
  breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
  that (lane mask) scalar register is legalized firstly before its
  definition, e.g., due to the mismatch block placement and its
  topological order or loop. In that cases, the legalization of PHI
  introduces the use of that scalar register as `vreg_1`.

Reviewers: rampitec, nhaehnle, arsenm, alex-t

Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62492

llvm-svn: 361847
2019-05-28 16:29:39 +00:00
Alexander Timofeev f4040a0dd8 [AMDGPU] Fix for the address sanitizer failure. Fixing typo
llvm-svn: 361776
2019-05-27 18:17:21 +00:00
Alexander Timofeev 4a7c4069ae [AMDGPU] Fix for the address sanitizer failure caused by the ifollowing commit:
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5  Divergence driven ISel. Assign register class for cross block values according to the divergence.

llvm-svn: 361770
2019-05-27 15:03:29 +00:00
Dmitry Preobrazhensky b79af7930c [AMDGPU][MC] Enabled constant expressions as operands of s_waitcnt
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61017

llvm-svn: 361763
2019-05-27 14:08:43 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Shawn Landden 343578759e [SimplifyCFG] back out all SwitchInst commits
They caused the sanitizer builds to fail.

My suspicion is the change the countLeadingZeros().

llvm-svn: 361736
2019-05-26 18:15:51 +00:00
Shawn Landden b7cc093db2 [Support] make countLeadingZeros() and countTrailingZeros() return unsigned
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().

(as well as __builtin_clzll())

llvm-svn: 361724
2019-05-26 13:49:58 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Matt Arsenault 3d59e388ca AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.

llvm-svn: 361655
2019-05-24 18:18:51 +00:00
Matt Arsenault 0ff901fba0 AMDGPU: Boost inline threshold with addrspacecasted alloca arguments
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.

llvm-svn: 361649
2019-05-24 16:52:35 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Matt Arsenault 5c714cbdd8 AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack
allocation than is possible:

faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)

Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.

Remove the corresponding subtarget feature and option that made
this configurable.

llvm-svn: 361541
2019-05-23 19:38:14 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Galina Kistanova ed49f6d8e6 Reverted r361134 because of a failing test left unattended for a long time.
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio
Failing Tests (1):
    LLVM :: CodeGen/AMDGPU/regbank-reassign.mir

llvm-svn: 361430
2019-05-22 20:42:56 +00:00
Matt Arsenault 418e23e33c AMDGPU: Move disassembler support check to constructor
Don't check for unsupported targets for every instruction.

llvm-svn: 361406
2019-05-22 16:28:48 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Dmitry Preobrazhensky 7773fc478d [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiers
See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61012

llvm-svn: 361386
2019-05-22 13:59:01 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault dd1ffa00a5 AMDGPU: Assume call pseudos are convergent
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.

llvm-svn: 361331
2019-05-21 23:23:10 +00:00
Matt Arsenault 60ba03e210 AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
2019-05-21 23:23:05 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Bjorn Pettersson eee0f2330d [AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.

The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.

llvm-svn: 361171
2019-05-20 16:41:08 +00:00
Matt Arsenault 5239298b0d R600: Fix unconditional return in loop
llvm-svn: 361167
2019-05-20 16:22:11 +00:00
Fangrui Song 68774edcd6 Use llvm::sort. NFC
llvm-svn: 361134
2019-05-20 10:18:35 +00:00
Carl Ritson 34e95ce259 [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation.  Expand and correct test cases to cover variants of
s_waitcnt.

Reviewers: nhaehnle, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62058

llvm-svn: 361124
2019-05-20 07:20:12 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Dmitry Preobrazhensky 198611b0ff [AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers
See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61009

llvm-svn: 361045
2019-05-17 16:04:17 +00:00
Dmitry Preobrazhensky 5ae3113969 [AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork
See https://bugs.llvm.org/show_bug.cgi?id=41888

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62016

llvm-svn: 361040
2019-05-17 14:57:04 +00:00
Dmitry Preobrazhensky 43fcc79837 [AMDGPU][MC] Enabled expressions for most operands which accept integer values
See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60768

llvm-svn: 361031
2019-05-17 13:17:48 +00:00
Matt Arsenault 1a02d30c87 AMDGPU: Fix unused variable warnings in release builds
llvm-svn: 361030
2019-05-17 12:59:27 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00
Matt Arsenault 6aebcd5499 AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC
llvm-svn: 361027
2019-05-17 12:20:01 +00:00
Matt Arsenault 6aafc5e19d AMDGPU/GlobalISel: Legalize G_FRINT
llvm-svn: 361026
2019-05-17 12:19:57 +00:00
Matt Arsenault 1448f5689e AMDGPU/GlobalISel: Legalize G_FCOPYSIGN
llvm-svn: 361025
2019-05-17 12:19:52 +00:00
Matt Arsenault 568f193847 AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load
llvm-svn: 361023
2019-05-17 12:02:34 +00:00
Matt Arsenault a3b5a386fa AMDGPU/GlobalISel: Use subreg index instead of extra unmerge
This saves instructions and extra steps, but I'm not sure about
introducing subregister indexes at this point.

llvm-svn: 361022
2019-05-17 12:02:31 +00:00
Matt Arsenault b3dc73634c AMDGPU/GlobalISel: Use waterfall loop for buffer_load
This adds support for more complex waterfall loops that need to handle
operands > 32-bits, and multiple operands.

llvm-svn: 361021
2019-05-17 12:02:27 +00:00
Rhys Perry c4bc61bad7 [AMDGPU] detect WaW hazards when moving/merging load/store instructions
Summary:
    In order to combine memory operations efficiently, the load/store
    optimizer might move some instructions around. It's usually safe
    to move instructions down past the merged instruction because the
    pass checks if memory operations can be re-ordered.

    Though, the current logic doesn't handle Write-after-Write hazards.

    This fixes a reflection issue with Monster Hunter World and DXVK.

    v2: - rebased on top of master
        - clean up the test case
        - handle WaW hazards correctly

    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130

Original patch by Samuel Pitoiset.

Reviewers: tpr, arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D61313

llvm-svn: 361008
2019-05-17 09:32:23 +00:00
Matt Arsenault 99e6f4d11a AMDGPU: Introduce TokenFactor for ABI register copies in call sequence
The call was missing chain dependencies on the pre-call copies. I
don't think this was causing any real issues however.

llvm-svn: 360906
2019-05-16 15:10:27 +00:00
Matt Arsenault df24c92c0f AMDGPU: Assume xnack is enabled by default
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.

Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.

llvm-svn: 360903
2019-05-16 14:48:34 +00:00
Matt Arsenault a8f88c388f AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor
Bool values should use the scc/vcc regbank since r350611.

llvm-svn: 360877
2019-05-16 12:06:41 +00:00
Ryan Taylor 29257eb76c [AMDGPU] Increases available SGPR for Calling Convention
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.

Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61261

llvm-svn: 360778
2019-05-15 14:43:55 +00:00
Richard Trieu 8ce2ee9d56 [AMDGPU] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360713
2019-05-14 21:54:37 +00:00
Dmitry Preobrazhensky ee51d851ea [AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D61905

llvm-svn: 360702
2019-05-14 19:16:24 +00:00
Stanislav Mekhanoshin 05791d90c9 [AMDGPU] Fixed handling of imemdiate i1 literals
This bug was exposed by the rL360395.

Differential Revision: https://reviews.llvm.org/D61812

llvm-svn: 360689
2019-05-14 16:18:00 +00:00
Tim Renouf 33cb8f5b54 [AMDGPU] Fixed +DumpCode
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code. Longer term, we should re-implement that by
using the LLVM disassembler from the Vulkan driver.

Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and
with -filetype=obj I think it did not include any instructions, only the
labels. Fixed with this commit: now it has no effect with -filetype=asm,
and works as intended with -filetype=obj.

Differential Revision: https://reviews.llvm.org/D60682

Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b
llvm-svn: 360688
2019-05-14 16:17:14 +00:00
Stanislav Mekhanoshin 79b2828b3f [AMDGPU] Reorder includes per coding standard. NFC.
llvm-svn: 360609
2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin 21088639ae [AMDGPU] Remove now unused V2FP16_ONE constant def. NFC.
llvm-svn: 360608
2019-05-13 17:52:57 +00:00
Richard Trieu c0bd7bd481 [AMDGPU] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360487
2019-05-11 00:03:35 +00:00
Stanislav Mekhanoshin 64196850f0 [AMDGPU] Pattern for v_xor3_b32
This also allows three op patterns to use increased constant bus
limit of GFX10.

Differential Revision: https://reviews.llvm.org/D61763

llvm-svn: 360395
2019-05-10 00:09:01 +00:00
Stanislav Mekhanoshin a76da34b1d [AMDGPU] gfx1010 v_interp_* instructions
Differential Revision: https://reviews.llvm.org/D61703

llvm-svn: 360364
2019-05-09 18:38:55 +00:00
Stanislav Mekhanoshin 4d4c9e0757 [AMDGPU] gfx1010 changes for PAL metadata
Differential Revision: https://reviews.llvm.org/D61704

llvm-svn: 360353
2019-05-09 16:34:13 +00:00
Matt Arsenault 462403a5c8 AMDGPU: Mark scheduler classes as final
llvm-svn: 360294
2019-05-08 22:10:04 +00:00
Matt Arsenault 01434f9377 AMDGPU: Select VOP3 form of add
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 360293
2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin 1dbf721315 [AMDGPU] gfx1010 exp modifications
Differential Revision: https://reviews.llvm.org/D61701

llvm-svn: 360287
2019-05-08 21:23:37 +00:00
Changpeng Fang 73b7272e7a AMDGPU: Fix a mis-placed bracket
Differential Revision:
  https://reviews.llvm.org/D61430

llvm-svn: 360283
2019-05-08 19:46:04 +00:00
Simon Pilgrim e3eec06dde [AMDGPU] Reapplied BFE canonicalization from D60462
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.

llvm-svn: 360263
2019-05-08 15:49:10 +00:00
Simon Pilgrim 02937dad69 R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI.
Fixes static analyzer undefined value warning.

llvm-svn: 360239
2019-05-08 10:39:56 +00:00
Simon Pilgrim be9ade93d1 [SIMode] Fix typo in Status constructor
As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names.

Differential Revision: https://reviews.llvm.org/D61595

llvm-svn: 360236
2019-05-08 10:24:22 +00:00
Austin Kerbow 8a3d3a9af6 [AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.

Reviewers: arsenm, msearles, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61564

llvm-svn: 360199
2019-05-07 22:12:15 +00:00
Nicolai Haehnle 79ea85c6af AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.

Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61490

llvm-svn: 360123
2019-05-07 09:19:09 +00:00
Stanislav Mekhanoshin 491746a584 [AMDGPU] gfx1010 verifier changes
Differential Revision: https://reviews.llvm.org/D61521

llvm-svn: 360095
2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin 971cb8b633 [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32
GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for
all targets.

Differential Revision: https://reviews.llvm.org/D61525

llvm-svn: 360094
2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin 1bc001dec4 [AMDGPU] gfx1010 memory legalizer
Differential Revision: https://reviews.llvm.org/D61535

llvm-svn: 360087
2019-05-06 21:57:02 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Alexandre Ganea 799d96ec39 Fix compilation warnings when compiling with GCC 7.3
Differential Revision: https://reviews.llvm.org/D61046

llvm-svn: 360044
2019-05-06 13:41:54 +00:00
Stanislav Mekhanoshin 5ddd564e19 [AMDGPU] Fixed asan error after D61536
llvm-svn: 359963
2019-05-04 06:40:20 +00:00
Stanislav Mekhanoshin 51d1415a16 AMDGPU] gfx1010 hazard recognizer
Differential Revision: https://reviews.llvm.org/D61536

llvm-svn: 359961
2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin 28a1936f6d [AMDGPU] gfx1010: use fmac instructions
Differential Revision: https://reviews.llvm.org/D61527

llvm-svn: 359959
2019-05-04 04:20:37 +00:00
Stanislav Mekhanoshin d9dcf392c7 [AMDGPU] gfx1010 wait count insertion
Differential Revision: https://reviews.llvm.org/D61534

llvm-svn: 359938
2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin 41bbe101a2 [AMDGPU] gfx1010 s_code_end generation
Also add some missing metadata in the streamer.

Differential Revision: https://reviews.llvm.org/D61531

llvm-svn: 359937
2019-05-03 21:26:39 +00:00
Stanislav Mekhanoshin 93f15c922f [AMDGPU] gfx1010 loop alignment
Differential Revision: https://reviews.llvm.org/D61529

llvm-svn: 359935
2019-05-03 21:17:29 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault cfd0ca38b0 AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch

llvm-svn: 359898
2019-05-03 15:21:53 +00:00
Matt Arsenault 344d68d3c9 AMDGPU: Remove redundant patterns for shifts
llvm-svn: 359895
2019-05-03 15:08:36 +00:00
Matt Arsenault ada33314a2 AMDGPU: Remove redundant patterns for sub
There were 2 patterns for sub, one selecting to sub and one to
subrev. Only one of these will succeed, so remove the reversed one.

llvm-svn: 359894
2019-05-03 15:08:35 +00:00
Matt Arsenault 0446fbe45e AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891
2019-05-03 14:40:10 +00:00
Matt Arsenault 2c8936fd26 AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883
2019-05-03 13:42:56 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Stanislav Mekhanoshin 64399da8b8 [AMDGPU] gfx1010 lost VOP2 forms of some add/sub
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.

Differential Revision:

llvm-svn: 359757
2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin f2baae0abb [AMDGPU] gfx1010 constant bus limit
Constant bus limit has increased to 2 with GFX10.

Differential Revision: https://reviews.llvm.org/D61404

llvm-svn: 359754
2019-05-02 03:47:23 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin a224f68a10 [AMDGPU] gfx1010 DS implementation
Differential Revision: https://reviews.llvm.org/D61332

llvm-svn: 359696
2019-05-01 16:11:11 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Simon Pilgrim 19cde62008 Avoid "checking a pointer after dereferencing" warning. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359473
2019-04-29 17:38:18 +00:00
Simon Pilgrim 6f349d8c39 Move if() to newline to stop ambiguity over whether it should be else if. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359472
2019-04-29 17:34:26 +00:00
Mark Searles 76c5b62988 Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363
2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin 4f331cb1f3 [AMDGPU] gfx1010 VOPC implementation
Differential Revision: https://reviews.llvm.org/D61208

llvm-svn: 359358
2019-04-26 23:16:16 +00:00
Stanislav Mekhanoshin 61beff020e [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin 8f3da70eed [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin 917c477a07 [AMDGPU] gfx1010 - fix ubsan failure
Revert DecoderNamespace in one place for now. It will need more
changes to properly work.

llvm-svn: 359239
2019-04-25 20:39:06 +00:00
Stanislav Mekhanoshin 2c97ff07bf [AMDGPU] gfx1010 VOP1 instructions
Differential Revision: https://reviews.llvm.org/D61099

llvm-svn: 359225
2019-04-25 19:01:51 +00:00
Stanislav Mekhanoshin 956b0be72e [AMDGPU] gfx1010 utility functions
Differential Revision: https://reviews.llvm.org/D61094

llvm-svn: 359224
2019-04-25 18:53:41 +00:00
Austin Kerbow 83e52142d1 Fix spelling error. NFC
Summary: Test commit.

Reviewers: msearles, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61093

llvm-svn: 359154
2019-04-24 23:32:21 +00:00
Stanislav Mekhanoshin 9d287358a8 [AMDGPU] gfx1010 SOP instructions
Differential Revision: https://reviews.llvm.org/D61080

llvm-svn: 359139
2019-04-24 20:44:34 +00:00
Stanislav Mekhanoshin 33d806a517 [AMDGPU] gfx1010 sgpr register changes
Differential Revision: https://reviews.llvm.org/D61045

llvm-svn: 359117
2019-04-24 17:28:30 +00:00
Stanislav Mekhanoshin cee607e414 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

llvm-svn: 359113
2019-04-24 17:03:15 +00:00
Dmitry Preobrazhensky 47621d7c89 [AMDGPU][MC] Parser cleanup and refactoring
Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60767

llvm-svn: 359096
2019-04-24 14:06:15 +00:00
Stanislav Mekhanoshin c464dddccb [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cpp
The second argument is flags, not subreg.

Differential Revision: https://reviews.llvm.org/D61031

llvm-svn: 359017
2019-04-23 17:59:26 +00:00
Scott Linder 3eed961973 [AMDGPU] Fix hidden argument metadata duplication for V3
Essentially complete a proper rebase of the V3 metadata change over
https://reviews.llvm.org/D49096.

Minimize the diff between the V2 and V3 variants of the relevant lit
tests, and clean up some trailing whitespace.

llvm-svn: 358992
2019-04-23 14:31:17 +00:00
Nicolai Haehnle 7edae4c403 AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.

This then used to trigger an assertion when processing a dependent
phi instruction.

Change-Id: Id4949719f8298062fe476a25718acccc109113b6

Reviewers: llvm-commits

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60999

llvm-svn: 358983
2019-04-23 13:12:52 +00:00
Fedor Sergeev 652168a99b [CallSite removal] move InlineCost to CallBase usage
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.

Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636

llvm-svn: 358982
2019-04-23 12:43:27 +00:00
Michael Liao 389d5a3474 [AMDGPU] Fix an issue in `op_sel_hi` skipping.
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
  packed literals. Even an instruction is `packed`, it may have operand
  requiring non-packed literal, such as `v_dot2_f32_f16`.

Reviewers: rampitec, arsenm, kzhuravl

Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60978

llvm-svn: 358922
2019-04-22 22:05:49 +00:00
Matt Arsenault f84ce75cd1 AMDGPU: Skip debug instructions in assert
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address

llvm-svn: 358909
2019-04-22 19:14:26 +00:00
Matt Arsenault 2b6f76f05f AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources
llvm-svn: 358894
2019-04-22 15:22:46 +00:00
Matt Arsenault 70346d127b AMDGPU: Fix not checking for copy when looking at copy src
Effectively reverts r356956. The check for isFullCopy was excessive,
but there still needs to be a check that this is a copy.

llvm-svn: 358890
2019-04-22 14:54:39 +00:00
Dmitry Preobrazhensky e2707f5aac [AMDGPU][MC] Corrected parsing of SP3 'neg' modifier
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60624

llvm-svn: 358888
2019-04-22 14:35:47 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Bjorn Pettersson 238c9d6308 [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

llvm-svn: 358744
2019-04-19 09:08:38 +00:00
Piotr Sobczak 72e2960e52 [AMDGPU] Ignore non-SUnits edges
Summary:
Ignore edges to non-SUnits (e.g. ExitSU) when checking
for low latency instructions.

When calling the function isLowLatencyInstruction(),
an ExitSU could be on the list of successors, not necessarily
a regular SU. In other places in the code there is a check
"Succ->NodeNum >= DAGSize" to prevent further processing of
ExitSU as "Succ->getInstr()" is NULL in such a case.
Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())"
has the guard, so it is clearly an omission here.

Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60864

llvm-svn: 358740
2019-04-19 06:19:14 +00:00
Tim Renouf 7c55c8d8c3 [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.

Differential Revision: https://reviews.llvm.org/D60633

Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
2019-04-18 05:27:01 +00:00
Dmitry Preobrazhensky 394d0a1637 [AMDGPU][MC] Corrected handling of "-" before expressions
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60622

llvm-svn: 358596
2019-04-17 16:56:34 +00:00
Rhys Perry c2814e12e7 AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.

Reviewers: arsen, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60824

llvm-svn: 358592
2019-04-17 16:31:52 +00:00
Dmitry Preobrazhensky 20d52e3aa2 [AMDGPU][MC] Corrected parsing of registers
See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60621

llvm-svn: 358581
2019-04-17 14:44:01 +00:00
Tim Renouf 59e8bd3093 [AMDGPU] Flag new raw/struct atomic ops as source of divergence
Differential Revision: https://reviews.llvm.org/D60731

Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d
llvm-svn: 358579
2019-04-17 14:04:31 +00:00
Matt Arsenault 101abd219b AMDGPU: Fix unreachable when counting register usage of SGPR96
llvm-svn: 358447
2019-04-15 20:51:12 +00:00
Matt Arsenault fbdd2a1887 AMDGPU: Fix printed format of SReg_96
These are artificial, so I think this should only come up with inline
asm comments.

llvm-svn: 358446
2019-04-15 20:42:18 +00:00
Tim Renouf 842be38162 [AMDGPU] Fixed incorrect test in vcnd/vcmp optimization
This fixes a test I introduced in change D59191 (that added src0 and
src1 modifiers to the v_cndmask instruction for disassembly purposes).

Spotted by David Binderman in bug 41488.

Differential Revision: https://reviews.llvm.org/D60652

Change-Id: I6ac95e66cd84e812ed3359ad57bcd0e13198ba0c
llvm-svn: 358392
2019-04-15 10:36:24 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Amara Emerson d189680baa [GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.

This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.

llvm-svn: 358368
2019-04-15 04:53:46 +00:00
Nick Desaulniers 5277b3ff25 [AsmPrinter] refactor to remove remove AsmVariant. NFC
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.

Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.

This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.

Reviewers: craig.topper

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60488

llvm-svn: 358101
2019-04-10 16:38:43 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin 5182302a37 [AMDGPU] Sort out and rename multiple CI/VI predicates
Differential Revision: https://reviews.llvm.org/D60346

llvm-svn: 357835
2019-04-06 09:20:48 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Stanislav Mekhanoshin 7895c03232 [AMDGPU] predicate and feature refactoring
We have done some predicate and feature refactoring lately but
did not upstream it. This is to sync.

Differential revision: https://reviews.llvm.org/D60292

llvm-svn: 357791
2019-04-05 18:24:34 +00:00