Commit Graph

3403 Commits

Author SHA1 Message Date
Matt Arsenault b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Matt Arsenault 4fb580c314 AMDGPU: Remove amdgpu-max-work-group-size attribute
This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.

llvm-svn: 362641
2019-06-05 20:32:32 +00:00
Matt Arsenault 0f8a764e8f AMDGPU: Fix using 2 different enums for same operand flags
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

llvm-svn: 362640
2019-06-05 20:32:25 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Shawn Landden 669775f9db [Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned
This matches APInt's versions of these functions, and there is no need for these to be size_t.

(as well as __builtin_clzll())

Differential Revision: https://reviews.llvm.org/D60823

llvm-svn: 362503
2019-06-04 14:51:15 +00:00
Matt Arsenault 0ceda9fb5c AMDGPU: Disable stack realignment for kernels
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447
2019-06-03 21:33:22 +00:00
Matt Arsenault 8dbeb9256c TTI: Improve default costs for addrspacecast
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436
2019-06-03 18:41:34 +00:00
Dmitry Preobrazhensky 9111f35f02 [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

llvm-svn: 362400
2019-06-03 13:51:24 +00:00
Nicolai Haehnle edfa756f3f AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand
Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da

Reviewers: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62720

Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e
llvm-svn: 362390
2019-06-03 12:07:41 +00:00
Matt Arsenault 302eedcbfa AMDGPU: Fix not adding ImplicitBufferPtr as a live-in
Fixes missing test from r293000.

llvm-svn: 362275
2019-05-31 22:47:36 +00:00
Stanislav Mekhanoshin fbbe5230f4 [AMDGPU] Use InliningThresholdMultiplier for inline hint
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Differential Revision: https://reviews.llvm.org/D62707

llvm-svn: 362239
2019-05-31 16:19:26 +00:00
Matt Arsenault e0a4da8c0a AMDGPU/GlobalISel: Add wave scratch offset argument
Avoids crashing in PEI in a future change.

llvm-svn: 362136
2019-05-30 19:33:18 +00:00
Tim Renouf 7fecdf36cc [AMDGPU] Added target-specific attribute amdgpu-max-memory-clause
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.

That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.

Differential Revision: https://reviews.llvm.org/D62572

Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
2019-05-30 18:46:34 +00:00
Aakanksha Patil d5443f8c21 AMDGPU: Return address lowering
The patch computes the return address for the current function.

Differential revision: https://reviews.llvm.org/D59666

llvm-svn: 362001
2019-05-29 18:20:11 +00:00
Konstantin Zhuravlyov fe23ed2c68 AMDGPU: Temporary drop s_mul_hi_i/u32 patterns
It introduces performance regressions in several applications.

This has already been submitted downstream.

llvm-svn: 361879
2019-05-28 21:18:34 +00:00
Michael Liao 5fc1dfa784 [AMDGPU] Correct the handling of inlineasm output registers.
Summary:
- There's a regression due to the cross-block RC assignment. Use the
  proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868
2019-05-28 19:37:09 +00:00
Matt Arsenault 24e80b8d04 AMDGPU: Don't enable all lanes with non-CSR VGPR spills
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.

llvm-svn: 361848
2019-05-28 16:46:02 +00:00
Michael Liao 7166843f1e [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
  Otherwise, that promotes that scalar register into vector one, which
  breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
  that (lane mask) scalar register is legalized firstly before its
  definition, e.g., due to the mismatch block placement and its
  topological order or loop. In that cases, the legalization of PHI
  introduces the use of that scalar register as `vreg_1`.

Reviewers: rampitec, nhaehnle, arsenm, alex-t

Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62492

llvm-svn: 361847
2019-05-28 16:29:39 +00:00
Alexander Timofeev f4040a0dd8 [AMDGPU] Fix for the address sanitizer failure. Fixing typo
llvm-svn: 361776
2019-05-27 18:17:21 +00:00
Alexander Timofeev 4a7c4069ae [AMDGPU] Fix for the address sanitizer failure caused by the ifollowing commit:
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5  Divergence driven ISel. Assign register class for cross block values according to the divergence.

llvm-svn: 361770
2019-05-27 15:03:29 +00:00
Dmitry Preobrazhensky b79af7930c [AMDGPU][MC] Enabled constant expressions as operands of s_waitcnt
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61017

llvm-svn: 361763
2019-05-27 14:08:43 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Shawn Landden 343578759e [SimplifyCFG] back out all SwitchInst commits
They caused the sanitizer builds to fail.

My suspicion is the change the countLeadingZeros().

llvm-svn: 361736
2019-05-26 18:15:51 +00:00
Shawn Landden b7cc093db2 [Support] make countLeadingZeros() and countTrailingZeros() return unsigned
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().

(as well as __builtin_clzll())

llvm-svn: 361724
2019-05-26 13:49:58 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Matt Arsenault 3d59e388ca AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.

llvm-svn: 361655
2019-05-24 18:18:51 +00:00
Matt Arsenault 0ff901fba0 AMDGPU: Boost inline threshold with addrspacecasted alloca arguments
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.

llvm-svn: 361649
2019-05-24 16:52:35 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Matt Arsenault 5c714cbdd8 AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack
allocation than is possible:

faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)

Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.

Remove the corresponding subtarget feature and option that made
this configurable.

llvm-svn: 361541
2019-05-23 19:38:14 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Galina Kistanova ed49f6d8e6 Reverted r361134 because of a failing test left unattended for a long time.
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio
Failing Tests (1):
    LLVM :: CodeGen/AMDGPU/regbank-reassign.mir

llvm-svn: 361430
2019-05-22 20:42:56 +00:00
Matt Arsenault 418e23e33c AMDGPU: Move disassembler support check to constructor
Don't check for unsupported targets for every instruction.

llvm-svn: 361406
2019-05-22 16:28:48 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Dmitry Preobrazhensky 7773fc478d [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiers
See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61012

llvm-svn: 361386
2019-05-22 13:59:01 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault dd1ffa00a5 AMDGPU: Assume call pseudos are convergent
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.

llvm-svn: 361331
2019-05-21 23:23:10 +00:00
Matt Arsenault 60ba03e210 AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
2019-05-21 23:23:05 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Bjorn Pettersson eee0f2330d [AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.

The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.

llvm-svn: 361171
2019-05-20 16:41:08 +00:00
Matt Arsenault 5239298b0d R600: Fix unconditional return in loop
llvm-svn: 361167
2019-05-20 16:22:11 +00:00
Fangrui Song 68774edcd6 Use llvm::sort. NFC
llvm-svn: 361134
2019-05-20 10:18:35 +00:00
Carl Ritson 34e95ce259 [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation.  Expand and correct test cases to cover variants of
s_waitcnt.

Reviewers: nhaehnle, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62058

llvm-svn: 361124
2019-05-20 07:20:12 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Dmitry Preobrazhensky 198611b0ff [AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers
See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61009

llvm-svn: 361045
2019-05-17 16:04:17 +00:00
Dmitry Preobrazhensky 5ae3113969 [AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork
See https://bugs.llvm.org/show_bug.cgi?id=41888

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62016

llvm-svn: 361040
2019-05-17 14:57:04 +00:00
Dmitry Preobrazhensky 43fcc79837 [AMDGPU][MC] Enabled expressions for most operands which accept integer values
See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60768

llvm-svn: 361031
2019-05-17 13:17:48 +00:00
Matt Arsenault 1a02d30c87 AMDGPU: Fix unused variable warnings in release builds
llvm-svn: 361030
2019-05-17 12:59:27 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00
Matt Arsenault 6aebcd5499 AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC
llvm-svn: 361027
2019-05-17 12:20:01 +00:00