Commit Graph

4660 Commits

Author SHA1 Message Date
Eli Friedman 68b03aee1a Remove SequentialType from the type heirarchy.
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.

In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.

In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)

Differential Revision: https://reviews.llvm.org/D75661
2020-04-06 17:03:49 -07:00
Konstantin Pyzhov 72e8754916 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 09:05:58 -04:00
Matt Arsenault 869f05c834 AMDGPU: Remove dead paths for requiresUniformRegister
The extracts from control flow intrinsics are already properly handled
by divergence analysis. The inline asm case isn't dead, but has also
never really worked correctly so leave it as-is for now.
2020-04-06 16:15:10 -04:00
Konstantin Pyzhov 51dc028314 Revert e1730cfeb3 2020-04-06 05:56:11 -04:00
Konstantin Pyzhov e1730cfeb3 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 05:10:37 -04:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Matt Arsenault e87ec66762 AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll 2020-04-06 11:50:16 -04:00
Jay Foad ddd2f4b96f [AMDGPU] Fix inaccurate comments 2020-04-06 16:44:08 +01:00
Matt Arsenault cbf719b568 AMDGPU: Use DAG patterns for div_fmas 2020-04-06 09:28:30 -04:00
Matt Arsenault 79b29d6df7 AMDGPU: Remove DisableInst feature
I'm not sure why these were bothering to check the instruction
profile, since those profiles should only be used with these
instruction classes.
2020-04-06 09:27:44 -04:00
Apelete Seketeli 8aadb442d1 [scan-build] fix dead store warnings emitted on LLVM AMDGPU code base
This fixes dead store warnings of the type "dead assignment" reported
by Clang Static Analyzer.
2020-04-05 11:19:03 -04:00
Matt Arsenault 6bfe28e92f AMDGPU: Fix annotate kernel features through casted calls
I thought I was testing this before, but the workitem id x
case isn't great since it's mandatory in the parent kernel.
2020-04-04 20:44:44 -04:00
Matt Arsenault 221890d709 AMDGPU: Add feature for fast f32 denormals 2020-04-04 20:01:24 -04:00
Matt Arsenault 30ebafaa56 CodeGen: Convert some TII hooks to use Register 2020-04-03 14:52:54 -04:00
Matt Arsenault 178050c3ba AMDGPU: Use Register in more places 2020-04-03 14:52:54 -04:00
Matt Arsenault e8dcb6d05e AMDGPU: Remove redundant virtual 2020-04-03 14:52:53 -04:00
Stanislav Mekhanoshin 0462795095 [AMDGPU] Propagate AGPR RC from PHI to its PHI operands
We can fix register class of PHI based on its all AGPR uses.
That leaves behind all PHIs which were already processed
earlier. Propagate RC back to PHI operands of a PHI.

Differential Revision: https://reviews.llvm.org/D77344
2020-04-03 11:23:02 -07:00
Austin Kerbow 30f18ed387 [AMDGPU] Handle SMRD signed offset immediate
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.

Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.

Third, S_BUFFER instructions should never use a signed offset immediate.

Differential Revision: https://reviews.llvm.org/D77082
2020-04-02 17:41:52 -07:00
Matt Arsenault f68cc2a7ed AMDGPU: Use 128-bit DS operations by default 2020-04-02 17:17:47 -04:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Matt Arsenault 75cf30918f AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
2020-04-02 17:17:12 -04:00
Matt Arsenault c3d3c22a58 AMDGPU: Hack out noinline on functions using LDS globals
This is a workaround for clang adding noinline to all functions at
-O0. Previously, we would just add alwaysinline, and the verifier
would complain about having both noinline and alwaysinline. We
currently can't truly codegen this case as a freestanding function, so
override the user forcing noinline.
2020-04-02 14:12:07 -04:00
Stanislav Mekhanoshin f2334a7ef2 [AMDGPU] Fix crash in SILoadStoreOptimizer
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An original order can be changed if
there were a dependent instruction in between and base instruction
was moved.

Fixed by bailing the optimization. In theory it might be possible
still to perform a merge by swapping instructions, but on practice
it bails anyway because it finds dependency on that same instruction
which has resulted in the base move.

Differential Revision: https://reviews.llvm.org/D77245
2020-04-02 10:26:47 -07:00
Guillaume Chatelet 189d2e215f [Alignment][NFC] Use more Align versions of various functions
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77291
2020-04-02 09:00:53 +00:00
Matt Arsenault 5e4e8d0388 AMDGPU/GlobalISel: Change intrinsic ID for _L to _LZ opt
Still should handle the other case changes the opcode this way.
2020-04-01 13:03:02 -04:00
Guillaume Chatelet 1dffa2550b [Alignment][NFC] Transition to MachineFrameInfo::getObjectAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77215
2020-04-01 14:08:28 +00:00
Simon Pilgrim be7a233e93 Fix operator precedence warning. NFCI. 2020-04-01 14:36:52 +01:00
Simon Pilgrim 552e46ea1e Fix unused variable warnings. NFCI. 2020-04-01 14:36:51 +01:00
Matt Arsenault 43e576593e AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD 2020-03-31 19:57:06 -04:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Stanislav Mekhanoshin 08682dcc86 [AMDGPU] Define 16 bit VGPR subregs
We have loads preserving low and high 16 bits of their
destinations. However, we always use a whole 32 bit register
for these. The same happens with 16 bit stores, we have to
use full 32 bit register so if high bits are clobbered the
register needs to be copied. One example of such code is
added to the load-hi16.ll.

The proper solution to the problem is to define 16 bit subregs
and use them in the operations which do not read another half
of a VGPR or preserve it if the VGPR is written.

This patch simply defines subregisters and register classes.
At the moment there should be no difference in code generation.
A lot more work is needed to actually use these new register
classes. Therefore, there are no new tests at this time.

Register weight calculation has changed with new subregs so
appropriate changes were made to keep all calculations just
as they are now, especially calculations of register pressure.

Differential Revision: https://reviews.llvm.org/D74873
2020-03-31 11:49:06 -07:00
Guillaume Chatelet c9d5c19597 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77121
2020-03-31 08:36:18 +00:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Guillaume Chatelet 0de874adfb [Alignment][NFC] Transition to inferAlignFromPtrInfo
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77120
2020-03-31 08:06:49 +00:00
Matt Arsenault d0dd24a381 AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources
No test since these cases shouldn't really be getting through the
legalizer.
2020-03-30 18:14:04 -04:00
Matt Arsenault db9f0d1ce5 AMDGPU: Form v_cvt_ubyte* with f16 results
We get 2 conversion instructions anyway. Previously we would get a
conversion with SDWA reading from a byte source, which has a larger
encoding.
2020-03-30 17:59:49 -04:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00
Matt Arsenault 570a578e46 AMDGPU: Account for dmask when computing image mem size
Only the number of elements in the dmask will really be accessed.
2020-03-30 17:30:58 -04:00
Jay Foad cee65d51fe AMDGPU: Implement getMemcpyLoopLoweringType
Summary: Based on a patch by Matt Arsenault.

Reviewers: rampitec, kerbowa, nhaehnle, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77057
2020-03-30 22:21:01 +01:00
Matt Arsenault 2641ba52a9 AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses
The instruction definitions are missing for these register types, so
round up to 8 like the DAG.
2020-03-30 17:02:47 -04:00
Matt Arsenault 42d5609809 AMDGPU/GlobalISel: Start handling _L to _LZ optimization
We currently don't have a way to map to the equivalent intrinsic
opcode, so track immediate 0s in place of the address for the
selection to know to change the final opcode.
2020-03-30 17:02:30 -04:00
Matt Arsenault 4919f2e1c5 AMDGPU/GlobalISel: Basic legalize rules for G_FSHR
Only handles easy 32-bit cases.
2020-03-30 11:53:01 -07:00
Jakub Kuderski 77ce2e21a8 [AMDGPU] Add Relocation Constant Support
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.

The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.

`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.

Author: csyonghe <yonghe@google.com>

Reviewers: tpr, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76440
2020-03-30 13:49:20 -04:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Matt Arsenault bb009498c2 AMDGPU/GlobalISel: Hack to fix i24 argument lowering
I still think the call lowering type legalization logic split between
the generic code and target is too confusing, but largely induced by
the reliance on the DAG infrastructure.
2020-03-30 11:00:45 -04:00
Matt Arsenault 90a36bbd7c AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM
Mostly ported from the DAG version. This results in much worse code
than the DAG version, largely due to a much worse expansion for
G_UMULH.
2020-03-30 10:57:37 -04:00
Florian Hahn c3b03f3d0c [AMDGPU] Drop const for value that is copied (NFC).
This fixes

    warning: loop variable 'Def' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]

llvm::Register just contains a single unsigned and should be copied.

Reviewers: rampitec

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D77011
2020-03-30 10:59:59 +01:00
Matt Arsenault d15723ef06 AMDGPU/GlobalISel: Remove redundant virtual 2020-03-29 14:03:07 -04:00