Commit Graph

3161 Commits

Author SHA1 Message Date
Matt Arsenault 752579736e RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

llvm-svn: 354828
2019-02-25 22:24:13 +00:00
Matt Arsenault f4bfe4cd17 AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
llvm-svn: 354825
2019-02-25 21:32:48 +00:00
Matt Arsenault 82b103998b AMDGPU/GlobalISel: Clamp max implicit_def elements
llvm-svn: 354818
2019-02-25 20:46:06 +00:00
Matt Arsenault f97ace5639 AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics
EarlyCSE with MemorySSA was able to use this to merge multiple calls
with no intervening store.

llvm-svn: 354814
2019-02-25 20:16:11 +00:00
Matt Arsenault fd6fd00773 AMDGPU: Correct definitions for bitset instructions
These really read and write the result register, so these need a tied
input.

llvm-svn: 354809
2019-02-25 19:24:46 +00:00
Konstantin Zhuravlyov 9a278bf6b5 Revert "AMDGPU/NFC: Cleanup subtarget predicates"
It breaks one of our downstream merges, so revert it
temporarily while investigating failures downstream

llvm-svn: 354700
2019-02-22 23:21:06 +00:00
Matt Arsenault 476e26b5d3 AMDGPU: Use removeAllRegUnitsForPhysReg
llvm-svn: 354686
2019-02-22 19:03:36 +00:00
Matt Arsenault aa6fb4c45e AMDGPU: Remove debugger related subtarget features
As far as I know these aren't needed anymore.

llvm-svn: 354634
2019-02-21 23:27:46 +00:00
Konstantin Zhuravlyov c2650178a1 AMDGPU/NFC: Cleanup subtarget predicates
Differential Revision: https://reviews.llvm.org/D58522

llvm-svn: 354620
2019-02-21 20:43:43 +00:00
Mark Searles 599ce44d3f [AMDGPU] remove unused AssemblerPredicates
An internal build is hitting asserts complaining about too many subtarget
features:
  llvm/utils/TableGen/Types.cpp:42:
    const char* llvm::getMinimalTypeForEnumBitfield(uint64_t):
    Assertion `MaxIndex <= 64 && "Too many bits"' failed.

  llvm/utils/TableGen/AsmMatcherEmitter.cpp:1476:
    void {anonymous}::AsmMatcherInfo::buildInfo():
    Assertion `SubtargetFeatures.size() <= 64 && "Too many subtarget features!"'
    failed.

The short-term solution is to remove a few unused AssemblerPredicates to get
under the limit.

The long-term solution seems to be to revisit these asserts. E.g., rather than
hardcoded '64', use the standard sized std::bitset like the other places that
track subtarget features.

Differential Revision: https://reviews.llvm.org/D58516

llvm-svn: 354604
2019-02-21 18:19:54 +00:00
Matt Arsenault 2e0ee47712 AMDGPU/GlobalISel: Make phis legal
llvm-svn: 354592
2019-02-21 15:48:13 +00:00
Matt Arsenault b10fa8df3f AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 types
llvm-svn: 354587
2019-02-21 15:22:20 +00:00
Stanislav Mekhanoshin 42e229e130 [AMDGPU] fix commuted case of sub combine
Differential Revision: https://reviews.llvm.org/D58481

llvm-svn: 354543
2019-02-21 02:58:00 +00:00
Tom Stellard 79b5c3842b AMDGPU/GlobalISel: Move SMRD selection logic to TableGen
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52922

llvm-svn: 354516
2019-02-20 21:02:37 +00:00
Matt Arsenault 75e30c4d5d GlobalISel: Fix fewerElementsVector for ctlz with different result type
Also complete the set of related operations.

llvm-svn: 354480
2019-02-20 16:42:52 +00:00
Matt Arsenault c4d07554e4 GlobalISel: Implement moreElementsVector for g_insert results
llvm-svn: 354477
2019-02-20 16:11:22 +00:00
Matt Arsenault b4c95b338b GlobalISel: Implement moreElementsVector for select
llvm-svn: 354354
2019-02-19 17:03:09 +00:00
Matt Arsenault 4d88427a58 GlobalISel: Implement moreElementsVector for G_EXTRACT source
llvm-svn: 354348
2019-02-19 16:44:22 +00:00
Matt Arsenault 26b7e859ef GlobalISel: Implement moreElementsVector for bit ops
llvm-svn: 354345
2019-02-19 16:30:19 +00:00
Changpeng Fang 4cabf6d3b5 AMDGPU: Use MachineInstr::mayAlias to replace areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass.
Summary:
  This is to fix a memory dependence bug in LoadStoreOptimizer.

Reviewers:
  arsenm, rampitec

Differential Revision:
  https://reviews.llvm.org/D58295

llvm-svn: 354295
2019-02-18 23:00:26 +00:00
Matt Arsenault fbe92a53d0 GlobalISel: Implement widenScalar for g_extract scalar results
llvm-svn: 354293
2019-02-18 22:39:27 +00:00
Konstantin Zhuravlyov 1e126c503b AMDGPU: Set ABI version to 1 for code object v3
Differential Revision: https://reviews.llvm.org/D57811

llvm-svn: 354085
2019-02-14 23:56:04 +00:00
Matt Arsenault 530d05e94a GlobalISel: Add alignment to LegalityQuery MMOs
This allows targets to specify the minimum alignment required for the
load/store.

llvm-svn: 354071
2019-02-14 22:41:09 +00:00
Matt Arsenault 9e5e868d95 AMDGPU/GlobalISel: Fix RegBankSelect for GEP.
This is basically a pointer typed add, so shouldn't be any different.
This was assuming everything was an SGPR, which is not true.

Also cleanup legality for GEP. I don't seem to be seeing the problem
the hack marking s64 as a legal pointer type the comment mentions.

llvm-svn: 354067
2019-02-14 22:24:28 +00:00
Stanislav Mekhanoshin 871821f786 [AMDGPU] Ressociate 'add (add x, y), z' to use SALU
Reassociate adds to collect scalar operands in a single
instruction when possible. That will result in a scalar
add followed by vector instead of two vector adds, thus
better utilizing SALU.

Differential Revision: https://reviews.llvm.org/D58220

llvm-svn: 354066
2019-02-14 22:11:25 +00:00
Matt Arsenault d3d496338e AMDGPU/GlobalISel: Handle split for 64-bit VALU select
llvm-svn: 354065
2019-02-14 21:58:12 +00:00
Matt Arsenault 4cd9509e1d AMDGPU: Try to use function specific ST
Subtargets are a function level property, so ideally we would
eliminate everywhere that needs to check the global one. Rename the
function to try avoiding confusion.

llvm-svn: 353900
2019-02-12 23:44:13 +00:00
Matt Arsenault d24296e282 AMDGPU: Ignore CodeObjectV3 when inlining
This was inhibiting inlining of library functions when clang was
invoking the inliner directly. This is covering a bit of a mess with
subtarget feature handling, and this shouldn't be a subtarget
feature. The behavior is different depending on whether you are using
a -mattr flag in clang, or llc, opt.

llvm-svn: 353899
2019-02-12 23:30:11 +00:00
Konstantin Zhuravlyov 6220d62e5c AMDGPU/NFC: Remove SubtargetFeatureISAVersion since it is not used anywhere
llvm-svn: 353892
2019-02-12 22:49:49 +00:00
Konstantin Zhuravlyov acb231c8d8 AMDGPU: Remove duplicate processor (gfx900)
llvm-svn: 353889
2019-02-12 22:29:25 +00:00
Matt Arsenault 00ccd13c73 AMDGPU/GlobalISel: Only make f16 constants legal on f16 targets
We could deal with it, but there's no real point.

llvm-svn: 353845
2019-02-12 14:54:55 +00:00
Matt Arsenault 18ec382698 GlobalISel: Implement moreElementsVector for implicit_def
llvm-svn: 353754
2019-02-11 22:00:39 +00:00
Matt Arsenault 9dba67f431 GlobalISel: Add G_FCANONICALIZE instruction
llvm-svn: 353719
2019-02-11 17:05:20 +00:00
Benjamin Kramer 582c16013d [AMDGPU] Remove unused variable
llvm-svn: 353704
2019-02-11 14:49:54 +00:00
Neil Henning 8c10fa1a90 [AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was
previously missing the row_shr:3 step), and works around a read_register
exec bug by using a ballot instead.

Differential Revision: https://reviews.llvm.org/D57737

llvm-svn: 353703
2019-02-11 14:44:14 +00:00
Valery Pykhtin ded96df01e [AMDGPU] Enable DPP combiner pass by default.
Related revisions: https://reviews.llvm.org/D55444, https://reviews.llvm.org/D55314

llvm-svn: 353691
2019-02-11 11:15:03 +00:00
Stanislav Mekhanoshin 0e858b028d [AMDGPU] Split dot-insts feature
Differential Revision: https://reviews.llvm.org/D57971

llvm-svn: 353587
2019-02-09 00:34:21 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Matt Arsenault 564f0f832c AMDGPU: Eliminate GPU specific SubtargetFeatures
Inline compatability is determined from the individual feature
bits. These are just sets of the separate features, but will always be
treated as incompatible unless they are specifically ignored.

Defining the ISA version number here in tablegen would be nice, but it
turns out this wasn't actually used.

llvm-svn: 353558
2019-02-08 19:59:32 +00:00
Matt Arsenault d7047276ec AMDGPU: Remove GCN features and predicates
These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548
2019-02-08 19:18:01 +00:00
Carl Ritson 494b8ac95a [AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs
Summary:
Prior to GCN3 s_load_dword offsets are in dwords rather than bytes.
Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56496

llvm-svn: 353530
2019-02-08 15:41:11 +00:00
Matt Arsenault b0a227049f AMDGPU/GlobalISel: Fix shift legalization for non-power-of-2
clampScalar doesn't do anything for non-power-of-2 in range.
There should probably be a combination rule to reduce the number
of matching rules.

llvm-svn: 353526
2019-02-08 15:06:24 +00:00
Dmitry Preobrazhensky 942c273d64 [AMDGPU][MC] Added support of lds_direct operand
See bug 39293: https://bugs.llvm.org/show_bug.cgi?id=39293

Reviewers: artem.tamazov, rampitec

Differential Revision: https://reviews.llvm.org/D57889

llvm-svn: 353524
2019-02-08 14:57:37 +00:00
Matt Arsenault 0f2debb1c2 AMDGPU/GlobalISel: Fix non-power-of-2 implicit_def
llvm-svn: 353522
2019-02-08 14:46:27 +00:00
Matt Arsenault dc88a2ce35 AMDGPU/GlobalISel: Don't use a copy in addrspacecast lowering
llvm-svn: 353516
2019-02-08 14:16:11 +00:00
Dmitry Preobrazhensky 62a0318dff [AMDGPU][MC][CODEOBJECT] Added predefined symbols to access GPU minor and stepping numbers
Added the following Code Object v3 symbols:
    .amdgcn.gfx_generation_minor
    .amdgcn.gfx_generation_stepping

Reviewers: artem.tamazov, kzhuravl

Differential Revision: https://reviews.llvm.org/D57826

llvm-svn: 353515
2019-02-08 13:51:31 +00:00
Valery Pykhtin 7fe97f8c7c [AMDGPU] Fix DPP combiner
Differential revision: https://reviews.llvm.org/D55444

dpp move with uses and old reg initializer should be in the same BB.
bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity.
Added add, subrev, and, or instructions to the old folding function.
Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.

The pass is still disabled by default.

llvm-svn: 353513
2019-02-08 11:59:48 +00:00
Matt Arsenault a8b4339c2f AMDGPU/GlobalISel: Legalize addrspacecast
Use a placeholder constant for now on targets
that need the load from the queue ptr.

llvm-svn: 353497
2019-02-08 02:40:47 +00:00
Matt Arsenault fbec8fe93b GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Matt Arsenault d914189a2e AMDGPU/GlobalISel: Restrict g_implicit_def legality
llvm-svn: 353452
2019-02-07 19:10:15 +00:00