Commit Graph

2131 Commits

Author SHA1 Message Date
Tim Renouf ed0b9af997 [AMDGPU] Switched HSA metadata to use MsgPackDocument
Summary:
MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This
commit switches AMDGPU HSA metadata processing to use MsgPackDocument
instead of MsgPackTypes.

Differential Revision: https://reviews.llvm.org/D57024

Change-Id: I0751668013abe8c87db01db1170831a76079b3a6
llvm-svn: 356081
2019-03-13 18:55:50 +00:00
Nirav Dave d6351340bb [DAGCombiner] If a TokenFactor would be merged into its user, consider the user later.
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.

Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.

CodeGen tests with non-reordering changes:

  X86/aligned-variadic.ll -- memory-based add folded into stored leaq
  value.

  X86/constant-combiners.ll -- Optimizes out overlap between stores.

  X86/pr40631_deadstore_elision -- folds constant byte store into
  preceding quad word constant store.

Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet

Reviewed By: courbet

Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59260

llvm-svn: 356068
2019-03-13 17:07:09 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Simon Pilgrim a6013c0286 Regenerate sign_extend.ll test.
This will change as part of the fix for the regressions in D58017.

llvm-svn: 355933
2019-03-12 16:00:59 +00:00
Tim Northover 8935aca9c7 CodeGenPrep: preserve inbounds attribute when sinking GEPs.
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.

llvm-svn: 355926
2019-03-12 15:22:23 +00:00
David Stuttard 20ea21c6ed [AMDGPU] Add support for immediate operand for S_ENDPGM
Summary:
Add support for immediate operand in S_ENDPGM

Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6

Reviewers: alexshap

Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59213

llvm-svn: 355902
2019-03-12 09:52:58 +00:00
Petar Avramovic 0b17e59b5c [MIPS GlobalISel] NarrowScalar G_MUL
Narrow Scalar G_MUL for MIPS32.
Revisit NarrowScalar implementation in LegalizerHelper.
Introduce new helper function multiplyRegisters.
It performs generic multiplication of values held in multiple registers.
Generated instructions use only types NarrowTy and i1.
Destination can be same or two times size of the source.

Differential Revision: https://reviews.llvm.org/D58824

llvm-svn: 355814
2019-03-11 10:00:17 +00:00
Matt Arsenault e8c03a2511 AMDGPU: Move d16 load matching to preprocess step
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.

I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.

llvm-svn: 355731
2019-03-08 20:58:11 +00:00
Matt Arsenault 26e76ef0e2 DAG: Don't try to cluster loads with tied inputs
This avoids breaking possible value dependencies when sorting loads by
offset.

AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.

Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.

llvm-svn: 355728
2019-03-08 20:46:15 +00:00
Matt Arsenault 74c9c305e0 AMDGPU: Add more tests for d16 loads
Also fix a few cases that weren't testing what they were supposed to.

llvm-svn: 355724
2019-03-08 20:30:51 +00:00
Matt Arsenault 07f904befb AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.

llvm-svn: 355722
2019-03-08 20:30:50 +00:00
Carl Ritson 1a98dc1840 [AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructions
Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions.

Reviewers: arsenm, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59091

llvm-svn: 355671
2019-03-08 09:03:11 +00:00
Konstantin Zhuravlyov 47f0bf8f1f AMDHSA: Code object v3 updates
- Copy kernel symbol attributes into kernel descriptor attributes
  - Make sure kernel symbol's visibility is not "higher" than protected

Differential Revision: https://reviews.llvm.org/D59057

llvm-svn: 355630
2019-03-07 19:58:29 +00:00
Aakanksha Patil c56d2afc63 AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.

Differential Revision: http://reviews.llvm.org/D58993

llvm-svn: 355574
2019-03-07 00:54:04 +00:00
Ryan Taylor 67f36903ae [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled

Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58918

llvm-svn: 355520
2019-03-06 17:02:06 +00:00
Matt Arsenault 870397739e AMDGPU: Preserve undef flag when expanding SI_IF
Fixes undefined value verifier error.

llvm-svn: 355426
2019-03-05 18:38:00 +00:00
Carl Ritson 9e3f7d8ad0 [AMDGPU] Fix DPP operand order in atomic optimizer
Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30

Reviewers: sheredom, tpr

Reviewed By: sheredom

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58900

llvm-svn: 355394
2019-03-05 12:21:44 +00:00
David Stuttard 81eec58a0d [AMDGPU] Omit KILL instructions from hazard recognizer
Summary:
In some cases the KILL was causing a hazard to be introduced as these were
scheduled into hazard slots, but don't result in an instruction.

KILL shouldn't be considered for hazard recognition.

Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58898

llvm-svn: 355384
2019-03-05 10:25:16 +00:00
Xing GUO 33649349c5 [Codegen] fix typos in test case
llvm-svn: 355264
2019-03-02 08:03:59 +00:00
Stanislav Mekhanoshin bb98841399 [AMDGPU] Mark ds instructions as meybeAtomic
These were not recognized as potential atomics by memory legalizer.
The test was working not because legalizer did a right thing, but
because it has skipped all these instructions. When I have fixed
DS desciption test started to fail because region address has
changed from 4 to 2 a while ago.

Differential Revision: https://reviews.llvm.org/D58802

llvm-svn: 355179
2019-03-01 07:59:17 +00:00
Tom Stellard 33634d1b25 AMDGPU/GlobalISel: Implement select for G_INSERT
Re-commit r344310.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 355159
2019-03-01 00:50:26 +00:00
Tom Stellard 41f32196a0 AMDGPU/GlobalISel: Implement select for G_EXTRACT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49714

llvm-svn: 355156
2019-02-28 23:37:48 +00:00
Matt Arsenault bf1bf706c8 AMDGPU/GlobalISel: Add regbankselect test for phis
Add baseline for future fixes. These mostly show how this is broken
and producing illegal situations.

llvm-svn: 355057
2019-02-28 00:52:36 +00:00
Matt Arsenault 5d567dc137 AMDGPU: Enable function calls by default
Fixes some crashes on illegal call situations which are unfortunately
still valid IR.

llvm-svn: 355051
2019-02-28 00:40:32 +00:00
Matt Arsenault aa03bcd23c AMDGPU: Fix crashes in invalid call cases
We have to at least tolerate calls to kernels, possibly with a
mismatched calling convention on the callsite.

llvm-svn: 355049
2019-02-28 00:28:44 +00:00
Matt Arsenault d3093c2f1f GlobalISel: Implement fewerElementsVector for phi
llvm-svn: 355048
2019-02-28 00:16:32 +00:00
Matt Arsenault 72bcf15dbf GlobalISel: Implement moreElementsVector for phi
llvm-svn: 355047
2019-02-28 00:01:05 +00:00
Philip Reames 288a95fc8c Seperate volatility and atomicity/ordering in SelectionDAG
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).

This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.

To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..

Previously landed

    D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
    D57596 [CodeGen] Be conservative about atomic accesses as for volatile
    D57802 Be conservative about unordered accesses for the moment
    rL353959: [Tests] First batch of cornercase tests for unordered atomics.
    rL353966: [Tests] RMW folding tests w/unordered atomic operations.
    rL353972: [Tests] More unordered atomic lowering tests.
    rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
    rL354740: [Hexagon, SystemZ] Be super conservative about atomics
    rL354800: [Lanai] Be super conservative about atomics
    rL354845: [ARM] Be super conservative about atomics

Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)

Differential Revision: https://reviews.llvm.org/D57601

llvm-svn: 355025
2019-02-27 20:20:08 +00:00
Dmitry Preobrazhensky ef92035827 [AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode
See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D58288

llvm-svn: 354969
2019-02-27 13:12:12 +00:00
Stanislav Mekhanoshin da1628eb67 [AMDGPU] Fixed hang during DAG combine
SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.

Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.

Differential Revision: https://reviews.llvm.org/D58695

llvm-svn: 354926
2019-02-26 20:56:25 +00:00
Simon Pilgrim 2ccc120d19 [AMDGPU] Regenerate bswap/bitreverse tests.
Make codegen changes more obvious in D58017

llvm-svn: 354863
2019-02-26 11:01:08 +00:00
Stanislav Mekhanoshin ab25f1e65b [AMDGPU] Added target to mir test. NFC.
Test was used without -mcpu, although tested instructions
not available on all ASICs.

llvm-svn: 354830
2019-02-25 22:59:55 +00:00
Matt Arsenault 752579736e RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

llvm-svn: 354828
2019-02-25 22:24:13 +00:00
Matt Arsenault f4bfe4cd17 AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
llvm-svn: 354825
2019-02-25 21:32:48 +00:00
Matt Arsenault 82b103998b AMDGPU/GlobalISel: Clamp max implicit_def elements
llvm-svn: 354818
2019-02-25 20:46:06 +00:00
Roman Tereshin 99a6672bba [LowerSwitch][AMDGPU] Do not handle impossible values
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D58096

llvm-svn: 354670
2019-02-22 14:33:46 +00:00
Matt Arsenault aa6fb4c45e AMDGPU: Remove debugger related subtarget features
As far as I know these aren't needed anymore.

llvm-svn: 354634
2019-02-21 23:27:46 +00:00
Mandeep Singh Grang 096fae32b3 [llvm] Fix typo: 's/ ot / to /' [NFC]
llvm-svn: 354614
2019-02-21 20:04:20 +00:00
Matt Arsenault 2e0ee47712 AMDGPU/GlobalISel: Make phis legal
llvm-svn: 354592
2019-02-21 15:48:13 +00:00
Matt Arsenault b10fa8df3f AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 types
llvm-svn: 354587
2019-02-21 15:22:20 +00:00
Stanislav Mekhanoshin 42e229e130 [AMDGPU] fix commuted case of sub combine
Differential Revision: https://reviews.llvm.org/D58481

llvm-svn: 354543
2019-02-21 02:58:00 +00:00
Tom Stellard 79b5c3842b AMDGPU/GlobalISel: Move SMRD selection logic to TableGen
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52922

llvm-svn: 354516
2019-02-20 21:02:37 +00:00
Matt Arsenault 75e30c4d5d GlobalISel: Fix fewerElementsVector for ctlz with different result type
Also complete the set of related operations.

llvm-svn: 354480
2019-02-20 16:42:52 +00:00
Matt Arsenault c4d07554e4 GlobalISel: Implement moreElementsVector for g_insert results
llvm-svn: 354477
2019-02-20 16:11:22 +00:00
Matt Arsenault b4c95b338b GlobalISel: Implement moreElementsVector for select
llvm-svn: 354354
2019-02-19 17:03:09 +00:00
Matt Arsenault 4d88427a58 GlobalISel: Implement moreElementsVector for G_EXTRACT source
llvm-svn: 354348
2019-02-19 16:44:22 +00:00
Matt Arsenault 26b7e859ef GlobalISel: Implement moreElementsVector for bit ops
llvm-svn: 354345
2019-02-19 16:30:19 +00:00
Changpeng Fang 4cabf6d3b5 AMDGPU: Use MachineInstr::mayAlias to replace areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass.
Summary:
  This is to fix a memory dependence bug in LoadStoreOptimizer.

Reviewers:
  arsenm, rampitec

Differential Revision:
  https://reviews.llvm.org/D58295

llvm-svn: 354295
2019-02-18 23:00:26 +00:00
Matt Arsenault fbe92a53d0 GlobalISel: Implement widenScalar for g_extract scalar results
llvm-svn: 354293
2019-02-18 22:39:27 +00:00
Matt Arsenault 4673fdc531 Try to organize MachineVerifier tests
The Verifier is separate from the MachineVerifier, so move it to a
different directory. Some other verifier tests were scattered in
target codegen tests as well (although I'm sure I missed some). Work
towards using a more consistent naming scheme to make it clearer where
the gaps still are for generic instructions.

llvm-svn: 354138
2019-02-15 15:24:31 +00:00