Commit Graph

3313 Commits

Author SHA1 Message Date
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Amara Emerson d189680baa [GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.

This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.

llvm-svn: 358368
2019-04-15 04:53:46 +00:00
Nick Desaulniers 5277b3ff25 [AsmPrinter] refactor to remove remove AsmVariant. NFC
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.

Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.

This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.

Reviewers: craig.topper

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60488

llvm-svn: 358101
2019-04-10 16:38:43 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin 5182302a37 [AMDGPU] Sort out and rename multiple CI/VI predicates
Differential Revision: https://reviews.llvm.org/D60346

llvm-svn: 357835
2019-04-06 09:20:48 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Stanislav Mekhanoshin 7895c03232 [AMDGPU] predicate and feature refactoring
We have done some predicate and feature refactoring lately but
did not upstream it. This is to sync.

Differential revision: https://reviews.llvm.org/D60292

llvm-svn: 357791
2019-04-05 18:24:34 +00:00
Matt Arsenault 4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Matt Arsenault 396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Matt Arsenault f426ddbfc7 AMDGPU: Assume ECC is enabled by default if supported
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.

llvm-svn: 357558
2019-04-03 01:58:57 +00:00
Matt Arsenault 807bedab2e AMDGPU: Remove unnecessary subtarget get
llvm-svn: 357542
2019-04-03 00:01:05 +00:00
Matt Arsenault 45c165b917 AMDGPU: Fix names for generation features
We should overall stop using these, but the uppercase name didn't
work. Any feature string is converted to lowercase, so these
could never be found in the table.

llvm-svn: 357541
2019-04-03 00:01:03 +00:00
Sander de Smalen 7f23e0a62f Enforce StackID definition in PEI
There are various places in LLVM where the definition of StackID is not
properly honoured, for example in PEI where objects with a StackID > 0 are
allocated on the default stack (StackID0). This patch enforces that PEI
only considers allocating objects to StackID 0.

Reviewers: arsenm, thegameg, MatzeB

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60062

llvm-svn: 357460
2019-04-02 09:46:52 +00:00
Neil Henning 0a30f33ce2 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

llvm-svn: 357400
2019-04-01 15:19:52 +00:00
Liang Zou 9f4a4d3974 fix typo: "\t" => " "
Reviewers: llvm.org, Jim

Reviewed By: Jim

Subscribers: arsenm, jvesely, nhaehnle, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59983

llvm-svn: 357365
2019-03-31 14:49:00 +00:00
Matt Arsenault 055e4dce45 AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302
2019-03-29 19:14:54 +00:00
Dmitry Preobrazhensky d6827ce3a3 [AMDGPU][MC] Corrected conversion rules for inlinable constants to match rules for literals
See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59786

llvm-svn: 357262
2019-03-29 14:50:20 +00:00
Dmitry Preobrazhensky 7f33574be3 [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes
See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59878

llvm-svn: 357249
2019-03-29 12:16:04 +00:00
Konstantin Zhuravlyov 2b766ed774 AMDGPU: Make sram-ecc off by default for Vega20
Differential Revision: https://reviews.llvm.org/D59718

llvm-svn: 357247
2019-03-29 12:04:18 +00:00
Clement Courbet b70355f0b4 [ScheduleDAG] Move `Topo` and `addEdge` to base class.
Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to
internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`.

There is nothing actually specific to `ScheduleDAGMI` in `Topo`.

llvm-svn: 357239
2019-03-29 08:33:05 +00:00
Matt Arsenault 5fddf09187 AMDGPU/GlobalISel: Insert waterfall loop for vector indexing
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.

llvm-svn: 357235
2019-03-29 03:54:56 +00:00
Reid Kleckner 85e2cdac73 Delay initialization of three static global maps, NFC
This avoids allocating a few KB of heap memory on startup, and instead
allocates these maps lazily. I noticed this while profiling LLD.

llvm-svn: 357192
2019-03-28 17:33:41 +00:00
Matt Arsenault a353fd572a AMDGPU: Make exec mask optimzations more resistant to block splits
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.

llvm-svn: 357170
2019-03-28 14:01:39 +00:00
Justin Bogner b1650f0da9 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)

This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.

Patch by Guillaume Marques. Thanks!

Differential Revision: https://reviews.llvm.org/D56201

llvm-svn: 357120
2019-03-27 20:35:56 +00:00
Matt Arsenault 7b14b2425d Reapply "AMDGPU: Scavenge register instead of findUnusedReg"
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.

This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.

llvm-svn: 357098
2019-03-27 17:31:29 +00:00
Matt Arsenault 17e39100a2 AMDGPU: Enable the scavenger for large frames
Another test is needed for the case where the scavenge fail, but
there's another issue with that which needs an additional fix.

llvm-svn: 357093
2019-03-27 17:14:32 +00:00
Matt Arsenault 4d47ac3b30 AMDGPU: Add additional MIR tests for exec mask optimizations
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.

Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.

llvm-svn: 357091
2019-03-27 16:58:30 +00:00
Matt Arsenault 4ab28b64b4 AMDGPU: Skip debug_instr when collapsing end_cf
Based on how these are inserted, I doubt this was causing a problem in
practice.

llvm-svn: 357090
2019-03-27 16:58:27 +00:00
Matt Arsenault a42b7247d3 AMDGPU: Fix missing scc implicit def on s_andn2_b64_term
Introduce new helper class to copy properties directly from the base
instruction.

llvm-svn: 357089
2019-03-27 16:58:22 +00:00
Matt Arsenault 28f97f1dbc AMDGPU: Don't hardcode num defs for MUBUF instructions
This shouldn't change anything since the no-ret atomics are selected
later.

llvm-svn: 357084
2019-03-27 16:12:29 +00:00
Matt Arsenault e9ad7e9a71 AMDGPU: wave_barrier is not isBarrier
This is not a control flow instruction, so should not be marked as
isBarrier. This fixes a verifier error if followed by unreachable.

llvm-svn: 357081
2019-03-27 15:54:45 +00:00
Matt Arsenault bbc59d8d0d AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics
The offset operand index is different for atomics.

llvm-svn: 357073
2019-03-27 15:41:00 +00:00
Dmitry Preobrazhensky 40f0162a9a Revert of 357063 [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes
Reason: the change was mistakenly committed before review
llvm-svn: 357066
2019-03-27 13:49:52 +00:00
Dmitry Preobrazhensky bcc4d53835 [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes
See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59305

llvm-svn: 357063
2019-03-27 13:07:41 +00:00
Matt Arsenault 8bbc159786 Revert "AMDGPU: Scavenge register instead of findUnusedReg"
This reverts r356149.

This is crashing on rocBLAS.

llvm-svn: 356958
2019-03-25 21:41:40 +00:00
Matt Arsenault 77bf2e3704 AMDGPU: Remove unnecessary check for isFullCopy
Subregister indexes are not used for physical register operands, so
isFullCopy is implied by the physical register check.

llvm-svn: 356956
2019-03-25 21:28:53 +00:00
Matt Arsenault bc978872de AMDGPU: Set hasSideEffects 0 on _term instructions
These were defaulting to true, but they are just wrappers around bit
operations. This avoids regressions in the exec mask optimization
passes in a future commit.

llvm-svn: 356952
2019-03-25 21:10:12 +00:00
Konstantin Zhuravlyov 51809cbc98 AMDGPU: Add support for cross address space synchronization scopes
Differential Revision: https://reviews.llvm.org/D59517

llvm-svn: 356946
2019-03-25 20:50:21 +00:00
Matt Arsenault fa28455116 AMDGPU: Preserve LiveIntervals in WQM
This seems to already be done, but wasn't marked.

llvm-svn: 356922
2019-03-25 16:47:42 +00:00
Alina Sbirlea bfc779e491 [AliasAnalysis] Second prototype to cache BasicAA / anyAA state.
Summary:
Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it.

AA changes:
- This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode".
- All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged.
- AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added.

MemorySSA changes:
- All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults).
- At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults.
- All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built.

- The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA.
- All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis.

Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59315

llvm-svn: 356783
2019-03-22 17:22:19 +00:00
Tim Renouf 6f0191a55a [AMDGPU] Use three- and five-dword result type in image ops
Some image ops return three or five dwords.  Previously, we modeled that
with a 4 or 8 dword register class.  The register allocator could
cleverly spot that some subregs were dead and allocate something else
there, but that caused the de-optimization that waitcnt insertion would
think that the result was used immediately.

This commit allows such an image op to have a result with a three or
five dword result, avoiding the above de-optimization.

Differential Revision: https://reviews.llvm.org/D58905

Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b
llvm-svn: 356757
2019-03-22 15:21:11 +00:00
Tim Renouf 677387d8dc [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsics
Now we have vec3 MVTs, this commit implements dwordx3 variants of the
buffer intrinsics.

On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4
instruction, and a dwordx3 buffer store intrinsic is not supported.
We need to support the dwordx3 load intrinsic because it is generated by
subtarget-unaware code in InstCombine.

Differential Revision: https://reviews.llvm.org/D58904

Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e
llvm-svn: 356755
2019-03-22 14:58:02 +00:00
Tim Renouf 033f99a2e5 [AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start
using them for image ops that return 5 dwords.

Differential Revision: https://reviews.llvm.org/D58903

Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271
llvm-svn: 356735
2019-03-22 10:11:21 +00:00
Tim Renouf 361b5b2193 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
2019-03-21 12:01:21 +00:00
Tim Renouf 2327c231d6 [AMDGPU] Do not generate spurious PAL metadata
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.

Differential Revision: https://reviews.llvm.org/D59613

Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
llvm-svn: 356621
2019-03-20 22:02:09 +00:00
Michael Liao bbcb95a64e [AMDGPU] Fix dependency on `BinaryFormat`
Summary: - The linking is broken when this library is built as shared one.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59610

llvm-svn: 356617
2019-03-20 21:22:27 +00:00
Matt Arsenault 2065206a9d AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.

llvm-svn: 356611
2019-03-20 20:41:34 +00:00
Michael Liao eea5177d30 [AMDGPU] Fix clamp bit DAG operand
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
  bit, which is expected as an immediate operand. Under certain
  conditions, such as a common `i1 false` constant is used in other
  place and selected before the instruction with clamp bit, register
  operand may be added instead of immediate one. Use `targetcosntant` to
  enforce that.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59608

llvm-svn: 356608
2019-03-20 20:18:56 +00:00
Konstantin Zhuravlyov 88268e3e36 AMDHSA: Fix COMPUTE_PGM_RSRC2.USER_SGPR calculation when parsing ISA assembly
It must match https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state

Differential Revision: https://reviews.llvm.org/D59570

llvm-svn: 356603
2019-03-20 19:44:47 +00:00
Tim Renouf e7bd52f86e [AMDGPU] Added MsgPack format PAL metadata
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.

The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.

Differential Revision: https://reviews.llvm.org/D57028

Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
2019-03-20 18:47:21 +00:00
Tim Renouf d737b551e9 [AMDGPU] Factored PAL metadata handling out into its own class
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
  .note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
  record.

Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821

Differential Revision: https://reviews.llvm.org/D57027

Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
llvm-svn: 356582
2019-03-20 17:42:00 +00:00
Dmitry Preobrazhensky 04bd1185ad [AMDGPU][MC] Corrected checks for DS offset0 range
See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59313

llvm-svn: 356576
2019-03-20 17:13:58 +00:00
Dmitry Preobrazhensky 137976fae2 [AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, private_base, private_limit, pops_exiting_wave_id
See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D59290

llvm-svn: 356561
2019-03-20 15:40:52 +00:00
David Stuttard fc2a747345 [AMDGPU] Allow MIMG with no uses in adjustWritemask in isel
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.

The instruction will be removed anyway.

Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58964

llvm-svn: 356540
2019-03-20 09:29:55 +00:00
Matt Arsenault cf55a657f0 CodeGen: Refactor regallocator command line and target selection
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.

llvm-svn: 356506
2019-03-19 19:33:12 +00:00
Ryan Taylor 00e063ab92 [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics
Summary:
Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer

Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59265

llvm-svn: 356465
2019-03-19 16:07:00 +00:00
Neil Henning e85f6bd64f [AMDGPU] Ban i8 min3 promotion.
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.

The new min3_i8.ll test case would previously spew the error:

PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68

Before the simple fix to our isel lowering to not do it for i8 MVT's.

Differential Revision: https://reviews.llvm.org/D59543

llvm-svn: 356464
2019-03-19 15:50:24 +00:00
Michael Liao efb4f9e568 [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59501

llvm-svn: 356405
2019-03-18 20:40:09 +00:00
Tim Renouf cfdfba996b [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.

This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.

Differential Revision: https://reviews.llvm.org/D59267

Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
2019-03-18 19:35:44 +00:00
Tim Renouf 2e94f6e584 [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.

This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.

To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.

Differential Revision: https://reviews.llvm.org/D59191

Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
2019-03-18 19:25:39 +00:00
Tim Renouf 8723a56551 [MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive checks bot
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.

Differential Revision: https://reviews.llvm.org/D59396

Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
llvm-svn: 356394
2019-03-18 19:00:46 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Neil Henning 523dab0788 [AMDGPU] Add an experimental buffer fat pointer address space.
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D58957

llvm-svn: 356373
2019-03-18 14:44:28 +00:00
Matt Arsenault e0c1f9e76d AMDGPU: Partially fix default device for HSA
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.

Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).

Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.

llvm-svn: 356347
2019-03-17 21:31:35 +00:00
Tim Renouf e30aa6a136 [AMDGPU] Prepare for introduction of v3 and v5 MVTs
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:

* Fixed assumptions of power-of-2 vector type in kernel arg handling,
  and added v5 kernel arg tests and v3/v5 shader arg tests.

* Added v5 tests for cost analysis.

* Added vec3/vec5 arg test cases.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58928

Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
2019-03-17 21:04:16 +00:00
Changpeng Fang 989ec59c9f AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.

When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.

This patch fixed the issue.

Reviewers:
  nhaehnle, arsenm

Differential Revision:
  https://reviews.llvm.org/D59312

llvm-svn: 356298
2019-03-15 21:02:48 +00:00
Michael Liao 6883d7e192 [AMDGPU] Fix SGPR fixing through SCC chaining
Summary:
- During the fixing of SGPR copying from VGPR, ensure users of SCC is
  properly propagated, i.e.
  * only propagate through live def of SCC,
  * skip the SCC-def inst itself, and
  * stop the propagation on the other SCC-def inst after checking its
    SCC-use first.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59362

llvm-svn: 356258
2019-03-15 12:42:21 +00:00
Matt Arsenault bc6d07ca46 MIR: Allow targets to serialize MachineFunctionInfo
This has been a very painful missing feature that has made producing
reduced testcases difficult. In particular the various registers
determined for stack access during function lowering were necessary to
avoid undefined register errors in a large percentage of
cases. Implement a subset of the important fields that need to be
preserved for AMDGPU.

Most of the changes are to support targets parsing register fields and
properly reporting errors. The biggest sort-of bug remaining is for
fields that can be initialized from the IR section will be overwritten
by a default initialized machineFunctionInfo section. Another
remaining bug is the machineFunctionInfo section is still printed even
if empty.

llvm-svn: 356215
2019-03-14 22:54:43 +00:00
Matt Arsenault 0b31b24c13 AMDGPU: Correct type for waitcnt debug flag
llvm-svn: 356206
2019-03-14 21:23:59 +00:00
Matt Arsenault 72bde9aa7e AMDGPU: Scavenge register instead of findUnusedReg
llvm-svn: 356149
2019-03-14 14:19:01 +00:00
Matt Arsenault 3a31b3f6e8 AMDGPU: Don't add unnecessary convergent attributes
These are redundant with the intrinsic declaration.

llvm-svn: 356143
2019-03-14 13:46:09 +00:00
Stanislav Mekhanoshin da644c025d [AMDGPU] Silence gcc 7 warnings
Differential Revision: https://reviews.llvm.org/D59330

llvm-svn: 356100
2019-03-13 21:15:52 +00:00
Tim Renouf ed0b9af997 [AMDGPU] Switched HSA metadata to use MsgPackDocument
Summary:
MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This
commit switches AMDGPU HSA metadata processing to use MsgPackDocument
instead of MsgPackTypes.

Differential Revision: https://reviews.llvm.org/D57024

Change-Id: I0751668013abe8c87db01db1170831a76079b3a6
llvm-svn: 356081
2019-03-13 18:55:50 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
David Stuttard 20ea21c6ed [AMDGPU] Add support for immediate operand for S_ENDPGM
Summary:
Add support for immediate operand in S_ENDPGM

Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6

Reviewers: alexshap

Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59213

llvm-svn: 355902
2019-03-12 09:52:58 +00:00
Stanislav Mekhanoshin e98944ed47 Use bitset for assembler predicates
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.

This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.

Differential Revision: https://reviews.llvm.org/D59002

llvm-svn: 355839
2019-03-11 17:04:35 +00:00
Stanislav Mekhanoshin 266f1574ce [AMDGPU] Mark enum types in SIDefines.h as unsigned
MSVC issues some warnings about signed/unsigned comparison.

Differential Revision: https://reviews.llvm.org/D59171

llvm-svn: 355836
2019-03-11 16:49:32 +00:00
Matt Arsenault e8c03a2511 AMDGPU: Move d16 load matching to preprocess step
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.

I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.

llvm-svn: 355731
2019-03-08 20:58:11 +00:00
Matt Arsenault 26e76ef0e2 DAG: Don't try to cluster loads with tied inputs
This avoids breaking possible value dependencies when sorting loads by
offset.

AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.

Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.

llvm-svn: 355728
2019-03-08 20:46:15 +00:00
Matt Arsenault f587fd9ce1 AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtr
This is only called in contexts that are verifying the chain itself,
and the query itself is only asking about the address.

llvm-svn: 355723
2019-03-08 20:30:51 +00:00
Matt Arsenault 07f904befb AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.

llvm-svn: 355722
2019-03-08 20:30:50 +00:00
Carl Ritson 1a98dc1840 [AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructions
Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions.

Reviewers: arsenm, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59091

llvm-svn: 355671
2019-03-08 09:03:11 +00:00
Konstantin Zhuravlyov 47f0bf8f1f AMDHSA: Code object v3 updates
- Copy kernel symbol attributes into kernel descriptor attributes
  - Make sure kernel symbol's visibility is not "higher" than protected

Differential Revision: https://reviews.llvm.org/D59057

llvm-svn: 355630
2019-03-07 19:58:29 +00:00
Aakanksha Patil c56d2afc63 AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.

Differential Revision: http://reviews.llvm.org/D58993

llvm-svn: 355574
2019-03-07 00:54:04 +00:00
Ryan Taylor 67f36903ae [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled

Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58918

llvm-svn: 355520
2019-03-06 17:02:06 +00:00
Matt Arsenault 870397739e AMDGPU: Preserve undef flag when expanding SI_IF
Fixes undefined value verifier error.

llvm-svn: 355426
2019-03-05 18:38:00 +00:00
Carl Ritson 9e3f7d8ad0 [AMDGPU] Fix DPP operand order in atomic optimizer
Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30

Reviewers: sheredom, tpr

Reviewed By: sheredom

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58900

llvm-svn: 355394
2019-03-05 12:21:44 +00:00
David Stuttard 81eec58a0d [AMDGPU] Omit KILL instructions from hazard recognizer
Summary:
In some cases the KILL was causing a hazard to be introduced as these were
scheduled into hazard slots, but don't result in an instruction.

KILL shouldn't be considered for hazard recognition.

Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58898

llvm-svn: 355384
2019-03-05 10:25:16 +00:00
Scott Linder efec1396ac [AMDGPU] Implement AMDGPUMCInstrAnalysis
Implement MCInstrAnalysis for AMDGPU, with default implementations save
for `evaluateBranch`.

Differential Revision: https://reviews.llvm.org/D58400

llvm-svn: 355373
2019-03-05 03:02:00 +00:00
Dmitry Preobrazhensky 6023d5990d [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32
See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58713

llvm-svn: 355312
2019-03-04 12:48:32 +00:00
Stanislav Mekhanoshin bb98841399 [AMDGPU] Mark ds instructions as meybeAtomic
These were not recognized as potential atomics by memory legalizer.
The test was working not because legalizer did a right thing, but
because it has skipped all these instructions. When I have fixed
DS desciption test started to fail because region address has
changed from 4 to 2 a while ago.

Differential Revision: https://reviews.llvm.org/D58802

llvm-svn: 355179
2019-03-01 07:59:17 +00:00
Tom Stellard 33634d1b25 AMDGPU/GlobalISel: Implement select for G_INSERT
Re-commit r344310.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 355159
2019-03-01 00:50:26 +00:00
Tom Stellard 41f32196a0 AMDGPU/GlobalISel: Implement select for G_EXTRACT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49714

llvm-svn: 355156
2019-02-28 23:37:48 +00:00
Matt Arsenault 09a09ef8b7 AMDGPU: Fix typo
llvm-svn: 355056
2019-02-28 00:52:33 +00:00
Matt Arsenault 5d567dc137 AMDGPU: Enable function calls by default
Fixes some crashes on illegal call situations which are unfortunately
still valid IR.

llvm-svn: 355051
2019-02-28 00:40:32 +00:00
Matt Arsenault aa03bcd23c AMDGPU: Fix crashes in invalid call cases
We have to at least tolerate calls to kernels, possibly with a
mismatched calling convention on the callsite.

llvm-svn: 355049
2019-02-28 00:28:44 +00:00
Matt Arsenault d3093c2f1f GlobalISel: Implement fewerElementsVector for phi
llvm-svn: 355048
2019-02-28 00:16:32 +00:00
Matt Arsenault 72bcf15dbf GlobalISel: Implement moreElementsVector for phi
llvm-svn: 355047
2019-02-28 00:01:05 +00:00
Dmitry Preobrazhensky 7904231edb [AMDGPU][MC] Added register size check for VOP3/SDWA/DPP operands
See bug 37943: https://bugs.llvm.org/show_bug.cgi?id=37943

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58287

llvm-svn: 354974
2019-02-27 13:58:48 +00:00