Commit Graph

3740 Commits

Author SHA1 Message Date
Matt Arsenault bcf5184a68 AMDGPU/GlobalISel: Make sure <2 x s1> phis are scalarized 2020-07-26 10:04:47 -04:00
Matt Arsenault 6f961a1e7e AMDGPU/GlobalISel: Legalize GDS atomics
I noticed these don't use the _gfx9, non-m0 reading variants but not
sure if that's a bug or not. It's the same in the DAG.
2020-07-26 10:03:34 -04:00
Matt Arsenault 5819159995 AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting 2020-07-26 09:55:34 -04:00
Matt Arsenault 61ced4b87a GlobalISel: Handle 'n' inline asm constraint 2020-07-26 09:30:41 -04:00
Matt Arsenault 4033aa1467 AMDGPU/GlobalISel: Sign extend integer constants
This matches the DAG behavior and fixes immediate folding
2020-07-26 09:30:14 -04:00
Matt Arsenault 4f6502ab33 AMDGPU/GlobalISel: Replace selection tests for G_CONSTANT/G_FCONSTANT
Split into separate tests and make more consistent with the others.
2020-07-26 09:30:09 -04:00
Changpeng Fang 9162b70e51 DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit
Summary:
  In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.

Reviewers:
  @spatel, @arsenm

Differential Revision:
  https://reviews.llvm.org/D84204
2020-07-25 21:20:59 -07:00
Matt Arsenault 392b969c32 AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits
Just fallback for now. Really tablegen needs to generate all of the
subregister index handling we need.
2020-07-25 10:05:44 -04:00
Matt Arsenault 2bd72abef0 AMDGPU: Skip other terminators before inserting s_cbranch_exec[n]z
PHIElimination/createPHISourceCopy inserts non-branch terminators
after the control flow pseudo if a successor phi reads a register
defined by the control flow pseudo. If this happens, we need to split
the expansion of the control flow pseudo to ensure all the branches
are after all of the other mask management instructions.

GlobalISel hit this in testscases that happened to be tail
duplicated. The original testcase still does not work, since the same
problem appears to be present in a later pass.
2020-07-24 16:51:59 -04:00
Dmitry Preobrazhensky 6b8948922c [AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier
Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3.

    op     dst, addr, rsrc, FORMAT, soffset

This change adds support for SP3 syntax:

    op     dst, addr, rsrc, soffset SP3FORMAT

In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.

format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>,<numeric-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]

The last syntax variant is supported for GFX10 only.

See llvm bug 37738

Reviewers: arsenm, rampitec, vpykhtin

Differential Revision: https://reviews.llvm.org/D84026
2020-07-24 16:41:03 +03:00
Petar Avramovic 47bd41d099 AMDGPU/GlobalISel: Select set.inactive intrinsic
Differential Revision: https://reviews.llvm.org/D84407
2020-07-24 10:14:14 +02:00
Matt Arsenault 891759db73 GlobalISel: Add scalarSameSizeAs LegalizeRule
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
2020-07-23 21:17:31 -04:00
Matt Arsenault b9c644ec61 AMDGPU: Fix failures from overflowing uint8_t number of operands
If the operand index exceeded the limit of unsigned char, it wrapped
and would point to the wrong operand. Increase the size of the operand
index field to avoid this, and also don't bother trying to fold into
implicit operands.
2020-07-23 15:39:33 -04:00
Nikita Popov deb4bb2b3a [IR] Add min/max/abs intrinsics
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.

Differential Revision: https://reviews.llvm.org/D84125
2020-07-23 20:56:19 +02:00
Matt Arsenault b2ee1cd2d9 AMDGPU/GlobalISel: Add some tests for stack passed pointers 2020-07-23 14:38:31 -04:00
Matt Arsenault d2b8fcff34 AMDGPU/GlobalISel: Handle call return values
The only case that I know doesn't work is the implicit sret case when
the return type doesn't fit in the return registers.
2020-07-23 14:29:35 -04:00
Jay Foad b35833b84e [GlobalISel][AMDGPU] Legalize saturating add/subtract
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.

Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
2020-07-23 09:06:42 -04:00
Konstantin Schwarz 931488779f [GlobalISel][InlineAsm] Add register class ID to the flags of register input operands
Summary: We do this already for output operands, but missed it for (non-tied) input operands.

Reviewers: arsenm, Petar.Avramovic

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83763
2020-07-23 13:35:01 +02:00
Matt Arsenault 1fd1beea18 AMDGPU/GlobalISel: Fix translation of indirect calls 2020-07-22 13:13:21 -04:00
Dmitry Preobrazhensky 0b8fd77ad9 [AMDGPU][MC] Corrected decoding of 16-bit literals
16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled.

For example, the following code

0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff

was decoded as

v_mul_lo_u16_e32 v2, 0xffffabcd, v2

However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe.

This change corrects decoding to make reassembly possible.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D84098
2020-07-22 17:20:43 +03:00
Petar Avramovic 44967fc604 AMDGPU: Simplify f16 to i64 custom lowering
Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.

Differential Revision: https://reviews.llvm.org/D84166
2020-07-22 10:32:14 +02:00
Matt Arsenault 7a669130f7 AMDGPU/GlobalISel: Add some baseline degenerate call argument tests 2020-07-21 18:48:40 -04:00
Matt Arsenault b258920095 AMDGPU/GlobalISel: Fix not erasing inst when lowering G_FRINT 2020-07-21 18:29:41 -04:00
Matt Arsenault 7cd8a0256d GlobalISel: Legalize G_FPOWI 2020-07-21 18:13:04 -04:00
Matt Arsenault 1168119c2f AMDGPU: Start interpreting byref on kernel arguments
These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.

This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.
2020-07-21 18:11:22 -04:00
Matt Arsenault 2fe0ea8261 DAG: Handle expanding strict_fsub into fneg and strict_fadd
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.

The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
2020-07-21 16:17:10 -04:00
Matt Arsenault 84704d989b AMDGPU: Fix not accounting for constantexpr uses of LDS globals
This was failing to add the size of LDS globals that weren't directly
used by an instruction. They could be used by constant expressions
which are transitively used by the function. This requires a better
search, but just abort on this for now for correctness.
2020-07-20 11:41:41 -04:00
Matt Arsenault 61f1f2a204 AMDGPU/GlobalISel: Initial Implementation of calls
Return values, and tail calls are not yet handled.
2020-07-20 11:13:22 -04:00
Petar Avramovic 6a1030aa0e AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT
Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030
2020-07-20 16:12:19 +02:00
Matt Arsenault 5cbd4e415e GlobalISel: Don't handle widenScalar for vector G_INSERT
This handling didn't make any sense for vectors.
2020-07-20 10:06:18 -04:00
Matt Arsenault 93311a9812 AMDGPU/GlobalISel: Fix custom lowering of llvm.trunc.f64 for SI
This was missing an operand from BFE and not erasing the original
instruction.
2020-07-20 10:06:18 -04:00
Elvina Yakubova b36a3e6140 [llvm-readobj] Update tests because of changes in llvm-readobj behavior
This patch updates tests using llvm-readobj and llvm-readelf, because
soon reading from stdin will be achievable only via a '-' as described
here: https://bugs.llvm.org/show_bug.cgi?id=46400. Patch with changes to
llvm-readobj behavior is here: https://reviews.llvm.org/D83704

Differential Revision: https://reviews.llvm.org/D83912

Reviewed by: jhenderson, MaskRay, grimar
2020-07-20 10:39:04 +01:00
Petar Avramovic ba938f6388 AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.

Differential Revision: https://reviews.llvm.org/D84010
2020-07-20 11:06:11 +02:00
Matt Arsenault c73df56966 AMDGPU/GlobalISel: Address some test fixmes that don't fail now 2020-07-18 10:54:39 -04:00
Matt Arsenault 918f3fc2c7 AMDGPU/GlobalISel: Fix test copy paste error 2020-07-18 10:09:01 -04:00
Dmitry Preobrazhensky 2e87acac9b [AMDGPU] Removed s_mov_regrd and mov_fed opcodes
These opcodes are not intended for public use.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D81659
2020-07-17 19:52:54 +03:00
Matt Arsenault 994fb86bc2 AMDGPU: Fix promoting f16 fpowi with legal f16 2020-07-17 11:29:05 -04:00
Jay Foad f05bce86af [AMDGPU] Add some missing check prefixes and tweak test
The test needed some extra ALU instructions to prevent it from being
memory bound.
2020-07-17 12:57:47 +01:00
Jay Foad 2dc3d1b313 [AMDGPU] Add some missing check prefixes 2020-07-17 12:56:29 +01:00
Jay Foad 760af7a074 [AMDGPU] Avoid splitting FLAT offsets in unsafe ways
As explained in the comment:

// For a FLAT instruction the hardware decides whether to access
// global/scratch/shared memory based on the high bits of vaddr,
// ignoring the offset field, so we have to ensure that when we add
// remainder to vaddr it still points into the same underlying object.
// The easiest way to do that is to make sure that we split the offset
// into two pieces that are both >= 0 or both <= 0.

In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have
an unsigned immediate offset field, so we can't use it to help split a
negative offset.

Differential Revision: https://reviews.llvm.org/D83394
2020-07-17 11:44:10 +01:00
Jay Foad 62fd7f767c [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
hsmahesha 4905536086 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit cc9d693856.
2020-07-17 12:20:37 +05:30
Carl Ritson 3a18665748 [AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation
When SCC is dead, but VCC is required then replace s_and / s_andn2
with s_mov into VCC when mask value is 0 or -1.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D83850
2020-07-17 11:48:57 +09:00
Matt Arsenault 6c5b635e95 AMDGPU: Add a few more missing test for AGPR tuple copying 2020-07-16 15:53:11 -04:00
Matt Arsenault 10382285ac AMDGPU: Add missing tests for copyPhysReg AGPR tuples 2020-07-16 15:27:57 -04:00
Matt Arsenault 79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Petar Avramovic 6850033ca6 AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.

Differential Revision: https://reviews.llvm.org/D83880
2020-07-16 16:31:57 +02:00
Petar Avramovic 5658002b80 AMDGPU/GlobalISel: Select G_FREEZE
Select G_FREEZE in the same way that COPY is selected.

Differential Revision: https://reviews.llvm.org/D83031
2020-07-16 11:10:48 +02:00
Carl Ritson 5bf2a9dd40 [AMDGPU] Update VMEM scalar write hazard mitigation sequence
Using s_waitcnt_depctr 0xffe3 is potentially faster than v_nop.

Reviewed By: rampitec, foad

Differential Revision: https://reviews.llvm.org/D83872
2020-07-16 11:37:45 +09:00
dfukalov 7520393842 [NFC] Fixed typo in tests parameters
Summary:
llc reports `fp32-denormals` is not recognized. I guess it was intended to be
`-denormal-fp-math-f32={preserve-sign|ieee} -mattr=+mad-mac-f32-insts`

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: jvesely, nhaehnle, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83883
2020-07-15 22:09:01 +03:00