Commit Graph

3651 Commits

Author SHA1 Message Date
Matt Arsenault 2ca552322c AMDGPU/GlobalISel: Fix 8-byte aligned, 96-bit scalar loads
These are legal since we can do a 96-bit load on some subtargets, but
this is only for vector loads. If we can't widen the load, it needs to
be broken down once known scalar. For 16-byte alignment, widen to a
128-bit load.
2020-06-15 11:33:16 -04:00
Matt Arsenault dae9554b2b AMDGPU/GlobalISel: Workaround some load/store type selection patterns
The logic is written for what loads/stores should be selectable. There
are a set of cases that should be selectable, but due to missing MVTs
and/or selection patterns, will fail to select. I think eventually
load/store select patterns should ignore the type and only look at the
value size, but until that happens, bitcast these to equivalent i32
vectors.
2020-06-15 07:42:20 -04:00
Matt Arsenault 96229606f9 AMDGPU/GlobalISel: Use less artifical example to avoid abort=0
These were failing due to an unlegalizable G_CONCAT_VECTORS due to
registers with types that are naturally illegal.
2020-06-15 07:37:15 -04:00
Matt Arsenault 33e9086501 GlobalISel: Support lowering vector->vector G_BITCAST
Extract subvectors and cast to the result element type before
remerging.
2020-06-15 07:36:30 -04:00
Matt Arsenault df0c4bfc95 AMDGPU: Add some baseline immediate encoding test changes
Add some encoding checks and add a few new cases.
2020-06-14 13:29:35 -04:00
Matt Arsenault 804397dde6 AMDGPU: Do not bundle inline asm
Fixes bug 46285
2020-06-14 13:24:50 -04:00
Matt Arsenault fb51d508ee AMDGPU/GlobalISel: Select general case for G_PTRMASK 2020-06-14 13:12:29 -04:00
Matt Arsenault 46579471fd AMDGPU: Fix spill/restore of 192-bit registers
I tried to use an IR inline asm test, but that doesn't work since the
inline asm handling asserts without an MVT to use.
2020-06-14 13:12:01 -04:00
Qiu Chaofan f8ef7c99a0 [DAGCombiner] Require ninf for division estimation
Current implementation of division estimation isn't correct for some
cases like 1.0/0.0 (result is nan, not expected inf).

And this change exposes a potential infinite loop: we use
isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the
divisor is some constant. But it doesn't work after legalized on some
platforms. This patch restricts the method to act before LegalDAG.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D80542
2020-06-14 22:58:22 +08:00
Michael Liao ec02635d10 [amdgpu] Skip OR combining on 64-bit integer before legalizing ops.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81710
2020-06-12 15:22:38 -04:00
Matt Arsenault 350ee7fb3f GlobalISel: Fix not erasing old instruction in sitofp/uitofp lowering 2020-06-12 10:33:23 -04:00
Sebastian Neubauer 29a6ad94fd [AMDGPU] Add G16 support to image instructions
Add G16 feature for GFX10 and support A16 and G16 in GlobalISel.

Differential Revision: https://reviews.llvm.org/D76836
2020-06-12 11:26:31 +02:00
Matt Arsenault 7d913becfc AMDGPU/GlobalISel: Fix select of private <2 x s16> load 2020-06-11 19:25:25 -04:00
Matt Arsenault 27f8bd94cb AMDGPU/GlobalISel: Fix select of <8 x s64> scalar load 2020-06-11 19:09:43 -04:00
Matt Arsenault 2247072b65 AMDGPU/GlobalISel: Set insert point when emitting control flow pseudos
This was implicitly assuming the branch instruction was the next after
the pseudo. It's possible for another non-terminator instruction to be
inserted between the intrinsic and the branch, so adjust the insertion
point. Fixes a non-terminator after terminator verifier error (which
without the verifier, manifested itself as an infinite loop in
analyzeBranch much later on).
2020-06-11 18:53:26 -04:00
Petar Avramovic bd3d951b8b AMDGPU/GlobalISel: Fix lower for f64->f16 G_FPTRUNC
Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16
in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16.

Differential Revision: https://reviews.llvm.org/D81666
2020-06-11 18:19:27 +02:00
Matt Arsenault 19b3b886b7 AMDGPU/GlobalISel: Fix porting error in 32-bit division
The baffling thing is this passed the OpenCL conformance test for
32-bit integer divisions, but only failed in the 32-bit path of
BypassSlowDivisions for the 64-bit tests.
2020-06-10 21:48:58 -04:00
Stanislav Mekhanoshin 09d325b20c AMDGPU/GlobalISel: cmp/select method for insert element
Differential Revision: https://reviews.llvm.org/D80754
2020-06-10 13:12:54 -07:00
Stanislav Mekhanoshin 6e1eee6034 [AMDGPU] Fixed promote alloca with ptr/int casts
There is an invalid cast produced when a pointee is a pointer
and the alloca type is cast to a pointer to int.

Differential Revision: https://reviews.llvm.org/D81606
2020-06-10 11:46:57 -07:00
Matt Arsenault ea1bd95411 AMDGPU/GlobalISel: Make G_IMPLICIT_DEF legality more consistent
Makes <6 x s16> legal, <4 x s8> illegal, and clamps the maximum size
to 1024.
2020-06-10 11:05:59 -04:00
Matt Arsenault 44b355f34b AMDGPU/GlobalISel: Add new baseline tests for bitcast legalization 2020-06-09 15:46:53 -04:00
hsmahesha 7410571ce9 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit 40a632a335.
2020-06-09 19:27:17 +05:30
hsmahesha 40a632a335 [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic
is purely based on different experimentation conducted, and there is no analytical
logic here.

Reviewers: foad, rampitec, arsenm, vpykhtin

Reviewed By: foad, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81085
2020-06-09 14:09:14 +05:30
Sameer Sahasrabuddhe d8f651d3e8 [AMDGPU] Enable structurizer workarounds by default
Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D81211
2020-06-09 13:14:15 +05:30
David Sherwood cc8872400c [CodeGen] Ensure callers of CreateStackTemporary use sensible alignments
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.

In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.

I added a test to

  CodeGen/AArch64/build-one-lane.ll

that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.

Differential Revision: https://reviews.llvm.org/D80370
2020-06-09 08:10:17 +01:00
Stanislav Mekhanoshin 295d1fe733 [AMDGPU] Custom lowering of i64 umulo/smulo
Differential Revision: https://reviews.llvm.org/D81430
2020-06-08 23:14:19 -07:00
dfukalov 6a31a9a543 [AMDGPU][NFC] Skip processing intrinsics that do not become real instructions
Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81260
2020-06-09 03:45:33 +03:00
Matt Arsenault 67b700480b AMDGPU/GlobalISel: Precommit regenerated check lines
The update_*test_checks scripts miss new stuff added at the end of
lines. Regenerate checks so the new mode register operands don't show
up in the diff of a future patch.
2020-06-08 12:47:45 -04:00
Jay Foad 275ecaae16 [AMDGPU] Cluster MIMG instructions
Differential Revision: https://reviews.llvm.org/D74035
2020-06-08 14:01:53 +01:00
Matt Arsenault 38fb446fc7 AMDGPU/GlobalISel: Fix test failure in release build
The annoying behavior where the output is different due to the
legality check struck again, plus the subtarget predicate wasn't
really correctly set for DS FP atomics.

Some of the FP min/max instructions seem to be in the gfx6/gfx7
manuals, but IIRC this might have been one of the cases where the
manual got ahead of the actual hardware support, but I've left these
as-is for now since the assembler tests seem to expect them.
2020-06-06 11:01:18 -04:00
Matt Arsenault bc20bdb9f9 AMDGPU/GlobalISel: Start rewriting load/store legality rules
The current set is an incomprehensible mess riddled with ordering
hacks for various limitations in the legalizer at the time of writing,
many of which have been fixed. This takes a very small step in
correcting this.

The core first change is to start checking for fully legal cases
first, rather than trying to figure out all of the actions that could
need to be performed. It's recommended to check the legal cases first
for faster legality checks in the common case. This still has a table
listing some common cases, but it needs measuring whether this really
helps or not.

More significantly, stop trying to allow any arbitrary type with a
legal bitwidth as a legal memory type, and start using the bitcast
legalize action for them. Allowing loads of these weird vector types
produced new burdens we don't need for handling all of the
legalization artifacts. Unlike the SelectionDAG handling, this is
still not casting 64 or 16-bit element vectors to 32-bit
vectors. These cases should still be handled by increasing/decreasing
the number of 16-bit elements. This is primarily to fix 8-bit element
vectors.

Another change is to stop trying to handle the load-widening based on
a higher alignment. We should still do this, but the way it was
handled wasn't really correct. We really need to modify the MMO's size
at the same time, and not just increase the result type. The
LegalizerHelper does not do this, and I think this would really
require a separate WidenMemory action (or to add a memory action
payload to the LegalizeMutation). These will now fail to legalize.

The structure of the legalizer rules makes writing concise rules here
difficult. It would be easier if the same function could answer the
query the query, and report the action to perform at the same
time. Instead these two are split into distinct predicate and action
functions. This is mostly tolerable for other cases, but the
load/store rules get pretty complicated so it's difficult to keep two
versions of these functions in sync.
2020-06-06 09:59:46 -04:00
Stanislav Mekhanoshin 5d62606f90 AMDGPU/GlobalISel: cmp/select method for extract element
Differential Revision: https://reviews.llvm.org/D80749
2020-06-05 12:57:40 -07:00
Matt Arsenault 45e1a22a92 GlobalISel: Make known bits/alignment API more consistent
Just computing the alignment makes sense without caring about the
general known bits, such as for non-integral pointers. Separate the
two and start calling into the TargetLowering hooks for frame indexes.

Start calling the TargetLowering implementation for FrameIndexes,
which improves the AMDGPU matching for stack addressing modes. Also
introduce a new hook for returning known alignment of target
instructions. For AMDGPU, it would be useful to report the known
alignment implied by certain intrinsic calls.

Also stop using MaybeAlign.
2020-06-05 14:57:22 -04:00
Matt Arsenault 43bb1c239c AMDGPU: Fix incorrect selection of buffer atomic fadd
There were additional standalone patterns for these nodes which were
missing the subtarget predicate.
2020-06-05 14:34:15 -04:00
Matt Arsenault 5ee2a1e476 AMDGPU/GlobalISel: Fix some broken YAML in MIR test
Different tools seem to be more or less strict with the markers
between different functions.
2020-06-05 13:59:46 -04:00
Matt Arsenault 3b5d4aa258 GlobalISel: Infer nofpexcept flag during selection for non-strict ops
Match SelectionDAG's behavior of adding nofpexcept to out instructions
that may raise fp exceptions that are selected from instructions that
do not.
2020-06-05 13:59:46 -04:00
Matt Arsenault 1657f0ebc2 AMDGPU: Fix overriding global FP atomic feature predicates
Global TableGen let override blocks are pretty dangerous and override
any local special cases. In this case, the broader HasFlatGlobalInsts
was overriding the more specific predicate for
FeatureAtomicFaddInsts. Make sure HasFlatGlobalInsts is implied by
FeatureAtomicFaddInsts, and make sure the right predicate is used.

One issue with independently setting the subtarget features on
incompatible targets is all of the encoding families do not define all
opcodes. This will hit an assert on gfx10 for example, since we set
the encoding independently based on the generation and not based on a
feature.
2020-06-04 17:50:38 -04:00
Matt Arsenault 651c36b508 AMDGPU: Select strict_fmul 2020-06-04 17:49:00 -04:00
Matt Arsenault 483d4daa5e AMDGPU: Select strict_fma
Like with strict_fadd, the legalization is scalarizing the v4f16 when
it should split.
2020-06-04 17:49:00 -04:00
Matt Arsenault ae26c064ce AMDGPU: Select strict_fadd 2020-06-04 17:49:00 -04:00
Matt Arsenault b71f574e7f AMDGPU: Add test for fdiv nofpexcept preservation
This logically belongs with 89d48ccabe,
but this order was needed to avoid regressions before adding
mayRaiseFPExceptions to relevant instructions.
2020-06-04 17:35:27 -04:00
Matt Arsenault 54a8a8d509 AMDGPU: Fix using unencodable instructions in tests
There are a number of MIR tests using instructions on subtargets where
they don't really exist. These are some of the easy cases that don't
require splitting up test functions.
2020-06-04 16:50:19 -04:00
Matt Arsenault fe0d5121fa AMDGPU/GlobalISel: Fix making LDS FP atomics legal on SI/CI 2020-06-04 16:50:19 -04:00
Matt Arsenault 16acc12e1d AMDGPU/GlobalISel: Fix trying to use wave32 for gfx9 test 2020-06-04 16:50:19 -04:00
Jay Foad 590964c835 [AMDGPU] More accurate gfx10 latencies
Differential Revision: https://reviews.llvm.org/D81012
2020-06-04 10:29:32 +01:00
Matt Arsenault ed5017e153 GlobalISel: Start defining strict FP instructions
The AMDGPU lowering for unconstrained G_FDIV sometimes needs to
introduce a mode switch in the middle, so it's helpful to have
constrained instructions available to legalize this. Right now nothing
is preventing reordering of the mode switch with the other
instructions in the expansion.
2020-06-03 20:46:37 -04:00
Matt Arsenault a1a93ca48a AMDGPU/GlobalISel: Handle uniform G_DYN_STACKALLOC 2020-06-03 19:56:07 -04:00
Matt Arsenault 66251f7e1d RegAllocFast: Record internal state based on register units
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.

Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
2020-06-03 16:51:46 -04:00
Simon Pilgrim ea80b40669 [DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits.
If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly.

This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI.

Differential Revision: https://reviews.llvm.org/D80869
2020-06-03 16:11:54 +01:00
Matt Arsenault 070362e252 AMDGPU: Switch test to generated checks
This is was a very frustrating test to update manually.
2020-06-03 10:14:54 -04:00
Carl Ritson da33c96d47 [AMDGPU] Make SGPR spills exec mask agnostic
Explicitly set the exec mask for SGPR spills and reloads.
This fixes a bug where SGPR spills to memory could be incorrect
if the exec mask was 0 (or differed between spill and reload).

Additionally pack scalar subregisters (upto 16/32 per VGPR),
so that the majority of scalar types can be spilt or reloaded
with a simple memory access.  This should amortize some of the
additional overhead of manipulating the exec mask.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D80282
2020-06-03 12:34:26 +09:00
Matt Arsenault cdd3054255 AMDGPU: Fix a test to be more stable
The chained unconditional branches can be eliminated and it's not
relevant to the test.
2020-06-02 13:47:48 -04:00
Matt Arsenault 4b1f6cdbf9 AMDGPU: Don't run indexing mode switches with exec = 0
Add mode defs rather than special casing this like some of the other
instructions.
2020-06-02 13:47:48 -04:00
Matt Arsenault 452e0d9023 AMDGPU: Don't run mode switches with exec 0
These are scalar instructions that change vector instructions, so they
should not be executed without any active lanes.

The implementation of -amdgpu-skip-threshold also seem to be backwards
from expected, since decreasing it prevents removal.
2020-06-02 13:47:48 -04:00
Matt Arsenault 85117e286d AMDGPU: Fix not using scalar loads for global reads in shaders
The pass which infers when it's legal to load a global address space
as SMRD was only considering amdgpu_kernel, and ignoring the shader
entry type calling conventions.
2020-06-02 09:49:23 -04:00
Dominik Montada 052c962ced [GlobalISel] Combine scalar unmerge(trunc)
Summary:
Combine unmerge(trunc) to enable other merge combines.
Without this combine, the scalar unmerge(trunc(merge))
pattern cannot be combined and easily lead to
hard-to-legalize merge/unmerge artifacts.

Reviewed By: arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79567
2020-06-02 08:56:18 +02:00
Matt Arsenault 20793b2aef AMDGPU: Fix test in code directory 2020-06-01 13:26:51 -04:00
Matt Arsenault 7ad36491ca AMDGPU: Fix alignment for dynamic allocas
The alignment value also needs to be scaled by the wave size.
2020-06-01 13:06:37 -04:00
Jay Foad 2768edfff1 [AMDGPU] Propagate fast-math flags when lowering FSIN and FCOS
Differential Revision: https://reviews.llvm.org/D80813
2020-05-31 05:21:55 +01:00
Jay Foad d4751f3556 [AMDGPU] Precommit tests for D80813 2020-05-31 05:21:55 +01:00
Changpeng Fang 234eba90f4 AMDGPU: Add setTruncStoreAction for vector i64 types made legal recently
Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D80853
2020-05-30 20:45:27 -07:00
Carl Ritson d04147789f [AMDGPU] Remove assertion on S1024 SGPR to VGPR spill
Summary:
Replace an assertion that blocks S1024 SGPR to VGPR spill.
The assertion pre-dates S1024 and is not wave size dependent.

Reviewers: arsenm, sameerds, rampitec

Reviewed By: arsenm

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80783
2020-05-30 11:16:19 +09:00
Matt Arsenault 0892a96a05 AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode
This is a custom inserter because it was less work than teaching
tablegen a way to indicate that it is sometimes OK to have a no side
effect instruction in the output of a side effecting pattern.

The asm is needed to look like a read of the mode register to prevent
it from being deleted. However, there seems to be a bug where the mode
register def instructions are moved across the asm sideeffect by the
post-RA scheduler.

Another oddity is the immediate is formatted differently between
s_denorm_mode and s_round_mode.
2020-05-29 21:11:36 -04:00
Matt Arsenault 4f300d4996 AMDGPU: Add new baseline tests for setreg handling
Most of these should be identical and use a common prefix, but
update_llc_test_checks is failing to generate shared checks for some
reason.
2020-05-29 21:00:30 -04:00
Matt Arsenault f012c58abd AMDGPU: Move MIMG MMO check to verifier 2020-05-29 20:58:23 -04:00
Matt Arsenault 2484109378 AMDGPU/GlobalISel: Add boilerplate for inline asm lowering
Test mostly from minor adjustments to the AArch64 one.
2020-05-29 16:49:23 -04:00
Matt Arsenault 2d2627d47a AMDGPU: Remove fp-exceptions feature
This was never used, and the only thing it changed was removed in
284472be6d. The floating point mode is
also not a property of the subtarget.
2020-05-29 15:19:59 -04:00
Ehud Katz f881c7967d [tests] Fix AMDGPU test
Fix naming issue in test due to change D80399.
2020-05-29 22:15:26 +03:00
Stanislav Mekhanoshin a520294913 [AMDGPU] Regenrated urem/udiv global isel tests. NFC. 2020-05-29 12:08:47 -07:00
Stanislav Mekhanoshin f6a6de288b GlobalISel: fix CombinerHelper::matchEqualDefs()
This matcher was always returning true for the different
results of a same instruction.

Differential Revision:
2020-05-29 09:30:02 -07:00
Jay Foad 9e0b52e2e6 [AMDGPU] Remove duplicate test cases
The two "2sin" test cases were identical to the "sin_2x" test cases just
above.
2020-05-29 16:36:36 +01:00
Stanislav Mekhanoshin 6c824c81a9 AMDGPU/GlobalISel: precommit extractelement test. NFC. 2020-05-28 11:46:06 -07:00
Matt Arsenault 97f3f0bab0 AMDGPU: Add intrinsic for s_setreg
This will be more useful with fenv access implemented.
2020-05-28 14:26:38 -04:00
alex-t b726d071b4 [AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate
Summary:
PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence.
         In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back.
         PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception.
         We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the
         source VGPR in this COPY is move immediate.

  bb.0:

     %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
     %2:sreg_32 = .....

  bb.1:
     %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1
     S_BRANCH %bb.3

  bb.3:
     %1:sreg_32 = COPY %0
     S_BRANCH %bb.2

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80434
2020-05-28 19:25:51 +03:00
Matt Arsenault 06019e3125 AMDGPU: Add missing test for s_denorm_mode scheduling
Forgot to add this file to 1a9e0d7092
2020-05-28 11:07:22 -04:00
Stanislav Mekhanoshin 7392bbc301 AMDGPU/GlobalISel: Fixed insert element for non-standard vectors
Differential Revision: https://reviews.llvm.org/D80653
2020-05-27 16:26:22 -07:00
Matt Arsenault 5e007fe998 AMDGPU: Support non-entry block static sized allocas
OpenMP emits these for some reason, so handle them. Assume these use
4096 bytes by default, with a flag to override this. Also change the
related stack assumption for calls to have a flag.
2020-05-27 18:46:10 -04:00
Stanislav Mekhanoshin 8aa81aaebe AMDGPU/GlobalISel: Fixed handling of non-standard vectors
We do not have register classes for all possible vector
sizes, so round it up for extract vector element.

Also fixes selection of G_MERGE_VALUES when vectors are
not a power of two.

This has required to refactor getRegSplitParts() in way
that it can handle not just power of two vectors.

Ideally we would like RegSplitParts to be generated by
tablegen.

Differential Revision: https://reviews.llvm.org/D80457
2020-05-27 15:44:09 -07:00
Michael Liao fa342b5c80 Enable `align <n>` to be used in the intrinsic definition.
- This allow us to specify the (minimal) alignment on an intrinsic's
  arguments and, more importantly, the return value.

Differential Revision: https://reviews.llvm.org/D80422
2020-05-27 16:38:18 -04:00
alex-t eb1092ada3 [AMDGPU] Fix for the lost CarryOut/CarryIn register operands in S_ADD/SUB_CO_PSEUDO.
Summary: This fixes the 5b898bddff bug when the carry-in and carry-out registers became lost in lowering S_ADD/SUB_CO_PSEUDO.

Reviewers: rampitec, arsenm

Reviewed By: arsenm

Subscribers: msearles, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80158
2020-05-27 22:41:04 +03:00
Matt Arsenault 4b4496312e AMDGPU: Start adding MODE register uses to instructions
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.

Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.

Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
2020-05-27 14:47:00 -04:00
Matt Arsenault 07cd19efa2 AMDGPU: Fix dropping MI flags when rewriting instructions
All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.
2020-05-27 13:27:06 -04:00
Matt Arsenault 833996cef1 AMDGPU: Fix backwards s_cselect_* operands
The vector equivalent has backwards operands, but the scalar version
does not. The passes that use these hooks aren't enabled by default,
so this doesn't really change anything.
2020-05-27 09:26:09 -04:00
Matt Arsenault ef3e831226 GlobalISel: Basic legalization for G_PTRMASK 2020-05-26 21:20:30 -04:00
Stanislav Mekhanoshin 512e806a33 [AMDGPU] Bail alloca vectorization if GEP not found
Differential Revision: https://reviews.llvm.org/D80587
2020-05-26 13:59:49 -07:00
Matt Arsenault bb10fa3a53 AMDGPU: Fix wrong null value for private address space
I'm guessing this was a holdover from when 0 was an invalid stack
pointer, but surprised nobody has discovered this before.

Also don't allow offset folding for -1 pointers, since it looks weird
to partially fold this.
2020-05-26 16:35:13 -04:00
Matt Arsenault 50d4b22ca0 AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results
I consider this to be a hack, since we probably should not mark any
16-bit extract as legal, and require all extracts to be done on
multiples of 32. There are quite a few more battles to fight in the
legalizer for sub-dword vectors, so just select this for now so we can
pass OpenCL conformance without crashing.

Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a
trivial way to select this so just fail on it.
2020-05-26 12:14:08 -04:00
Matt Arsenault 8bc03d2168 GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.

Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
2020-05-26 11:48:13 -04:00
Matt Arsenault 2dd7714b8d AMDGPU/GlobalISel: Don't select boolean phi by default
This is currently missing most of the hard parts to lower correctly,
so disable it for now. This fixes at least one OpenCL conformance test
and allows it to pass with fallback. Hide this behind an option for
now.
2020-05-26 11:01:21 -04:00
vpykhtin 92f3828dc5 [AMDGPU] Fix wait counts in the presence of 16bit subregisters
Differential Revision: https://reviews.llvm.org/D80033
2020-05-26 12:19:27 +03:00
Dmitry Preobrazhensky 77aec3b4c0 [AMDGPU][MC][GFX8+] Enabled clamp for v_add_u16, v_sub_u16 and v_subrev_u16
See https://bugs.llvm.org/show_bug.cgi?id=45926

Reviewers: arsenm, rampitec, vpykhtin

Differential Revision: https://reviews.llvm.org/D80430
2020-05-25 19:55:38 +03:00
Dmitry Preobrazhensky b087b91c91 [AMDGPU][CODEGEN] Added 'A' constraint for inline assembler
Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D78494
2020-05-25 14:23:34 +03:00
Sanjay Patel 57bb4787d7 [Pass Manager] remove EarlyCSE as clean-up for VectorCombine
EarlyCSE was added with D75145, but the motivating test is
not regressed by removing the extra pass now. That might be
because VectorCombine altered the way it processes instructions,
or it might be from (re)moving VectorCombine in the pipeline.

The extra round of EarlyCSE appears to cost approximately
0.26% in compile-time as discussed in D80236, so we need some
evidence to justify its inclusion here, but we do not have
that (yet).

I suspect that between SLP and VectorCombine, we are creating
patterns that InstCombine and/or codegen are not prepared for,
but we will need to reduce those examples and include them as
PhaseOrdering and/or test-suite benchmarks.
2020-05-24 12:36:21 -04:00
Stanislav Mekhanoshin 62fb3fa6d9 [AMDGPU] Define 6 dword subregs
This prevents autogeneration of degenerate names for these.

Differential Revision: https://reviews.llvm.org/D80451
2020-05-22 13:53:29 -07:00
Sanjay Patel 024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00
Sanjay Patel 6438ea45e0 [VectorCombine] position pass after SLP in the optimization pipeline rather than before
There are 2 known problem patterns shown in the test diffs here:
vector horizontal ops (an x86 specialization) and vector reductions.

SLP has greater ability to match and fold those than vector-combine,
so let SLP have first chance at that.

This is a quick fix while we continue to improve vector-combine and
possibly canonicalize to reduction intrinsics.

In the longer term, we should improve matching of these patterns
because if they were created in the "bad" forms shown here, then we
would miss optimizing them.

I'm not sure what is happening with alias analysis on the addsub test.
The old pass manager now shows an extra line for that, and we see an
improvement that comes from SLP vectorizing a store. I don't know
what's missing with the new pass manager to make that happen.
Strangely, I can't reproduce the behavior if I compile from C++ with
clang and invoke the new PM with "-fexperimental-new-pass-manager".

Differential Revision: https://reviews.llvm.org/D80236
2020-05-22 12:22:44 -04:00
Matt Arsenault 66fe60220c AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks
Unlike SelectionDAGBuilder, IRTranslator omits the unconditional
branch in fallthrough cases. Confusingly, the control flow pseudos
function in the opposite way the intrinsics are used, and the branch
targets always need to be swapped. We're inverting the target blocks,
so we need to figure out the old fallthrough block and insert a branch
to the original unconditional branch target.
2020-05-22 10:31:44 -04:00
Jon Roelofs 5a8db275f8 Revert "[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC"
This reverts commit 183d6af081.

Revert pending further consensus building: https://reviews.llvm.org/D79963#2050521
2020-05-22 05:36:15 -06:00
Dmitry Preobrazhensky 933ebc4078 [AMDGPU][MC][GFX8+] Enabled clamp for v_mul_i32_i24_e64 and v_mul_u32_u24_e64
See bug 45925: https://bugs.llvm.org/show_bug.cgi?id=45925

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D80287
2020-05-22 14:11:31 +03:00
Tim Renouf d13a508820 [AMDGPU] Fixed incorrect PAL metadata register naming
This only affects assembly and -filetype=asm codegen of PAL metadata.

Differential Revision: https://reviews.llvm.org/D78860

Change-Id: I7b822e1917bf7b403486820d31afc483be207652
2020-05-21 22:13:19 +01:00