Commit Graph

610 Commits

Author SHA1 Message Date
Matt Arsenault b60a2ae40e AMDGPU/GlobalISel: Support arguments with multiple registers
Handles structs used directly in argument lists.

llvm-svn: 366584
2019-07-19 14:29:30 +00:00
Matt Arsenault fecf43eba3 AMDGPU/GlobalISel: Rewrite lowerFormalArguments
This should now handle everything except structs passed as multiple
registers.

I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.

This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.

llvm-svn: 366582
2019-07-19 14:15:18 +00:00
Matt Arsenault 1022c0dfde AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Matt Arsenault 0966dd0d69 GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sources
Extract the sources to the GCD of the original size and target size,
padding with implicit_def as necessary.

Also fix the case where the requested source type is wider than the
original result type. This was ignoring the type, and just using the
destination. Do the operation in the requested type and truncate back.

llvm-svn: 366367
2019-07-17 20:22:44 +00:00
Matt Arsenault 914a59cad8 GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUES
Use an anyext to the requested type for the leftover operand to
produce a slightly wider type, and then truncate the final merge.

I have another implementation almost ready which handles arbitrary
widens, but I think it produces worse code in this example (which I
think is 90% due to not folding redundant copies or folding out
implicit_def users), so I wanted to add this as a baseline first.

llvm-svn: 366366
2019-07-17 20:22:38 +00:00
Nicolai Haehnle 8b7041a5c6 AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC
Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630

Reviewers: rampitec, mareko

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64807

Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc
llvm-svn: 366314
2019-07-17 11:22:57 +00:00
Matt Arsenault f8c8284455 AMDGPU/GlobalISel: Select G_ASHR
llvm-svn: 366257
2019-07-16 20:31:25 +00:00
Matt Arsenault e5b28b98e9 AMDGPU/GlobalISel: Select G_LSHR
llvm-svn: 366256
2019-07-16 20:25:43 +00:00
Matt Arsenault 1b69fd275d AMDGPU/GlobalISel: Select G_SHL
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.

The 16-bit versions don't work yet.

llvm-svn: 366254
2019-07-16 20:15:30 +00:00
Matt Arsenault 2d10407719 AMDGPU/GlobalISel: Fix selection of private stores
llvm-svn: 366249
2019-07-16 19:27:44 +00:00
Matt Arsenault 7161fb0be5 AMDGPU/GlobalISel: Select private loads
llvm-svn: 366248
2019-07-16 19:22:21 +00:00
Matt Arsenault dad1f89210 AMDGPU/GlobalISel: Select flat stores
llvm-svn: 366246
2019-07-16 18:42:53 +00:00
Matt Arsenault 35c96598b1 AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237
2019-07-16 18:05:29 +00:00
Matt Arsenault 22c4a147a9 AMDGPU/GlobalISel: Fix test failures in release build
Apparently the check for legal instructions during instruction
select does not happen without an asserts build, so these would
successfully select in release, and fail in debug.

Make s16 and/or/xor legal. These can just be selected directly
to the 32-bit operation, as is already done in SelectionDAG, so just
make them legal.

llvm-svn: 366210
2019-07-16 14:28:30 +00:00
Matt Arsenault 66ee934440 AMDGPU/GlobalISel: Allow scalar s1 and/or/xor
If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to
whether the result is 0. If the inputs are SCC, these can be copied to
a 32-bit SGPR to produce an SCC result.

llvm-svn: 366125
2019-07-15 20:20:18 +00:00
Matt Arsenault c8291c94f8 AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR
llvm-svn: 366121
2019-07-15 19:50:07 +00:00
Matt Arsenault ad19b50c00 AMDGPU/GlobalISel: Don't constrain source register of VCC copies
This is a hack until I come up with a better way of dealing with the
pseudo-register banks used for boolean values. If the use instruction
constrains the register, the selector for the def instruction won't
see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have
been SCCRegBank or VCCRegBank in wave32.

This is necessary to successfully select branches with and and/or/xor
condition.

llvm-svn: 366120
2019-07-15 19:48:36 +00:00
Matt Arsenault e1b52f4180 AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies
The extra test change is correct, although how it arrives there is a
bug that needs work. With wave32, the test for isVCC ambiguously
reports true for an SCC or VCC source. A new allocatable pseudo
register class for SCC may be necesssary.

llvm-svn: 366119
2019-07-15 19:46:48 +00:00
Matt Arsenault 3bfdb54d88 AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC
llvm-svn: 366118
2019-07-15 19:45:49 +00:00
Matt Arsenault 18b7133843 AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC
This was emitting a copy from a 32-bit register to a 64-bit.

llvm-svn: 366117
2019-07-15 19:44:07 +00:00
Matt Arsenault 6ed315f89b AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT
llvm-svn: 366116
2019-07-15 19:43:04 +00:00
Matt Arsenault b0e04c018c AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT
Turn the constant cases into G_EXTRACTs.

llvm-svn: 366115
2019-07-15 19:40:59 +00:00
Matt Arsenault 5dfd466032 AMDGPU/GlobalISel: Fix G_ICMP for wave32
llvm-svn: 366114
2019-07-15 19:39:31 +00:00
Matt Arsenault 434d664095 GlobalISel: Implement narrowScalar for vector extract/insert indexes
llvm-svn: 366113
2019-07-15 19:37:34 +00:00
Matt Arsenault 90bdfb3daf AMDGPU/GlobalISel: Widen vector extracts
llvm-svn: 366103
2019-07-15 18:31:10 +00:00
Matt Arsenault 53fa759ff5 AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break
llvm-svn: 366102
2019-07-15 18:25:24 +00:00
Matt Arsenault b390121efb AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf
llvm-svn: 366099
2019-07-15 18:18:46 +00:00
Matt Arsenault a65913e752 AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR
llvm-svn: 366087
2019-07-15 17:26:43 +00:00
Matt Arsenault cc02b17082 AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS
llvm-svn: 366086
2019-07-15 17:20:40 +00:00
Matt Arsenault 51a05d72ae AMDGPU: Drop remnants of byval support for shaders
Before 2018, mesa used to use byval interchangably with inreg, which
didn't really make sense. Fix tests still using it to avoid breaking
in a future commit.

llvm-svn: 365953
2019-07-12 20:12:17 +00:00
Matt Arsenault 6ce1b4fec5 GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM
llvm-svn: 365658
2019-07-10 16:31:19 +00:00
Tom Stellard d0ba79fe7b AMDGPU/GlobalISel: Add support for wide loads >= 256-bits
Summary:
This adds support for the most commonly used wide load types:
<8xi32>, <16xi32>, <4xi64>, and <8xi64>

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57399

llvm-svn: 365586
2019-07-10 00:22:41 +00:00
Matt Arsenault b1843e130a GlobalISel: Implement lower for G_FCOPYSIGN
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.

llvm-svn: 365583
2019-07-09 23:34:29 +00:00
Matt Arsenault 3f1a34546c AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTOR
llvm-svn: 365575
2019-07-09 22:48:04 +00:00
Matt Arsenault 14a4495155 GlobalISel: Combine unmerge of merge with intermediate cast
This eliminates some illegal intermediate vectors when operations are
scalarized.

llvm-svn: 365566
2019-07-09 22:19:13 +00:00
Matt Arsenault fdd761af15 AMDGPU/GlobalISel: Prepare some tests for store selection
Mostsly these would fail due to trying to use SI with a flat
operation. Implementing global loads with MUBUF is more work than
flat, so these won't be handled in the initial load selection.

Others fail because store of s64 won't initially work, as the current
set of patterns expect everything to be turned into v2i32.

llvm-svn: 365493
2019-07-09 14:30:57 +00:00
Matt Arsenault 85ad662dfd AMDGPU/GlobalISel: Fix test
llvm-svn: 365491
2019-07-09 14:30:02 +00:00
Matt Arsenault 4dd5755d01 AMDGPU/GlobalISel: Legalize more concat_vectors
llvm-svn: 365488
2019-07-09 14:17:31 +00:00
Matt Arsenault 6bdb92d833 AMDGPU/GlobalISel: Improve regbankselect for icmp s16
Account for 64-bit scalar eq/ne when available.

llvm-svn: 365487
2019-07-09 14:13:09 +00:00
Matt Arsenault 8b8eee5904 AMDGPU/GlobalISel: Make s16 G_ICMP legal
llvm-svn: 365486
2019-07-09 14:10:43 +00:00
Matt Arsenault e6d10f97dd AMDGPU/GlobalISel: Select G_SUB
llvm-svn: 365484
2019-07-09 14:05:11 +00:00
Matt Arsenault 872f38be7e AMDGPU/GlobalISel: Select G_UNMERGE_VALUES
llvm-svn: 365483
2019-07-09 14:02:26 +00:00
Matt Arsenault 9b7ffc4e55 AMDGPU/GlobalISel: Select G_MERGE_VALUES
llvm-svn: 365482
2019-07-09 14:02:20 +00:00
Matt Arsenault 43cbca50e4 GlobalISel: Fix widenScalar for pointer typed G_MERGE_VALUES
llvm-svn: 365093
2019-07-03 23:08:06 +00:00
Amara Emerson cac1151845 [AArch64][GlobalISel] Overhaul legalization & isel or shifts to select immediate forms.
There are two main issues preventing us from generating immediate form shifts:
1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift
immediate forms, but they currently don't work because the amount type is
expected to be an s64 constant, but we only legalize them to have homogenous
types.

To deal with this, first we introduce a custom legalizer to *only* custom legalize
s32 shifts which have a constant operand into a s64.

There is also an additional artifact combiner to fold zexts(g_constant) to a
larger G_CONSTANT if it's legal, a counterpart to the anyext version committed
in an earlier patch.

2) For G_SHL the importer can't cope with the pattern. For this I introduced an
early selection phase in the arm64 selector to select these forms manually
before the tablegen selector pessimizes it to a register-register variant.

Differential Revision: https://reviews.llvm.org/D63910

llvm-svn: 364994
2019-07-03 01:49:06 +00:00
Matt Arsenault 50be3481d4 AMDGPU/GlobalISel: Try generated matcher with intrinsics
llvm-svn: 364933
2019-07-02 14:52:16 +00:00
Matt Arsenault a8bff4b963 AMDGPU/GlobalISel: Select mul
llvm-svn: 364932
2019-07-02 14:52:14 +00:00
Matt Arsenault dd7ca4faa5 GlobalISel: Define GINodeEquiv for G_UMULH/G_SMULH
llvm-svn: 364931
2019-07-02 14:49:29 +00:00
Matt Arsenault 70a4d3f67c AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands
The register bank for the destination of the sample argument copy was
wrong. We shouldn't be constraining each source to the result register
bank. Allow constraining the original register to the right size.

llvm-svn: 364928
2019-07-02 14:40:22 +00:00
Matt Arsenault ed63399244 AMDGPU/GlobalISel: Select G_FENCE
Manually select to workaround tablegen emitter emitting checks for
G_CONSTANT.

llvm-svn: 364927
2019-07-02 14:17:38 +00:00
Matt Arsenault ce690544a6 GlobalISel: Add G_FENCE
The pattern importer is for some reason emitting checks for G_CONSTANT
for the immediate operands.

llvm-svn: 364926
2019-07-02 14:16:39 +00:00
Matt Arsenault c9f14f29f5 GlobalISel: Try to widen merges with other merges
If the requested source type an be used as a merge source type, create
a merge of merges. This avoids creating large, illegal extensions and
bit-ops directly to the result type.

llvm-svn: 364841
2019-07-01 19:36:10 +00:00
Matt Arsenault bae3636f96 AMDGPU/GlobalISel: Handle more input argument intrinsics
llvm-svn: 364836
2019-07-01 18:50:50 +00:00
Matt Arsenault 9e8e8c60fa AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics
llvm-svn: 364835
2019-07-01 18:49:01 +00:00
Matt Arsenault 756d81905f AMDGPU/GlobalISel: Legalize workgroup ID intrinsics
llvm-svn: 364834
2019-07-01 18:47:22 +00:00
Matt Arsenault e2c86cce3a AMDGPU/GlobalISel: Legalize workitem ID intrinsics
Tests don't cover the masked input path since non-kernel arguments
aren't lowered yet.

Test is copied directly from the existing test, with 2 additions.

llvm-svn: 364833
2019-07-01 18:45:36 +00:00
Matt Arsenault e15770aec4 AMDGPU/GlobalISel: Custom lower control flow intrinsics
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.

llvm-svn: 364832
2019-07-01 18:40:23 +00:00
Matt Arsenault 4073b33786 AMDGPU/GlobalISel: Handle 16-bit SALU min/max
This needs to be extended to s32, and expanded into cmp+select.  This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.

llvm-svn: 364831
2019-07-01 18:33:37 +00:00
Matt Arsenault 5a7d5111e5 AMDGPU/GlobalISel: Lower SALU min/max to cmp+select
Use a change observer to apply a register bank to the newly created
intermediate result register.

llvm-svn: 364830
2019-07-01 18:30:45 +00:00
Matt Arsenault 7f8c720939 AMDGPU/GlobalISel: Add tests for add legalization
llvm-svn: 364828
2019-07-01 18:26:47 +00:00
Matt Arsenault ef59cb6982 AMDGPU/GlobalISel: Legalize s16 add/sub/mul
If this is scalar, promote to s32. Use a new observer class to assign
the register bank of newly created registers.

llvm-svn: 364827
2019-07-01 18:18:55 +00:00
Matt Arsenault 9470bb262b AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT
The condition register bank must be scc or vcc so that a copy will be
inserted, which will be lowered to a compare.

Currently greedy unnecessarily forces using a VCC select.

llvm-svn: 364825
2019-07-01 18:13:12 +00:00
Matt Arsenault b2ea20eedd AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt
llvm-svn: 364819
2019-07-01 17:40:18 +00:00
Matt Arsenault 40d1faf38f AMDGPU/GlobalISel: Legalize s16 fcmp
llvm-svn: 364817
2019-07-01 17:35:53 +00:00
Matt Arsenault 1094e6a814 AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap
llvm-svn: 364811
2019-07-01 17:04:57 +00:00
Matt Arsenault 265059eaf6 AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane
llvm-svn: 364808
2019-07-01 16:41:36 +00:00
Matt Arsenault 0a52e9d026 AMDGPU/GlobalISel: Complete implementation of G_GEP
Also works around tablegen defect in selecting add with unused carry,
but if we have to manually select GEP, might as well handle add
manually.

llvm-svn: 364806
2019-07-01 16:34:48 +00:00
Matt Arsenault e1006259d8 AMDGPU/GlobalISel: Select G_PHI
llvm-svn: 364805
2019-07-01 16:32:47 +00:00
Matt Arsenault d810ff2588 AMDGPU/GlobalISel: Try to select VOP3 form of add
There are several things broken, but at least emit the right thing for
gfx9.

The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.

llvm-svn: 364804
2019-07-01 16:27:32 +00:00
Matt Arsenault 62d64b0c30 AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane
llvm-svn: 364801
2019-07-01 16:19:39 +00:00
Tom Stellard 9e9dd30de3 AMDGPU/GlobalISel: Implement select for 32-bit G_ADD
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58804

llvm-svn: 364797
2019-07-01 16:09:33 +00:00
Matt Arsenault 2ab25f9ceb AMDGPU/GlobalISel: Select G_BRCOND for vcc
llvm-svn: 364795
2019-07-01 16:06:02 +00:00
Matt Arsenault cda82f0bb6 AMDGPU/GlobalISel: Select G_FRAME_INDEX
llvm-svn: 364789
2019-07-01 15:48:18 +00:00
Matt Arsenault fdf36729c7 AMDGPU/GlobalISel: Make s16 select legal
This is easy to handle and avoids legalization artifacts which are
likely to obscure combines.

llvm-svn: 364787
2019-07-01 15:42:47 +00:00
Matt Arsenault 6464280eb0 AMDGPU/GlobalISel: Select G_BRCOND for scc conditions
llvm-svn: 364786
2019-07-01 15:39:27 +00:00
Matt Arsenault 1daad91af6 AMDGPU/GlobalISel: Tolerate copies with no type set
isVCC has the same bug, but isn't used in a context where it can cause
a problem.

llvm-svn: 364784
2019-07-01 15:23:04 +00:00
Matt Arsenault 4f64ade04c AMDGPU/GlobalISel: Select src modifiers
llvm-svn: 364782
2019-07-01 15:18:56 +00:00
Matt Arsenault 5bf850d52e AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE
llvm-svn: 364768
2019-07-01 13:40:18 +00:00
Matt Arsenault b5fc94f3e7 AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR
llvm-svn: 364767
2019-07-01 13:40:17 +00:00
Matt Arsenault 89fc8bcdd6 AMDGPU/GlobalISel: Fail on store to 32-bit address space
llvm-svn: 364766
2019-07-01 13:37:39 +00:00
Matt Arsenault 3b7668ae4b AMDGPU/GlobalISel: Improve icmp selection coverage.
Select s64 eq/ne scalar icmp.

llvm-svn: 364765
2019-07-01 13:34:26 +00:00
Matt Arsenault c23149f612 AMDGPU/GlobalISel: RegBankSelect for WWM/WQM
llvm-svn: 364763
2019-07-01 13:30:12 +00:00
Matt Arsenault facf69e844 AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote
llvm-svn: 364762
2019-07-01 13:30:09 +00:00
Matt Arsenault 9f992c238a AMDGPU/GlobalISel: Fix scc->vcc copy handling
This was checking the size of the register with the value of the size,
which happens to be exec. Also fix assuming VCC is 64-bit to fix
wave32.

Also remove some untested handling for physical registers which is
skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical
copy source. I'm not sure if this should be trying to handle this
special case instead of dealing with this in copyPhysReg.

llvm-svn: 364761
2019-07-01 13:22:07 +00:00
Matt Arsenault 5dafcb9b11 AMDGPU/GlobalISel: Use and instead of BFE with inline immediate
Zext from s1 is the only case where this should do anything with the
current legal extensions.

llvm-svn: 364760
2019-07-01 13:22:06 +00:00
Matt Arsenault 01bb075c1f GlobalISel: Add GINodeEquiv for min/max
llvm-svn: 364759
2019-07-01 13:22:04 +00:00
Matt Arsenault fbf67d88de GlobalISel: Add DAG compat for G_FCANONICALIZE
llvm-svn: 364758
2019-07-01 13:22:00 +00:00
Matt Arsenault 7889d4ce66 AMDGPU/GlobalISel: Add some more tests for icmp select
llvm-svn: 364703
2019-06-29 00:55:16 +00:00
Matt Arsenault 0d45209757 AMDGPU/GlobalISel: RegBankSelect for update.dpp
llvm-svn: 364701
2019-06-29 00:44:36 +00:00
Matt Arsenault fd82cf4f4d AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec
llvm-svn: 364699
2019-06-29 00:39:20 +00:00
Matt Arsenault adb1f21e52 AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics
llvm-svn: 364698
2019-06-29 00:33:13 +00:00
Matt Arsenault 5ea3c9adb2 AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsics
llvm-svn: 364696
2019-06-29 00:28:52 +00:00
Matt Arsenault 6aafb3068f AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmas
llvm-svn: 364695
2019-06-29 00:25:53 +00:00
Matt Arsenault ade5162432 AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsics
llvm-svn: 364694
2019-06-29 00:22:28 +00:00
Diana Picus c3dbe23977 [GlobalISel] Accept multiple vregs in lowerFormalArgs
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.

With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.

AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.

Mips doesn't support aggregates yet, so it's also NFC.

x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.

Differential Revision: https://reviews.llvm.org/D63549

llvm-svn: 364510
2019-06-27 08:54:17 +00:00
Matt Arsenault f4e51dd2cd AMDGPU/GlobalISel: Fix broken test
llvm-svn: 364316
2019-06-25 13:57:53 +00:00
Matt Arsenault d7ffa2a948 AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT
llvm-svn: 364308
2019-06-25 13:18:11 +00:00
Matt Arsenault 25bc27965a AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class
llvm-svn: 364262
2019-06-25 01:07:22 +00:00
Matt Arsenault 2100caf7f6 AMDGPU/GlobalISel: Add tests for regbankselect of v2s16 and/or/xor
llvm-svn: 364244
2019-06-24 22:21:02 +00:00
Matt Arsenault dbb6c03175 AMDGPU/GlobalISel: Select G_TRUNC
llvm-svn: 364215
2019-06-24 18:02:18 +00:00
Matt Arsenault 14d0b646b7 AMDGPU/GlobalISel: RegBankSelect for amdgcn.class
llvm-svn: 364214
2019-06-24 18:00:47 +00:00
Matt Arsenault 8fcd5ade3e AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect
Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need
to extend to the 32-bit half, and then to 64.

I'm not sure what the line should be between what RegBankSelect
handles, and what instruction select does, but for now I'm erring on
the side of RegBankSelect for future post-RBS combines.

llvm-svn: 364212
2019-06-24 17:54:12 +00:00
Matt Arsenault f8a841b88e AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1
Try to fail for scc, since I don't think that should ever be produced.

llvm-svn: 364199
2019-06-24 16:24:03 +00:00
Matt Arsenault 5dbd9228c4 AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyext
This needs different handling if the source is known to be a valid
condition or not. Handle turning it into shifts or a select during
regbankselect.

llvm-svn: 364186
2019-06-24 14:53:58 +00:00
Amara Emerson 8f25a021dd [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).

On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.

There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.

Differential Revision: https://reviews.llvm.org/D63587

llvm-svn: 364075
2019-06-21 16:43:50 +00:00
Matt Arsenault d5ce8ec778 AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale
llvm-svn: 363667
2019-06-18 12:23:42 +00:00
Matt Arsenault 5a321b899e GlobalISel: Use the original flags when lowering fneg to fsub
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.

Note the expansion itself isn't correct without nnan, but that should
be fixed separately.

llvm-svn: 363637
2019-06-17 23:48:43 +00:00
Matt Arsenault 3e140066bc GlobalISel: Ignore callsite attributes when picking intrinsic type
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.

I fixed the same bug in SelectionDAG in r287593.

llvm-svn: 363580
2019-06-17 17:01:35 +00:00
Matt Arsenault a7f09f3c9e GlobalISel: Verify intrinsics
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.

llvm-svn: 363579
2019-06-17 17:01:32 +00:00
Tom Stellard 8b1c53b528 AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60640

llvm-svn: 363576
2019-06-17 16:27:43 +00:00
Matt Arsenault 9487278010 Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This reapplies r363410, avoiding null dereference if there is no
AltRegBank.

llvm-svn: 363478
2019-06-15 00:33:26 +00:00
Mitch Phillips 0d44f129bb Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This patch breaks UBSan build bots. See
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for
a guide as to how to reproduce the error.

This reverts commit c2864c0de0.
This reverts rL363410.

llvm-svn: 363476
2019-06-14 23:45:34 +00:00
Matt Arsenault c2864c0de0 GlobalISel: Avoid producing Illegal copies in RegBankSelect
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.

The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.

For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.

llvm-svn: 363410
2019-06-14 15:22:25 +00:00
Stanislav Mekhanoshin 000f9cc62a [AMDGPU] more gfx1010 tests. NFC.
llvm-svn: 363190
2019-06-12 18:44:11 +00:00
Matt Arsenault 61f6395fd0 AMDGPU/GlobalISel: Fix using illegal situations in tests
These were using illegal copies as the side effecting use, so make
them legal.

llvm-svn: 363168
2019-06-12 14:23:28 +00:00
Matt Arsenault e0a4da8c0a AMDGPU/GlobalISel: Add wave scratch offset argument
Avoids crashing in PEI in a future change.

llvm-svn: 362136
2019-05-30 19:33:18 +00:00
Matt Arsenault 9ffd8b5a6f AMDGPU/GlobalISel: Remove unnecesssary REQUIREs
This has been a mandatory part of the build for a while.

llvm-svn: 361956
2019-05-29 13:14:35 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00
Matt Arsenault 6aebcd5499 AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC
llvm-svn: 361027
2019-05-17 12:20:01 +00:00
Matt Arsenault 6aafc5e19d AMDGPU/GlobalISel: Legalize G_FRINT
llvm-svn: 361026
2019-05-17 12:19:57 +00:00
Matt Arsenault 1448f5689e AMDGPU/GlobalISel: Legalize G_FCOPYSIGN
llvm-svn: 361025
2019-05-17 12:19:52 +00:00
Matt Arsenault 568f193847 AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load
llvm-svn: 361023
2019-05-17 12:02:34 +00:00
Matt Arsenault a3b5a386fa AMDGPU/GlobalISel: Use subreg index instead of extra unmerge
This saves instructions and extra steps, but I'm not sure about
introducing subregister indexes at this point.

llvm-svn: 361022
2019-05-17 12:02:31 +00:00
Matt Arsenault b3dc73634c AMDGPU/GlobalISel: Use waterfall loop for buffer_load
This adds support for more complex waterfall loops that need to handle
operands > 32-bits, and multiple operands.

llvm-svn: 361021
2019-05-17 12:02:27 +00:00
Matt Arsenault a8f88c388f AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor
Bool values should use the scc/vcc regbank since r350611.

llvm-svn: 360877
2019-05-16 12:06:41 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Matt Arsenault 2b6f76f05f AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources
llvm-svn: 358894
2019-04-22 15:22:46 +00:00
Matt Arsenault 8f624abc1d GlobalISel: Legalize scalar G_EXTRACT sources
llvm-svn: 358892
2019-04-22 15:10:42 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Matt Arsenault 7187272b2b GlobalISel: Support legalizing G_CONSTANT with irregular breakdown
llvm-svn: 358109
2019-04-10 17:27:53 +00:00
Matt Arsenault 9e0eeba569 GlobalISel: Handle odd breakdowns for bit ops
llvm-svn: 358105
2019-04-10 17:07:56 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Matt Arsenault 4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Matt Arsenault 5fddf09187 AMDGPU/GlobalISel: Insert waterfall loop for vector indexing
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.

llvm-svn: 357235
2019-03-29 03:54:56 +00:00
Amara Emerson 381188f1f3 [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions.
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).

This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.

As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.

Differential Revision: https://reviews.llvm.org/D59892

llvm-svn: 357101
2019-03-27 17:47:42 +00:00
Matt Arsenault 733b8571b4 MIR: Freeze reserved regs after parsing everything
The AMDGPU implementation of getReservedRegs depends on
MachineFunctionInfo fields that are parsed from the YAML section. This
was reserving the wrong register since it was setting the reserved
regs before parsing the correct one.

Some tests were relying on the default reserved set for the assumed
default calling convention.

llvm-svn: 357083
2019-03-27 16:12:26 +00:00
Matt Arsenault b34afa311d GlobalISel: Fix RegBankSelect for REG_SEQUENCE
The AArch64 test was broken since the result register already had a
set register class, so this test was a no-op. The mapping verify call
would fail because the result size is not the same as the inputs like
in a copy or phi.

The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR
copies which need much more work to handle correctly (same for phis),
but add them as a baseline.

llvm-svn: 356713
2019-03-21 20:45:36 +00:00
Matt Arsenault 2065206a9d AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.

llvm-svn: 356611
2019-03-20 20:41:34 +00:00
Matt Arsenault 133716929c GlobalISel: Use multiple returns for intrinsic structs
This is consistent with what SelectionDAG does and is much easier to
work with than the extract sequence with an artificial wide register.

For the AMDGPU control flow intrinsics, this was producing an s128 for
the i64, i1 tuple return. Any legalization that should apply to a real
s128 value would badly obscure the direct values that need to be seen.

llvm-svn: 356147
2019-03-14 14:18:56 +00:00
David Stuttard 20ea21c6ed [AMDGPU] Add support for immediate operand for S_ENDPGM
Summary:
Add support for immediate operand in S_ENDPGM

Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6

Reviewers: alexshap

Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59213

llvm-svn: 355902
2019-03-12 09:52:58 +00:00
Petar Avramovic 0b17e59b5c [MIPS GlobalISel] NarrowScalar G_MUL
Narrow Scalar G_MUL for MIPS32.
Revisit NarrowScalar implementation in LegalizerHelper.
Introduce new helper function multiplyRegisters.
It performs generic multiplication of values held in multiple registers.
Generated instructions use only types NarrowTy and i1.
Destination can be same or two times size of the source.

Differential Revision: https://reviews.llvm.org/D58824

llvm-svn: 355814
2019-03-11 10:00:17 +00:00
Tom Stellard 33634d1b25 AMDGPU/GlobalISel: Implement select for G_INSERT
Re-commit r344310.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 355159
2019-03-01 00:50:26 +00:00
Tom Stellard 41f32196a0 AMDGPU/GlobalISel: Implement select for G_EXTRACT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49714

llvm-svn: 355156
2019-02-28 23:37:48 +00:00
Matt Arsenault bf1bf706c8 AMDGPU/GlobalISel: Add regbankselect test for phis
Add baseline for future fixes. These mostly show how this is broken
and producing illegal situations.

llvm-svn: 355057
2019-02-28 00:52:36 +00:00
Matt Arsenault d3093c2f1f GlobalISel: Implement fewerElementsVector for phi
llvm-svn: 355048
2019-02-28 00:16:32 +00:00
Matt Arsenault 72bcf15dbf GlobalISel: Implement moreElementsVector for phi
llvm-svn: 355047
2019-02-28 00:01:05 +00:00
Matt Arsenault 752579736e RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

llvm-svn: 354828
2019-02-25 22:24:13 +00:00
Matt Arsenault f4bfe4cd17 AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
llvm-svn: 354825
2019-02-25 21:32:48 +00:00
Matt Arsenault 82b103998b AMDGPU/GlobalISel: Clamp max implicit_def elements
llvm-svn: 354818
2019-02-25 20:46:06 +00:00
Matt Arsenault 2e0ee47712 AMDGPU/GlobalISel: Make phis legal
llvm-svn: 354592
2019-02-21 15:48:13 +00:00
Matt Arsenault b10fa8df3f AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 types
llvm-svn: 354587
2019-02-21 15:22:20 +00:00
Tom Stellard 79b5c3842b AMDGPU/GlobalISel: Move SMRD selection logic to TableGen
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52922

llvm-svn: 354516
2019-02-20 21:02:37 +00:00
Matt Arsenault 75e30c4d5d GlobalISel: Fix fewerElementsVector for ctlz with different result type
Also complete the set of related operations.

llvm-svn: 354480
2019-02-20 16:42:52 +00:00
Matt Arsenault c4d07554e4 GlobalISel: Implement moreElementsVector for g_insert results
llvm-svn: 354477
2019-02-20 16:11:22 +00:00
Matt Arsenault b4c95b338b GlobalISel: Implement moreElementsVector for select
llvm-svn: 354354
2019-02-19 17:03:09 +00:00
Matt Arsenault 4d88427a58 GlobalISel: Implement moreElementsVector for G_EXTRACT source
llvm-svn: 354348
2019-02-19 16:44:22 +00:00
Matt Arsenault 26b7e859ef GlobalISel: Implement moreElementsVector for bit ops
llvm-svn: 354345
2019-02-19 16:30:19 +00:00
Matt Arsenault fbe92a53d0 GlobalISel: Implement widenScalar for g_extract scalar results
llvm-svn: 354293
2019-02-18 22:39:27 +00:00
Matt Arsenault 9e5e868d95 AMDGPU/GlobalISel: Fix RegBankSelect for GEP.
This is basically a pointer typed add, so shouldn't be any different.
This was assuming everything was an SGPR, which is not true.

Also cleanup legality for GEP. I don't seem to be seeing the problem
the hack marking s64 as a legal pointer type the comment mentions.

llvm-svn: 354067
2019-02-14 22:24:28 +00:00
Matt Arsenault d3d496338e AMDGPU/GlobalISel: Handle split for 64-bit VALU select
llvm-svn: 354065
2019-02-14 21:58:12 +00:00
Matt Arsenault a180554020 AMDGPU/GlobalISel: Add more insert/extract testcases
llvm-svn: 353848
2019-02-12 15:04:03 +00:00
Matt Arsenault 00ccd13c73 AMDGPU/GlobalISel: Only make f16 constants legal on f16 targets
We could deal with it, but there's no real point.

llvm-svn: 353845
2019-02-12 14:54:55 +00:00
Matt Arsenault b2d245771f GlobalISel: Verify G_EXTRACT
llvm-svn: 353759
2019-02-11 22:12:43 +00:00
Matt Arsenault 18ec382698 GlobalISel: Implement moreElementsVector for implicit_def
llvm-svn: 353754
2019-02-11 22:00:39 +00:00
Matt Arsenault 9dba67f431 GlobalISel: Add G_FCANONICALIZE instruction
llvm-svn: 353719
2019-02-11 17:05:20 +00:00
Matt Arsenault ca9583a70a AMDGPU/GlobalISel: Fix broken tests
llvm-svn: 353559
2019-02-08 19:59:39 +00:00
Matt Arsenault b0a227049f AMDGPU/GlobalISel: Fix shift legalization for non-power-of-2
clampScalar doesn't do anything for non-power-of-2 in range.
There should probably be a combination rule to reduce the number
of matching rules.

llvm-svn: 353526
2019-02-08 15:06:24 +00:00
Matt Arsenault 0f2debb1c2 AMDGPU/GlobalISel: Fix non-power-of-2 implicit_def
llvm-svn: 353522
2019-02-08 14:46:27 +00:00
Petar Avramovic c98b26d326 [MIPS GlobalISel] Select any extending load and truncating store
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.

Select extending load that does not have specified type of extension
into zero extending load.

Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.

Differential Revision: https://reviews.llvm.org/D57454

llvm-svn: 353520
2019-02-08 14:27:23 +00:00
Matt Arsenault dc88a2ce35 AMDGPU/GlobalISel: Don't use a copy in addrspacecast lowering
llvm-svn: 353516
2019-02-08 14:16:11 +00:00
Matt Arsenault a8b4339c2f AMDGPU/GlobalISel: Legalize addrspacecast
Use a placeholder constant for now on targets
that need the load from the queue ptr.

llvm-svn: 353497
2019-02-08 02:40:47 +00:00
Matt Arsenault fbec8fe93b GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Matt Arsenault d914189a2e AMDGPU/GlobalISel: Restrict g_implicit_def legality
llvm-svn: 353452
2019-02-07 19:10:15 +00:00
Matt Arsenault d6212f9f1b GlobalISel: Fix artifact combiner constant legality checks for vectors
Since G_CONSTANT is illegal for vectors, this needs to check
what buildConstant will produce for a splat vector.

llvm-svn: 353449
2019-02-07 18:58:28 +00:00
Matt Arsenault 60b33fb6fc AMDGPU/GlobalISel: Don't use g_implicit_def in a few tests
llvm-svn: 353443
2019-02-07 18:33:22 +00:00
Matt Arsenault c0f7569aab AMDGPU/GlobalISel: Legalize fsqrt
llvm-svn: 353438
2019-02-07 18:14:39 +00:00
Matt Arsenault 93fdec739b AMDGPU/GlobalISel: Legalize some f16 operations
llvm-svn: 353436
2019-02-07 18:03:11 +00:00
Matt Arsenault c83b82363c GlobalISel: Implement fewerElementsVector for shifts
Introduce a new function which handles instructions with multiple type
indices, but have the same number of vector elements.

Also legalize v2s16 shifts when applicable.

llvm-svn: 353432
2019-02-07 17:38:00 +00:00
Matt Arsenault 7f09fd6b04 GlobalISel: Consolidate load/store legalization
The fewerElementsVectors implementation for load/stores
handles the scalar reduction case just as well, so drop
the redundant code in narrowScalar. This also introduces
support for narrowing irregular size breakdowns for
scalars.

llvm-svn: 353125
2019-02-05 00:26:12 +00:00
Matt Arsenault 81511e5428 GlobalISel: Implement narrowScalar for select
Don't handle vector conditions.

I think this can be merged in the future with
fewerElementsVectorSelect, although this becomes slightly tricky with
a vector condition.

llvm-svn: 353122
2019-02-05 00:13:44 +00:00
Matt Arsenault 24f14993e8 GlobalISel: Combine g_extract with g_merge_values
Try to use the underlying source registers.

This enables legalization in more cases where some irregular
operations are widened and others narrowed.

This seems to make the test_combines_2 AArch64 test worse, since the
MERGE_VALUES has multiple uses. Since this should be required for
legalization, a hasOneUse check is probably inappropriate (or maybe
should only be used if the merge is legal?).

llvm-svn: 353121
2019-02-04 23:41:59 +00:00
Matt Arsenault 1f795e2c2a GlobalISel: Enforce operand types for constants
A number of of tests were using imm operands, not cimm. Since CSE
relies on the exact ConstantInt* pointer used, and implicit
conversions are generally evil, also enforce the bitsize of the types.

llvm-svn: 353113
2019-02-04 23:29:31 +00:00
Matt Arsenault 3d6a49b0b9 GlobalISel: Fix not calling observer when legalizing bitcount ops
This was hiding bugs from never legalizing the source type.

llvm-svn: 353102
2019-02-04 22:26:33 +00:00
Matt Arsenault 10547230f3 AMDGPU/GlobalISel: Legalize select for v4s16
Also add some more select tests to help show future legalization
changes.

llvm-svn: 353045
2019-02-04 14:04:52 +00:00
Matt Arsenault 888aa5dedd GlobalISel: Implement widenScalar for G_UNMERGE_VALUES
For the scalar case only.

Also move the similar G_MERGE_VALUES handling to a separate function
and cleanup to make them look more similar.

llvm-svn: 352979
2019-02-03 00:07:33 +00:00
Matt Arsenault 0e5d856eb8 GlobalISel: Implement widenScalar for G_EXTRACT vector sources
Handle the basic element extract case.

llvm-svn: 352978
2019-02-02 23:56:00 +00:00
Matt Arsenault 58f9d3df97 AMDGPU/GlobalISel: Legalize icmp for pointer types
llvm-svn: 352976
2019-02-02 23:35:15 +00:00
Matt Arsenault 2065c94dd3 AMDGPU/GlobalISel: Legalize constant for pointer types
llvm-svn: 352975
2019-02-02 23:33:49 +00:00
Matt Arsenault 2491f82679 AMDGPU/GlobalISel: Legalize select for pointer types
llvm-svn: 352974
2019-02-02 23:31:50 +00:00
Matt Arsenault cbaada6bc1 GlobalISel: Legalization for inttoptr/ptrtoint
llvm-svn: 352973
2019-02-02 23:29:55 +00:00
Matt Arsenault c7bce739ad GlobalISel: Handle odd splits in fewerElementsVector for load/store
llvm-svn: 352720
2019-01-31 02:46:05 +00:00
Matt Arsenault d1bfc8d0c3 GlobalISel: Implement narrowScalar for bswap
llvm-svn: 352719
2019-01-31 02:34:03 +00:00
Matt Arsenault d5684f76e0 GlobalISel: Allow bitcount ops to have different result type
For AMDGPU the result is always 32-bit for 64-bit inputs.

llvm-svn: 352717
2019-01-31 02:09:57 +00:00
Matt Arsenault dc6c78596b GlobalISel: Implement fewerElementsVector for select
llvm-svn: 352601
2019-01-30 04:19:31 +00:00
Matt Arsenault f6cab16258 AMDGPU/GlobalISel: Fix clamping shifts with 16-bit insts
llvm-svn: 352599
2019-01-30 03:36:25 +00:00
Matt Arsenault 045bc9a4a6 GlobalISel: Support narrowScalar for uneven loads
llvm-svn: 352594
2019-01-30 02:35:38 +00:00
Matt Arsenault ccefbbd0f0 GlobalISel: Handle some odd splits in fewerElementsVector
Also add some quick hacks to AMDGPU legality for the tests.

llvm-svn: 352591
2019-01-30 02:22:13 +00:00
Matt Arsenault 92c5001136 GlobalISel: Handle more cases for widenScalar for G_STORE
llvm-svn: 352585
2019-01-30 02:04:31 +00:00
Matt Arsenault d45b03bb81 GlobalISel: Verify pointer casts
Not sure if the old AArch64 tests should be just
deleted or not.

llvm-svn: 352562
2019-01-29 23:29:00 +00:00
Matt Arsenault d8d193d5e2 GlobalISel: Partially implement widenScalar for MERGE_VALUES
llvm-svn: 352560
2019-01-29 23:17:35 +00:00
Matt Arsenault 18619afe1d GlobalISel: Fix narrowScalar for load/store with different mem size
This was ignoring the memory size, and producing multiple loads/stores
if the operand size was different from the memory size.

I assume this is the intent of not having an explicit G_ANYEXTLOAD
(although I think that would probably be better).

llvm-svn: 352523
2019-01-29 18:13:02 +00:00
Matt Arsenault cfca2a7adf GlobalISel: Don't reduce elements for atomic load/store
This is invalid for the same reason as in the narrowScalar handling
for load.

llvm-svn: 352334
2019-01-27 22:36:24 +00:00
Matt Arsenault fdfb7d78f1 GlobalISel: Verify load/store has a pointer input
I expected this to be automatically verified, but it seems
nothing uses that the type index was declared as a "ptype"

llvm-svn: 352319
2019-01-27 15:57:23 +00:00
Matt Arsenault 211e89d4dd GlobalISel: Implement narrowScalar for mul
llvm-svn: 352300
2019-01-27 00:52:51 +00:00
Matt Arsenault 2e5f900849 GlobalISel: fewerElementsVector for intrinsic_trunc/intrinsic_round
llvm-svn: 352298
2019-01-27 00:12:21 +00:00
Matt Arsenault 26a6c74fbe AMDGPU/GlobalISel: Legalize more bit ops
llvm-svn: 352295
2019-01-26 23:47:07 +00:00
Matt Arsenault 4d47594fc5 AMDGPU/GlobalISel: Widen small uaddo/usubo
llvm-svn: 352294
2019-01-26 23:44:51 +00:00
Matt Arsenault 3e08b772b3 AMDGPU/GlobalISel: Scalarize add/sub
llvm-svn: 352167
2019-01-25 04:53:57 +00:00
Matt Arsenault e6cebd0d69 GlobalISel: fewerElementsVector for more cast types
llvm-svn: 352166
2019-01-25 04:37:33 +00:00
Matt Arsenault 95fd95cfe0 GlobalISel: fewerElementsVector for a few more trivial ops
llvm-svn: 352165
2019-01-25 04:03:38 +00:00
Matt Arsenault 5d622fbcc1 AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mul
llvm-svn: 352162
2019-01-25 03:23:04 +00:00
Matt Arsenault 1b1e685f10 GlobalISel: Support fewerElementsVector for icmp/fcmp
Also legalize 64-bit compares for AMDGPU

llvm-svn: 352157
2019-01-25 02:59:34 +00:00
Matt Arsenault ca676343a9 GlobalISel: Implement fewerElementsVector for extensions
llvm-svn: 352155
2019-01-25 02:36:32 +00:00
Aditya Nandakumar 3ba0d94bce [GISel]: Change how CSE is enabled by default for each pass
https://reviews.llvm.org/D57178

Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.

reviewed by: arsenm

llvm-svn: 352126
2019-01-24 23:11:25 +00:00
Matt Arsenault baa5d2e69c RegBankSelect: Support some more complex part mappings
llvm-svn: 352123
2019-01-24 22:47:04 +00:00
Matt Arsenault 4c5e8f51e7 AMDGPU/GlobalISel: Start selectively legalizing 16-bit operations
It might be a bit nicer to use the fancy .legalIf and co. predicates,
but this was requiring more boilerplate and disables the coverage
assertions.

llvm-svn: 351886
2019-01-22 22:00:19 +00:00
Matt Arsenault 736cfa9ffb AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts
llvm-svn: 351884
2019-01-22 21:51:38 +00:00
Matt Arsenault 6378629609 GlobalISel: Implement widen for extract_vector_elt elt type
llvm-svn: 351871
2019-01-22 20:38:15 +00:00
Matt Arsenault aebb2ee036 GlobalISel: Implement fewerElementsVector for basic FP ops
llvm-svn: 351866
2019-01-22 20:14:29 +00:00
Matt Arsenault 6614f852b6 GlobalISel: Support narrowing zextload/sextload
llvm-svn: 351856
2019-01-22 19:02:10 +00:00
Matt Arsenault a7cd83bc88 GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANT
llvm-svn: 351853
2019-01-22 18:53:41 +00:00
Matt Arsenault fb67164ebc AMDGPU/GlobalISel: Legalize more fp<->int conversions
llvm-svn: 351767
2019-01-22 00:20:17 +00:00
Matt Arsenault 7ac79ed8f0 AMDGPU: Legalize more bitcasts
llvm-svn: 351700
2019-01-20 19:45:18 +00:00
Matt Arsenault 46ffe68d77 AMDGPU/GlobalISel: Really legalize exts from i1
There is a combine that was hiding these tests
not actually testing what they should be, although
they were producing the expected end result.

llvm-svn: 351698
2019-01-20 19:28:20 +00:00
Matt Arsenault 745fd9f547 GlobalISel: Implement widenScalar for basic FP ops
llvm-svn: 351696
2019-01-20 19:10:31 +00:00
Matt Arsenault cfd9e7f594 AMDGPU/GlobalISel: Legalize f32->f16 fptrunc
llvm-svn: 351695
2019-01-20 19:10:26 +00:00
Matt Arsenault ff6a9a275b AMDGPU/GlobalISel: Fix some crashs in g_unmerge_values/g_merge_values
This was crashing in the predicate function assuming the value
is a vector.

Copy more of what AArch64 uses. This probably needs more refinement
later, but I don't exactly understand what it means in some cases,
particularly since any legalization for these seems to be missing.

llvm-svn: 351693
2019-01-20 18:40:36 +00:00
Matt Arsenault 2a2086b830 AMDGPU/GlobalISel: Regbank select for fpext
llvm-svn: 351692
2019-01-20 18:35:41 +00:00
Matt Arsenault 24563ef628 AMDGPU/GlobalISel: Cleanup legality for extensions
llvm-svn: 351691
2019-01-20 18:34:24 +00:00
Matt Arsenault 96e4701401 AMDGPU/GlobalISel: Legalize more types for select
llvm-svn: 351599
2019-01-18 21:42:55 +00:00
Matt Arsenault 4599159ac3 AMDGPU/GlobalISel: Legalize illegal g_constant
llvm-svn: 351596
2019-01-18 21:33:50 +00:00
Matt Arsenault c765240060 AMDGPU/GlobalISel: Introduce vcc reg bank
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.

For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.

This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.

Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.

For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.

llvm-svn: 350611
2019-01-08 06:30:53 +00:00
Matt Arsenault a1515d2d33 AMDGPU/GlobalISel: Legalize concat_vectors
llvm-svn: 350598
2019-01-08 01:30:02 +00:00
Matt Arsenault adc40baa29 RegBankSelect: Fix copy insertion point for terminators
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.

I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).

Also legalize brcond for AMDGPU.

llvm-svn: 350595
2019-01-08 01:22:47 +00:00
Matt Arsenault ae6f1e07fc AMDGPU/GlobalISel: Disallow VGPR->SCC copies
This fixes using scalar adds when only the carry in is a VGPR
using greedy regbankselect.

llvm-svn: 350593
2019-01-08 01:13:20 +00:00
Matt Arsenault 68c668a5f3 AMDGPU/GlobalISel: RegBankSelect for carry-in
I'm not sure we should be allowing the truncate
to s1 for the inputs. It may be necessary to
create a new VCC reg bank.

llvm-svn: 350592
2019-01-08 01:09:09 +00:00
Matt Arsenault 2cc15b67b7 AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out
llvm-svn: 350589
2019-01-08 01:03:58 +00:00
Matt Arsenault 299302fbe7 AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUES
llvm-svn: 350588
2019-01-08 00:46:19 +00:00
Matt Arsenault 369acb8470 AMDGPU: Remove VS/SV mappings from select
These would violate the constant bus restriction

llvm-svn: 350517
2019-01-07 13:21:36 +00:00
Matt Arsenault 3eae3c4590 AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote
llvm-svn: 349882
2018-12-21 03:20:54 +00:00
Matt Arsenault f4c21c575a AMDGPU/GlobalISel: RegBankSelect for some fp ops
llvm-svn: 349880
2018-12-21 03:14:45 +00:00
Matt Arsenault bee2ad7185 AMDGPU/GlobalISel: Redo legality for build_vector
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.

Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.

llvm-svn: 349878
2018-12-21 03:03:11 +00:00
Matt Arsenault 4339883710 AMDGPU: Make i1/i64/v2i32 and/or/xor legal
The 64-bit types do depend on the register bank,
but that's another issue to deal with later.

llvm-svn: 349716
2018-12-20 01:35:49 +00:00
Matt Arsenault 8cc98bee8a AMDGPU/GlobalISel: Fix ValueMapping tables for i1
This was incorrectly selecting SGPR for any i1 values,
e.g. G_TRUNC to i1 from a VGPR was still an SGPR.

llvm-svn: 349715
2018-12-20 01:33:43 +00:00
Matt Arsenault dff33c38e1 AMDGPU/GlobalISel: RegBankSelect for fp conversions
llvm-svn: 349709
2018-12-20 00:37:02 +00:00
Matt Arsenault 36d4092173 AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg
llvm-svn: 349708
2018-12-20 00:33:49 +00:00
Matt Arsenault b110e2277c AMDGPU/GlobalISel: Regbankselect for fsub
llvm-svn: 349608
2018-12-19 09:07:58 +00:00
Matt Arsenault c94e26c71d AMDGPU: Legalize/regbankselect frame_index
llvm-svn: 349468
2018-12-18 09:46:13 +00:00
Matt Arsenault c0ea221068 AMDGPU: Legalize/regbankselect fma
llvm-svn: 349467
2018-12-18 09:39:56 +00:00
Matt Arsenault e01e7c81f2 AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub
llvm-svn: 349463
2018-12-18 09:19:03 +00:00
Matt Arsenault 934e534c47 AMDGPU/GlobalISel: Legalize/regbankselect block_addr
llvm-svn: 349081
2018-12-13 20:34:15 +00:00
Matt Arsenault 577b9fc543 AMDGPU/GlobalISel: Legalize f64 fadd/fmul
llvm-svn: 349014
2018-12-13 08:27:48 +00:00
Matt Arsenault f38f483bef AMDGPU/GlobalISel: RegBankSelect some simple operations
llvm-svn: 349012
2018-12-13 08:23:51 +00:00
Matt Arsenault 7acf89a21a AMDGPU/GlobalISel: Test cleanups
Remove IR and registers sections

llvm-svn: 349011
2018-12-13 08:11:45 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Tom Stellard a894043910 Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"
This reverts commit r344310.

The test case was failing on some bots.

llvm-svn: 344317
2018-10-11 23:36:46 +00:00
Tom Stellard 4733be6e7b AMDGPU/GlobalISel: Implement select for G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 344310
2018-10-11 22:49:54 +00:00
Tom Stellard 14d8807d9a AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions
Summary: The 32-bit variants do not exist on VI+.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52958

llvm-svn: 343985
2018-10-08 17:49:29 +00:00
Tom Stellard 7c65078f04 AMDGPU/GlobalISel: Add support for G_INTTOPTR
Summary: This is a no-op.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52916

llvm-svn: 343839
2018-10-05 04:34:09 +00:00
Tom Stellard ffc6bd6f3d AMDGPU/GlobalISel: Define instruction mapping for G_SELECT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D49737

llvm-svn: 341271
2018-09-01 02:41:19 +00:00
Tom Stellard 8adc86a7dc AMDGPU/GlobalISel: Define instruction mapping for G_INSERT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49625

llvm-svn: 339491
2018-08-11 00:51:54 +00:00
Tom Stellard e9bdc5f1d8 AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D49624

llvm-svn: 338102
2018-07-27 06:04:40 +00:00
Tom Stellard b7f19e6d1e AMDGPU/GlobalISel: Legalize G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49601

llvm-svn: 337798
2018-07-24 02:19:20 +00:00
Tom Stellard ac68471326 AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46172

llvm-svn: 337056
2018-07-13 22:16:03 +00:00
Tom Stellard 390a5f4774 AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45882

llvm-svn: 337046
2018-07-13 21:05:14 +00:00
Matt Arsenault 29f303799b AMDGPU/GlobalISel: Implement custom kernel arg lowering
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.

For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.

llvm-svn: 336373
2018-07-05 17:01:20 +00:00
Tom Stellard eebbfc2809 AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.
Summary:
We could split sizes that are not power of two into smaller sized
G_IMPLICIT_DEF instructions, but this ends up generating
G_MERGE_VALUES instructions which we then have to handle in the instruction
selector.  Since G_IMPLICIT_DEF is really a no-op it's easier just to
keep everything that can fit into a register legal.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48777

llvm-svn: 336041
2018-06-30 04:09:44 +00:00
Matt Arsenault 8c4a35237a AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Matt Arsenault b1cc4f52ff AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr
Note a normal select test is not currently possible because this
relies on input registers tracked in SIMachineFunctionInfo which
are not currently serializable in MIR, but this does work end-to-end
from the IR.

llvm-svn: 335490
2018-06-25 16:17:48 +00:00
Matt Arsenault b3feccd7fa AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers
llvm-svn: 335485
2018-06-25 15:42:12 +00:00
Tom Stellard 26fac0f8e1 AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48196

llvm-svn: 335318
2018-06-22 02:54:57 +00:00
Tom Stellard 9a6535718e AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48195

llvm-svn: 335316
2018-06-22 02:34:29 +00:00
Tom Stellard 7712ee8891 AMDGPU/GlobalISel: Implement select() for COPY
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46151

llvm-svn: 335315
2018-06-22 00:44:29 +00:00
Tom Stellard 3f1c6fe156 AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46150

llvm-svn: 335307
2018-06-21 23:38:20 +00:00
Tom Stellard a92847359a AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45907

llvm-svn: 334757
2018-06-14 19:26:37 +00:00
Tom Stellard 46bbbc33c0 AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46171

llvm-svn: 334665
2018-06-13 22:30:47 +00:00
Tom Stellard e182b28ae4 AMDGPU/GlobalISel: Implement select() for G_FCONSTANT
Summary: Also clean up G_CONSTANT selection.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46170

llvm-svn: 332379
2018-05-15 17:57:09 +00:00
Tom Stellard 655fdd3f82 AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46153

llvm-svn: 332154
2018-05-11 23:12:49 +00:00
Tom Stellard dcc95e9385 AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45883

llvm-svn: 332082
2018-05-11 05:44:16 +00:00
Tom Stellard 1e0edad4bb AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45881

llvm-svn: 332042
2018-05-10 21:20:10 +00:00
Tom Stellard 1dc90204bf AMDGPU/GlobalISel: Enable TableGen'd instruction selector
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45994

llvm-svn: 332039
2018-05-10 20:53:06 +00:00
Roman Tereshin d2421f9445 [MachineVerifier][GlobalISel] Verifying generic extends and truncates
Making sure we don't truncate / extend pointers, don't try to change
vector topology or bitcast vectors to scalars or back, and most
importantly, don't extend to a smaller type or truncate to a large
one.

Reviewers: qcolombet t.p.northover aditya_nandakumar

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46490

llvm-svn: 331718
2018-05-08 02:48:15 +00:00
Daniel Sanders acc008cb0c [globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC
As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the
-global-isel option is redundant when -run-pass is given. -global-isel sets up
the GlobalISel passes in the pass manager but -run-pass skips that entirely and
configures it's own pipeline.

llvm-svn: 331603
2018-05-05 21:19:59 +00:00
Tom Stellard 257882ff72 AMDGPU/GlobalISel: Fall-back to SelectionDAG for non-void functions
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45843

llvm-svn: 330774
2018-04-24 21:29:36 +00:00
Tom Stellard c7709e1c29 AMDGPU/GlobalISel: Add support for amdgpu_ps calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45837

llvm-svn: 330767
2018-04-24 20:51:28 +00:00
Matt Arsenault fed0a45036 AMDGPU/GlobalISel: RegBankSelect for basic int ops
llvm-svn: 327843
2018-03-19 14:07:23 +00:00
Matt Arsenault abdc4f2dc7 AMDGPU/GlobalISel: Cleanup constant legality
llvm-svn: 327774
2018-03-17 15:17:48 +00:00
Matt Arsenault 685d1e8157 AMDGPU/GlobalISel: Basic G_GEP legality
llvm-svn: 327773
2018-03-17 15:17:45 +00:00
Matt Arsenault 85803366d6 AMDGPU/GlobalISel: Basic legality for load/store
llvm-svn: 327772
2018-03-17 15:17:41 +00:00
Matt Arsenault 7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault 503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Matt Arsenault e31ab94e97 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
llvm-svn: 326715
2018-03-05 16:25:18 +00:00
Matt Arsenault 71272e6d4e AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

llvm-svn: 326714
2018-03-05 16:25:15 +00:00
Matt Arsenault b9699c009d AMDGPU/GlobalISel: InstrMapping for G_ZEXT
llvm-svn: 326589
2018-03-02 16:55:37 +00:00
Matt Arsenault 1c1aab99ae AMDGPU/GlobalISel: InstrMapping for G_TRUNC
llvm-svn: 326588
2018-03-02 16:55:33 +00:00
Matt Arsenault ef8db767d7 AMDGPU/GlobalISel: Define InstrMappings for G_FCMP
Patch by Tom Stellard

llvm-svn: 326587
2018-03-02 16:53:15 +00:00