Unmerges have the same fundamental problem as G_TRUNC, and G_TRUNC
could be implemented in terms of G_UNMERGE_VALUES. Reducing the number
of elements in unmerge results ends up producing the original unmerge
type profile, so the artifact combiner needs to eliminate the
intermediate illegal registers. This avoids infinite looping in the
legalizer in a future change.
Assuming an unmerge has each result unmerged the same way, this ends
up producing a new unmerge of the source for every definition. I'm not
sure if the artifact combiner should either insert temporary merges
here and erase the original merge, or if the combiner should look at
uses from defs rather than defs from uses for unmerges.
In a few cases this regresses from using 16-bit shifts for 8-bit
values to using 32-bit shifts, but I think these can be legalized
later (the other legalization rules don't try very hard to use 16-bit
shifts either).
If we were to have an operation with an s16 def that needs to be
executed in a waterfall loop, not having s16 legal would place an
avoidable burden on RegBankSelect to widen it.
There seems to be an unrelated CSEMIRBuilder bug that was causing
expensive checks failures in this case. Hack the test to avoid this
problem for now until that's fixed.
Use pad with undef and unmerge with unused results. This is annoyingly
similar to several other places in LegalizerHelper, but they're all
slightly different.
The current set is an incomprehensible mess riddled with ordering
hacks for various limitations in the legalizer at the time of writing,
many of which have been fixed. This takes a very small step in
correcting this.
The core first change is to start checking for fully legal cases
first, rather than trying to figure out all of the actions that could
need to be performed. It's recommended to check the legal cases first
for faster legality checks in the common case. This still has a table
listing some common cases, but it needs measuring whether this really
helps or not.
More significantly, stop trying to allow any arbitrary type with a
legal bitwidth as a legal memory type, and start using the bitcast
legalize action for them. Allowing loads of these weird vector types
produced new burdens we don't need for handling all of the
legalization artifacts. Unlike the SelectionDAG handling, this is
still not casting 64 or 16-bit element vectors to 32-bit
vectors. These cases should still be handled by increasing/decreasing
the number of 16-bit elements. This is primarily to fix 8-bit element
vectors.
Another change is to stop trying to handle the load-widening based on
a higher alignment. We should still do this, but the way it was
handled wasn't really correct. We really need to modify the MMO's size
at the same time, and not just increase the result type. The
LegalizerHelper does not do this, and I think this would really
require a separate WidenMemory action (or to add a memory action
payload to the LegalizeMutation). These will now fail to legalize.
The structure of the legalizer rules makes writing concise rules here
difficult. It would be easier if the same function could answer the
query the query, and report the action to perform at the same
time. Instead these two are split into distinct predicate and action
functions. This is mostly tolerable for other cases, but the
load/store rules get pretty complicated so it's difficult to keep two
versions of these functions in sync.
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.
If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.
We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.
This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
The combine G_UNMERGE_VALUES with G_CONCAT_VECTORS used to only be performed
when the result type of the G_UNMERGE_VALUES was a vector type.
In other words, we were expecting that the G_UNMERGE_VALUES was effectively
the exact opposite of the G_CONCAT_VECTORS.
Lift that constraint by allowing any G_UNMERGE_VALUES to be combined
with any G_CONCAT_VECTORS (as long as the size of the different pieces
that we merge/unmerge match).
Differential Revision: https://reviews.llvm.org/D69288
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD
Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69734
There are 1024 bit register classes defined for AGPRs. Additionally
OpenCL defines vectors up to 16 x i64, and this helps those tests
legalize.
llvm-svn: 373350
r371901 was overeager and widenScalarDst() and the like in the legalizer
attempt to increment the insert point given in order to add new instructions
after the currently legalizing inst. In cases where the insertion point is not
exactly the current instruction, then callers need to de-compensate for the
behaviour by decrementing the insertion iterator before calling them. It's not
a nice state of affairs, for now just undo the problematic parts of the change.
llvm-svn: 372050
For some reason we sometimes insert new instructions one instruction before
the first non-PHI when legalizing. This can result in having non-PHI
instructions before PHIs, which mean that PHI elimination doesn't catch them.
Differential Revision: https://reviews.llvm.org/D67570
llvm-svn: 371901
Summary:
Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s
possible to combine them after processing the rest of the instruction because
the legalization is likely to generate more artifacts that allow ArtifactCombiner
to combine away them.
Instead, move illegal artifacts to another list called RetryList and wait until all of the
instruction in InstList are legalized. After that, check if there is any new artifacts and
try to combine them again if that’s the case. If not, abort. The idea is similar to D59339,
but the approach is a bit different.
This patch fixes the issue described above, but the legalizer still may be unable to handle
some cases depending on when to legalize artifacts. So, in the long run, we probably need
a different legalization strategy that handles this dependency in a better way.
Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette
Reviewed By: dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65894
llvm-svn: 369805
The x86 tests are now broken (in paticular add-scalar.ll now hits the
DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a
custom lowering for this, so that will need to be implemented.
llvm-svn: 369673
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.
llvm-svn: 369547
AMDGPU sometimes has legal s16 and <2 x s16> operations, but all
registers are really 32-bit. An unmerge destination really should ben
widened to a 32-bit register. If widening a scalarizing vector with a
target size that matches the vector size, bitcast to integer and
extract the relevant bits with shifts.
I'm not sure if this is the right place for this. This could arguably
be part of widenScalar for the result. I also have a growing feeling
that we're missing a bitcast legalize action.
llvm-svn: 367604