Instead of using, for example, `dup v0.4s, wzr`, which transfers between
register files, use the more efficient `movi v0.4s, #0` instead.
Differential revision: https://reviews.llvm.org/D41515
llvm-svn: 321824
Select G_PHI to PHI and manually constrain the result register. This is
very similar to how COPY is handled, so extract and reuse some of that
code.
llvm-svn: 321797
We used to handle G_CONSTANT with pointer type by forcing the type of
the result register to s32 and then letting TableGen handle it.
Unfortunately, setting the type only works for generic virtual
registers, that haven't yet been constrained to a register class (e.g.
those used only by a COPY later on). If the result register has already
been constrained as a use of a previously selected instruction, then
setting the type will assert.
It would be nice to be able to teach TableGen to select pointer
constants the same as integer constants, but since it's such an edge
case (at the moment the only pointer constant that we're generally
interested in is 0, and that is mostly used for comparisons and selects,
which are also not supported by TableGen) it's probably not worth the
effort right now. Instead, handle pointer constants with some trivial
handwritten code.
llvm-svn: 321793
Handle this in DAGCombiner::visitEXTRACT_VECTOR_ELT the same as we already do in SelectionDAG::getNode and use APInt instead of getZExtValue.
This should also fix oss-fuzz #4910
llvm-svn: 321767
The work order was changed in r228186 from SCC order
to RPO with an arbitrary sorting function. The sorting
function attempted to move inner loop nodes earlier. This
was was apparently relying on an assumption that every block
in a given loop / the same loop depth would be seen before
visiting another loop. In the broken testcase, a block
outside of the loop was encountered before moving onto
another block in the same loop. The testcase would then
structurize such that one blocks unconditional successor
could never be reached.
Revert to plain RPO for the analysis phase. This fixes
detecting edges as backedges that aren't really.
The processing phase does use another visited set, and
I'm unclear on whether the order there is as important.
An arbitrary order doesn't work, and triggers some infinite
loops. The reversed RPO list seems to work and is closer
to the order that was used before, minus the arbitary
custom sorting.
A few of the changed tests now produce smaller code,
and a few are slightly worse looking.
llvm-svn: 321751
Currently we use SIGN_EXTEND in lowerMasksToReg as part of calling convention setup, but we don't require a specific value for the upper bits.
This patch changes it to ANY_EXTEND which will be lowered as SIGN_EXTEND if it ends up sticking around.
llvm-svn: 321746
Previously the code for handling G_SMULO didn't properly check for the signed
multiply overflow, instead treating it the same as the unsigned G_UMULO.
Fixes PR35800.
llvm-svn: 321690
A call may have an intrinsic name but not have a valid intrinsic ID,
for example with llvm.invariant.group.barrier. If so, treat it as a
normal call like FastISel does.
llvm-svn: 321662
This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion
for x86 to use 2 pairs of loads per block.
The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern
with oversized integer compares, so we want to transform these into x86-specific vector
nodes before legalization splits things into scalar chunks.
See PR33325 for more details:
https://bugs.llvm.org/show_bug.cgi?id=33325
Differential Revision: https://reviews.llvm.org/D41618
llvm-svn: 321656
Tests updated to explicitly use fast-isel at -O0 instead of implicitly.
This change also allows an explicit -fast-isel option to override an
implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0.
Differential Revision: https://reviews.llvm.org/D41362
llvm-svn: 321655
Our internal testing has revealed has discovered bugs in PPC builds.
I have forward reproduction instructions to the original author (Nirav).
llvm-svn: 321649
We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately.
llvm-svn: 321612
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.
I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.
llvm-svn: 321611