This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.
It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.
Differential Revision: https://reviews.llvm.org/D31847
llvm-svn: 302810
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.
This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.
This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.
I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.
llvm-svn: 302785
This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.
At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.
llvm-svn: 302782
G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.
When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.
llvm-svn: 302778
This reverts r302712.
The change fails with ASAN enabled:
ERROR: AddressSanitizer: use-after-poison on address ... at ...
READ of size 2 at ... thread T0
#0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42
#1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3
#2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17
#3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33081
llvm-svn: 302746
Summary:
Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.
Fixes PR32086.
Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30471
llvm-svn: 302712
For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.
llvm-svn: 302679
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.
The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.
Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 302678
Summary:
When trying to figure out if MBB could fallthrough to ToMBB (possibly by
falling through a bunch of other MBBs) we didn't actually check if there
was fallthrough between the last two blocks in the chain.
Reviewers: kparzysz, iteratee, MatzeB
Reviewed By: kparzysz, iteratee
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D32996
llvm-svn: 302650
This method must return a valid register class, or the list-ilp isel
scheduler will crash. For MVT::Untyped nullptr was previously returned, but
now ADDR128BitRegClass is returned instead. This is needed just as long as
list-ilp (and probably also list-hybrid) is still there.
Review: Ulrich Weigand, A Trick
https://reviews.llvm.org/D32802
llvm-svn: 302649
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.
Differential Revision: https://reviews.llvm.org/D32245
llvm-svn: 302631
This is a follow-up to r302611, which moved an -O0 computation of DT
from SDAGISel to TwoAddress.
Don't use it here either, and avoid computing it completely. The only
use was forwarding the analysis as an optional argument to utility
functions.
Differential Revision: https://reviews.llvm.org/D32766
llvm-svn: 302612
Before r247167, the pass manager builder controlled which AA
implementations were used, exporting them all in the AliasAnalysis
analysis group.
Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA
implementations if made available in the pass pipeline.
But regardless, SDAGISel is required at O0, and really doesn't need to
be doing fancy optimizations based on useful AA results.
Don't require AA at CodeGenOpt::None, and only use it otherwise.
This does have a functional impact (and one testcase is pessimized
because we can't reuse a load). But I think that's desirable no matter
what.
Note that this alone doesn't result in less DT computations: TwoAddress
was previously able to reuse the DT we computed for SDAG. That will be
fixed separately.
Differential Revision: https://reviews.llvm.org/D32766
llvm-svn: 302611
We currently require SCEV, which requires DT/LI. Those are expensive to
compute, but the pass only runs for functions that have the safestack
attribute.
Compute DT/LI to build SCEV lazily, only when the pass is actually going
to transform the function.
Differential Revision: https://reviews.llvm.org/D31302
llvm-svn: 302610
This should hopefully makes changes to the O0 pipeline obvious; it's
easy to require expensive passes, and this helps make informed
decisions.
Case in point: in the few weeks separating the time when I initially
wrote this patch to the time when I committed, the test regressed as
r302103 added another use of DT!
llvm-svn: 302608
Summary: computeKnownBitsForTargetNode was not defined for Lanai which resulted in additional AND's with 0x1 for the output of SETCC instructions.
Reviewers: eliben, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29605
llvm-svn: 302568
This patch adds more patterns that a reasonable person might write that can be compiled to BZHI.
This adds support for
(~0U >> (32 - b)) & a;
and
a << (32 - b) >> (32 - b);
This was inspired by the code in APInt::clearUnusedBits.
This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case.
I think this is still missing cases where the subtract portion is an 8-bit operation.
Differential Revision: https://reviews.llvm.org/D32616
llvm-svn: 302549
for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit.
This patch handles cases where the lower bit is '1'.
Differential Revision: https://reviews.llvm.org/D32805
llvm-svn: 302546
The modified tests should test the masked intrinsics.
Currently the mask is constant, which with a future patch (https://reviews.llvm.org/D32805) will cause the intrinsics to be replaced with an unmasked version.
This patch changes the constant mask to be a variable one.
llvm-svn: 302529
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD.
Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal.
Differential Revision: https://reviews.llvm.org/D32973
llvm-svn: 302525
This reverts commit r302461.
It appears to be causing failures compiling gtest with debug info on the
Linux sanitizer bot. I was unable to reproduce the failure locally,
however.
llvm-svn: 302504
Summary:
For inalloca functions, this is a very common code pattern:
%argpack = type <{ i32, i32, i32 }>
define void @f(%argpack* inalloca %args) {
entry:
%a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0
%b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1
%c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2
tail call void @llvm.dbg.declare(metadata i32* %a, ... "a")
tail call void @llvm.dbg.declare(metadata i32* %c, ... "b")
tail call void @llvm.dbg.declare(metadata i32* %b, ... "c")
Even though these GEPs can be simplified to a constant offset from EBP
or RSP, we don't do that at -O0, and each GEP is computed into a
register. Registers used to compute argument addresses are typically
spilled and clobbered very quickly after the initial computation, so
live debug variable tracking loses information very quickly if we use
DBG_VALUE instructions.
This change moves processing of dbg.declare between argument lowering
and basic block isel, so that we can ask if an argument has a frame
index or not. If the argument lives in a register as is the case for
byval arguments on some targets, then we don't put it in the side table
and during ISel we emit DBG_VALUE instructions.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32980
llvm-svn: 302483
Summary:
An llvm.dbg.declare of a static alloca is always added to the
MachineFunction dbg variable map, so these values are entirely
redundant. They survive all the way through codegen to be ignored by
DWARF emission.
Effectively revert r113967
Two bugpoint-reduced test cases from 2012 broke as a result of this
change. Despite my best efforts, I haven't been able to rewrite the test
case using dbg.value. I'm not too concerned about the lost coverage
because these were reduced from the test-suite, which we still run.
Reviewers: aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32920
llvm-svn: 302461
This fixes PR32550, in a way that does not imply running the greedy
mode at O0.
The fix consists in checking if a load is used by any floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.
llvm-svn: 302453
Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead.
This is a first step in several things we can do to improve PBLENDV support:
* Better matching of X86ISD::ANDNP patterns.
* Handle floating point cases.
* Better vector and bitcast support in computeNumSignBits.
* Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.).
Differential Revision: https://reviews.llvm.org/D32953
llvm-svn: 302424