Summary:
When computing useful bits for a BFM instruction, we need
to take into consideration the case where both operands
of the BFM are equal and provide data that we need to track.
Not doing this can cause us to miss useful bits.
Fixes PR31138 (https://llvm.org/bugs/show_bug.cgi?id=31138)
Reviewers: t.p.northover, jmolloy
Subscribers: evandro, gberry, srhines, pirama, mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D27130
llvm-svn: 288253
Summary:
In AArch64InstrInfo::foldMemoryOperandImpl, catch more cases where the
COPY being spilled is copying from WZR/XZR, but the source register is
not in the COPY destination register's regclass.
For example, when spilling:
%vreg0 = COPY %XZR ; %vreg0:GPR64common
without this change, the code in TargetInstrInfo::foldMemoryOperand()
and canFoldCopy() that normally handles cases like this would fail to
optimize since %XZR is not in GPR64common. So the spill code generated
would be:
%vreg0 = COPY %XZR
STR %vreg
instead of the new code generated:
STR %XZR
Reviewers: qcolombet, MatzeB
Subscribers: mcrosier, aemerson, t.p.northover, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26976
llvm-svn: 288176
We have the following DAGCombiner transformations:
(mul (shl X, c1), c2) -> (mul X, c2 << c1)
(mul (shl X, C), Y) -> (shl (mul X, Y), C)
(shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.
Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.
Differential Revision: https://reviews.llvm.org/D26605
llvm-svn: 287766
The test is currently broken, and this CL should fix it.
Patch by Adrian Kuegel!
Differential Revision: https://reviews.llvm.org/D26910
llvm-svn: 287536
This patch adds a test for the assembly code emitted with XRay
instrumentation. It also fixes a bug where the operand of a jump
instruction must be not the number of bytes to jump over, but rather the
number of 4-byte instructions.
Author: rSerge
Reviewers: dberris, rengolin
Differential Revision: https://reviews.llvm.org/D26805
llvm-svn: 287516
Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.
This is a follow-up change to r286875.
This change fixes PR31038.
Reviewers: MatzeB
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26682
llvm-svn: 287142
Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.
Differential Revision: https://reviews.llvm.org/D26111
llvm-svn: 287076
Lower a = b * C where C = (2^n + 1) * 2^m to
add w0, w0, w0, lsl n
lsl w0, w0, m
Differential Revision: https://reviews.llvm.org/D229245
llvm-svn: 287019
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.
Differential revision: https://reviews.llvm.org/D26518
llvm-svn: 286907
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.
For example, the final generated code should be:
stp xzr, xzr, [x0]
instead of:
movi v0.2d, #0
str q0, [x0]
Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26561
llvm-svn: 286875
Summary:
Fix off-by-one indexing error in loop checking that inserted value was a
splat vector.
Add code to check that INSERT_VECTOR_ELT nodes constructing the splat
vector have the expected constant index values.
Reviewers: t.p.northover, jmolloy, mcrosier
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26409
llvm-svn: 286616
This is a partial revert of r244615 (http://reviews.llvm.org/D11942),
which caused a major regression in debug info quality.
Turning the artificial __MergedGlobal symbols into private symbols
(l__MergedGlobal) means that the linker will not include them in the
symbol table of the final executable. Without a symbol table entry
dsymutil is not be able to process the debug info for any of the
merged globals and thus drops the debug info for all of them.
This patch is enabling the old behavior for all MachO targets while
leaving all other targets unaffected.
rdar://problem/29160481
https://reviews.llvm.org/D26531
llvm-svn: 286607
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.
Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.
Differential Revision: https://reviews.llvm.org/D25140
llvm-svn: 286544
There is no need to track dependencies for constant physregs, as they
don't change their value no matter in what order you read/write to them.
Differential Revision: https://reviews.llvm.org/D26221
llvm-svn: 286526
When copying to/from a constant register interferences can be ignored.
Also update the documentation for isConstantPhysReg() to make it more
obvious that this transformation is valid.
Differential Revision: https://reviews.llvm.org/D26106
llvm-svn: 286503
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.
Differential Revision: https://reviews.llvm.org/D26394
llvm-svn: 286231
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.
Patch mostly by Ahmed again.
llvm-svn: 286183
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling. We just need to transfer memory
operands during lowering.
Reviewers: mcrosier, t.p.northover, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26313
llvm-svn: 286168
Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also clarify the name of
the other existing option governing the creation of jump tables.
Differential revision: https://reviews.llvm.org/D25883
llvm-svn: 285104
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases. The motivation is
that many contemporary processors typically perform case switches fairly
quickly.
Differential revision: https://reviews.llvm.org/D25212
llvm-svn: 285099
Add support for estimating the square root or its reciprocal and division or
reciprocal using the combiner generic Newton series.
Differential revision: https://reviews.llvm.org/D25291
llvm-svn: 284986
Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.
Differential Revision: https://reviews.llvm.org/D24808
llvm-svn: 284531
AArch64 actually supports many 8-bit operations under the definition used by
GlobalISel: the designated information-carrying bits of a GPR32 get the right
value if you just use the normal 32-bit instruction.
llvm-svn: 284526