E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
llvm-svn: 222504
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.
This manifested as an assertion failure when we compared the various
types: we had a size mismatch.
This fixes PR21480.
llvm-svn: 222499
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
llvm-svn: 222480
Code seems cleaner and easier to understand this way
This is basically r222416, after fixes for MSVC lack of standard
support, and a few cleaning (got rid of a warning).
Thanks Nakamura Takumi and Nico Weber for the MSVC fixes.
llvm-svn: 222472
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.
However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.
llvm-svn: 222451
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT. This corrects the emission of stack probes on i686-windows-itanium.
llvm-svn: 222439
As dump() methods should be. To allow that, do not store the DWARFFormValue
objects used for the dump in the header data.
Per Alexey's suggestion!
llvm-svn: 222436
These fields would need to be explicitly deleted before we RAUW the temporary
node anyway (this was done in cfe commit r222373). Instead, do not create
these useless nodes in the first place.
llvm-svn: 222434
"global-init", "global-init-src" and "global-init-type" were originally
used to blacklist entities in ASan init-order checker. However, they
were never documented, and later were replaced by "=init" category.
Old blacklist entries should be converted as follows:
* global-init:foo -> global:foo=init
* global-init-src:bar -> src:bar=init
* global-init-type:baz -> type:baz=init
llvm-svn: 222401
This reverts commit r222142. This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.
Conflicts:
test/Transforms/Reassociate/optional-flags.ll
llvm-svn: 222398
- Show "Considering..." message after flipping so you actually see the final
destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
a list of which register got merged with which.
llvm-svn: 222382
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.
With this patch, a build_vector that performs a blend with zero is
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.
This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.
Differential Revision: http://reviews.llvm.org/D6311
llvm-svn: 222375
As detailed at http://llvm.org/PR20728, due to an internal overflow in
APFloat::multiplySignificand the APFloat::fusedMultiplyAdd method can return
incorrect results for x87DoubleExtended (x86_fp80) values. This commonly
manifests as incorrect constant folding of libm fmal calls on x86. E.g.
fmal(1.0L, 1.0L, 3.0L) == 0.0L (should be 4.0L)
This patch fixes PR20728 by adding an extra bit to the significand for
intermediate results of APFloat::multiplySignificand, avoiding the overflow.
llvm-svn: 222374
A register operand that has a common sub-class with its instruction's
defined register class is not always legal. For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.
This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.
llvm-svn: 222368
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.
The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.
This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.
llvm-svn: 222354
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.
I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.
Differential Revision: http://reviews.llvm.org/D5699
llvm-svn: 222340