matchSelectPattern attempts to see through casts which mask min/max
patterns from being more obvious. Under certain circumstances, it would
misidentify a sequence of instructions as a min/max because it assumed
that folding casts would preserve the result. This is not the case for
floating point <-> integer casts.
This fixes PR27575.
llvm-svn: 268086
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.
llvm-svn: 268002
We need to keep loop hints from the original loop on the new vector loop.
Failure to do this meant that, for example:
void foo(int *b) {
#pragma clang loop unroll(disable)
for (int i = 0; i < 16; ++i)
b[i] = 1;
}
this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the
hints that unrolling should be disabled, and then we'd unroll it.
llvm-svn: 267970
I closely followed the precedents set by the vectorizer:
* With -Rpass-missed, the loop is reported with further details pointing
to -Rpass--analysis.
* -Rpass-analysis reports the details why distribution has failed.
* Regardless of -Rpass*, when distribution fails for a loop where
distribution was forced with the pragma, a warning is produced according
to -Wpass-failed. In this case the analysis info is also printed even
without -Rpass-analysis.
llvm-svn: 267952
When inlining a call site with llvm.mem.parallel_loop_access metadata, this
metadata needs to be propagated to all cloned memory-accessing instructions.
Otherwise, inlining parts of the loop body will invalidate the annotation.
With this functionality, we now vectorize the following as expected:
void Body(int *res, int *c, int *d, int *p, int i) {
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
void Test(int *res, int *c, int *d, int *p, int n) {
int i;
#pragma clang loop vectorize(assume_safety)
for (i = 0; i < 1600; i++) {
Body(res, c, d, p, i);
}
}
llvm-svn: 267949
The MOVMSK instructions copies a vector elements' sign bits to the low bits of a scalar register and zeros the high bits.
This patch adds MOVMSK support to SimplifyDemandedUseBits so that its aware that the upper bits are known to be zero. It also removes the call to MOVMSK if none of the lower bits are actually required and just returns zero.
Differential Revision: http://reviews.llvm.org/D19614
llvm-svn: 267873
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.
Differential Revision: http://reviews.llvm.org/D17864
llvm-svn: 267815
The sink cast machinery is supposed to sink casts as close to their user
as possible. However, an EH pad is the first instruction in it's basic
block. Don't sink if the user is an EH pad.
This fixes PR27536.
llvm-svn: 267767
"inferattrs" will deduce the attribute, but it will be too late for
many optimizations. Set it ourselves when creating the call.
Differential Revision: http://reviews.llvm.org/D17598
llvm-svn: 267762
We previously disallowed interleaved load groups that may cause us to
speculatively access memory out-of-bounds (r261331). We did this by ensuring
each load group had an access corresponding to the first and last member.
Instead of bailing out for these interleaved groups, this patch enables us to
peel off the last vector iteration, ensuring that we execute at least one
iteration of the scalar remainder loop. This solution was proposed in the
review of the previous patch.
Differential Revision: http://reviews.llvm.org/D19487
llvm-svn: 267751
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Summary:
D19403 adds a new pragma for loop distribution. This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.
As part of this I had to rethink the flag -enable-loop-distribute. My
goal was to be backward compatible with the existing behavior:
A1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute is specified
A2. pass is on when invoked directly from opt (e.g. for unit-testing)
The new pragma/metadata overrides these defaults so the new behavior is:
B1. A1 + enable distribution for individual loop with the pragma/metadata
B2. A2 + disable distribution for individual loop with the pragma/metadata
The default value whether the pass is on or off comes from the initiator
of the pass. From the PassManagerBuilder the default is off, from opt
it's on.
I moved -enable-loop-distribute under the pass. If the flag is
specified it overrides the default from above.
Then the pragma/metadata can further modifies this per loop.
As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline. So to be precise
this is the new behavior:
C1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute or the pragma/metadata enables it
C2. pass is on when invoked directly from opt
unless -enable-loop-distribute=0 or the pragma/metadata disables it
Reviewers: hfinkel
Subscribers: joker.eph, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19431
llvm-svn: 267672
Summary:
It is incorrect to compare TripCount (which is BECount + 1)
with extraiters (or Count) to check if we should enter unrolled
loop or not, because TripCount can potentially overflow
(when BECount is max unsigned integer).
While comparing BECount with (Count - 1) is overflow safe and
therefore correct.
Reviewer: hfinkel
Differential Revision: http://reviews.llvm.org/D19256
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
When encountering a non-local pointer, LVI would eagerly scan the block for dereferences of the given object to prove the pointer to be non null. That's all well and good, but *then* we'd go recurse through our input blocks. As a result, we could end up scanning each and every block we traverse, even if the final definition was obviously non null or we found a constant value somewhere up the chain. The previous code papered over this by using the isKnownNonNull routine from value tracking. This made the duplication less painful in the common case.
Instead, we know do the block scan only *after* we've gotten the recursive results back. This lets us stop scanning individual blocks as soon as we've determined it to be non-null in any predecessor block and use our usual merge rules to propagate that information cheaply through successor blocks. For a pointer which can be found non-null, this does strictly less work and sometimes substaintially so.
Note that the case where we *can't* prove something non-null is still the really expensive case. We end up scanning each and every block looking for a dereference and never end up finding one.
llvm-svn: 267642
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.
This patch builds on 267609 which did the same thing for unary casts.
llvm-svn: 267620
Essentially, I was using the wrong size function. For types which were sized, but not primitive, I wasn't getting a useful size for the operand and failed an assert. I fixed this, and also added a guard that the input is a sized type. Test case is for the original mistake. I'm not sure how to actually exercise the sized type check.
llvm-svn: 267618
We need the default ratio to be sufficiently large that it triggers transforms
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.
Differential Revision: http://reviews.llvm.org/D19435
llvm-svn: 267615
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.
This patch implements only the unary operation case. Once this is in, I'll implement the same for the binary operations.
Differential Revision: http://reviews.llvm.org/D19492
llvm-svn: 267609
The destination buffer that sprintf uses is restrict qualified, we do
not need to worry about derived pointers referenced via format
specifiers.
This reverts commit r267580.
llvm-svn: 267605
sprintf doesn't read or copy the terminating null byte from it's string
operands. sprintf will append it's own after processing all of the
format specifiers.
This fixes PR27526.
llvm-svn: 267580
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.
This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19301
llvm-svn: 267519
When SimplifyCFG merges identical instructions from both sides of a diamond, it
can preserve !llvm.mem.parallel_loop_access (as it does with most of the other
metadata). There's no real data or control dependency change in this case.
llvm-svn: 267515
I really thought we were doing this already, but we were not. Given this input:
void Test(int *res, int *c, int *d, int *p) {
for (int i = 0; i < 16; i++)
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.
One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.
The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.
Differential Revision: http://reviews.llvm.org/D19512
llvm-svn: 267514