memory bound checks. Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:
for ( k=1 ; k<n ; k++ )
x[k] = x[k-1] + y[k];
llvm-svn: 170811
Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.
This fixes External/SPEC/CINT95/099_go/099_go
llvm-svn: 170756
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).
llvm-svn: 170704
When the least bit of C is greater than V, (x&C) must be greater than V
if it is not zero, so the comparison can be simplified.
Although this was suggested in Target/X86/README.txt, it benefits any
architecture with a directly testable form of AND.
Patch by Kevin Schoedel
llvm-svn: 170576
This changes adds shadow and origin propagation for unknown intrinsics
by examining the arguments and ModRef behaviour. For now, only 3 classes
of intrinsics are handled:
- those that look like simple SIMD store
- those that look like simple SIMD load
- those that don't have memory effects and look like arithmetic/logic/whatever
operation on simple types.
llvm-svn: 170530
MapVector is a bit heavyweight, but I don't see a simpler way. Also the
InductionList is unlikely to be large. This should help 3-stage selfhost
compares (PR14647).
llvm-svn: 170528
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.
llvm-svn: 170353
This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.
Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.
The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.
llvm-svn: 170338
Check whether a BB is known as reachable before adding it to the worklist.
This way BB's with multiple predecessors are added to the list no more than
once.
llvm-svn: 170335
The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.
This should fix two new asserts that showed up in the vectorize nightly
tester.
llvm-svn: 170333
I noticed this while looking at r170328. We only ever do a vector
rewrite when the alloca *is* the vector type, so it's good to not paper
over bugs here by doing a convertValue that isn't needed.
llvm-svn: 170331
This will allow its use inside of memcpy rewriting as well. This routine
is more complex than extractVector, and some of its uses are not 100%
where I want them to be so there is still some work to do here.
While this can technically change the output in some cases, it shouldn't
be a change that matters -- IE, it can leave some dead code lying around
that prior versions did not, etc.
Yet another step in the refactorings leading up to the solution to the
last component of PR14478.
llvm-svn: 170328
The method helpers all implicitly act upon the alloca, and what we
really want is a fully generic helper. Doing memcpy rewrites is more
special than all other rewrites because we are at times rewriting
instructions which touch pointers *other* than the alloca. As
a consequence all of the helpers needed by memcpy rewriting of
sub-vector copies will need to be generalized fully.
Note that all of these helpers ({insert,extract}{Integer,Vector}) are
woefully uncommented. I'm going to go back through and document them
once I get the factoring correct.
No functionality changed.
llvm-svn: 170325
PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).
The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.
Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.
llvm-svn: 170301
No functionality changed. Refactoring leading up to the fix for PR14478
which requires some significant changes to the memset and memcpy
rewriting.
llvm-svn: 170299
This change moves the code for default shadow propagaition (handleShadowOr)
and origin tracking (setOriginForNaryOp) into a new builder-like class. Also
gets rid of handleShadowOrBinary.
llvm-svn: 170192
This assumes (1 << n) is always not zero. Consider n is greater than word size.
Although I know it is undefined, this transforms undefined behavior hidden.
This led clang unexpected behavior with some failures. I will investigate to fix undefined shl in clang.
llvm-svn: 170128
In a previous thread it was pointed out that isPowerOfTwo is not a very precise
name since it can return false for powers of two if it is unable to show that
they are powers of two.
llvm-svn: 170093
Provides m_Argument that allows matching against a CallSite's specified argument. Provides m_Intrinsic pattern that can be templatized over the intrinsic id and bind/match arguments similarly to other pattern matchers. Implementations provided for 0 to 4 arguments, though it's very simple to extend for more. Also provides example template specialization for bswap (m_BSwap) and example of code cleanup for its use.
llvm-svn: 170091
Better controls the inlining of functions when the caller function has MinSize attribute.
Basically, when the caller function has this attribute, we do not "force" the inlining
of callee functions carrying the InlineHint attribute (i.e., functions defined with
inline keyword)
llvm-svn: 170065
been used in the first place. It simply was passed to the function and to the
recursive invocations. Simply drop the parameter and update the callers for the
new signature.
Patch by Saleem Abdulrasool!
llvm-svn: 169988
When ASan replaces <alloca instruction> with
<offset into a common large alloca>, it should also patch
llvm.dbg.declare calls and replace debug info descriptors to mark
that we've replaced alloca with a value that stores an address
of the user variable, not the user variable itself.
See PR11818 for more context.
llvm-svn: 169984
Use explicitely aligned store and load instructions to deal with argument and
retval shadow. This matters when an argument's alignment is higher than
__msan_param_tls alignment (which is the case with __m128i).
llvm-svn: 169859
The `-mno-red-zone' flag wasn't being propagated to the functions that code
coverage generates. This allowed some of them to use the red zone when that
wasn't allowed.
<rdar://problem/12843084>
llvm-svn: 169754
This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.
However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
tracking isn't needed.
3) It doesn't support non-instruction pointer values.
The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.
Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.
The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.
Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.
llvm-svn: 169728
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.
In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.
The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.
llvm-svn: 169719
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.
Reviewed by: Nadav
llvm-svn: 169711
This will more closely match the behavior of the new PtrUseVisitor that
I am adding. Hopefully this will not change the actual behavior in any
way, but by making the processing order more similar help in debugging.
llvm-svn: 169697
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.
This reverts the commits:
r169671: Fix a logic error.
r169604: Move the popcnt tests to an X86 subdirectory.
r168931: Initial commit adding the pass.
llvm-svn: 169683
MSan uses a TLS slot to pass shadow for function arguments and return values.
This makes all instrumented functions not readonly, and at the same time
requires that all callees of an instrumented function that may be
MSan-instrumented do not have readonly attribute (otherwise some of the
instrumentation may be optimized out).
llvm-svn: 169591
Instead of unconditionally storing origin with every application store,
only do this when the shadow of the stored value is != 0.
This change also delays instrumentation of stores until after the walk over
function's instructions, because adding new basic blocks confuses InstVisitor.
We only keep 1 origin value per 4 bytes of application memory. This change
fixes the bug when a store of a single clean byte wiped the origin for the
whole 4-byte area.
Since stores of uninitialized values are relatively uncommon, this change
improves performance of track-origins mode by 5% median and by up to 47% on
specs.
llvm-svn: 169490
This change attempts to simplify (X^Y) -> X or Y in the user's context if we know that
only bits from X or Y are demanded.
A minimized case is provided bellow. This change will simplify "t>>16" into "var1 >>16".
=============================================================
unsigned foo (unsigned val1, unsigned val2) {
unsigned t = val1 ^ 1234;
return (t >> 16) | t; // NOTE: t is used more than once.
}
=============================================================
Note that if the "t" were used only once, the expression would be finally optimized as well.
However, with with this change, the optimization will take place earlier.
Reviewed by Nadav, Thanks a lot!
llvm-svn: 169317
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
Added the code that actually performs the if-conversion during vectorization.
We can now vectorize this code:
for (int i=0; i<n; ++i) {
unsigned k = 0;
if (a[i] > b[i]) <------ IF inside the loop.
k = k * 5 + 3;
a[i] = k; <---- K is a phi node that becomes vector-select.
}
llvm-svn: 169217
This change tries to simmplify E1 = " X >> C1 << C2" into :
- E2 = "X << (C2 - C1)" if C2 > C1, or
- E2 = "X >> (C1 - C2)" if C1 > C2, or
- E2 = X if C1 == C2.
Reviewed by Nadav. Thanks!
llvm-svn: 169182
which is the legality of the if-conversion transformation. The next step is to
implement the cost-model for the if-converted code as well as the
vectorization itself.
llvm-svn: 169152
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
The partitioning logic attempted to handle uses of an alloca with an
offset starting before the alloca so long as the use had some overlap
with the alloca itself. However, there was a bug where we tested
'(uint64_t)Offset >= AllocSize' without first checking whether 'Offset'
was positive. As a consequence, essentially every negative offset (that
is, starting *before* the alloca does) would be thrown out, even if it
was overlapping. The subsequent code to throw out negative offsets which
were actually non-overlapping was essentially dead. The code to *handle*
overlapping negative offsets was actually dead!
I've just removed all of this, and taught SROA to discard any uses which
start prior to the alloca from the beginning. It has the lovely property
of simplifying the code. =] All the tests still pass, and in fact no new
tests are needed as this is already covered by our testsuite. Fixing the
code so that negative offsets work the way the comments indicate they
were supposed to work causes regressions. That's how I found this.
Anyways, this is all progress in the correct direction -- tightening up
SROA to be maximally aggressive. Some day, I really hope to turn
out-of-bounds accesses to an alloca into 'unreachable'.
llvm-svn: 169120
Also check in a case to repeat the issue, on which 'opt -globalopt' consumes 1.6GB memory.
The big memory footprint cause is that current GlobalOpt one by one hoists and stores the leaf element constant into the global array, in each iteration, it recreates the global array initializer constant and leave the old initializer alone. This may result in many obsolete constants left.
For example: we have global array @rom = global [16 x i32] zeroinitializer
After the first element value is hoisted and installed: @rom = global [16 x i32] [ 1, 0, 0, ... ]
After the second element value is installed: @rom = global [16 x 32] [ 1, 2, 0, 0, ... ] // here the previous initializer is obsolete
...
When the transform is done, we have 15 obsolete initializers left useless.
llvm-svn: 169079
The original patch removed a bunch of code that the SjLjEHPrepare pass placed
into the entry block if all of the landing pads were removed during the
CodeGenPrepare class. The more natural way of doing things is to run the CGP
*before* we run the SjLjEHPrepare pass.
Make it so!
llvm-svn: 169044
We're iterating over a non-deterministically ordered container looking
for two saturating flags. To do this correctly, we have to saturate
both, and only stop looping if both saturate to their final value.
Otherwise, which flag we see first changes the result.
This is also a micro-optimization of the previous version as now we
don't go into the (possibly expensive) test logic once the first
violation of either constraint is detected.
llvm-svn: 168989
functionality changed.
Evan's commit r168970 moved the code that the primary comment in this
function referred to to the other end of the function without moving the
comment, and there has been a steady creep of "boolean" logic in it that
is simpler if handled via early exit. That way each special case can
have its own comments. I've also made the variable name a bit more
explanatory than "AllFit". This is in preparation to fix the
non-deterministic output of this function.
llvm-svn: 168988
The simplify-libcalls pass maintained a statistic to count the number
of library calls that have been simplified. Now that library call
simplification is being carried out in instcombine the statistic should
be moved to there.
llvm-svn: 168975
depends on the IR infrastructure, there is no sense in it being off in
Support land.
This is in preparation to start working to expand InstVisitor into more
special-purpose visitors that are still generic and can be re-used
across different passes. The expansion will go into the Analylis tree
though as nothing in VMCore needs it.
llvm-svn: 168972
This revision attempts to recognize following population-count pattern:
while(a) { c++; ... ; a &= a - 1; ... },
where <c> and <a>could be used multiple times in the loop body.
TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent
instruction sequence, which need to be improved in the following commits.
Reviewed by Nadav, really appreciate!
llvm-svn: 168931
the last invoke instruction in the function. This also removes the last landing
pad in an function. This is fine, but with SjLj EH code, we've already placed a
bunch of code in the 'entry' block, which expects the landing pad to stick
around.
When we get to the situation where CGP has removed the last landing pad, go
ahead and nuke the SjLj instructions from the 'entry' block.
<rdar://problem/12721258>
llvm-svn: 168930
This patch migrates the puts optimizations from the simplify-libcalls
pass into the instcombine library call simplifier.
All the simplifiers from simplify-libcalls have now been migrated to
instcombine. Yay! Just a few other bits to migrate (prototype attribute
inference and a few statistics) and simplify-libcalls can finally be put
to rest.
llvm-svn: 168925