Commit Graph

14581 Commits

Author SHA1 Message Date
Guozhi Wei 92e9d0e80e [InstCombine] Combine A->B->A BitCast
This patch enhances InstCombine to handle following case:

        A  ->  B    bitcast
        PHI
        B  ->  A    bitcast

llvm-svn: 262670
2016-03-03 23:21:38 +00:00
Sanjay Patel 9bba75084b [InstCombine] transform bitcasted bitwise logic ops with constants (PR26702)
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.

The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
  %bc = bitcast <4 x i32> %not to <2 x i64>
  %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
  %bc2 = bitcast <2 x i64> %notnot to <4 x i32>
  ret <4 x i32> %bc2
}

Simplifies to the expected:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  ret <4 x i32> %lobit
}

Differential Revision: http://reviews.llvm.org/D17583

llvm-svn: 262645
2016-03-03 19:19:04 +00:00
Easwaran Raman 3035719c86 Infrastructure for PGO enhancements in inliner
This patch provides the following infrastructure for PGO enhancements in inliner:

Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.

Differential Revision: http://reviews.llvm.org/D16381

llvm-svn: 262636
2016-03-03 18:26:33 +00:00
Dehao Chen 57d1dda558 Use LineLocation instead of CallsiteLocation to index callsite profile.
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17827

llvm-svn: 262634
2016-03-03 18:09:32 +00:00
Matthew Simpson b840a6d6f4 [LoopUtils, LV] Fix PR26734
The vectorization of first-order recurrences (r261346) caused PR26734. When
detecting these recurrences, we need to ensure that the previous value is
actually defined inside the loop. This patch includes the fix and test case.

llvm-svn: 262624
2016-03-03 16:12:01 +00:00
Amaury Sechet 3b8b2ea2e1 Explode store of arrays in instcombine
Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.

Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17828

llvm-svn: 262530
2016-03-02 22:36:45 +00:00
Amaury Sechet 7cd3fe7db6 Unpack array of all sizes in InstCombine
Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.

Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15890

llvm-svn: 262521
2016-03-02 21:28:30 +00:00
Daniel Berlin 6412002d24 Really fix ASAN leak/etc issues with MemorySSA unittests
llvm-svn: 262519
2016-03-02 21:16:28 +00:00
Daniel Berlin 989e601b26 Revert "Fix ASAN detected errors in code and test" (it was not meant to be committed yet)
This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95.

llvm-svn: 262512
2016-03-02 20:36:22 +00:00
Daniel Berlin 27ed1c2eb0 Fix ASAN detected errors in code and test
llvm-svn: 262511
2016-03-02 20:27:29 +00:00
Chandler Carruth 12884f7f80 [AA] Hoist the logic to reformulate various AA queries in terms of other
parts of the AA interface out of the base class of every single AA
result object.

Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.

When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.

To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).

However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.

The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.

It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.

However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.

Differential Revision: http://reviews.llvm.org/D17329

llvm-svn: 262490
2016-03-02 15:56:53 +00:00
George Burgess IV e0e6e48b29 Attempt to fix ASAN failure in a MemorySSA test.
llvm-svn: 262452
2016-03-02 02:35:04 +00:00
Sanjay Patel 5e4c46de6d revert r262424 because there's a *clang test* for AArch64 that checks -O3 asm output
that is broken by this change

llvm-svn: 262440
2016-03-02 01:04:09 +00:00
Sanjay Patel 147e927957 [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701)
As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case. 

Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.

But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
 

llvm-svn: 262424
2016-03-01 23:55:18 +00:00
Dehao Chen 1012be120a Perform InstructioinCombiningPass before SampleProfile pass.
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17742

llvm-svn: 262419
2016-03-01 22:53:02 +00:00
Owen Anderson 7ea02fc787 Fix an issue where fast math flags were dropped during scalarization.
Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.

llvm-svn: 262376
2016-03-01 19:35:52 +00:00
Daniel Berlin 83fc77b4c0 Add the beginnings of an update API for preserving MemorySSA
Summary:
This adds the beginning of an update API to preserve MemorySSA.  In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.

It also adds relevant unit testing infrastructure for MemorySSA's API.

(There is an actual user of this API, i will make that diff dependent on this one.  In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)

Reviewers: hfinkel, reames, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17157

llvm-svn: 262362
2016-03-01 18:46:54 +00:00
Petar Jovanovic 6315f3f9b7 Revert "calculate builtin_object_size if argument is a removable pointer"
Revert r262337 as "check-llvm ubsan" step failed on
sanitizer-x86_64-linux-fast buildbot.

llvm-svn: 262349
2016-03-01 16:50:08 +00:00
Petar Jovanovic 8aef99aa86 calculate builtin_object_size if argument is a removable pointer
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D17337

llvm-svn: 262337
2016-03-01 14:39:55 +00:00
Sanjay Patel 6f2c01f712 [x86, InstCombine] transform more x86 masked loads to LLVM intrinsics
Continuation of:
http://reviews.llvm.org/rL262269

llvm-svn: 262273
2016-02-29 23:59:00 +00:00
Adam Nemet efc091f457 [LLE] Fix a comment
llvm-svn: 262270
2016-02-29 23:21:12 +00:00
Sanjay Patel 98a71505f5 [x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:

__m128 mload_zeros(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0));
}

__m128 mload_fakeones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(1));
}

__m128 mload_ones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}

__m128 mload_oneset(float *f) {
  return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}

...so none of the above will actually generate a masked load for optimized code.

This is the masked load counterpart to:
http://reviews.llvm.org/rL262064

llvm-svn: 262269
2016-02-29 23:16:48 +00:00
Adam Nemet 83be06e529 [LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly
We can actually have dependences between accesses with different
underlying types.  Bail in this case.

A test will follow shortly.

llvm-svn: 262267
2016-02-29 22:53:59 +00:00
Adam Nemet dd9e637aca Enable LoopLoadElimination by default
Summary:
I re-benchmarked this and results are similar to original results in
D13259:

On ARM64:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
  SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%

On x86:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%

And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.

In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.

The reason that time spent in SCEV increases has to do with the design
of the old pass manager.  If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.

This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.

(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result.  Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)

Reviewers: hfinkel, dberlin

Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16300

llvm-svn: 262250
2016-02-29 20:35:11 +00:00
Rong Xu 9e926e8b92 Minor code cleanup. NFC
llvm-svn: 262242
2016-02-29 19:16:04 +00:00
Dehao Chen 939993ff2f Move discriminator assignment to the right place.
Summary: Now discriminator is assigned per-function instead of per-module.

Reviewers: davidxl, dnovillo

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D17664

llvm-svn: 262240
2016-02-29 18:59:48 +00:00
Xinliang David Li 985ff20a9c [PGO] Remove redundant counter copies for avail_extern functions.
Differential Revision: http://reviews.llvm.org/D17654

llvm-svn: 262157
2016-02-27 23:11:30 +00:00
Renato Golin 9a5419ecf7 Revert "[sancov] do not instrument nodes that are full pre-dominators"
This reverts commit r262103, as it broke all ARM and AArch64 bots.

llvm-svn: 262139
2016-02-27 14:19:19 +00:00
Sean Silva ea399f0242 [instrprof] Use __{start,stop}_SECNAME on PS4 too.
Summary:
The PS4 linker seems to handle this fine.

Hi David, it seems that indeed most ELF linkers support
__{start,stop}_SECNAME, as our proprietary linker does as well.

This follows the pattern of r250679 w.r.t. the testing.

Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain
prerelease and it seems to work fine.

Reviewers: davidxl

Subscribers: probinson, phillip.power, MaggieYi

Differential Revision: http://reviews.llvm.org/D17672

llvm-svn: 262112
2016-02-27 06:01:26 +00:00
Mike Aizatsky 9056284912 [sancov] properly initializing pass.
llvm-svn: 262111
2016-02-27 05:50:40 +00:00
Kostya Serebryany 3c767db3c5 [libFuzzer] don't emit callbacks to sanitizer run-time in -fsanitize-coverage=trace-pc mode; update libFuzzer doc for previous commit
llvm-svn: 262110
2016-02-27 05:45:12 +00:00
Chandler Carruth ad8cb382fa [LICM] Teach LICM how to handle cases where the alias set tracker was
merged into a loop that was subsequently unrolled (or otherwise nuked).

In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.

The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.

This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.

llvm-svn: 262108
2016-02-27 04:34:07 +00:00
Mike Aizatsky 9b53ab7121 [sancov] do not instrument nodes that are full pre-dominators
Summary:
Without tree pruning clang has 2,667,552 points.
Wiht only dominators pruning: 1,515,586.
With both dominators & predominators pruning: 1,340,534.

Differential Revision: http://reviews.llvm.org/D17671

llvm-svn: 262103
2016-02-27 02:10:27 +00:00
Reid Kleckner 892ae2e2b6 [InstCombine] Be more conservative about removing stackrestore
We ended up removing a save/restore pair around an inalloca call,
leading to a miscompile in Chromium.

llvm-svn: 262095
2016-02-27 00:53:54 +00:00
Sanjay Patel fc7e7ebf36 [x86, InstCombine] transform x86 AVX2 masked stores to LLVM intrinsics
Replicate everything for integers...because x86.

Continuation of:
http://reviews.llvm.org/rL262064

llvm-svn: 262077
2016-02-26 21:51:44 +00:00
Sanjay Patel 1ace99351f [x86, InstCombine] transform x86 AVX masked stores to LLVM intrinsics
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:

void mstore_zero_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}

void mstore_fake_ones_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}

void mstore_ones_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}

void mstore_one_set_elt_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}

...so none of the above will actually generate a masked store for optimized code.

Differential Revision: http://reviews.llvm.org/D17485

llvm-svn: 262064
2016-02-26 21:04:14 +00:00
Haicheng Wu 5539f852ae [JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()
This change tries to find more opportunities to thread over basic blocks.

llvm-svn: 261981
2016-02-26 06:06:04 +00:00
Michael Zolotukhin 9f520ebc54 [LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating.
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.

Reviewers: chandlerc, hfinkel

Subscribers: sanjoy, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17632

llvm-svn: 261958
2016-02-26 02:57:05 +00:00
Mike Aizatsky 5971f18133 [sancov] Pruning full dominator blocks from instrumentation.
Summary:
This is the first simple attempt to reduce number of coverage-
instrumented blocks.

If a basic block dominates all its successors, then its coverage
information is useless to us. Ingore such blocks if
santizer-coverage-prune-tree option is set.

Differential Revision: http://reviews.llvm.org/D17626

llvm-svn: 261949
2016-02-26 01:17:22 +00:00
Anna Zaks 40148f1716 [asan] Do not instrument globals in the special "LLVM" sections
llvm-svn: 261794
2016-02-24 22:12:18 +00:00
David Majnemer ee0cbbbe69 [SimplifyCFG] Use a more elegant solution than r261731
The cleanupret instruction has an invariant that it's 'from' operand be
a cleanuppad.  This invariant was violated when we removed a dead block
which removed a cleanuppad leaving behind a cleanupret with an undef
'from' operand.

This was solved in r261731 by staving off the removal of the dead block
to a later pass.

However, it occured to me that we do not need to do this.
Instead, we can simply avoid processing the cleanupret if it has an
undef 'from' operand because we know that it will be removed soon.

llvm-svn: 261754
2016-02-24 17:30:48 +00:00
Sanjay Patel dbbaca0e1b [InstCombine] enable optimization of casted vector xor instructions
This is part of the payoff for the refactoring in:
http://reviews.llvm.org/rL261649
http://reviews.llvm.org/rL261707

In addition to removing a pile of duplicated code, the xor case was
missing the optimization for vector types because it checked
"SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()"
like 'and' and 'or' were already doing.

This solves part of:
https://llvm.org/bugs/show_bug.cgi?id=26702

llvm-svn: 261750
2016-02-24 17:00:34 +00:00
Artur Pilipenko 31bcca47d3 NFC. Move isDereferenceable to Loads.h/cpp
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.   

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D16180

llvm-svn: 261736
2016-02-24 12:49:04 +00:00
David Majnemer ec72e37220 [SimplifyCFG] Do not blindly remove unreachable blocks
DeleteDeadBlock was called indiscriminately, leading to cleanuprets with
undef cleanuppad references.

Instead, try to drain the BB of most of it's instructions if it is
unreachable.  We can then remove the BB if it solely consists of a
terminator (and maybe some phis).

llvm-svn: 261731
2016-02-24 10:02:16 +00:00
Sanjay Patel 75b4ae25cb [InstCombine] refactor visitOr() to use foldCastedBitwiseLogic()
Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))

But I'm not sure how to expose that difference in a regression test. 
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.

The refactoring should extend to the 'xor' case next to solve part of
PR26702.

llvm-svn: 261707
2016-02-23 23:56:23 +00:00
Sanjay Patel 713f25e0f8 [InstCombine] improve readability ; NFCI
Less indenting, named local variables, more descriptive names.

llvm-svn: 261659
2016-02-23 17:41:34 +00:00
David Majnemer 223538f764 [WinEH] Don't inline an 'unwinds to caller' cleanupret into funclets which locally unwind
It is problematic if the inlinee has a cleanupret which unwinds to
caller and we inline it into a call site which doesn't unwind.

If the funclet unwinds anywhere other than to the caller,
then we will give the funclet two unwind destinations.
This will result in a verifier failure.

Seeing as how the caller wasn't an invoke (which would locally unwind)
and that the funclet cannot unwind to caller, we must conclude that an
'unwind to caller' cleanupret is dynamically unreachable.

This fixes PR26698.

Differential Revision: http://reviews.llvm.org/D17536

llvm-svn: 261656
2016-02-23 17:11:04 +00:00
Sanjay Patel 7d0d810ce5 [InstCombine] less indenting; NFC
llvm-svn: 261652
2016-02-23 16:59:21 +00:00
Sanjay Patel 40e7ba0046 [InstCombine] add helper function to foldCastedBitwiseLogic() ; NFCI
This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

We should be able to reuse most of this and delete the nearly identical 
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.

llvm-svn: 261649
2016-02-23 16:36:07 +00:00
Michael Zolotukhin 792a885537 Follow up for r261597: Add the * to the auto.
llvm-svn: 261600
2016-02-23 00:57:48 +00:00
Michael Zolotukhin 4fdf974e3e Follow-up for r261595: use range loop.
llvm-svn: 261597
2016-02-23 00:48:44 +00:00
Michael Zolotukhin de19ed1eb1 [LoopUnroll] Avoid unnecessary DT recomputation.
Summary:
When we completely unroll a loop, it's pretty easy to update DT in-place and
thus avoid rebuilding it. DT recalculation is one of the most time-consuming
tasks in loop-unroll, so avoiding it at least in case of full unroll should be
beneficial.

On some extreme (but still real-world) tests this patch improves compile time by
~2x.

Reviewers: escha, jmolloy, hfinkel, sanjoy, chandlerc

Subscribers: joker.eph, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D17473

llvm-svn: 261595
2016-02-23 00:30:50 +00:00
Dehao Chen 6c73b49911 Set function entry count as 0 if sample profile is not found for the function.
Summary: This change makes the sample profile's behavior consistent with instr profile.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17522

llvm-svn: 261587
2016-02-22 22:46:21 +00:00
Adam Nemet fb31d580ea [LoopDataPrefetch] Make it testable with opt
Summary:
Since this is an IR pass it's nice to be able to write tests without
llc.  This is the counterpart of the llc test under
CodeGen/PowerPC/loop-data-prefetch.ll.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17464

llvm-svn: 261578
2016-02-22 21:41:22 +00:00
Michael Zolotukhin d734bea8ba [LoopUnrolling] Fix a bug introduced in r259869 (PR26688).
The issue was that we only required LCSSA rebuilding if the immediate
parent-loop had values used outside of it. The fix is to enaable the
same logic for all outer loops, not only immediate parent.

llvm-svn: 261575
2016-02-22 21:21:45 +00:00
Philip Reames ce38c2ddf6 [RS4GC] "Constant fold" the rs4gc-split-vector-values flag
This flag was part of a migration to a new means of handling vectors-of-points which was described in the llvm-dev thread "FYI: Relocating vector of pointers".  The old code path has been off by default for a while without complaints, so time to cleanup.

llvm-svn: 261569
2016-02-22 21:01:28 +00:00
Philip Reames 79fa9b75c0 [RS4GC] Revert optimization attempt due to memory corruption
This change reverts "246133 [RewriteStatepointsForGC] Reduce the number of new instructions for base pointers" and a follow on bugfix 12575.

As pointed out in pr25846, this code suffers from a memory corruption bug.  Since I'm (empirically) not going to get back to this any time soon, simply reverting the problematic change is the right answer.

llvm-svn: 261565
2016-02-22 20:45:56 +00:00
Justin Lebar ccbd8f5a02 Revert "[attrs] Handle convergent CallSites."
This reverts r261544, which was causing a test failure in
Transforms/FunctionAttrs/readattrs.ll.

llvm-svn: 261549
2016-02-22 18:24:43 +00:00
Justin Lebar 7bf9187abb [attrs] Handle convergent CallSites.
Summary:
Previously we had a notion of convergent functions but not of convergent
calls.  This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.

Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent.  As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.

Reviewers: chandlerc, jingyue

Subscribers: hfinkel, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17317

llvm-svn: 261544
2016-02-22 17:51:35 +00:00
Benjamin Kramer 451f54cf62 Fix some abuse of auto flagged by clang's -Wrange-loop-analysis.
llvm-svn: 261524
2016-02-22 13:11:58 +00:00
Elena Demikhovsky 9914dbd11b Allow setting MaxRerollIterations above 16
By Ayal Zaks.

Differential Revision http://reviews.llvm.org/D17258

llvm-svn: 261517
2016-02-22 09:38:28 +00:00
Duncan P. N. Exon Smith e9bc579c37 ADT: Remove == and != comparisons between ilist iterators and pointers
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.

Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators.  (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)

There should be NFC here.

llvm-svn: 261498
2016-02-21 20:39:50 +00:00
Duncan P. N. Exon Smith ec6f7fed54 TransformUtils: Avoid getNodePtrUnchecked() in integer division, NFC
Stop relying on `getNodePtrUnchecked()` being useful on invalid
iterators.  This function is documented to be for internal use only, and
the pointer type will eventually have to change to remove UB from
ilist_iterator.  Instead, check the iterator before it has been
invalidated.

llvm-svn: 261497
2016-02-21 20:14:29 +00:00
Sanjay Patel 2440130437 fix inaccurate comment; NFC
llvm-svn: 261484
2016-02-21 17:33:31 +00:00
Sanjay Patel 368ac5dbf7 [InstCombine] add getNegativeIsTrueBoolVec() helper function; NFC
Originally part of:
http://reviews.llvm.org/D17485

We need this when simplifying masked memory ops too.

llvm-svn: 261483
2016-02-21 17:29:33 +00:00
Sanjoy Das 979a11d5b2 [LoopDeletion] Add an assert that verifies LCSSA
This is inspired by PR24804 -- had this assert been there before,
isolating the root cause for PR24804 would have been far easier.

llvm-svn: 261481
2016-02-21 17:11:59 +00:00
Simon Pilgrim 471efd244a [InstCombine] SSE/SSE2 (u)comiss/(u)comisd comparison intrinsics only use the lowest vector element
llvm-svn: 261460
2016-02-20 23:17:35 +00:00
Benjamin Kramer 7d537ae747 [SimplifyCFG] Use pointer identity to simplify predicate.
No functional change intended.

llvm-svn: 261427
2016-02-20 10:40:42 +00:00
David Majnemer 1efa23ddab [SimplifyCFG] Merge together cleanuppads
Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.

Differential Revision: http://reviews.llvm.org/D17459

llvm-svn: 261390
2016-02-20 01:07:45 +00:00
Hans Wennborg a0f7090563 Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions."
It caused PR26509.

llvm-svn: 261368
2016-02-19 21:40:12 +00:00
Matthew Simpson 29c997c1a1 [LV] Vectorize first-order recurrences
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.

In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197

llvm-svn: 261346
2016-02-19 17:56:08 +00:00
Silviu Baranga ad1dafb2c3 [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization
Summary:
If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).

We now discard interleaved load groups that don't have the first and
load in the group.

Reviewers: hfinkel, rengolin

Subscribers: rengolin, llvm-commits, mzolotukhin, anemet

Differential Revision: http://reviews.llvm.org/D17332

llvm-svn: 261331
2016-02-19 15:46:10 +00:00
Chandler Carruth 31088a9d58 [LPM] Factor all of the loop analysis usage updates into a common helper
routine.

We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.

The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.

With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.

I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.

LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.

Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.

Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!

Differential Revision: http://reviews.llvm.org/D17435

llvm-svn: 261316
2016-02-19 10:45:18 +00:00
Chandler Carruth ac07270828 [AA] Preserve the AA results wrapper pass as well as BasicAA in a few
more places to prevent gratuitous re-"runs" of these passes.

The passes themselves don't do any work when run, but we keep spending
time scheduling and running these needlessly when we really don't need
to do so.

This is the first patch towards fixing the really horrible loop pass
pipeline fragmentation pointed out by Sanjoy in PR24804.

llvm-svn: 261302
2016-02-19 03:12:14 +00:00
Lawrence Hu 84e6f1dd70 Bug fix: use dyn_cast_or_null instead of dyn_cast
Differential Revision: http://reviews.llvm.org/D17154

llvm-svn: 261299
2016-02-19 02:17:07 +00:00
Richard Trieu 7a08381403 Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Adam Nemet 9d9cb274ea [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

Obviously the pass still only used from PPC at this point.  Subsequent
patches will start driving this from ARM64 as well.

Due to the previous patch most lines should show up as moved lines.

llvm-svn: 261265
2016-02-18 21:38:19 +00:00
Matthew Simpson 92821cb4a8 Reapply commit r259357 with a fix for PR26629
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.

llvm-svn: 261212
2016-02-18 14:14:40 +00:00
Chandler Carruth 9c4ed175c2 [PM] Port the PostOrderFunctionAttrs pass to the new pass manager and
convert one test to use this.

This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.

For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.

The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.

llvm-svn: 261203
2016-02-18 11:03:11 +00:00
Junmo Park 80440eb804 Minor code cleanup. NFC.
llvm-svn: 261200
2016-02-18 10:09:20 +00:00
Kostya Serebryany d4590c7304 [sanitizer-coverage] implement -fsanitize-coverage=trace-pc. This is similar to trace-bb, but has a different API. We already use the equivalent flag in GCC for Linux kernel fuzzing. We may be able to use this flag with AFL too
llvm-svn: 261159
2016-02-17 21:34:43 +00:00
Amaury Sechet da71cb7b92 NFC: Fix formating
llvm-svn: 261156
2016-02-17 21:21:29 +00:00
Haicheng Wu 57e1a3e6ee [LIR] Avoid turning non-temporal stores into memset
This is to fix PR26645.

llvm-svn: 261149
2016-02-17 21:00:06 +00:00
Adrian Prantl a5b2a64980 Debug Info: Teach LdStHasDebugValue() (Local.cpp) about DIExpressions.
This function is used to check whether a dbg.value intrinsic has already
been inserted, but without comparing the DIExpression, it would erroneously
fire on split aggregates and only the first scalar would survive.

Found via http://reviews.llvm.org/D16867.
<rdar://problem/24456528>

llvm-svn: 261145
2016-02-17 20:02:25 +00:00
Elena Demikhovsky 88e76cad16 Create masked gather and scatter intrinsics in Loop Vectorizer.
Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
Differential Revision: http://reviews.llvm.org/D15690

llvm-svn: 261140
2016-02-17 19:23:04 +00:00
Amaury Sechet 61a7d629ec Fix load alignement when unpacking aggregates structs
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.

Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17326

llvm-svn: 261139
2016-02-17 19:21:28 +00:00
David Majnemer f48bcb2bd9 Revert "Reapply commit r258404 with fix."
This reverts commit r259357, it caused PR26629.

llvm-svn: 261137
2016-02-17 19:02:36 +00:00
Frederic Riss 009d60650d [ObjCARC] Handle ARCInstKind::ClaimRV in OptimizeIndividualCalls.
When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the
ARC optimizer in r258970, one case was missed which would lead the optimizer
to execute an llvm_unreachable. In this case, just handle ClaimRV in the same
way we handle RetainRV.

llvm-svn: 261134
2016-02-17 18:51:27 +00:00
Mehdi Amini 1db10ac6ce Define the ThinLTO Pipeline (experimental)
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).

This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.

The ThinLTO pipeline integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes. It is not clear if this part
   of the pipeline will stay as is, as the split model of ThinLTO
   does not allow the same benefit as FullLTO without added tricks.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.

Reviewers: tejohnson

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
2016-02-16 23:02:29 +00:00
Mehdi Amini ec8bee1879 Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC)
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
2016-02-16 22:54:27 +00:00
Junmo Park 6ebdc14cf1 [SCEVExpander] Make findExistingExpansion smarter
Summary:
Extending findExistingExpansion can use existing value in ExprValueMap.
This patch gives 0.3~0.5% performance improvements on 
benchmarks(test-suite, spec2000, spec2006, commercial benchmark)
   
Reviewers: mzolotukhin, sanjoy, zzheng

Differential Revision: http://reviews.llvm.org/D15559

llvm-svn: 260938
2016-02-16 06:46:58 +00:00
Silviu Baranga ec7063ac77 [LV] Add support for insertelt/extractelt processing during type truncation
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.

This change adds support for truncating insert/extract element
operations, and adds a regression test.

Reviewers: jmolloy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17078

llvm-svn: 260893
2016-02-15 15:38:17 +00:00
Roman Gareev 036c08874a Tweak the LICM code to reuse the first sub-loop instead of creating a new one
LICM starts with an *empty* AST, and then merges in each sub-loop. While the
add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for
sub-loop 1. If the AST starts off empty, we can just clone/move the contents
of the subloop into the containing AST.

Reviewed-by: Philip Reames <listmail@philipreames.com>

Differential Revision: http://reviews.llvm.org/D16753

llvm-svn: 260892
2016-02-15 14:48:50 +00:00
Benjamin Kramer 8a752e316d Use ArrayRef to hide SmallVector details, kill a useless vector copy along the way.
llvm-svn: 260824
2016-02-13 16:01:12 +00:00
Chandler Carruth 632d208c78 [attrs] Move the norecurse deduction to operate on the node set rather
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.

This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.

This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.

llvm-svn: 260813
2016-02-13 08:47:51 +00:00
Keno Fischer 7c7c3e3591 [Cloning] Clone every Function's Debug Info
Summary:
Export the CloneDebugInfoMetadata utility, which clones all debug info
associated with a function into the first module. Also use this function
in CloneModule on each function we clone (the CloneFunction entrypoint
already does this).

Without this, cloning a module will lead to DI quality regressions,
especially since r252219 reversed the Function <-> DISubprogram edge
(before we could get lucky and have this edge preserved if the
DISubprogram itself was, e.g. due to location metadata).

This was verified to fix missing debug information in julia and
a unittest to verify the new behavior is included.

Patch by Yichao Yu! Thanks!

Reviewers: loladiro, pcc
Differential Revision: http://reviews.llvm.org/D17165

llvm-svn: 260791
2016-02-13 02:04:29 +00:00
Chad Rosier 81362a8599 [LIR] Allow merging of memsets in negatively strided loops.
Last part of PR25166.

llvm-svn: 260732
2016-02-12 21:03:23 +00:00
Justin Lebar 6086c6a387 Fix typo in comment.
llvm-svn: 260731
2016-02-12 21:01:37 +00:00
Justin Lebar db63949e8d [SimplifyCFG] Don't fold conditional branches that contain calls to convergent functions.
Summary:
Performing this optimization duplicates the call to the convergent
function and adds new control-flow dependencies, which is a no-no.

Reviewers: jingyue

Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17128

llvm-svn: 260730
2016-02-12 21:01:36 +00:00
Justin Lebar df04d2a1f1 [LoopRotate] Don't perform loop rotation if the loop header calls a convergent function.
Summary:
Calls to convergent functions can be duplicated, but only if the
duplicates are not control-flow dependent on any additional values.
Loop rotation doesn't meet the bar.

Reviewers: jingyue

Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune

Differential Revision: http://reviews.llvm.org/D17127

llvm-svn: 260729
2016-02-12 21:01:33 +00:00