Commit Graph

14555 Commits

Author SHA1 Message Date
Philip Reames e0a5454df4 Fix the build
I screwed up rebasing 263072.  This change fixes the build and passes all make check.

llvm-svn: 263073
2016-03-09 23:07:53 +00:00
Philip Reames b54c8e6eea [LICM] Store promotion when memory is thread local
This patch teaches LICM's implementation of store promotion to exploit the fact that the memory location being accessed might be provable thread local. The fact it's thread local weakens the requirements for where we can insert stores since no other thread can observe the write. This allows us perform store promotion even in cases where the store is not guaranteed to execute in the loop.

Two key assumption worth drawing out is that this assumes a) no-capture is strong enough to imply no-escape, and b) standard allocation functions like malloc, calloc, and operator new return values which can be assumed not to have previously escaped.

In future work, it would be nice to generalize this so that it works without directly seeing the allocation site. I believe that the nocapture return attribute should be suitable for this purpose, but haven't investigated carefully. It's also likely that we could support unescaped allocas with similar reasoning, but since SROA and Mem2Reg should destroy those, they're less interesting than they first might seem.

Differential Revision: http://reviews.llvm.org/D16783

llvm-svn: 263072
2016-03-09 22:59:30 +00:00
Philip Reames 8f12eba78d [ValueTracking] Extract isKnownPositive [NFCI]
Extract out a generic interface from a recently landed patch and document a TODO in case compile time becomes a problem.

llvm-svn: 263062
2016-03-09 21:31:47 +00:00
Philip Reames ec8a8b5437 [InstCombine] (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0)
When checking whether an smin is positive, we can move the comparison to one of the inputs if the other is known positive. If the known positive one is the min, then the other can't be negative. If the other is the min, then we compute the min.

Differential Revision: http://reviews.llvm.org/D17873

llvm-svn: 263059
2016-03-09 21:05:07 +00:00
Adam Nemet 660748ca8c [LLE] Add missing check for unit stride
I somehow missed this.  The case in GCC (global_alloc) was similar to
the new testcase except it had an array of structs rather than a two
dimensional array.

Fixes RP26885.

llvm-svn: 263058
2016-03-09 20:47:55 +00:00
Matthias Braun c31032d607 InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.

This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.

Differential Revision: http://reviews.llvm.org/D16835

llvm-svn: 263047
2016-03-09 18:47:11 +00:00
Petar Jovanovic 921c2b4eb3 Reland r262337 "calculate builtin_object_size if arg is a removable pointer"
Original commit message:
 calculate builtin_object_size if argument is a removable pointer

 This patch fixes calculating correct value for builtin_object_size function
 when pointer is used only in builtin_object_size function call and never
 after that.

 Patch by Strahinja Petrovic.

 Differential Revision: http://reviews.llvm.org/D17337

Reland the original change with a small modification (first do a null check
and then do the cast) to satisfy ubsan.

llvm-svn: 263011
2016-03-09 14:12:47 +00:00
Adam Nemet 34785ecff1 [LoopDataPrefetch] Add stats and debug output
llvm-svn: 262998
2016-03-09 05:33:21 +00:00
Sanjoy Das 2eac48de9e Return StringRef instead of a naked char*; NFC
llvm-svn: 262989
2016-03-09 02:34:19 +00:00
Sanjoy Das f13900f8ac [IRCE] Reflow comments; NFC
llvm-svn: 262988
2016-03-09 02:34:15 +00:00
Mehdi Amini bd04e8fed6 FunctionIndex is not optional for renameModuleForThinLTO(), make it a reference (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 262976
2016-03-09 01:37:14 +00:00
Sanjay Patel f831fdb56a fix variable name; NFC
llvm-svn: 262953
2016-03-08 19:07:42 +00:00
Sanjay Patel 5c96723622 use range-based loop; NFCI
llvm-svn: 262952
2016-03-08 19:06:12 +00:00
Sanjay Patel eaf06851d0 rangify, fix function names; NFCI
llvm-svn: 262940
2016-03-08 17:12:32 +00:00
Sanjay Patel 5b8d741632 don't repeat function names in documentation comments; NFC
llvm-svn: 262937
2016-03-08 16:26:39 +00:00
Junmo Park 974eb0a96d Revert "[InstCombine] Combine A->B->A BitCast"
This reverts commit r262670 due to compile failure.

llvm-svn: 262916
2016-03-08 07:09:46 +00:00
Peter Collingbourne 3866cc5f69 Fix evaluation order. Spotted by Alexander Riccio!
llvm-svn: 262907
2016-03-08 03:50:36 +00:00
Easwaran Raman b1bd398ceb Revert revisions 262636, 262643, 262679, and 262682.
llvm-svn: 262883
2016-03-08 00:36:35 +00:00
Anna Zaks c1efa64c63 [tsan] Add support for pointer typed atomic stores, loads, and cmpxchg
TSan instrumentation functions for atomic stores, loads, and cmpxchg work on
integer value types. This patch adds casts before calling TSan instrumentation
functions in cases where the value is a pointer.

Differential Revision: http://reviews.llvm.org/D17833

llvm-svn: 262876
2016-03-07 23:16:23 +00:00
Adam Nemet bb3680bd85 [LoopDataPrefetch] If prefetch distance is not set, skip pass
This lets select sub-targets enable this pass.  The patch implements the
idea from the recent llvm-dev thread:
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925

The goal is to enable the LoopDataPrefetch pass for the Cyclone
sub-target only within Aarch64.

Positive and negative tests will be included in an upcoming patch that
enables selective prefetching of large-strided accesses on Cyclone.

llvm-svn: 262844
2016-03-07 18:35:42 +00:00
Adam Nemet 81113ef68c Revert "Enable LoopLoadElimination by default"
This reverts commit r262250.

It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
running with *ref* data set.  The error happens with
"-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".

llvm-svn: 262839
2016-03-07 17:38:02 +00:00
Chandler Carruth 9ca96384f3 [DFSan] Remove an overly aggressive assert reported in PR26068.
This code has been successfully used to bootstrap libc++ in a no-asserts
mode for a very long time, so the code that follows cannot be completely
incorrect. I've added a test that shows the current behavior for this
kind of code with DFSan. If it is desirable for DFSan to do something
special when processing an invoke of a variadic function, it can be
added, but we shouldn't keep an assert that we've been ignoring due to
release builds anyways.

llvm-svn: 262829
2016-03-07 14:05:09 +00:00
Rong Xu ecdc98fdae [PGO] Add a commandline option to control number of the VP annotation metadata.
llvm-svn: 262750
2016-03-04 22:08:44 +00:00
Easwaran Raman 3b7a8246c9 Fix a use-after-free bug introduced in r262636
llvm-svn: 262679
2016-03-04 00:44:01 +00:00
Guozhi Wei 92e9d0e80e [InstCombine] Combine A->B->A BitCast
This patch enhances InstCombine to handle following case:

        A  ->  B    bitcast
        PHI
        B  ->  A    bitcast

llvm-svn: 262670
2016-03-03 23:21:38 +00:00
Sanjay Patel 9bba75084b [InstCombine] transform bitcasted bitwise logic ops with constants (PR26702)
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.

The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
  %bc = bitcast <4 x i32> %not to <2 x i64>
  %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
  %bc2 = bitcast <2 x i64> %notnot to <4 x i32>
  ret <4 x i32> %bc2
}

Simplifies to the expected:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  ret <4 x i32> %lobit
}

Differential Revision: http://reviews.llvm.org/D17583

llvm-svn: 262645
2016-03-03 19:19:04 +00:00
Easwaran Raman 3035719c86 Infrastructure for PGO enhancements in inliner
This patch provides the following infrastructure for PGO enhancements in inliner:

Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.

Differential Revision: http://reviews.llvm.org/D16381

llvm-svn: 262636
2016-03-03 18:26:33 +00:00
Dehao Chen 57d1dda558 Use LineLocation instead of CallsiteLocation to index callsite profile.
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17827

llvm-svn: 262634
2016-03-03 18:09:32 +00:00
Matthew Simpson b840a6d6f4 [LoopUtils, LV] Fix PR26734
The vectorization of first-order recurrences (r261346) caused PR26734. When
detecting these recurrences, we need to ensure that the previous value is
actually defined inside the loop. This patch includes the fix and test case.

llvm-svn: 262624
2016-03-03 16:12:01 +00:00
Amaury Sechet 3b8b2ea2e1 Explode store of arrays in instcombine
Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.

Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17828

llvm-svn: 262530
2016-03-02 22:36:45 +00:00
Amaury Sechet 7cd3fe7db6 Unpack array of all sizes in InstCombine
Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.

Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15890

llvm-svn: 262521
2016-03-02 21:28:30 +00:00
Daniel Berlin 6412002d24 Really fix ASAN leak/etc issues with MemorySSA unittests
llvm-svn: 262519
2016-03-02 21:16:28 +00:00
Daniel Berlin 989e601b26 Revert "Fix ASAN detected errors in code and test" (it was not meant to be committed yet)
This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95.

llvm-svn: 262512
2016-03-02 20:36:22 +00:00
Daniel Berlin 27ed1c2eb0 Fix ASAN detected errors in code and test
llvm-svn: 262511
2016-03-02 20:27:29 +00:00
Chandler Carruth 12884f7f80 [AA] Hoist the logic to reformulate various AA queries in terms of other
parts of the AA interface out of the base class of every single AA
result object.

Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.

When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.

To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).

However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.

The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.

It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.

However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.

Differential Revision: http://reviews.llvm.org/D17329

llvm-svn: 262490
2016-03-02 15:56:53 +00:00
George Burgess IV e0e6e48b29 Attempt to fix ASAN failure in a MemorySSA test.
llvm-svn: 262452
2016-03-02 02:35:04 +00:00
Sanjay Patel 5e4c46de6d revert r262424 because there's a *clang test* for AArch64 that checks -O3 asm output
that is broken by this change

llvm-svn: 262440
2016-03-02 01:04:09 +00:00
Sanjay Patel 147e927957 [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701)
As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case. 

Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.

But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
 

llvm-svn: 262424
2016-03-01 23:55:18 +00:00
Dehao Chen 1012be120a Perform InstructioinCombiningPass before SampleProfile pass.
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17742

llvm-svn: 262419
2016-03-01 22:53:02 +00:00
Owen Anderson 7ea02fc787 Fix an issue where fast math flags were dropped during scalarization.
Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.

llvm-svn: 262376
2016-03-01 19:35:52 +00:00
Daniel Berlin 83fc77b4c0 Add the beginnings of an update API for preserving MemorySSA
Summary:
This adds the beginning of an update API to preserve MemorySSA.  In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.

It also adds relevant unit testing infrastructure for MemorySSA's API.

(There is an actual user of this API, i will make that diff dependent on this one.  In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)

Reviewers: hfinkel, reames, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17157

llvm-svn: 262362
2016-03-01 18:46:54 +00:00
Petar Jovanovic 6315f3f9b7 Revert "calculate builtin_object_size if argument is a removable pointer"
Revert r262337 as "check-llvm ubsan" step failed on
sanitizer-x86_64-linux-fast buildbot.

llvm-svn: 262349
2016-03-01 16:50:08 +00:00
Petar Jovanovic 8aef99aa86 calculate builtin_object_size if argument is a removable pointer
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D17337

llvm-svn: 262337
2016-03-01 14:39:55 +00:00
Sanjay Patel 6f2c01f712 [x86, InstCombine] transform more x86 masked loads to LLVM intrinsics
Continuation of:
http://reviews.llvm.org/rL262269

llvm-svn: 262273
2016-02-29 23:59:00 +00:00
Adam Nemet efc091f457 [LLE] Fix a comment
llvm-svn: 262270
2016-02-29 23:21:12 +00:00
Sanjay Patel 98a71505f5 [x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:

__m128 mload_zeros(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0));
}

__m128 mload_fakeones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(1));
}

__m128 mload_ones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}

__m128 mload_oneset(float *f) {
  return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}

...so none of the above will actually generate a masked load for optimized code.

This is the masked load counterpart to:
http://reviews.llvm.org/rL262064

llvm-svn: 262269
2016-02-29 23:16:48 +00:00
Adam Nemet 83be06e529 [LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly
We can actually have dependences between accesses with different
underlying types.  Bail in this case.

A test will follow shortly.

llvm-svn: 262267
2016-02-29 22:53:59 +00:00
Adam Nemet dd9e637aca Enable LoopLoadElimination by default
Summary:
I re-benchmarked this and results are similar to original results in
D13259:

On ARM64:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
  SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%

On x86:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%

And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.

In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.

The reason that time spent in SCEV increases has to do with the design
of the old pass manager.  If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.

This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.

(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result.  Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)

Reviewers: hfinkel, dberlin

Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16300

llvm-svn: 262250
2016-02-29 20:35:11 +00:00
Rong Xu 9e926e8b92 Minor code cleanup. NFC
llvm-svn: 262242
2016-02-29 19:16:04 +00:00
Dehao Chen 939993ff2f Move discriminator assignment to the right place.
Summary: Now discriminator is assigned per-function instead of per-module.

Reviewers: davidxl, dnovillo

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D17664

llvm-svn: 262240
2016-02-29 18:59:48 +00:00