Commit Graph

4563 Commits

Author SHA1 Message Date
Hal Finkel f5867a79c5 Canonicalization for @llvm.assume
Adds simple logical canonicalization of assumption intrinsics to instcombine,
currently:
 - invariant(a && b) -> invariant(a); invariant(b)
 - invariant(!(a || b)) -> invariant(!a); invariant(!b)

llvm-svn: 213977
2014-07-25 21:45:17 +00:00
Hal Finkel 930469107d Add @llvm.assume, lowering, and some basic properties
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:

 - llvm.invariant(true) is dead.
 - llvm.invariant(false) is unreachable (this directly corresponds to the
   documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).

The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.

llvm-svn: 213973
2014-07-25 21:13:35 +00:00
Hal Finkel ff0bcb60c9 Convert noalias parameter attributes into noalias metadata during inlining
This functionality is currently turned off by default.

Part of the motivation for introducing scoped-noalias metadata is to enable the
preservation of noalias parameter attribute information after inlining.
Sometimes this can be inferred from the code in the caller after inlining, but
often we simply lose valuable information.

The overall process if fairly simple:
 1. Create a new unqiue scope domain.
 2. For each (used) noalias parameter, create a new alias scope.
 3. For each pointer, collect the underlying objects. Add a noalias scope for
    each noalias parameter from which we're not derived (and has not been
    captured prior to that point).
 4. Add an alias.scope for each noalias parameter from which we might be
    derived (or has been captured before that point).

Note that the capture checks apply only if one of the underlying objects is not
an identified function-local object.

llvm-svn: 213949
2014-07-25 15:50:08 +00:00
Mark Heffernan 8ec1474f7f After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor
hint) the loop unroller replaces the llvm.loop.unroll.count metadata with
llvm.loop.unroll.disable metadata to prevent any subsequent unrolling
passes from unrolling more than the hint indicates.  This patch fixes
an issue where loop unrolling could be disabled for other loops as well which
share the same llvm.loop metadata.

llvm-svn: 213900
2014-07-24 22:36:40 +00:00
Manman Ren 29a2005596 Try to fix the bots again by moving test to X86 directory.
llvm-svn: 213884
2014-07-24 17:57:09 +00:00
Manman Ren a8bc9a4c36 Try to fix the bots. If this does not work, I am going to move it to X86 directory.
llvm-svn: 213880
2014-07-24 17:18:33 +00:00
Hal Finkel 9414665a3b Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

llvm-svn: 213864
2014-07-24 14:25:39 +00:00
Manman Ren edc60376ed SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.

For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx

It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext

rdar://17735071

llvm-svn: 213815
2014-07-23 23:13:23 +00:00
David Blaikie 8e9cfa5497 ArgPromo+DebugInfo: Handle updating debug info over multiple applications of argument promotion.
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.

To address this, ensure that the map is updated after each argument
promotion.

In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.

llvm-svn: 213805
2014-07-23 22:09:29 +00:00
David Blaikie f997c6f90b Test debug info in arg promotion with an actual promotion case, rather than a degenerate arg promotion that's actually DAE performed by ArgPromo
Also the debug location I had here was bogus, describing the location of
the call site as in the callee - and unnecessary, so just drop it.

llvm-svn: 213803
2014-07-23 21:30:59 +00:00
Mark Heffernan 9e112443b6 Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full).
llvm-svn: 213789
2014-07-23 20:05:44 +00:00
Mark Heffernan e6b4ba1c41 In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full".
llvm-svn: 213772
2014-07-23 17:31:37 +00:00
Nick Lewycky aba900c252 We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405!
llvm-svn: 213726
2014-07-23 06:24:49 +00:00
Suyog Sarda 3a8c2c1e6c This patch implements optimization as mentioned in PR19753: Optimize comparisons with "ashr/lshr exact" of a constanst.
It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch.
Added code for lshr as well as non-exact right shifts.

It implements : 
(icmp eq/ne (ashr/lshr const2, A), const1)" ->
(icmp eq/ne A, Log2(const2/const1)) ->
(icmp eq/ne A, Log2(const2) - Log2(const1))

Differential Revision: http://reviews.llvm.org/D4068
 

llvm-svn: 213678
2014-07-22 19:19:36 +00:00
Suyog Sarda b60ec909ca Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A | B)"
Patch idea by Ankit Jain !

Differential Revision: http://reviews.llvm.org/D4618

llvm-svn: 213677
2014-07-22 18:30:54 +00:00
Suyog Sarda d64faf6cae Added InstCombine Transform for patterns:
"((~A & B) | A) -> (A | B)" and "((A & B) | ~A) -> (~A | B)"

Original Patch credit to Ankit Jain !!

Differential Revision: http://reviews.llvm.org/D4591

llvm-svn: 213676
2014-07-22 18:09:41 +00:00
Hal Finkel ccc7090671 Make use of the align parameter attribute for all pointer arguments
We previously supported the align attribute on all (pointer) parameters, but we
only used it for byval parameters. However, it is completely consistent at the
IR level to treat 'align n' on all pointer parameters as an alignment
assumption on the pointer, and now we wll. Specifically, this causes
computeKnownBits to use the align attribute on all pointer parameters, not just
byval parameters. I've also added an explicit parameter attribute test for this
to test/Bitcode/attributes.ll.

And I've updated the LangRef to document the align parameter attribute (as it
turns out, it was not documented at all previously, although the byval
documentation mentioned that it could be used).

There are (at least) two benefits to doing this:
 - It allows enhancing alignment based on the pointer alignment after inlining callees.
 - It allows simplification of pointer arithmetic.

llvm-svn: 213670
2014-07-22 16:58:55 +00:00
Suyog Sarda 521237cad6 This patch implements transform for pattern "(A | B) ^ (~A) -> (A | ~B)".
Patch Credit to Ankit Jain !!

Differential Revision: http://reviews.llvm.org/D4588

llvm-svn: 213662
2014-07-22 15:37:39 +00:00
Mark Heffernan 9d20e42765 Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave.
llvm-svn: 213588
2014-07-21 23:11:03 +00:00
Hal Finkel 7ae00a1282 [LoopVectorize] Use AA to partition potential dependency checks
Prior to this change, the loop vectorizer did not make use of the alias
analysis infrastructure. Instead, it performed memory dependence analysis using
ScalarEvolution-based linear dependence checks within equivalence classes
derived from the results of ValueTracking's GetUnderlyingObjects.

Unfortunately, this meant that:
  1. The loop vectorizer had logic that essentially duplicated that in BasicAA
     for aliasing based on identified objects.
  2. The loop vectorizer could not partition the space of dependency checks
     based on information only easily available from within AA (TBAA metadata is
     currently the prime example).

This means, for example, regardless of whether -fno-strict-aliasing was
provided, the vectorizer would only vectorize this loop with a runtime
memory-overlap check:

void foo(int *a, float *b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i];
}

This is suboptimal because the TBAA metadata already provides the information
necessary to show that this check unnecessary. Of course, the vectorizer has a
limit on the number of such checks it will insert, so in practice, ignoring
TBAA means not vectorizing more-complicated loops that we should.

This change causes the vectorizer to use an AliasSetTracker to keep track of
the pointers in the loop. The resulting alias sets are then used to partition
the space of dependency checks, and potential runtime checks; this results in
more-efficient vectorizations.

When pointer locations are added to the AliasSetTracker, two things are done:
  1. The location size is set to UnknownSize (otherwise you'd not catch
     inter-iteration dependencies)
  2. For instructions in blocks that would need to be predicated, TBAA is
     removed (because the metadata might have a control dependency on the condition
     being speculated).

For non-predicated blocks, you can leave the TBAA metadata. This is safe
because you can't have an iteration dependency on the TBAA metadata (if you
did, and you unrolled sufficiently, you'd end up with the same pointer value
used by two accesses that TBAA says should not alias, and that would yield
undefined behavior).

llvm-svn: 213486
2014-07-20 23:07:52 +00:00
Hal Finkel 4f7d55aac8 [LoopVectorize] Propagate known metadata to vectorized instructions
There are some kinds of metadata that are safe to propagate from the scalar
instructions to the vector instructions (fpmath and tbaa currently).

Regarding TBAA, one might worry about propagating it on if-converted loads and
stores, because the metadata might have had a control dependency on the
condition, and thus actually aliased with some other non-speculated memory
access when the condition was false. However, this would be caught by the
runtime overlap checks.

llvm-svn: 213452
2014-07-19 13:33:16 +00:00
Hal Finkel 9e440c08a9 Make Value::isDereferenceablePointer handle offsets to pointer types with dereferenceable attributes
When we have a parameter (or call site return) with a dereferenceable
attribute, it can specify the size of an array pointed to by that parameter. If
we have a value for which we can accumulate a constant offset to such a
parameter, then we can use that offset in a direct comparison with the size
specified by the dereferenceable attribute.

This enables us to handle cases like this:

  int foo(int a[static 3]) {
    return a[2]; /* this is always dereferenceable */
  }

llvm-svn: 213447
2014-07-19 03:25:16 +00:00
Mark Heffernan 053a68688a Remove unroll pragma metadata after it is used.
llvm-svn: 213412
2014-07-18 21:04:33 +00:00
Gerolf Hoflehner f27ae6cdcf MergedLoadStoreMotion pass
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.

llvm-svn: 213396
2014-07-18 19:13:09 +00:00
Hal Finkel b0407ba071 Add a dereferenceable attribute
This attribute indicates that the parameter or return pointer is
dereferenceable. Practically speaking, loads from such a pointer within the
associated byte range are safe to speculatively execute. Such pointer
parameters are common in source languages (C++ references, for example).

llvm-svn: 213385
2014-07-18 15:51:28 +00:00
Matt Arsenault 3dd43fc75d R600: Implement TTI:getPopcntSupport
The test is just copied from X86, and I don't know of a better
way to test it.

llvm-svn: 213351
2014-07-18 06:07:13 +00:00
Suyog Sarda 68862414b5 Move ashr optimization from InstCombineShift to InstSimplify.
Refactor code, no functionality change, test case moved from instcombine to instsimplify.

Differential Revision: http://reviews.llvm.org/D4102
 

llvm-svn: 213231
2014-07-17 06:28:15 +00:00
Hal Finkel 354e23b029 Improve BasicAA CS-CS queries (redux)
This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it
causes PR20303." with a fix for the bug in pr20303. As it turned out, the
relevant code was both wrong and over-conservative (because, as with the code
it replaced, it would return the overall ModRef mask even if just Ref had been
implied by the argument aliasing results). Hopefully, this correctly fixes both
problems.

Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've
cleaned up a little and added in DSE's test directory). The BasicAA test has
also been updated to check for this error.

Original commit message:

BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.

Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.

This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.

Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).

llvm-svn: 213219
2014-07-17 01:28:25 +00:00
Jingyue Wu 0bdc027e31 Partially revert r210444 due to performance regression
Summary:
Converting outermost zext(a) to sext(a) causes worse code when the
computation of zext(a) could be reused. For example, after converting

... = array[zext(a)]
... = array[zext(a) + 1]

to

... = array[sext(a)]
... = array[zext(a) + 1],

the program computes sext(a), which is actually unnecessary. I added one
test in split-gep-and-gvn.ll to illustrate this scenario.

Also, with r211281 and r211084, we annotate more "nuw" tags to
computation involving CUDA intrinsics such as threadIdx.x. These
annotations help with splitting GEP a lot, rendering the benefit we get
from this reverted optimization only marginal.

Test Plan: make check-all

Reviewers: eliben, meheff

Reviewed By: meheff

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D4542

llvm-svn: 213209
2014-07-16 23:25:00 +00:00
Justin Holewinski 3e037d98e6 [NVPTX] Rename registers %fl -> %fd and %rl -> %rd
This matches the internal behavior of NVIDIA tools like libnvvm.

llvm-svn: 213168
2014-07-16 16:26:58 +00:00
Tyler Nowicki 641d8a06bd Emit warnings if vectorization is forced and fails.
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.

Reviewed by: Arnold Schwaighofer

llvm-svn: 213110
2014-07-16 00:36:00 +00:00
Stepan Dyatkovskiy dee612d4f6 MergeFunc patch from Björn Steinbrink.
Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke.
Thanks!

llvm-svn: 213060
2014-07-15 10:46:51 +00:00
Matt Arsenault f1a7e62033 Teach computeKnownBits to look through addrspacecast.
This fixes inferring alignment through an addrspacecast.

llvm-svn: 213030
2014-07-15 01:55:03 +00:00
Matt Arsenault 70f4db88d5 Teach GetUnderlyingObject / BasicAA about addrspacecast
llvm-svn: 213025
2014-07-15 00:56:40 +00:00
Matt Arsenault ee54aaef0c Convert test to FileCheck.
Check the individual test functions for more useful failure errors.

llvm-svn: 213021
2014-07-15 00:07:27 +00:00
Matt Arsenault caa9c71f32 Look through addrspacecast in IsConstantOffsetFromGlobal
llvm-svn: 213000
2014-07-14 22:39:26 +00:00
Matt Arsenault fd78d0c934 Look through addrspacecast in GetPointerBaseWithConstantOffset
llvm-svn: 212999
2014-07-14 22:39:22 +00:00
Matt Arsenault eb9e5f41a6 Convert test to FileCheck
llvm-svn: 212992
2014-07-14 21:59:26 +00:00
David Majnemer b8f435ca70 Fix a test broken in r212981
@icmp_sdiv_neg1 should have referred to %a instead of %call, it was
renamed at the last second.

llvm-svn: 212983
2014-07-14 20:46:04 +00:00
David Majnemer af9180fd04 InstSimplify: Correct sdiv x / -1
Determining the bounds of x/ -1 would start off with us dividing it by
INT_MIN.  Suffice to say, this would not work very well.

Instead, handle it upfront by checking for -1 and mapping it to the
range: [INT_MIN + 1, INT_MAX.  This means that the result of our
division can be any value other than INT_MIN.

llvm-svn: 212981
2014-07-14 20:38:45 +00:00
David Majnemer 5ea4fc0b33 InstSimplify: The upper bound of X / C was missing a rounding step
Summary:
When calculating the upper bound of X / -8589934592, we would perform
the following calculation: Floor[INT_MAX / 8589934592]

However, flooring the result would make us wrongly come to the
conclusion that 1073741824 was not in the set of possible values.
Instead, use the ceiling of the result.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4502

llvm-svn: 212976
2014-07-14 19:49:57 +00:00
Matt Arsenault 199b39e063 Look through addrspacecast when checking isDereferenceablePointer
llvm-svn: 212971
2014-07-14 18:54:12 +00:00
Nick Lewycky 703e488ed9 Don't eliminate memcpy's when the address of the pointer may itself be relevant. Fixes PR18304. Patch by David Wiberg!
llvm-svn: 212970
2014-07-14 18:52:02 +00:00
Aditya Nandakumar 0b5a674243 When we sink an instruction, this can open up opportunity for the operands to be sunk - add them to the worklist
llvm-svn: 212847
2014-07-11 21:49:39 +00:00
Marcello Maggioni 83442bb8b2 Added test for commit r212802 that was missing
llvm-svn: 212803
2014-07-11 10:36:00 +00:00
Duncan P. N. Exon Smith 04934b0fec InstCombine: Fix a crash in Descale for multiply-by-zero
Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets
created as an argument to a GEP partway through an iteration, causing
-instcombine to optimize the GEP before the multiply.

rdar://problem/17615671

llvm-svn: 212742
2014-07-10 17:13:27 +00:00
Hal Finkel a71fe078c8 A test case for not asserting in isDereferenceablePointer upon unsized types
This is the test case for r212687.

llvm-svn: 212688
2014-07-10 07:04:37 +00:00
Hal Finkel 2e42c34d05 Allow isDereferenceablePointer to look through some bitcasts
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.

In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).

llvm-svn: 212686
2014-07-10 05:27:53 +00:00
Adam Nemet 2820a5b9e9 [X86] AVX512: Enable it in the Loop Vectorizer
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.

The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.

llvm-svn: 212634
2014-07-09 18:22:33 +00:00
Sanjay Patel 7ae7a831b9 removed duplicate testcase
llvm-svn: 212632
2014-07-09 17:49:58 +00:00