This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:
- llvm.invariant(true) is dead.
- llvm.invariant(false) is unreachable (this directly corresponds to the
documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).
The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.
llvm-svn: 213973
This functionality is currently turned off by default.
Part of the motivation for introducing scoped-noalias metadata is to enable the
preservation of noalias parameter attribute information after inlining.
Sometimes this can be inferred from the code in the caller after inlining, but
often we simply lose valuable information.
The overall process if fairly simple:
1. Create a new unqiue scope domain.
2. For each (used) noalias parameter, create a new alias scope.
3. For each pointer, collect the underlying objects. Add a noalias scope for
each noalias parameter from which we're not derived (and has not been
captured prior to that point).
4. Add an alias.scope for each noalias parameter from which we might be
derived (or has been captured before that point).
Note that the capture checks apply only if one of the underlying objects is not
an identified function-local object.
llvm-svn: 213949
hint) the loop unroller replaces the llvm.loop.unroll.count metadata with
llvm.loop.unroll.disable metadata to prevent any subsequent unrolling
passes from unrolling more than the hint indicates. This patch fixes
an issue where loop unrolling could be disabled for other loops as well which
share the same llvm.loop metadata.
llvm-svn: 213900
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
1. To preserve noalias function attribute information when inlining
2. To provide the ability to model block-scope C99 restrict pointers
Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.
What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:
!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }
Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:
... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }
When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.
Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.
[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]
Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.
llvm-svn: 213864
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.
For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx
It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext
rdar://17735071
llvm-svn: 213815
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.
To address this, ensure that the map is updated after each argument
promotion.
In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.
llvm-svn: 213805
Also the debug location I had here was bogus, describing the location of
the call site as in the callee - and unnecessary, so just drop it.
llvm-svn: 213803
It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch.
Added code for lshr as well as non-exact right shifts.
It implements :
(icmp eq/ne (ashr/lshr const2, A), const1)" ->
(icmp eq/ne A, Log2(const2/const1)) ->
(icmp eq/ne A, Log2(const2) - Log2(const1))
Differential Revision: http://reviews.llvm.org/D4068
llvm-svn: 213678
"((~A & B) | A) -> (A | B)" and "((A & B) | ~A) -> (~A | B)"
Original Patch credit to Ankit Jain !!
Differential Revision: http://reviews.llvm.org/D4591
llvm-svn: 213676
We previously supported the align attribute on all (pointer) parameters, but we
only used it for byval parameters. However, it is completely consistent at the
IR level to treat 'align n' on all pointer parameters as an alignment
assumption on the pointer, and now we wll. Specifically, this causes
computeKnownBits to use the align attribute on all pointer parameters, not just
byval parameters. I've also added an explicit parameter attribute test for this
to test/Bitcode/attributes.ll.
And I've updated the LangRef to document the align parameter attribute (as it
turns out, it was not documented at all previously, although the byval
documentation mentioned that it could be used).
There are (at least) two benefits to doing this:
- It allows enhancing alignment based on the pointer alignment after inlining callees.
- It allows simplification of pointer arithmetic.
llvm-svn: 213670
Prior to this change, the loop vectorizer did not make use of the alias
analysis infrastructure. Instead, it performed memory dependence analysis using
ScalarEvolution-based linear dependence checks within equivalence classes
derived from the results of ValueTracking's GetUnderlyingObjects.
Unfortunately, this meant that:
1. The loop vectorizer had logic that essentially duplicated that in BasicAA
for aliasing based on identified objects.
2. The loop vectorizer could not partition the space of dependency checks
based on information only easily available from within AA (TBAA metadata is
currently the prime example).
This means, for example, regardless of whether -fno-strict-aliasing was
provided, the vectorizer would only vectorize this loop with a runtime
memory-overlap check:
void foo(int *a, float *b) {
for (int i = 0; i < 1600; ++i)
a[i] = b[i];
}
This is suboptimal because the TBAA metadata already provides the information
necessary to show that this check unnecessary. Of course, the vectorizer has a
limit on the number of such checks it will insert, so in practice, ignoring
TBAA means not vectorizing more-complicated loops that we should.
This change causes the vectorizer to use an AliasSetTracker to keep track of
the pointers in the loop. The resulting alias sets are then used to partition
the space of dependency checks, and potential runtime checks; this results in
more-efficient vectorizations.
When pointer locations are added to the AliasSetTracker, two things are done:
1. The location size is set to UnknownSize (otherwise you'd not catch
inter-iteration dependencies)
2. For instructions in blocks that would need to be predicated, TBAA is
removed (because the metadata might have a control dependency on the condition
being speculated).
For non-predicated blocks, you can leave the TBAA metadata. This is safe
because you can't have an iteration dependency on the TBAA metadata (if you
did, and you unrolled sufficiently, you'd end up with the same pointer value
used by two accesses that TBAA says should not alias, and that would yield
undefined behavior).
llvm-svn: 213486
There are some kinds of metadata that are safe to propagate from the scalar
instructions to the vector instructions (fpmath and tbaa currently).
Regarding TBAA, one might worry about propagating it on if-converted loads and
stores, because the metadata might have had a control dependency on the
condition, and thus actually aliased with some other non-speculated memory
access when the condition was false. However, this would be caught by the
runtime overlap checks.
llvm-svn: 213452
When we have a parameter (or call site return) with a dereferenceable
attribute, it can specify the size of an array pointed to by that parameter. If
we have a value for which we can accumulate a constant offset to such a
parameter, then we can use that offset in a direct comparison with the size
specified by the dereferenceable attribute.
This enables us to handle cases like this:
int foo(int a[static 3]) {
return a[2]; /* this is always dereferenceable */
}
llvm-svn: 213447
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.
llvm-svn: 213396
This attribute indicates that the parameter or return pointer is
dereferenceable. Practically speaking, loads from such a pointer within the
associated byte range are safe to speculatively execute. Such pointer
parameters are common in source languages (C++ references, for example).
llvm-svn: 213385
Refactor code, no functionality change, test case moved from instcombine to instsimplify.
Differential Revision: http://reviews.llvm.org/D4102
llvm-svn: 213231
This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it
causes PR20303." with a fix for the bug in pr20303. As it turned out, the
relevant code was both wrong and over-conservative (because, as with the code
it replaced, it would return the overall ModRef mask even if just Ref had been
implied by the argument aliasing results). Hopefully, this correctly fixes both
problems.
Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've
cleaned up a little and added in DSE's test directory). The BasicAA test has
also been updated to check for this error.
Original commit message:
BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.
Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.
This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.
Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).
llvm-svn: 213219
Summary:
Converting outermost zext(a) to sext(a) causes worse code when the
computation of zext(a) could be reused. For example, after converting
... = array[zext(a)]
... = array[zext(a) + 1]
to
... = array[sext(a)]
... = array[zext(a) + 1],
the program computes sext(a), which is actually unnecessary. I added one
test in split-gep-and-gvn.ll to illustrate this scenario.
Also, with r211281 and r211084, we annotate more "nuw" tags to
computation involving CUDA intrinsics such as threadIdx.x. These
annotations help with splitting GEP a lot, rendering the benefit we get
from this reverted optimization only marginal.
Test Plan: make check-all
Reviewers: eliben, meheff
Reviewed By: meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D4542
llvm-svn: 213209
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.
Reviewed by: Arnold Schwaighofer
llvm-svn: 213110
Determining the bounds of x/ -1 would start off with us dividing it by
INT_MIN. Suffice to say, this would not work very well.
Instead, handle it upfront by checking for -1 and mapping it to the
range: [INT_MIN + 1, INT_MAX. This means that the result of our
division can be any value other than INT_MIN.
llvm-svn: 212981
Summary:
When calculating the upper bound of X / -8589934592, we would perform
the following calculation: Floor[INT_MAX / 8589934592]
However, flooring the result would make us wrongly come to the
conclusion that 1073741824 was not in the set of possible values.
Instead, use the ceiling of the result.
Reviewers: nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4502
llvm-svn: 212976
Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets
created as an argument to a GEP partway through an iteration, causing
-instcombine to optimize the GEP before the multiply.
rdar://problem/17615671
llvm-svn: 212742
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.
In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).
llvm-svn: 212686
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.
The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.
llvm-svn: 212634