In the "LoopDispositions:" section:
- Instead of printing out a list, print out a "dictionary" to make it
obvious by inspection which disposition is for which loop. This is
just a cosmetic change.
- Print dispositions for parent _and_ sibling loops. I will use this
to write a test case.
llvm-svn: 268405
Summary:
This intrinsic is used to get flat-shaded fragment shader inputs. Those are
uniform across a primitive, but a fragment shader wave may process pixels from
multiple primitives (as indicated by the prim_mask), and so that's where
divergence can arise.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19747
llvm-svn: 268259
There are currently some bugs in tree around SCEV caching an incorrect
loop disposition. Printing out loop dispositions will let us write
whitebox tests as those are fixed.
The dispositions are printed as a list in "inside out" order,
i.e. innermost loop first.
llvm-svn: 268177
Teach Value::getPointerAlignment that allocas with no explicit alignment are aligned to preferred alignment of the allocated type.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D17569
llvm-svn: 267689
Summary:
This implements a new method of run-time checking the NoWrap
SCEV predicates, which should be easier to optimize and nicer
for targets that don't correctly handle multiplication/addition
of large integer types (like i128).
If the AddRec is {a,+,b} and the backedge taken count is c,
the idea is to check that |b| * c doesn't have unsigned overflow,
and depending on the sign of b, that:
a + |b| * c >= a (b >= 0) or
a - |b| * c <= a (b <= 0)
where the comparisons above are signed or unsigned, depending on
the flag that we're checking.
The advantage of doing this is that we avoid extending to a larger
type and we avoid the multiplication of large types (multiplying
i128 can be expensive).
Reviewers: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19266
llvm-svn: 267389
Summary:
(... while still not using a PostDomTree)
The way we use isKnownNotFullPoison from SCEV today, the new CFG walking
logic will not trigger for any realistic cases -- it will kick in only
for situations where we could have merged the contiguous basic blocks
anyway[0], since the poison generating instruction dominates all of its
non-PHI uses (which are the only uses we consider right now).
However, having this change in place will allow a later bugfix to break
fewer llvm-lit tests.
[0]: i.e. cases where block A branches to block B and B is A's only
successor and A is B's only predecessor.
Reviewers: broune, bjarke.roune
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19212
llvm-svn: 267175
Summary:
rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS
operations during DAG combine. This generates better code for truncate, so cost
of truncate needs to be changed but looks like it got changed only in SSE2 table
Whereas this change is also applicable for SSE4.1, so the cost of truncate needs
to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE
v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from
SSE4.1, so it will fall back to SSE2.
Reviewers: Simon Pilgrim
llvm-svn: 267123
Summary:
Calls to @llvm.experimental.deoptimize are expected to "never execute",
so optimize them as such.
Reviewers: chandlerc
Subscribers: junbuml, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19095
llvm-svn: 266654
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.
This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.
This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex
Reviewers: arsenm, tstellarAMD, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19013
llvm-svn: 266347
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
This is a resubmittion of 263158 change.
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 266086
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 265912
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.
Original commit message:
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265786
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Differential Revision: http://reviews.llvm.org/D18559
llvm-svn: 265589
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265535
Prior to this patch, CFLAA wouldn't tag arguments/globals properly if
it didn't find any "interesting" edges on them. This means that, if all
you do is store constants to a global or argument, we would never
actually treat it as a global/argument.
Test case:
define void @foo(i32* %A, i32* %B) #0 {
entry:
store i32 0, i32* %A, align 4
store i32 0, i32* %B, align 4
ret void
}
CFLAA would say that %A can't alias %B, because neither pointer was
used in an interesting way. This patch makes us note whether something
is an argument, global, ... regardless of how interesting CFLAA thinks
its uses are.
(For the record, using a value in an interesting way means loading
from it, using it in a GEP, ...)
llvm-svn: 265474
A seg-fault occurs due to a reference of a null pointer, which is
the value returned by getConstantPart. This function returns
null if the constant part is not found. The code that calls this
function needs to check for the null return value.
Differential Revision: http://reviews.llvm.org/D18718
llvm-svn: 265319
PPC has a vector popcount, this lets the vectorizer use the correct cost
for it. Tweak X86 test to use an intrinsic that's actually scalarized (we
have a somewhat efficient lowering for vector popcount using SSE, the
cost model finds that now).
llvm-svn: 265005
We used to only allow SCEVAddRecExpr for pointer expressions in order to
be able to compute the bounds. However this is also trivially possible
for loop-invariant addresses (scUnknown) since then the bounds are the
address itself.
Interestingly, we used allow this for the special case when the
loop-invariant address happens to also be an SCEVAddRecExpr (in an outer
loop).
There are a couple more loops that are vectorized in SPEC after this.
My guess is that the main reason we don't see more because for example a
loop-invariant load is vectorized into a splat vector with several
vector-inserts. This is likely to make the vectorization unprofitable.
I.e. we don't notice that a later LICM will move all of this out of the
loop so the cost estimate should really be 0.
llvm-svn: 264243
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).
The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, rivanvx, llvm-commits
Differential Revision: http://reviews.llvm.org/D18151
llvm-svn: 263791
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18156
llvm-svn: 263719
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18101
llvm-svn: 263440
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 263158
actually finish wiring up the old call graph.
There were bugs in the old call graph that hadn't been caught because it
wasn't being tested. It wasn't being tested because it wasn't in the
pipeline system and we didn't have a printing pass to run in tests. This
fixes all of that.
As for why I'm still keeping the old call graph alive its so that I can
port GlobalsAA to the new pass manager with out forking it to work with
the lazy call graph. That's clearly the right eventual design, but it
seems pragmatic to defer that until its necessary. The old call graph
works just fine for GlobalsAA.
llvm-svn: 263104
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.
This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.
Differential Revision: http://reviews.llvm.org/D16835
llvm-svn: 263047
Building on the previous change, this generalizes
ScalarEvolution::getRangeViaFactoring to work with
{Ext(C?A:B)+k0,+,Ext(C?A:B)+k1} where Ext can be a zero extend, sign
extend or truncate operation, and k0 and k1 are constants.
llvm-svn: 262979
This change generalizes ScalarEvolution::getRangeViaFactoring to work
with {Ext(C?A:B),+,Ext(C?A:B)} where Ext can be a zero extend, sign
extend or truncate operation.
llvm-svn: 262978
Summary:
This testcase had me confused. It made me believe that you can use
alias scopes and alias scopes list interchangeably with alias.scope and
noalias. Both langref and the other testcase use scope lists so I went
looking.
Turns out using scope directly only happens to work by chance. When
ScopedNoAliasAAResult::mayAliasInScopes traverses this as a scope list:
!1 = !{!1, !0, !"some scope"}
, the first entry is in fact a scope but only because the scope is
happened to be defined self-referentially to make it unique globally.
The remaining elements in the tuple (!0, !"some scope") are considered
as scopes but AliasScopeNode::getDomain will just bail on those without
any error.
This change avoids this ambiguity in the test but I've also been
wondering if we should issue some sort of a diagnostics.
Reviewers: dexonsmith, hfinkel
Subscribers: mssimpso, llvm-commits
Differential Revision: http://reviews.llvm.org/D16670
llvm-svn: 262841
This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment.
Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree.
For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach.
llvm-svn: 262646
After r262438 we can have provably positive NSW SCEV expressions whose
zero extensions cannot be simplified (since r262438 makes SCEV better at
computing constant ranges). This means demoting sexts of positive add
recurrences eagerly can result in an unsimplified zero extension where
we could have had a simplified sign extension. This change fixes the
issue by teaching SCEV to demote sext of a positive SCEV expression to a
zext only if the sext could not be simplified.
llvm-svn: 262638
Have ScalarEvolution::getRange re-consider cases like "{C?A:B,+,C?P:Q}"
by factoring out "C" and computing RangeOf{A,+,P} union RangeOf({B,+,Q})
instead.
The latter can be easier to compute precisely in cases like
"{C?0:N,+,C?1:-1}" N is the backedge taken count of the loop; since in
such cases the latter form simplifies to [0,N+1) union [0,N+1).
llvm-svn: 262438
it to actually test the new pass manager AA wiring.
This patch was extracted from the (somewhat too large) D12357 and
rebosed on top of the slightly different design of the new pass manager
AA wiring that I just landed. With this we can start testing the AA in
a thorough way with the new pass manager.
Some minor cleanups to the code in the pass was necessitated here, but
otherwise it is a very minimal change.
Differential Revision: http://reviews.llvm.org/D17372
llvm-svn: 261403
reference-edge SCCs.
This essentially builds a more normal call graph as a subgraph of the
"reference graph" that was the old model. This allows both to exist and
the different use cases to use the aspect which addresses their needs.
Specifically, the pass manager and other *ordering* constrained logic
can use the reference graph to achieve conservative order of visit,
while analyses reasoning about attributes and other properties derived
from reachability can reason about the direct call graph.
Note that this isn't necessarily complete: it doesn't model edges to
declarations or indirect calls. Those can be found by scanning the
instructions of the function if desirable, and in fact every user
currently does this in order to handle things like calls to instrinsics.
If useful, we could consider caching this information in the call graph
to save the instruction scans, but currently that doesn't seem to be
important.
An important realization for why the representation chosen here works is
that the call graph is a formal subset of the reference graph and thus
both can live within the same data structure. All SCCs of the call graph
are necessarily contained within an SCC of the reference graph, etc.
The design is to build 'RefSCC's to model SCCs of the reference graph,
and then within them more literal SCCs for the call graph.
The formation of actual call edge SCCs is not done lazily, unlike
reference edge 'RefSCC's. Instead, once a reference SCC is formed, it
directly builds the call SCCs within it and stores them in a post-order
sequence. This is used to provide a consistent platform for mutation and
update of the graph. The post-order also allows for very efficient
updates in common cases by bounding the number of nodes (and thus edges)
considered.
There is considerable common code that I'm still looking for the best
way to factor out between the various DFS implementations here. So far,
my attempts have made the code harder to read and understand despite
reducing the duplication, which seems a poor tradeoff. I've not given up
on figuring out the right way to do this, but I wanted to wait until
I at least had the system working and tested to continue attempting to
factor it differently.
This also requires introducing several new algorithms in order to handle
all of the incremental update scenarios for the more complex structure
involving two edge colorings. I've tried to comment the algorithms
sufficiently to make it clear how this is expected to work, but they may
still need more extensive documentation.
I know that there are some changes which are not strictly necessarily
coupled here. The process of developing this started out with a very
focused set of changes for the new structure of the graph and
algorithms, but subsequent changes to bring the APIs and code into
consistent and understandable patterns also ended up touching on other
aspects. There was no good way to separate these out without causing
*massive* merge conflicts. Ultimately, to a large degree this is
a rewrite of most of the core algorithms in the LCG class and so I don't
think it really matters much.
Many thanks to the careful review by Sanjoy Das!
Differential Revision: http://reviews.llvm.org/D16802
llvm-svn: 261040
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.
This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.
This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.
llvm-svn: 260813
Summary:
Passes that call `getAnalysisIfAvailable<T>` also need to call
`addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
legacy pass manager that it uses `T`. This contract was being
violated by passes that used `createLegacyPMAAResults`. This change
fixes this by exposing a helper in AliasAnalysis.h,
`addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
and does the right thing when called from `getAnalysisUsage`.
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17010
llvm-svn: 260183
IndVarSimplify assumes scAddRecExpr to be expanded in literal form instead of
canonical form by calling disableCanonicalMode after it creates SCEVExpander.
When CanonicalMode is disabled, SCEVExpander::expand should always return PHI
node for scAddRecExpr. r259736 broke the assumption.
The fix is to let SCEVExpander::expand skip the reuse Value logic if
CanonicalMode is false.
In addition, Besides IndVarSimplify, LSR pass also calls disableCanonicalMode
before doing rewrite. We can remove the original check of LSRMode in reuse
Value logic and use CanonicalMode instead.
llvm-svn: 260174
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.
Original commit message:
[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
llvm-svn: 260112
We shouldn't assert when there are no memchecks, since we
can have SCEV checks. There is already an assert covering
the case where there are no SCEV checks or memchecks.
This also changes the LAA pointer wrapping versioning test
to use the loop versioning pass (this was how I managed to
trigger the assert in the loop versioning pass).
llvm-svn: 260086
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
llvm-svn: 260085
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.
1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.
2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259736
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259662
This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken.
From the writeup in PR26071:
What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs.
This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails.
The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one.
llvm-svn: 259649
differentiate between indirect references to functions an direct calls.
This doesn't do a whole lot yet other than change the print out produced
by the analysis, but it lays the groundwork for a very major change I'm
working on next: teaching the call graph to actually be a call graph,
modeling *both* the indirect reference graph and the call graph
simultaneously. More details on that in the next patch though.
The rest of this is essentially a bunch of over-engineering that won't
be interesting until the next patch. But this also isolates essentially
all of the churn necessary to introduce the edge abstraction from the
very important behavior change necessary in order to separately model
the two graphs. So it should make review of the subsequent patch a bit
easier at the cost of making this patch seem poorly motivated. ;]
Differential Revision: http://reviews.llvm.org/D16038
llvm-svn: 259463
The computation of ICmp demanded bits is independent of the individual operand being evaluated. We simply return a mask consisting of the minimum leading zeroes of both operands.
We were incorrectly passing "I" to ComputeKnownBits - this should be "UserI->getOperand(0)". In cases where we were evaluating the 1th operand, we were taking the minimum leading zeroes of it and itself.
This should fix PR26266.
llvm-svn: 258690
Some patterns of select+compare allow us to know exactly the value of the uppermost bits in the select result. For example:
%b = icmp ugt i32 %a, 5
%c = select i1 %b, i32 2, i32 %a
Here we know that %c is bounded by 5, and therefore KnownZero = ~APInt(5).getActiveBits() = ~7.
There are several such patterns, and this patch attempts to understand a reasonable subset of them - namely when the base values are the same (as above), and when they are related by a simple (add nsw), for example (add nsw %a, 4) and %a.
llvm-svn: 257769
Summary:
Since globals may escape as function arguments (even when they have been
found to be non-escaping, because of optimizations such as memcpyoptimizer
that replaces stores with memcpy), all arguments to a function are checked
during query to make sure they are identifiable. At that time, also ensure
we return a conservative result only if the arguments don't alias to our global.
Reviewers: hfinkel, jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16140
llvm-svn: 257750
The early return seems to be missed. This causes a radical and wrong loop
optimization on powerpc. It isn't reproducible on x86_64, because
"UseInterleaved" is false.
Patch by Tim Shen.
llvm-svn: 257134
See PR25822 for a more full summary, but we were conflating the concepts of "capture" and "escape". We were proving nocapture and using that proof to infer noescape, which is not true. Escaped-ness is a function-local property - as soon as a value is used in a call argument it escapes. Capturedness is a related but distinct property. It implies a *temporally limited* escape. Consider:
static int a;
int b;
int g(int * nocapture arg);
int f() {
a = 2; // Even though a escapes to g, it is not captured so can be treated as non-escaping here.
g(&a); // But here it must be treated as escaping.
g(&b); // Now that g(&a) has returned we know it was not captured so we can treat it as non-escaping again.
}
The original commit did not sufficiently understand this nuance and so caused PR25822 and PR26046.
r248576 included both a performance improvement (which has been backed out) and a related conformance fix (which has been kept along with its testcase).
llvm-svn: 257058
Summary:
This reverts commit 5a9e526f29cf8510ab5c3d566fbdcf47ac24e1e9.
As per discussion in D15665
This also add a test case so that regression introduced by that diff are not reintroduced.
Reviewers: vaivaswatha, jmolloy, hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15919
llvm-svn: 256932
Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are:
- We don't yet have a writeonly attribute for the first argument.
- We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp.
Differential Revision: http://reviews.llvm.org/D15879
llvm-svn: 256911
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.
Differential revision: http://reviews.llvm.org/D15677
llvm-svn: 256519
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob
Subscribers: reames, mjacob, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15662
llvm-svn: 256443
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.
Differential revision: http://reviews.llvm.org/D15519
llvm-svn: 256263
This is recommit of r256028 with minor fixes in unittests:
CodeGen/Mips/eh.ll
CodeGen/Mips/insn-zero-size-bb.ll
Original commit message:
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.
llvm-svn: 256202
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.
Cost table is updated accordingly.
Differential revision: http://reviews.llvm.org/D14588
llvm-svn: 256194
Summary:
The analysis of shader inputs was completely wrong. We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15608
llvm-svn: 256082
This patch removes all getEdgeWeight() interfaces from CodeGen directory. As
getEdgeProbability() is a little more expensive than getEdgeWeight(), I will
compose a patch soon in which BPI only stores probabilities instead of edge
weights so that getEdgeProbability() will have O(1) time.
Differential revision: http://reviews.llvm.org/D15489
llvm-svn: 256039