The LoopInfo in combination with depth_first is used to enumerate the
loops.
Right now -analyze is not yet complete. It only prints the result of
the analysis, the report and the run-time checks. Printing the unsafe
depedences will require a bit more reshuffling which I'd like to do in a
follow-on to this patchset. Unsafe dependences are currently checked
via -debug-only=loop-accesses in the new test.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229633
The only difference between these two is that VectorizerReport adds a
vectorizer-specific prefix to its messages. When LAA is used in the
vectorizer context the prefix is added when we promote the
LoopAccessReport into a VectorizerReport via one of the constructors.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229632
When I split out LoopAccessReport from this, I need to create some temps
so constness becomes necessary.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229631
This allows the analysis to be attempted with any loop. This feature
will be used with -analysis. (LV only requests the analysis on loops
that have already satisfied these tests.)
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229630
Will be used by the new RuntimePointerCheck::print.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229629
Also add pass name as an argument to VectorizationReport::emitAnalysis.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229628
This is a function pass that runs the analysis on demand. The analysis
can be initiated by querying the loop access info via LAA::getInfo. It
either returns the cached info or runs the analysis.
Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now. The idea is that Loop Distribution won't support run-time stride
checking at least initially.
This means that when querying the analysis, symbolic stride information
can be provided optionally. Whether stride information is used can
invalidate the cache entry and rerun the analysis. Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.
Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.
On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass. A large chunk of the
diff is due to LAI becoming a pointer from a reference.
A test will be added as part of the -analyze patch.
Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229626
blockNeedsPredication is in LoopAccess in order to share it with the
vectorizer. It's a utility needed by LoopAccess not strictly provided
by it but it's a good place to share it. This makes the function static
so that it no longer required to create an LoopAccessInfo instance in
order to access it from LV.
This was actually causing problems because it would have required
creating LAI much earlier that LV::canVectorizeMemory().
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229625
LAA will be an on-demand analysis pass, so we need to cache the result
of the analysis. canVectorizeMemory is renamed to analyzeLoop which
computes the result. canVectorizeMemory becomes the query function for
the cached result.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229624
The transformation passes will query this and then emit them as part of
their own report. The currently only user LV is modified to do just
that.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229623
As LAA is becoming a pass, we can no longer pass the params to its
constructor. This changes the command line flags to have external
storage. These can now be accessed both from LV and LAA.
VectorizerParams is moved out of LoopAccessInfo in order to shorten the
code to access it.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229622
LoopAccessAnalysis will be used as the name of the pass.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229621
extensions.
This change also removes `DEBUG(dbgs() << "SCEV: untested prestart
overflow check\n");` because that case has a unit test now.
Differential Revision: http://reviews.llvm.org/D7645
llvm-svn: 229600
I could not come up with a test case for this one; but I don't think
`getPreStartForSignExtend` can assume `AR` is `nsw` -- there is one
place in scalar evolution that calls `getSignExtendAddRecStart(AR,
...)` without proving that `AR` is `nsw`
(line 1564)
OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getZeroExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {
// If AR wraps around then
//
// abs(Step) * MaxBECount > unsigned-max(AR->getType())
// => SAdd != OperandExtendedAdd
//
// Thus (AR is not NW => SAdd != OperandExtendedAdd) <=>
// (SAdd == OperandExtendedAdd => AR is NW)
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(getSignExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty),
L, AR->getNoWrapFlags());
}
Differential Revision: http://reviews.llvm.org/D7640
llvm-svn: 229594
This change is a logical suspect in 22587 and 22590. Given it's of minimal importanance and I can't get clang to build on my home machine, I'm reverting so that I can deal with this next week.
llvm-svn: 229322
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229192
Two minor tweaks I noticed when reading through the code:
- No need to recompute begin() on every iteration. We're not modifying the instructions in this loop.
- We can ignore PHINodes and Dbg intrinsics. The current code does this anyways, but it will spend slightly more time doing so and will count towards the limit of instructions in the block. It seems really silly to give up due the presence of PHIs...
Differential Revision: http://reviews.llvm.org/D7624
llvm-svn: 229175
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
llvm-svn: 229094
Summary:
Instances of the AssumptionCache are per function, so we can't re-use
the same AssumptionCache instance when recursing in the CallAnalyzer to
analyze a different function. Instead we have to pass the
AssumptionCacheTracker to the CallAnalyzer so it can get the right
AssumptionCache on demand.
Reviewers: hfinkel
Subscribers: llvm-commits, hans
Differential Revision: http://reviews.llvm.org/D7533
llvm-svn: 228957
We would crash if we couldn't locate a Function that either Location's
Value belonged to. Now we just print out a debug message and return
conservatively.
llvm-svn: 228901
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.
Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman
llvm-svn: 228798
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.
Also add some landingpads to invalid LLVM IR test cases that lack them.
Over-the-shoulder reviewed by David Majnemer.
llvm-svn: 228782
Without a valid data layout, deferenceable(N) doesn't get parsed or
propagated. Since this is the key item we are testing, add a dependency
on the pass.
Differential Revision: http://reviews.llvm.org/D7508
llvm-svn: 228611
Make use of the newly introduced inst_range to clean up two loops. Clean
up a third one while at it.
Differential Revision: http://reviews.llvm.org/D7455
llvm-svn: 228596
When creating a scev for sext({X,+,Y}), scev checks if the expression
is equivalent to {sext X,+,zext Y}. If it can prove that, it also
tags the original {X,+,Y} as <nsw>, which is not correct.
In the test case I run `-scalar-evolution` twice because the bug
manifests only once SCEV has run through and seen the `sext`
expressions (and then does a in-place mutation on {X,+,Y}).
Differential Revision: http://reviews.llvm.org/D7495
llvm-svn: 228586
For the attached test case different types are used in the ICmpInst
and SelectInst that represent the min/max expressions. However, if the
ICmpInst type is smaller a comparison with the sign/zero extended
operands would have yielded the same result. This situation might
arise after the instruction combination pass was applied.
Differential Revision: http://reviews.llvm.org/D7338
llvm-svn: 228572
add recurrences don't overflow.
This change makes the optimization more restrictive. It still assumes
that an overflowing `add nsw` is undefined behavior; and this change
will need revisiting once we have a consistent semantics for poison
values.
Differential Revision: http://reviews.llvm.org/D7331
llvm-svn: 228552
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7490
llvm-svn: 228525
different fields.
We can show that two GEPs off of the same (possibly multidimensional)
array of structs, into different fields, can't alias. Quoting:
For two GEPOperators GEP1 and GEP2, if we find that:
- both GEPs begin indexing from the exact same pointer;
- the last indices in both GEPs are constants, indexing into a struct;
- said indices are different, hence,the pointed-to fields are different;
- and both GEPs only index through arrays prior to that;
this lets us determine that the struct that GEP1 indexes into and the
struct that GEP2 indexes into must either precisely overlap or be
completely disjoint. Because they cannot partially overlap, indexing
into different non-overlapping fields of the struct will never alias.
The other BasicAA::aliasGEP rules worked in some cases, but not all
(for example, the i32x3 struct in the testcase).
We can add this simple ad-hoc rule to complement them.
rdar://19717375
Differential Revision: http://reviews.llvm.org/D7453
llvm-svn: 228498
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).
Example:
float foo(float *a, float b) {
float r;
if (a[1] * b)
r = /* a lot of expensive computations */;
else
r = 1;
return r;
}
float boo(float *a) {
return foo(a, 0.0);
}
Without this patch, we don't inline 'foo' into 'boo'.
llvm-svn: 228432
This will allow it to be shared with the new Loop Distribution pass.
getFirstInst is currently duplicated across LoopVectorize.cpp and
LoopAccessAnalysis.cpp. This is a short-term work-around until we figure out
a better solution.
NFC. (The code moved is adjusted a bit for the name of the Loop member and
that PtrRtCheck is now a reference rather than a pointer.)
llvm-svn: 228418
Since testing the function indirectly is tricky, introduce a direct
print-memderefs pass, in the same spirit as print-memdeps, which prints
dereferenceability information matched by FileCheck.
Differential Revision: http://reviews.llvm.org/D7075
llvm-svn: 228369
Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6936
llvm-svn: 228263
Other than moving code and adding the boilerplate for the new files, the code
being moved is unchanged.
There are a few global functions that are shared with the rest of the
LoopVectorizer. I moved these to the new module as well (emitLoopAnalysis,
stripIntegerCast, replaceSymbolicStrideSCEV) along with the Report class used
by emitLoopAnalysis. There is probably room for further improvement in this
area.
I kept DEBUG_TYPE "loop-vectorize" because it's used as the PassName with
emitOptimizationRemarkAnalysis. This will obviously have to change.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227756
terms of the new pass manager's TargetIRAnalysis.
Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.
The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.
One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.
llvm-svn: 227731
getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.
The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.
llvm-svn: 227730
produce it.
This adds a function to the TargetMachine that produces this analysis
via a callback for each function. This in turn faves the way to produce
a *different* TTI per-function with the correct subtarget cached.
I've also done the necessary wiring in the opt tool to thread the target
machine down and make it available to the pass registry so that we can
construct this analysis from a target machine when available.
llvm-svn: 227721
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.
This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.
I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.
With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.
llvm-svn: 227685
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
- the predicate is olt or uge
- the first operand is provably not less than zero
- the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..
http://reviews.llvm.org/D6972
llvm-svn: 227298
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7214
llvm-svn: 227284
According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile.
The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes. I will be submitting a PRE w/volatiles test case seperately in the near future.
Differential Revision: http://reviews.llvm.org/D6901
llvm-svn: 227112
This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block.
Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that.
Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself.
Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well.
Differential Revision: http://reviews.llvm.org/D6895
llvm-svn: 227110
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
I had already factored this analysis specifically to enable doing this,
but hadn't actually committed the necessary wiring to get at this from
the new pass manager. This also nicely shows how the separate cache
object can be directly managed by the new pass manager.
This analysis didn't have any direct tests and so I've added a printer
pass and a boring test case. I chose to print the i1 value which is
being assumed rather than the call to llvm.assume as that seems much
more useful for testing... but suggestions on an even better printing
strategy welcome. My main goal was to make sure things actually work. =]
llvm-svn: 226868
Specifically, gc.result benefits from this greatly. Instead of:
gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...
We now have a gc.result.* that can specialize to literally any type.
Differential Revision: http://reviews.llvm.org/D7020
llvm-svn: 226857
ScalarEvolution currently lowers a subtraction recurrence to an add
recurrence with the same no-wrap flags as the subtraction. This is
incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y`
and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch
fixes the issue, and adds two test cases demonstrating the bug.
Differential Revision: http://reviews.llvm.org/D7081
llvm-svn: 226755
pass and a LoopPrinterPass with the expected associated wiring.
I've added a RUN line to the only test case (!!!) we have that actually
prints loops. Everything seems to be working.
This is somewhat exciting as this is the first analysis using another
analysis to go in for the new pass manager. =D I also believe it is the
last analysis necessary for porting instcombine, but of course I may yet
discover more.
llvm-svn: 226560
cleaner to derive from the generic base.
Thise removes a ton of boiler plate code and somewhat strange and
pointless indirections. It also remove a bunch of the previously needed
friend declarations. To fully remove these, I also lifted the verify
logic into the generic LoopInfoBase, which seems good anyways -- it is
generic and useful logic even for the machine side.
llvm-svn: 226385
unused variables in a no-asserts build.
I've fixed this by putting the entire loop behind an #ifndef as it
contains nothing other than asserts.
llvm-svn: 226377
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.
This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.
llvm-svn: 226373
TargetLibraryAnalysis pass.
There are actually no direct tests of this already in the tree. I've
added the most basic test that the pass manager bits themselves work,
and the TLI object produced will be tested by an upcoming patches as
they port passes which rely on TLI.
This is starting to point out the awkwardness of the invalidate API --
it seems poorly fitting on the *result* object. I suspect I will change
it to live on the analysis instead, but that's not for this change, and
I'd rather have a few more passes ported in order to have more
experience with how this plays out.
I believe there is only one more analysis required in order to start
porting instcombine. =]
llvm-svn: 226160
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.
Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.
llvm-svn: 226157
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
it's defined in the current module. Clang generates this situation for the
C++14 sized deallocation functions, because it generates a weak definition in
case one isn't provided by the C++ runtime library.
llvm-svn: 226069
utils/sort_includes.py.
I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.
llvm-svn: 225974
a print method.
This was formulated on a bad idea, but sadly I didn't uncover how bad
this was until I got further down the path. I had hoped that we could
provide a low boilerplate way of printing analyses, but it just doesn't
seem like this really fits the needs of the analyses. Not all analyses
really want to do printing, and those that do don't all use the same
interface. Instead, with the new pass manager let's just take advantage
of the fact that creating an explicit printer pass like the LCG has is
pretty low boilerplate already and rely on that for testing.
llvm-svn: 225861
I'm adding generic analysis printing utility pass support which will
require such a method (or a specialization) so this will let the
existing printing logic satisfy that.
llvm-svn: 225854
Even before I sunk the debug flag into the opt tool this had been made
obsolete by factoring the pass and analysis managers into a single set
of templates that all used the core flag. No functionality changed here.
llvm-svn: 225842
the generic functionality of the pass managers themselves.
In the new infrastructure, the pass "manager" isn't actually interesting
at all. It just pipelines a single chunk of IR through N passes. We
don't need to know anything about the IR or the passes to do this really
and we can replace the 3 implementations of the exact same functionality
with a single generic PassManager template, complementing the single
generic AnalysisManager template.
I've left typedefs in place to give convenient names to the various
obvious instantiations of the template.
With this, I think I've nuked almost all of the redundant logic in the
managers, and I think the overall design is actually simpler for having
single templates that clearly indicate there is no special logic here.
The logging is made somewhat more annoying by this change, but I don't
think the difference is worth having heavy-weight traits to help log
things.
llvm-svn: 225783
The functions {pred,succ,use,user}_{begin,end} exist, but many users
have to check *_begin() with *_end() by hand to determine if the
BasicBlock or User is empty. Fix this with a standard *_empty(),
demonstrating a few usecases.
llvm-svn: 225760
template.
This consolidates three copies of nearly the same core logic. It adds
"complexity" to the ModuleAnalysisManager in that it makes it possible
to share a ModuleAnalysisManager across multiple modules... But it does
so by deleting *all of the code*, so I'm OK with that. This will
naturally make fixing bugs in this code much simpler, etc.
The only down side here is that we have to use 'typename' and 'this->'
in various places, and the implementation is lifted into the header.
I'll take that for the code size reduction.
The convenient names are still typedef-ed and used throughout so that
users can largely ignore this aspect of the implementation.
The follow-up change to this will do the exact same refactoring for the
PassManagers. =D
It turns out that the interesting different code is almost entirely in
the adaptors. At the end, that should be essentially all that is left.
llvm-svn: 225757
so has clang-format. Notably, this fixes a bunch of formatting in the
CGSCC pass manager side of things that has been improved in clang-format
recently.
llvm-svn: 225743
Previously, MemDepPrinter handled volatile and unordered accesses without involving MemoryDependencyAnalysis. By making a slight tweak to the documented interface - which is respected by both callers - we can move this responsibility to MDA for the benefit of any future callers. This is basically just cleanup.
In the future, we may decide to extend MDA's non local dependency analysis to return useful results for ordered or volatile loads. I believe (but have not really checked in detail) that local dependency analyis does get useful results for ordered, but not volatile, loads.
llvm-svn: 225483
Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all.
I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here.
The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores.
llvm-svn: 225481
passes too many time.
I think this is actually the issue that someone raised with me at the
developer's meeting and in an email, but that we never really got to the
bottom of. Having all the testing utilities made it much easier to dig
down and uncover the core issue.
When a pass manager is running many passes over a single function, we
need it to invalidate the analyses between each run so that they can be
re-computed as needed. We also need to track the intersection of
preserved higher-level analyses across all the passes that we run (for
example, if there is one module analysis which all the function analyses
preserve, we want to track that and propagate it). Unfortunately, this
interacted poorly with any enclosing pass adaptor between two IR units.
It would see the intersection of preserved analyses, and need to
invalidate any other analyses, but some of the un-preserved analyses
might have already been invalidated *and recomputed*! We would fail to
propagate the fact that the analysis had already been invalidated.
The solution to this struck me as really strange at first, but the more
I thought about it, the more natural it seemed. After a nice discussion
with Duncan about it on IRC, it seemed even nicer. The idea is that
invalidating an analysis *causes* it to be preserved! Preserving the
lack of result is trivial. If it is recomputed, great. Until something
*else* invalidates it again, we're good.
The consequence of this is that the invalidate methods on the analysis
manager which operate over many passes now consume their
PreservedAnalyses object, update it to "preserve" every analysis pass to
which it delivers an invalidation (regardless of whether the pass
chooses to be removed, or handles the invalidation itself by updating
itself). Then we return this augmented set from the invalidate routine,
letting the pass manager take the result and use the intersection of
*that* across each pass run to compute the final preserved set. This
accounts for all the places where the early invalidation of an analysis
has already "preserved" it for a future run.
I've beefed up the testing and adjusted the assertions to show that we
no longer repeatedly invalidate or compute the analyses across nested
pass managers.
llvm-svn: 225333
WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as
computeOverflowForUnsignedAdd. It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.
llvm-svn: 225329
a specific analysis result.
This is quite handy to test things, and will also likely be very useful
for debugging issues. You could narrow down pass validation failures by
walking these invalidate pass runs up and down the pass pipeline, etc.
I've added support to the pass pipeline parsing to be able to create one
of these for any analysis pass desired.
Just adding this class uncovered one latent bug where the
AnalysisManager CRTP base class had a hard-coded Module type rather than
using IRUnitT.
I've also added tests for invalidation and caching of analyses in
a basic way across all the pass managers. These in turn uncovered two
more bugs where we failed to correctly invalidate an analysis -- its
results were invalidated but the key for re-running the pass was never
cleared and so it was never re-run. Quite nasty. I'm very glad to debug
this here rather than with a full system.
Also, yes, the naming here is horrid. I'm going to update some of the
names to be slightly less awful shortly. But really, I've no "good"
ideas for naming. I'll be satisfied if I can get it to "not bad".
llvm-svn: 225246
manager tests to use them and be significantly more comprehensive.
This, naturally, uncovered a bug where the CGSCC pass manager wasn't
printing analyses when they were run.
The only remaining core manipulator is I think an invalidate pass
similar to the require pass. That'll be next. =]
llvm-svn: 225240
when all are being preserved.
We want to short-circuit this for a couple of reasons. One, I don't
really want passes to grow a dependency on actually receiving their
invalidate call when they've been preserved. I'm thinking about removing
this entirely. But more importantly, preserving everything is likely to
be the common case in a lot of scenarios, and it would be really good to
bypass all of the invalidation and preservation machinery there.
Avoiding calling N opaque functions to try to invalidate things that are
by definition still valid seems important. =]
This wasn't really inpsired by much other than seeing the spam in the
logging for analyses, but it seems better ot get it checked in rather
than forgetting about it.
llvm-svn: 225163
manager.
This starts to allow us to test analyses more easily, but it's really
only the beginning. Some of the code here is still untestable without
manual changes to create analysis passes, but I wanted to factor it into
a small of chunks as possible.
Next up in order to be able to test things are, in no particular order:
- No-op analyses passes so we don't have to use real ones to exercise
the pass maneger itself.
- Automatic way of generating dummy passes that require an analysis be
run, including a variant that calls a 'print' method on a pass to make
it even easier to print out the results of an analysis.
- Dummy passes that invalidate all analyses for their IR unit so we can
test invalidation and re-runs.
- Automatic way to print each analysis pass as it is re-run.
- Automatic but optional verification of analysis passes everywhere
possible.
I'm not claiming I'll get to all of these immediately, but that's what
is in the pipeline at some stage. I'm fleshing out exactly what I need
and what to prioritize by working on converting analyses and then trying
to test the conversion. =]
llvm-svn: 225162
units.
This was debated back and forth a bunch, but using references is now
clearly cleaner. Of all the code written using pointers thus far, in
only one place did it really make more sense to have a pointer. In most
cases, this just removes immediate dereferencing from the code. I think
it is much better to get errors on null IR units earlier, potentially
at compile time, than to delay it.
Most notably, the legacy pass manager uses references for its routines
and so as more and more code works with both, the use of pointers was
likely to become really annoying. I noticed this when I ported the
domtree analysis over and wrote the entire thing with references only to
have it fail to compile. =/ It seemed better to switch now than to
delay. We can, of course, revisit this is we learn that references are
really problematic in the API.
llvm-svn: 225145
from before I removed thet non-const use of the function.
The unused variable that held the const_cast was already kindly removed
by Michael.
llvm-svn: 225143
a cache of assumptions for a single function, and an immutable pass that
manages those caches.
The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.
Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.
For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.
llvm-svn: 225131
PHI nodes can have zero operands in the middle of a transform. It is
expected that utilities in Analysis don't freak out when this happens.
Note that it is considered invalid to allow these misshapen phi nodes to
make it to another pass.
This fixes PR22086.
llvm-svn: 225126
We would sometimes leave the out-param APInts untouched while going
through computeKnownBits. While I don't know of a way to trigger a bug
involving this in practice, it goes against the overall design of
computeKnownBits.
Found via code inspection.
llvm-svn: 225109
WillNotOverflowUnsignedMul's smarts will live in ValueTracking as
computeOverflowForUnsignedMul. It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.
llvm-svn: 225076
GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation.
This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects.
Differential Revision: http://reviews.llvm.org/D6758
llvm-svn: 224765
(X & INT_MIN) ? X & INT_MAX : X into X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX into X
(X & INT_MIN) ? X | INT_MIN : X into X
(X & INT_MIN) ? X : X | INT_MIN into X | INT_MIN
llvm-svn: 224669
We can always choose an value for undef which might cause %V to shift
out an important bit except for one case, when %V is zero.
However, shl behaves like an identity function when the right hand side
is zero.
llvm-svn: 224405
isKnownPredicate.
The motivation for this change is to optimize away checks in loops
like this:
limit = min(t, len)
for (i = 0 to limit)
if (i >= len || i < 0) throw_array_of_of_bounds();
a[i] = ...
Differential Revision: http://reviews.llvm.org/D6635
llvm-svn: 224285
- by Ella Bolshinsky
The alias analysis is used define whether the given instruction
is a barrier for store sinking. For 2 identical stores, following
instructions are checked in the both basic blocks, to determine
whether they are sinking barriers.
http://reviews.llvm.org/D6420
llvm-svn: 224247
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532. Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other
sub-projects, I apologize in advance :(. Help me compile it on Darwin
I'll try to fix it. FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes:
`MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from
the `Value` class hierarchy. It is typeless -- i.e., instances do
*not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph
construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for
`replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the
result of `MDNode::getTemporary()` -- it maintains a side map of its
uses and can RAUW itself. Once the forward declarations are fully
resolved RAUW support is dropped on the ground. This means that
uniquing collisions on changing operands cause nodes to become
"distinct". (This already happened fairly commonly, whenever an
operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles,
you need to call `MDNode::resolveCycles()` on each node (or on a
top-level node that somehow references all of the nodes). Also,
don't do that. Metadata cycles (and the RAUW machinery needed to
construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called
`ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known
to be, e.g., `ConstantInt`, takes three steps: first, cast from
`Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
metadata schema owners transition away from using `Constant`s when
the type isn't important (and they don't care about referring to
`GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst`
namespace that matches semantics with the old code, in order to
avoid adding the error-prone three-step equivalent to every call
site. If your old code was:
MDNode *N = foo();
bar(isa <ConstantInt>(N->getOperand(0)));
baz(cast <ConstantInt>(N->getOperand(1)));
bak(cast_or_null <ConstantInt>(N->getOperand(2)));
bat(dyn_cast <ConstantInt>(N->getOperand(3)));
bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo();
bar(mdconst::hasa <ConstantInt>(N->getOperand(0)));
baz(mdconst::extract <ConstantInt>(N->getOperand(1)));
bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2)));
bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3)));
bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo();
bar(isa <MDInt>(N->getOperand(0)));
baz(cast <MDInt>(N->getOperand(1)));
bak(cast_or_null <MDInt>(N->getOperand(2)));
bat(dyn_cast <MDInt>(N->getOperand(3)));
bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to
metadata through a bridge called `MetadataAsValue`. This is a
subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a
`LocalAsMetadata`, which is a bridged form of non-`Constant` values
like `Argument` and `Instruction`. It can also refer to any other
`Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)
llvm-svn: 223802
When a loop gets bundled up, its outgoing edges are quite large, and can
just barely overflow 64-bits. If one successor has multiple incoming
edges -- and that successor is getting all the incoming mass --
combining just its edges can overflow. Handle that by saturating rather
than asserting.
This fixes PR21622.
llvm-svn: 223500
Reapply r223347, with a fix to not crash on uninserted instructions (or more
precisely, instructions in uninserted blocks). bugpoint was able to reduce the
test case somewhat, but it is still somewhat large (and relies on setting
things up to be simplified during inlining), so I've not included it here.
Nevertheless, it is clear what is going on and why.
Original commit message:
Restrict somewhat the memory-allocation pointer cmp opt from r223093
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.
llvm-svn: 223371
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 223348
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.
llvm-svn: 223347
System memory allocation functions, which are identified at the IR level by the
noalias attribute on the return value, must return a pointer into a memory region
disjoint from any other memory accessible to the caller. We can use this
property to simplify pointer comparisons between allocated memory and local
stack addresses and the addresses of global variables. Neither the stack nor
global variables can overlap with the region used by the memory allocator.
Fixes PR21556.
llvm-svn: 223093
The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations.
A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.
This is the first patch in a small series. This patch contains only the IR parts; the documentation and backend support will be following separately. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
Reviewed by: atrick, ributzka
llvm-svn: 223078
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
llvm-svn: 222936
This restores our ability to optimize:
(X & C) ? X & ~C : X into X & ~C
(X & C) ? X : X & ~C into X
(X & C) ? X | C : X into X
(X & C) ? X : X | C into X | C
llvm-svn: 222868
If solveBlockValue() needs results from predecessors that are not already
computed, it returns false with the intention of resuming when the dependencies
have been resolved. However, the computation would never be resumed since an
'overdefined' result had been placed in the cache, preventing any further
computation.
The point of placing the 'overdefined' result in the cache seems to have been
to break cycles, but we can check for that when inserting work items in the
BlockValue stack instead. This makes the "stop and resume" mechanism of
solveBlockValue() work as intended, unlocking more analysis.
Using this patch shaves 120 KB off a 64-bit Chromium build on Linux.
I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in
compile time.
Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me.
Differential Revision: http://reviews.llvm.org/D6397
llvm-svn: 222768
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.
Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
llvm-svn: 222739
This handles cases where we are comparing a masked value against itself.
The analysis could be further improved by making it recursive but such
expense is not currently justified.
llvm-svn: 222716
We were matching against the assume intrinsic in every check. Since we know that it must be an assume, this is just wasted work. Somewhat surprisingly, matching an intrinsic id is actually relatively expensive. It devolves to a string construction and comparison in Function::isIntrinsic.
I originally spotted this because it showed up in a performance profile of my compiler. I've since discovered a separate issue which seems to be the actual root cause, but this is minor perf goodness regardless.
I'm likely to follow up with another change to factor out the comparison matching. There's no need to match the compare instruction in every single one of the tests.
Differential Revision: http://reviews.llvm.org/D6312
llvm-svn: 222709
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 222632
AliasSetTracker::addUnknown may create an AliasSet devoid of pointers
just to contain an instruction if no suitable AliasSet already exists.
It will then AliasSet::addUnknownInst and we will be done.
However, it's possible for addUnknown to choose an existing AliasSet to
addUnknownInst.
If this were to occur, we are in a bit of a pickle: removing pointers
from the AliasSet can cause the entire AliasSet to become destroyed,
taking our unknown instructions out with them.
Instead, keep track whether or not our AliasSet has any unknown
instructions.
This fixes PR21582.
llvm-svn: 222338
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
SCEVDivision::divide constructed an object of SCEVDivision<Derived>
instead of Derived. divide would call visit which would cast the
SCEVDivision<Derived> to type Derived. As it happens,
SCEVDivision<Derived> and Derived currently have the same layout but
this is fragile and grounds for UB.
Instead, just construct Derived. No functional change intended.
llvm-svn: 222126
It turns out that not all users of SCEVDivision want the same
signedness. Let the users determine which operation they'd like by
explicitly choosing SCEVUDivision or SCEVSDivision.
findArrayDimensions and computeAccessFunctions will use SCEVSDivision
while HowFarToZero will use SCEVUDivision.
llvm-svn: 222104
Summary:
Several places in DependenceAnalysis assumes both SCEVs in a subscript pair
share the same integer type. For instance, isKnownPredicate calls
SE->getMinusSCEV(X, Y) which asserts X and Y share the same type. However,
DependenceAnalysis fails to ensure this assumption when producing a subscript
pair, causing tests such as NonCanonicalizedSubscript to crash. With this
patch, DependenceAnalysis runs unifySubscriptType before producing any
subscript pair, ensuring the assumption.
Test Plan:
Added NonCanonicalizedSubscript.ll on which DependenceAnalysis before the fix
crashed because subscripts have different types.
Reviewers: spop, sebpop, jingyue
Reviewed By: jingyue
Subscribers: eliben, meheff, llvm-commits
Differential Revision: http://reviews.llvm.org/D6289
llvm-svn: 222100
HowFarToZero was supposed to use unsigned division in order to calculate
the backedge taken count. However, SCEVDivision::divide performs signed
division. Unless I am mistaken, no users of SCEVDivision actually want
signed arithmetic: switch to udiv and urem.
This fixes PR21578.
llvm-svn: 222093
A few things:
- computeKnownBits is relatively expensive, let's delay its use as long
as we can.
- Don't create two APInt values just to run computeKnownBits on a
ConstantInt, we already know the exact value!
- Avoid creating a temporary APInt value in order to calculate unary
negation.
llvm-svn: 222092
Private variables are can be renamed, so it is not reliable to make
decisions on the name.
The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).
This patch changes one case where we were looking at the variable name to use
the section instead.
Test tuning by Michael Gottesman.
llvm-svn: 222061
Let's try this again...
This reverts r219432, plus a bug fix.
Description of the bug in r219432 (by Nick):
The bug was using AllPositive to break out of the loop; if the loop break
condition i != e is changed to i != e && AllPositive then the
test_modulo_analysis_with_global test I've added will fail as the Modulo will
be calculated incorrectly (as the last loop iteration is skipped, so Modulo
isn't updated with its Scale).
Nick also adds this comment:
ComputeSignBit is safe to use in loops as it takes into account phi nodes, and
the == EK_ZeroEx check is safe in loops as, no matter how the variable changes
between iterations, zero-extensions will always guarantee a zero sign bit. The
isValueEqualInPotentialCycles check is therefore definitely not needed as all
the variable analysis holds no matter how the variables change between loop
iterations.
And this patch also adds another enhancement to GetLinearExpression - basically
to convert ConstantInts to Offsets (see test_const_eval and
test_const_eval_scaled for the situations this improves).
Original commit message:
This reverts r218944, which reverted r218714, plus a bug fix.
Description of the bug in r218714 (by Nick):
The original patch forgot to check if the Scale in VariableGEPIndex flipped the
sign of the variable. The BasicAA pass iterates over the instructions in the
order they appear in the function, and so BasicAliasAnalysis::aliasGEP is
called with the variable it first comes across as parameter GEP1. Adding a
%reorder label puts the definition of %a after %b so aliasGEP is called with %b
as the first parameter and %a as the second. aliasGEP later calculates that %a
== %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first
parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) -
ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly
conclude that %a > %b.
Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug.
Slightly modified by me to add an early exit from the loop and avoid
unnecessary, but expensive, function calls.
Original commit message:
Two related things:
1. Fixes a bug when calculating the offset in GetLinearExpression. The code
previously used zext to extend the offset, so negative offsets were converted
to large positive ones.
2. Enhance aliasGEP to deduce that, if the difference between two GEP
allocations is positive and all the variables that govern the offset are also
positive (i.e. the offset is strictly after the higher base pointer), then
locations that fit in the gap between the two base pointers are NoAlias.
Patch by Nick White!
llvm-svn: 221876
If x is known to have the range [a, b), in a loop predicated by (icmp
ne x, a) its range can be sharpened to [a + 1, b). Get
ScalarEvolution and hence IndVars to exploit this fact.
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.
This change was originally landed in r219834 but had a bug and broke
ASan. It was reverted in r219878, and is now being re-landed after
fixing the original bug.
phabricator: http://reviews.llvm.org/D5639
reviewed by: atrick
llvm-svn: 221839
Make the handling of calls to intrinsics in CGSCC consistent:
they are not treated like regular function calls because they
are never lowered to function calls.
Without this patch, we can get dangling pointer asserts from
the subsequent loop that processes callsites because it already
ignores intrinsics.
See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion.
Differential Revision: http://reviews.llvm.org/D6124
llvm-svn: 221802
Instead, we're going to separate metadata from the Value hierarchy. See
PR21532.
This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.
llvm-svn: 221711
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.
This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.
Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.
Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
Exact shifts may not shift out any non-zero bits. Use computeKnownBits
to determine when this occurs and just return the left hand side.
This fixes PR21477.
llvm-svn: 221325
Divides and remainder operations do not behave like other operations
when they are given poison: they turn into undefined behavior.
It's really hard to know if the operands going into a div are or are not
poison. Because of this, we should only choose to speculate if there
are constant operands which we can easily reason about.
This fixes PR21412.
llvm-svn: 221318
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.
This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.
llvm-svn: 221203
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.
Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.
llvm-svn: 221024
In a case where we have a no {un,}signed wrap flag on the increment, if
RHS - Start is constant then we can avoid inserting a max operation bewteen
the two, since we can statically determine which is greater.
This allows us to unroll loops such as:
void testcase3(int v) {
for (int i=v; i<=v+1; ++i)
f(i);
}
llvm-svn: 220960
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes. This helps to remove redundant checks and canonicalize checks for other optimization passes. This particular patch checks whether a value is known to be non-zero from the range metadata.
Currently, these tests are against InstCombine. In theory, all of these should be InstSimplify since we're not inserting any new instructions. Moving the code may follow in a separate change.
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947
llvm-svn: 220925
ConstantFolding crashes when trying to InstSimplify the following load:
@a = private unnamed_addr constant %mst {
i8* inttoptr (i64 -1 to i8*),
i8* inttoptr (i64 -1 to i8*)
}, align 8
%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8
This patch fix this by adding support to this type of folding:
%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8
==> gets folded to:
%x = <2 x i8*> <i8* inttoptr (i64 -1 to i8*), i8* inttoptr (i64 -1 to i8*)>
llvm-svn: 220380
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.
llvm-svn: 220341
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
llvm-svn: 220277
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types:
+ MD_nontemporal = 9, // "nontemporal"
+ MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access"
+ MD_nonnull = 11 // "nonnull"
I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update.
llvm-svn: 220248
The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns. Long term, it would be nice to combine these into a single construct. The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull.
Reviewed by: Hal Finkel
Differential Revision: http://reviews.llvm.org/D5220
llvm-svn: 220240
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.
I've added tests for both the alloca and global case here.
llvm-svn: 220190
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.
I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.
This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.
llvm-svn: 220178
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.
This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.
llvm-svn: 220164
by my refactoring of this code.
The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.
This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.
llvm-svn: 220161
make much more sense and in theory be more correct.
If you trace the code alllll the way back to when it was first
introduced, the comments make it slightly more clear what was going on
here. At that time, the only way Base != V was if DL (then TD) was
non-null. As a consequence, if DL *was* null, that meant we were loading
directly from the alloca or global found above the test. After
refactoring, this has become at least terribly subtle and potentially
incorrect. There are many forms of pointer manipulation that can be
traversed without DataLayout, and some of them would in fact change the
size of object being loaded vs. allocated.
Rather than this subtlety, I've hoisted the actual 'return true' bits
into the code which actually found an alloca or global and based them on
the loaded pointer being that alloca or global. This is both more clear
and safer. I've also added comments about exactly why this set of
predicates is used.
I've also corrected a misleading comment about globals -- if overridden
they may not just have a different size, they may be null and completely
unsafe to load from!
Hopefully this confuses the next reader a bit less. I don't have any
test cases or anything, the patch is motivated strictly to improve the
readability of the code.
llvm-svn: 220156
Philip Reames and I had a long conversation about this, mostly because it is
not obvious why the current logic is correct. Hopefully, these comments will
prevent such confusion in the future.
llvm-svn: 219882
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b). Get
ScalarEvolution and hence IndVars to exploit this fact.
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.
phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
We need to make sure that we visit all operands of an instruction before moving
deeper in the operand graph. We had been pushing operands onto the back of the work
set, and popping them off the back as well, meaning that we might visit an
instruction before visiting all of its uses that sit in between it and the call
to @llvm.assume.
To provide an explicit example, given the following:
%q0 = extractelement <4 x float> %rd, i32 0
%q1 = extractelement <4 x float> %rd, i32 1
%q2 = extractelement <4 x float> %rd, i32 2
%q3 = extractelement <4 x float> %rd, i32 3
%q4 = fadd float %q0, %q1
%q5 = fadd float %q2, %q3
%q6 = fadd float %q4, %q5
%qi = fcmp olt float %q6, %q5
call void @llvm.assume(i1 %qi)
%q5 is used by both %qi and %q6. When we visit %qi, it will be marked as
ephemeral, and we'll queue %q6 and %q5. %q6 will be marked as ephemeral and
we'll queue %q4 and %q5. Under the old system, we'd then visit %q4, which
would become ephemeral, %q1 and then %q0, which would become ephemeral as
well, and now we have a problem. We'd visit %rd, but it would not be marked as
ephemeral because we've not yet visited %q2 and %q3 (because we've not yet
visited %q5).
This will be covered by a test case in a follow-up commit that enables
ephemeral-value awareness in the SLP vectorizer.
llvm-svn: 219815
The CFL-AA implementation was missing a visit* routine for va_arg instructions,
causing it to assert when run on a function that had one. For now, handle these
in a conservative way.
Fixes PR20954.
llvm-svn: 219718
When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.
The test case, and an initial patch, were provided by Philip Reames. Thanks!
llvm-svn: 219688