This change was done as an audit and is by inspection. The new EH
system is still very much a work in progress. NFC for the landingpad
case.
llvm-svn: 243965
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.
llvm-svn: 243585
through APIs that are no longer necessary now that the update API has
been removed.
This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.
llvm-svn: 242882
directly model in the new PM.
This also was an incredibly brittle and expensive update API that was
never fully utilized by all the passes that claimed to preserve AA, nor
could it reasonably have been extended to all of them. Any number of
places add uses of values. If we ever wanted to reliably instrument
this, we would want a callback hook much like we have with ValueHandles,
but doing this for every use addition seems *extremely* expensive in
terms of compile time.
The only user of this update mechanism is GlobalsModRef. The idea of
using this to keep it up to date doesn't really work anyways as its
analysis requires a symmetric analysis of two different memory
locations. It would be very hard to make updates be sufficiently
rigorous to *guarantee* symmetric analysis in this way, and it pretty
certainly isn't true today.
However, folks have been using GMR with this update for a long time and
seem to not be hitting the issues. The reported issue that the update
hook fixes isn't even a problem any more as other changes to
GetUnderlyingObject worked around it, and that issue stemmed from *many*
years ago. As a consequence, a prior patch provided a flag to control
the unsafe behavior of GMR, and this patch removes the update mechanism
that has questionable compile-time tradeoffs and is causing problems
with moving to the new pass manager. Note the lack of test updates --
not one test in tree actually requires this update, even for a contrived
case.
All of this was extensively discussed on the dev list, this patch will
just enact what that discussion decides on. I'm sending it for review in
part to show what I'm planning, and in part to show the *amazing* amount
of work this avoids. Every call to the AA here is something like three
to six indirect function calls, which in the non-LTO pipeline never do
any work! =[
Differential Revision: http://reviews.llvm.org/D11214
llvm-svn: 242605
Sometimes an incidentally created instruction can duplicate a Value used
elsewhere. It then often doesn't end up in the leader table. If it's later
removed, we attempt to remove it from the leader table and segfault.
Instead we should just ignore the removal request, which won't cause any
problems. The reverse situation, where the original instruction is replaced by
the new one (which you might think could leave the leader table empty) cannot
occur, because the incidental instruction will never be found in the first
place.
llvm-svn: 242199
No in-tree alias analysis used this facility, and it was not called in
any particularly rigorous way, so it seems unlikely to be correct.
Note that one of the only stateful AA implementations in-tree,
GlobalsModRef is completely broken currently (and any AA passes like it
are equally broken) because Module AA passes are not effectively
invalidated when a function pass that fails to update the AA stack runs.
Ultimately, it doesn't seem like we know how we want to build stateful
AA, and until then trying to support and maintain correctness for an
untested API is essentially impossible. To that end, I'm planning to rip
out all of the update API. It can return if and when we need it and know
how to build it on top of the new pass manager and as part of *tested*
stateful AA implementations in the tree.
Differential Revision: http://reviews.llvm.org/D10889
llvm-svn: 241975
This previously caused miscompilations as a result of phi nodes receiving
undef incoming values from blocks dominated by such successors.
Differential Revision: http://reviews.llvm.org/D10726
llvm-svn: 240670
We performed a simple, but incomplete, intersection when it came time to
CSE instructions. It didn't handle, for example, the 'exact' flag.
This fixes PR23922.
llvm-svn: 240595
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
Summary:
A side effect of this change is that it IRBuilder now automatically
created debug info locations for new instructions, which is the
same as debug location of insertion point. This is fine for the
functions in questions (GetStoreValueForLoad and
GetMemInstValueForLoad), as they are used in two situations:
* GVN::processLoad, which tries to eliminate a load. In this case
new instructions would have the same debug location as the load they
eventually replace;
* MaterializeAdjustedValue, which adds new instructions to the end
of the basic blocks, which could later be used to replace the load
definition. In this case we don't yet know the way the load would
be eventually replaced (either by assembling the precomputed values
via PHI, or by using them directly), so just using the basic block
strategy seems to be reasonable. There is also a special case
in the code that *would* adjust the location of the last
instruction replacing the load definition to the location of the
load.
Test Plan: regression test suite
Reviewers: echristo, dberlin, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10405
llvm-svn: 239585
Determining proper debug locations for instructions created in
PHITransAddr is tricky. We use a simple approach here and simply copy
debug locations from instructions computing load address to
"corresponding" instructions re-creating the address computation
in predecessor basic blocks.
This may not always be correct, given all the rearrangement and
simplification going on, and debug locations may jump around a lot,
as the basic blocks we copy locations between may be very far from
each other.
Still, this would work good in most simple cases (e.g. when chain
of address computing instruction is short, or our mapping turns out
to be 1-to-1), and we desire to have *some* reasonable debug locations
associated with newly inserted instructions.
See http://reviews.llvm.org/D10351 review thread for more details.
Test Plan: regression test suite
Reviewers: spatel, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10351
llvm-svn: 239479
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.
llvm-svn: 233938
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.
Bug noted and patch suggested by Eli Friedman.
llvm-svn: 230564
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.
This patch disallows propagating zero constants in equality comparisons.
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376
Differential Revision: http://reviews.llvm.org/D7257
llvm-svn: 227491
APIs and replace it and numerous booleans with an option struct.
The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.
The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.
This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.
llvm-svn: 226456
optionally updated by MergeBlockIntoPredecessors.
No functionality changed, just refactoring to clear the way for the new
pass manager.
llvm-svn: 226392
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.
Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.
llvm-svn: 226157
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
doing Load PRE"
It's not really expected to stick around, last time it provoked a weird LTO
build failure that I can't reproduce now, and the bot logs are long gone. I'll
re-revert it if the failures recur.
Original description: Perform Scalar PRE on gep indices that feed loads before
doing Load PRE.
llvm-svn: 225536
Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all.
I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here.
The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores.
llvm-svn: 225481
a cache of assumptions for a single function, and an immutable pass that
manages those caches.
The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.
Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.
For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.
llvm-svn: 225131
doing Load PRE"
This commit updates the failing test in
Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll
The failing test is sensitive to the order in which we process loads. This
version turns on the RPO traversal instead of the while DT traversal in GVN.
The new test code is functionally same just the order of loads that are
eliminated is swapped.
This new version also fixes an issue where GVN splits a critical edge and
potentially invalidate the RPO/DT iterator.
llvm-svn: 222039
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.
As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.
The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.
Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.
This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).
llvm-svn: 217342
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
1. To preserve noalias function attribute information when inlining
2. To provide the ability to model block-scope C99 restrict pointers
Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.
What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:
!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }
Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:
... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }
When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.
Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.
[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]
Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.
llvm-svn: 213864
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).
This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.
No functionality change intended.
llvm-svn: 213859
Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended.
Test Plan: All tests (make check-all) still pass.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4481
llvm-svn: 213474
If both instructions to be replaced are marked invariant the resulting
instruction is invariant.
rdar://13358910
Fix by Erik Eckstein!
llvm-svn: 211801
Enable value forwarding for loads from `calloc()` without an intervening
store.
This change extends GVN to handle the following case:
%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This load is trivially constant zero
%3 = load i32* %2, align 4
This is analogous to the handling for `malloc()` in the same places.
`malloc()` returns `undef`; `calloc()` returns a zero value. Note that
it is correct to return zero even for out of bounds GEPs since the
result of such a GEP would be undefined.
Patch by Philip Reames!
llvm-svn: 210828
address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where
PRE is applied to a load that is not partially redundant.
<rdar://problem/16638765>.
llvm-svn: 207853
definition below all of the header #include lines, lib/Transforms/...
edition.
This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.
Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.
llvm-svn: 206844
Also updated as many loops as I could find using df_begin/idf_begin -
strangely I found no uses of idf_begin. Is that just used out of tree?
Also a few places couldn't use df_begin because either they used the
member functions of the depth first iterators or had specific ordering
constraints (I added a comment in the latter case).
Based on a patch by Jim Grosbach. (Jim - you just had iterator_range<T>
where you needed iterator_range<idf_iterator<T>>)
llvm-svn: 206016
This reverts commit r203553, and follow-up commits r203558 and r203574.
I will follow this up on the mailinglist to do it in a way that won't
cause subtle PRE bugs.
llvm-svn: 205009
After r203553 overflow intrinsics and their non-intrinsic (normal)
instruction get hashed to the same value. This patch prevents PRE from
moving an instruction into a predecessor block, and trying to add a phi
node that gets two different types (the intrinsic result and the
non-intrinsic result), resulting in a failing assert.
llvm-svn: 203574
When an overflow intrinsic is followed by a non-overflow instruction,
replace the latter with an extract. For example:
%sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
%sadd3 = add i32 %a, %b
Here the add statement will be replaced by an extract.
When an overflow intrinsic follows a non-overflow instruction, a clone
of the intrinsic is inserted before the normal instruction, which makes
it the same as the previous case. Subsequent runs of GVN can then clean
up the duplicate instructions and insert the extract.
This fixes PR8817.
llvm-svn: 203553
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
llvm-svn: 203364
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.
llvm-svn: 201827
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
llvm-svn: 200892
can be used by both the new pass manager and the old.
This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.
The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.
Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.
llvm-svn: 199104
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
llvm-svn: 199082
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.
This removes the 'Writer.h' header which contained only a single function
declaration.
llvm-svn: 198836
are part of the core IR library in order to support dumping and other
basic functionality.
Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.
Update all of the #includes to match.
All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.
llvm-svn: 198688
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.
Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.
llvm-svn: 198685
The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
value from dead block to alive join node is certainly ok.)
The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that
split the critial edge needs to be flagged "dead" as well.
There are 3 ways to fix this bug:
1) Get rid of the assertion as I mentioned eariler
2) When an dead edge is split, flag the inserted block "dead".
3) proactively split the critical edges connecting dead and live blocks when
new dead blocks are revealed.
This fix go for 3) with additional 2 LOC.
Testing case was added by Rafael the other day.
llvm-svn: 194424
The problem of r191017 is that when GVN fabricate a val-number for a dead instruction (in order
to make following expr-PRE happy), it forget to fabricate a leader-table entry for it as well.
llvm-svn: 191118
This is how it ignores the dead code:
1) When a dead branch target, say block B, is identified, all the
blocks dominated by B is dead as well.
2) The PHIs of those blocks in dominance-frontier(B) is updated such
that the operands corresponding to dead predecessors are replaced
by "UndefVal".
Using lattice's jargon, the "UndefVal" is the "Top" in essence.
Phi node like this "phi(v1 bb1, undef xx)" will be optimized into
"v1" if v1 is constant, or v1 is an instruction which dominate this
PHI node.
3) When analyzing the availability of a load L, all dead mem-ops which
L depends on disguise as a load which evaluate exactly same value as L.
4) The dead mem-ops will be materialized as "UndefVal" during code motion.
llvm-svn: 191017
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
iteration.
This on step toward non-iterative GVN. My local hack suggests that getting rid
of iteration will speedup GVN by 30%+ on a medium sized input (2k LOC, C++).
I cannot explain why not 2x or more at this moment.
llvm-svn: 181532
This function consists of following steps:
1. Collect dependent memory accesses.
2. Analyze availability.
3. Perform fully redundancy elimination, or
4. Perform PRE, depending on the availability
Step 2, 3 and 4 are now moved to three helper routines.
llvm-svn: 181047
Actually it took me couple of hours trying to make sense of them and
only to find they are dead code. I guess the original author used
"allSingleSucc" to indicate if there are any critial edge emanating
from some blocks, and tried to perform code motion (actually speculation)
in the presence of these critical edges; but later on he/she changed mind
and decided to perform edge-splitting first.
llvm-svn: 180951
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
wrapper returns a vector of integers when passed a vector of pointers) by having
getIntPtrType itself return a vector of integers in this case. Outside of this
wrapper, I didn't find anywhere in the codebase that was relying on the old
behaviour for vectors of pointers, so give this a whirl through the buildbots.
llvm-svn: 166939
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
llvm-svn: 162841
No intended behavior change. This was introduced in r162023. With the fixed
algorithm a Release build of ARMInstPrinter.cpp goes from 16s to 10s on a
2011 MBP.
llvm-svn: 162559
where some fact lake a=b dominates a use in a phi, but doesn't dominate the
basic block itself.
This feature could also be implemented by splitting critical edges, but at least
with the current algorithm reasoning about the dominance directly is faster.
The time for running "opt -O2" in the testcase in pr10584 is 1.003 times slower
and on gcc as a single file it is 1.0007 times faster.
llvm-svn: 162023
This was always part of the VMCore library out of necessity -- it deals
entirely in the IR. The .cpp file in fact was already part of the VMCore
library. This is just a mechanical move.
I've tried to go through and re-apply the coding standard's preferred
header sort, but at 40-ish files, I may have gotten some wrong. Please
let me know if so.
I'll be committing the corresponding updates to Clang and Polly, and
Duncan has DragonEgg.
Thanks to Bill and Eric for giving the green light for this bit of cleanup.
llvm-svn: 159421
- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
llvm-svn: 158919
replacement to make it at least as generic as the instruction being replaced.
This includes:
* dropping nsw/nuw flags
* getting the least restrictive tbaa and fpmath metadata
* merging ranges
Fixes PR12979.
llvm-svn: 157958
leader table. That's because it wasn't expecting instructions to turn up as
leader for a value number that is not its own, but equality propagation could
create this situation. One solution is to have the leader table use a WeakVH
but this slows down GVN by about 5%. Instead just have equality propagation not
add instructions to the leader table, only constants and arguments. In theory
this might cause GVN to run more (each time it changes something it runs again)
but it doesn't seem to occur enough to cause a slow down.
llvm-svn: 157251
CodeGenPrepare sinks compare instructions down to their uses to prevent
live flags and predicate registers across basic blocks.
PRE of a compare instruction prevents that, forcing the i1 compare
result into a general purpose register. That is usually more expensive
than the redundant compare PRE was trying to eliminate in the first
place.
llvm-svn: 153657
dominated by Root, check that B is available throughout the scope. This
is obviously true (famous last words?) given the current logic, but the
check may be helpful if more complicated reasoning is added one day.
llvm-svn: 153323
Renamed methods caseBegin, caseEnd and caseDefault with case_begin, case_end, and case_default.
Added some notes relative to case iterators.
llvm-svn: 152532
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120130/136146.html
Implemented CaseIterator and it solves almost all described issues: we don't need to mix operand/case/successor indexing anymore. Base iterator class is implemented as a template since it may be initialized either from "const SwitchInst*" or from "SwitchInst*".
ConstCaseIt is just a read-only iterator.
CaseIt is read-write iterator; it allows to change case successor and case value.
Usage of iterator allows totally remove resolveXXXX methods. All indexing convertions done automatically inside the iterator's getters.
Main way of iterator usage looks like this:
SwitchInst *SI = ... // intialize it somehow
for (SwitchInst::CaseIt i = SI->caseBegin(), e = SI->caseEnd(); i != e; ++i) {
BasicBlock *BB = i.getCaseSuccessor();
ConstantInt *V = i.getCaseValue();
// Do something.
}
If you want to convert case number to TerminatorInst successor index, just use getSuccessorIndex iterator's method.
If you want initialize iterator from TerminatorInst successor index, use CaseIt::fromSuccessorIndex(...) method.
There are also related changes in llvm-clients: klee and clang.
llvm-svn: 152297
This implicitly fixes a nasty bug in the GVN hashing (that thankfully
could only manifest as a performance bug): actually include the opcode
in the hash. The old code started the hash off with the opcode, but then
overwrote it with the type pointer.
Since this is likely to be pretty hot (GVN being already pretty
expensive) I've included a micro-optimization to just not bother with
the varargs hashing if they aren't present. I can't measure any change
in GVN performance due to this, even with a big test case like Duncan's
sqlite one. Everything I see is in the noise floor. That said, this
closes a loop hole for a potential scaling problem due to collisions if
the opcode were the differentiating aspect of the expression.
llvm-svn: 152025
equalities into phi node operands for which the equality is known to
hold in the incoming basic block. That's because replaceAllDominatedUsesWith
wasn't handling phi nodes correctly in general (that this didn't give wrong
results was just luck: the specific way GVN uses replaceAllDominatedUsesWith
precluded wrong changes to phi nodes).
llvm-svn: 152006
value numbers to be assigned when calculating any particular value number.
Enhance the logic that detects new value numbers to take this into account,
for a tiny compile time speedup. Fix a comment typo while there.
llvm-svn: 151522
%cmp (eg: A==B) we already replace %cmp with "true" under the true edge, and
with "false" under the false edge. This change enhances this to replace the
negated compare (A!=B) with "false" under the true edge and "true" under the
false edge. Reported to improve perlbench results by 1%.
llvm-svn: 151517
logic by half: isOnlyReachableViaThisEdge was trying to be clever and
handle the case of a branch to a basic block which is contained in a
loop. This costs a domtree lookup and is completely useless due to
GVN's position in the pass pipeline: all loops have preheaders at this
point, which means it is enough for isOnlyReachableViaThisEdge to check
that Dst has only one predecessor. (I checked this theoretical argument
by running over the entire nightly testsuite, and indeed it is so!).
llvm-svn: 149838
The purpose of refactoring is to hide operand roles from SwitchInst user (programmer). If you want to play with operands directly, probably you will need lower level methods than SwitchInst ones (TerminatorInst or may be User). After this patch we can reorganize SwitchInst operands and successors as we want.
What was done:
1. Changed semantics of index inside the getCaseValue method:
getCaseValue(0) means "get first case", not a condition. Use getCondition() if you want to resolve the condition. I propose don't mix SwitchInst case indexing with low level indexing (TI successors indexing, User's operands indexing), since it may be dangerous.
2. By the same reason findCaseValue(ConstantInt*) returns actual number of case value. 0 means first case, not default. If there is no case with given value, ErrorIndex will returned.
3. Added getCaseSuccessor method. I propose to avoid usage of TerminatorInst::getSuccessor if you want to resolve case successor BB. Use getCaseSuccessor instead, since internal SwitchInst organization of operands/successors is hidden and may be changed in any moment.
4. Added resolveSuccessorIndex and resolveCaseIndex. The main purpose of these methods is to see how case successors are really mapped in TerminatorInst.
4.1 "resolveSuccessorIndex" was created if you need to level down from SwitchInst to TerminatorInst. It returns TerminatorInst's successor index for given case successor.
4.2 "resolveCaseIndex" converts low level successors index to case index that curresponds to the given successor.
Note: There are also related compatability fix patches for dragonegg, klee, llvm-gcc-4.0, llvm-gcc-4.2, safecode, clang.
llvm-svn: 149481
switch (n) {
case 27:
do_something(x);
...
}
the call do_something(x) will be replaced with do_something(27). In
gcc-as-one-big-file this results in the removal of about 500 lines of
bitcode (about 0.02%), so has about 1/10 of the effect of propagating
branch conditions.
llvm-svn: 141360
branch "br i1 %x, label %if_true, label %if_false" then it replaces
"%x" with "true" in places only reachable via the %if_true arm, and
with "false" in places only reachable via the %if_false arm. Except
that actually it doesn't: if value numbering shows that %y is equal
to %x then, yes, %y will be turned into true/false in this way, but
any occurrences of %x itself are not transformed. Fix this. What's
more, it's often the case that %x is an equality comparison such as
"%x = icmp eq %A, 0", in which case every occurrence of %A that is
only reachable via the %if_true arm can be replaced with 0. Implement
this and a few other variations on this theme. This reduces the number
of lines of LLVM IR in "GCC as one big file" by 0.2%. It has a bigger
impact on Ada code, typically reducing the number of lines of bitcode
by around 0.4% by removing repeated compiler generated checks. Passes
the LLVM nightly testsuite and the Ada ACATS testsuite.
llvm-svn: 141177
it's OK for the false/true destination to have multiple
predecessors as long as the extra ones are dominated by
the branch destination.
llvm-svn: 141176
PRE needs the landing pads to have their critical edges split. Doing this for a
landing pad is non-trivial. Abandon the attempt to perform PRE when we come
across a landing pad. (Reviewed by Owen!)
llvm-svn: 137876
Change various bits of code to make better use of the existing PHINode
API, to insulate them from forthcoming changes in how PHINodes store
their operands.
llvm-svn: 133434
a nice and tidy:
%x1 = load i32* %0, align 4
%1 = icmp eq i32 %x1, 1179403647
br i1 %1, label %if.then, label %if.end
instead of doing lots of loads and branches. May the FreeBSD bootloader
long fit in its allocated space.
llvm-svn: 130416
wider load would allow elimination of subsequent loads, and when the wider
load is still a native integer type. This eliminates a ton of loads on
various benchmarks involving struct fields, though it is somewhat hobbled
by clang not being very aggressive about field alignment.
This is yet another step along the way towards resolving PR6627.
llvm-svn: 130390
return it as a clobber. This allows GVN to do smart things.
Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load. In this case, forward the value. This
allows us to compile stuff like this:
int test(void *P) {
int tmp = *(unsigned int*)P;
return tmp+*((unsigned char*)P+1);
}
into:
_test: ## @test
movl (%rdi), %ecx
movzbl %ch, %eax
addl %ecx, %eax
ret
which has one load. We already handled the case where the smaller
load was from a must-aliased base pointer.
llvm-svn: 130180
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.
Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.
llvm-svn: 124134
phi nodes. It is called from MergeBlockIntoPredecessor which is
called from GVN, which claims to preserve these.
I'm skeptical that this is the actual problem behind PR8954, but
this is a stab in the right direction.
llvm-svn: 123222
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.
llvm-svn: 122378
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed. This led to really bad performance on tall, narrow CFGs.
We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree. Because there are typically few duplicates of a given value, this scan tends to be
quite fast. Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.
This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.
The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack. If you see conditional propagation that's not happening, please file bugs against LVI or CVP.
llvm-svn: 119714
systematically, CollapsePhi will always return null here. Note
that CollapsePhi did an extra check, isSafeReplacement, which
the SimplifyInstruction logic does not do. I think that check
was bogus - I guess we will soon find out! (It was originally
added in commit 41998 without a testcase).
llvm-svn: 119456
"%z = %x and %y". If GVN can prove that %y equals %x, then it turns
this into "%z = %x and %x". With the new code, %z will be replaced
with %x everywhere (and then deleted). Previously %z would be value
numbered too, which is a waste of time. Also, while a clever value
numbering algorithm would give %z the same value number as %x, our
current one doesn't do so (at least I don't think it does). The new
logic has an essentially equivalent effect to what you would get if
%z was given the same value number as %x, i.e. it should make value
numbering smarter. While there, get hold of target data once at the
start rather than a gazillion times all over the place.
llvm-svn: 118923
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334