pounding on Loop::contains (which is O(n) in the size of the loop), use a
sorted vector, which is O(log(N)) for each query. This speeds up Duraid's
horrible testcase from ~72s to ~31s in a debug build.
llvm-svn: 29645
target CG node. This allows the inliner to properly update the callgraph
when using the pruning inliner. The pruning inliner may not copy over all
call sites from a callee to a caller, so the edges corresponding to those
call sites should not be copied over either.
This fixes PR827 and Transforms/Inline/2006-07-12-InlinePruneCGUpdate.ll
llvm-svn: 29120
causes the pointer to be removed from the underlying alias analysis
implementation as well. This impl of remove is also significantly faster than
the old one. This fixes:
Regression/Transforms/DeadStoreElimination/2006-06-27-AST-Remove.ll
llvm-svn: 28950
Refactor the Graph writing code to use a common implementation which is
now in lib/Support/GraphWriter.cpp. This completes the PR.
Patch by Anton Korobeynikov. Thanks, Anton!
llvm-svn: 28925
1. Fix the macros in IncludeFile.h to put everything in the llvm namespace
2. Replace the previous explicit mechanism in all the .h and .cpp files
with the macros in IncludeFile.h
This gets us a consistent mechanism throughout LLVM for ensuring linkage.
Next step is to make sure its used in enough places.
llvm-svn: 28715
Break the "IncludeFile" mechanism into its own header file and adjust other
files accordingly. Use this facility for the IntrinsicInst problem which
was the subject of PR800.
More to follow on this.
llvm-svn: 28709
IncludeFile hack to ensure linkage of analysis passes. This works around
some -pedantic warnings about assigning an object to a function.
llvm-svn: 28621
uses DSA to make find targets of calls. It provides a very convinient
interface to DSA results to do things with indirect calls, such as
write a devirtualizer (which I have and may commit one of these days).
llvm-svn: 28545
and the offset lands at a field boundary in the old type, construct a new type,
copying the fields masked by the offset from the old type, and unify with that.
llvm-svn: 26775
set construction, rather than intersecting various std::sets. This reduces
the memory usage for the testcase in PR681 from 496 to 26MB of ram on my
darwin system, and reduces the runtime from 32.8 to 0.8 seconds on a
2.5GHz G5. This also enables future code sharing between Dom and PostDom
now that they share near-identical implementations.
llvm-svn: 26707
don't assume that A[1][0] and A[0][i] can't alias. "i" might be out of
range, or even negative. This fixes a miscompilation of 188.ammp (which
does bad pointer tricks) with the new CFE.
Testcase here: Analysis/BasicAA/2006-03-03-BadArraySubscript.ll
llvm-svn: 26515
This patch is an incremental step towards supporting a flat symbol table.
It de-overloads the intrinsic functions by providing type-specific intrinsics
and arranging for automatically upgrading from the old overloaded name to
the new non-overloaded name. Specifically:
llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64
llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64
llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64
llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64
llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64
New code should not use the overloaded intrinsic names. Warnings will be
emitted if they are used.
llvm-svn: 25366
the rough idea sketched out in http://nondot.org/sabre/LLVMNotes/CallGraphClass.txt,
allowing new spiffy implementations of the callgraph interface to be built.
Many thanks to Saem Ghani for contributing this!
llvm-svn: 24944
When inserting code for an addrec expression with a non-unit stride, be
more careful where we insert the multiply. In particular, insert the multiply
in the outermost loop we can, instead of the requested insertion point.
This allows LSR to notice the mul in the right loop, reducing it when it gets
to it. This allows it to reduce the multiply, where before it missed it.
This happens quite a bit in the test suite, for example, eliminating 2
multiplies in art, 3 in ammp, 4 in apsi, reducing from 1050 multiplies to
910 muls in galgel (!), from 877 to 859 in applu, and 36 to 30 in bzip2.
This speeds up galgel from 16.45s to 16.01s, applu from 14.21 to 13.94s and
fourinarow from 66.67s to 63.48s.
This implements Transforms/LoopStrengthReduce/nested-reduce.ll
llvm-svn: 24102
into the LLVMAnalysis library.
This allows LLVMTranform and LLVMTransformUtils to be archives and linked
with LLVMAnalysis.a, which provides any missing definitions.
llvm-svn: 24036
to std::set<std::pair<Inst,Func>> to avoid duplicate entries.
This speeds up the CompleteBU pass from 1.99s to .15s on povray and the
eqgraph passes from 1.5s to .16s on the same.
llvm-svn: 21031
the function: print more precision XX.X% instead of XX%, and cast to ULL
before scaling by 100/1000 to avoid wrap around for large numbers of queries
(such as occur for 253.perlbmk and 176.gcc)
llvm-svn: 20872
query. If the next mod/ref query happens to be for the same call site
(which is extremely likely), use the cache instead of recomputing the
callee/caller mapping. This makes -aa-eval ***MUCH*** faster with
ds-aa
llvm-svn: 20871
1. If memory never escapes the program, it cannot be mod/ref'd by external
functions.
2. If memory is global never mod/ref'd in the program, it cannot be mod/ref'd
by any call.
llvm-svn: 20867
1. Instead of copying Local graphs to the BU graphs to start with, use
spliceFrom to do the job (which is constant time in this case). On
176.gcc, this chops off .17s from the bu pass.
2. When building SCC graphs, simplify the logic and use spliceFrom to
do the heavy lifting, instead of cloneInto/delete. This slices
another .14s off 176.gcc.
llvm-svn: 20826
based approach to find globals and call sites that need to be copied. This
speeds up the BU pass on 176.gcc from 22s back up to 2.3s. Not as good
as 1.5s, but at least it's correct :)
llvm-svn: 20820
something correct. Unfortunately this takes 176.gcc's BU phase back
up to 29s from 1.5. This fixes DSGraph/2005-03-24-Global-Arg-Alias.ll
llvm-svn: 20817
global roots in from callees to callers. The BU graphs do not have accurate
globals information and all of the clients know it. Instead, just make sure
the GG is up-to-date, and they will be perfectly satiated.
This speeds up the BU pass on 176.gcc from 5.5s to 1.5s, and Loc+BU+TD
from 7s to 2.7s.
llvm-svn: 20786
1. Increase max node size from 64->256 to avoid collapsing an important
structure in 181.mcf
2. If we have multiple calls to an indirect call node with an indirect
callee, fold these call nodes together, to avoid DSA turning apoc into
a flaming fireball of death when analyzing 176.gcc.
With this change, 176.gcc now takes ~7s to analyze for loc+bu+td, with
5.7s of that in the BU pass.
llvm-svn: 20775
this clone is supposed to be used for *ALL* of the functions in the SCC.
This fixes the memory explosion problem the TD pass was having, reducing the
memory growth from 24MB -> 3.5MB on povray and 270MB ->8.3MB on perlbmk!
This obviously also speeds up the TD pass *a lot*.
llvm-svn: 20763
up the TD pass about 30% for povray and perlbmk. It's still not clear why
copying a 5MB set of graphs turns into a 25MB set of graphs though :(
llvm-svn: 20762
sites that target multiple callees. If we have a function table, for
example, with N callees, and M callers call through it, we used to have
to perform O(M*N) graph inlinings. Now we perform O(M+N) inlinings.
This speeds up the td pass on perlbmk from 36.26s to 25.75s.
llvm-svn: 20743
graph into all of the functions it calls when we visit a graph, change it so
that the graph visitor inlines all of the callers of a graph into the current
graph when it visits it.
While we're at it, inline global information from the GG instead of from each
of the callers. The GG contains a superset of the info that the callers do
anyway, and this way we only need to do it one time (not one for each caller).
This speeds up the TD pass substantially on several programs, and there is
still room for improvement. For example, the TD pass used to take 147s
on perlbmk, it now takes 36s. On povray, we went from about 5s to 1.97s.
134.perl is down from ~1s for Loc+BU+TD to .6s.
The TD pass needs a lot of improvement though, which will occur with later
patches.
llvm-svn: 20723
Globals Graph for the local pass, the second is after all of the locals
graphs have been constructed. This allows for many additional global EC's
to be recognized that weren't before. This speeds up analysis of programs
like 177.mesa, where it changes DSA from taking 0.712s to 0.4018s.
llvm-svn: 20711
to tell apart anyway, and only track the leader for of these equivalence
classes in our graphs.
This dramatically reduces the number of GlobalValue*'s that appear in scalar
maps, which A) reduces memory usage, by eliminating many many scalarmap entries
and B) reduces time for operations that need to execute an operation for each
global in the scalar map.
As an example, this reduces the memory used to analyze 176.gcc from 1GB to
511MB, which (while it's still way too much) is better because it doesn't hit
swap anymore. On eon, this shrinks the local graphs from 14MB to 6.8MB,
shrinks the bu+td graphs of povray from 50M to 40M, shrinks the TD graphs of
130.li from 8.8M to 3.6M, etc.
This change also speeds up DSA on large programs where this makes a big
difference. For example, 130.li goes from 1.17s -> 0.56s, 134.perl goes
from 2.14 -> 0.93s, povray goes from 15.63s->7.99s (!!!).
This also apparently either fixes the problem that caused DSA to crash on
perlbmk and gcc, or it hides it, because DSA now works on these. These
both take entirely too much time in the TD pass (147s for perl, 538s for
gcc, vs 7.67/5.9s in the bu pass for either one), but this is a known
problem that I'll deal with later.
llvm-svn: 20696
effect these calls can have is due to global variables, and these passes
all use the globals graph to capture their effect anyway. This speeds up
the BU pass very slightly on perlbmk, reducing the number of dsnodes
allocated from 98913 to 96423.
llvm-svn: 20676
to determine mod/ref behavior, instead of creating a *copy* of the caller
graph and inlining the callee graph into the copy.
This speeds up aa-eval on Ptrdist/yacr2 from 109.13s to 3.98s, and gives
identical results. The speedup is similar on other programs.
llvm-svn: 20669
1. Chain to the parent implementation of M/R analysis if we can't find
any information. It has some heuristics that often do well.
2. Do not clear all flags, this can make invalid nodes by turning nodes
that used to be collapsed into non-collapsed nodes (fixing crashes)
llvm-svn: 20659
{ short, short }
to
short
where the second short maps onto the second field of the first struct. In
this case, the struct index is not aligned, so we should avoid calling
getLink(2), which asserts out.
llvm-svn: 20626
If we fold three constants together (c1+c2+c3), make sure to keep
LHSC updated, instead of reusing (in this case), the 1 instead of the
partial sum.
llvm-svn: 20337
Actually teach dsa about select instructions. This doesn't affect the
graph in any way other than not setting a spurious U marker on pointer
nodes that are selected.
llvm-svn: 20324
infinite loops (using the new replaceSymbolicValuesWithConcrete method).
This patch reverts this patch:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20050131/023830.html
... which was an attempted fix for this problem. Unfortunately, that patch
caused test/Regression/Transforms/IndVarsSimplify/exit_value_tests.llx to fail
and slightly castrated the entire analysis. This patch fixes it right.
This patch is dedicated to jeffc, for making me deal with this. :)
llvm-svn: 20146
into a temporary graph, remember it for later, then inline the tmp graph into
the call site.
In the case where there are other call sites to the same set of functions, this
permits us to just inline the temporary graph instead of all of the callees.
This turns N*M inlining situations into an N+M inlining situation.
llvm-svn: 20036
* Change the FunctionCalls and AuxFunctionCalls vectors into std::lists.
This makes many operations on these lists much more natural, and avoids
*exteremely* expensive copying of DSCallSites (e.g. moving nodes around
between lists, erasing a node from not the end of the vector, etc).
With a profile build of analyze, this speeds up BU DS from 25.14s to
12.59s on 176.gcc. I expect that it would help TD even more, but I don't
have data for it.
This effectively eliminates removeIdenticalCalls and children from the
profile, going from 6.53 to 0.27s.
llvm-svn: 19939
not invalidated on entry and on exit of the block. This fixes some N^2
behavior in common cases, and speeds up gcc another 5% to 22.35s.
llvm-svn: 19910
will cause us to miss cases where the input pointer to a load could be value
numbered to another load. Something like this:
%X = load int* %P1
%Y = load int* %P2
Those are obviously the same if P1/P2 are the same. The code this patch
removes attempts to handle that. However, since GCSE iterates, this doesn't
actually buy us anything: GCSE will first replace P1 or P2 with the other
one, then the load can be value numbered as equal.
Removing this code speeds up gcse a lot. On 176.gcc in debug mode, this
speeds up gcse from 29.08s -> 25.73s, a 13% savings.
llvm-svn: 19906
If needed, this can be resurrected from CVS.
Note that several of the interfaces (e.g. the IPModRef ones) are supersumed
by generic AliasAnalysis interfaces that have been written since this code
was developed (and they are not DSA specific).
llvm-svn: 19864
a function call at the core of the loop inline and removing unused
stack variables from an often called function. This doesn't improve things
much, the real saving will be by reducing the number of calls to this
function (100K+ when linking kimwitu++).
llvm-svn: 19119
1. Calls to external global VARIABLES should not be treated as a call to an
external function
2. Efficiently deleting an element from a vector by using std::swap with
the back, then pop_back is NOT a good way to keep the vector sorted.
3. Our hope of having stuff get deleted by making them redundant just won't
work. In particular, if we have three calls in sequence that should be
merged: A, B, C first we unify B into A. To be sure that they appeared
identical (so B would be erased) we set B = A. On the next step, we
unified C into A and set C = A. Unfortunately, this is no guarantee that
C = B, so we would fail to delete the dead call. Switch to a more
explicit scheme.
llvm-svn: 17357
* change some uses of NH.getNode() in a bool context to use !NH.isNull()
* Fix a bunch of places where we depended on the (undefined) order of
evaluation of arguments to function calls to ensure that getNode() was
called before getOffset(). In practice, this was NOT happening.
llvm-svn: 17354
to go in. This patch allows us to compute the trip count of loops controlled
by values loaded from constant arrays. The cannonnical example of this is
strlen when passed a constant argument:
for (int i = 0; "constantstring"[i]; ++i) ;
return i;
In this case, it will compute that the loop executes 14 times, which means
that the exit value of i is 14. Because of this, the loop gets DCE'd and
we are happy. This also applies to anything that does similar things, e.g.
loops like this:
const float Array[] = { 0.1, 2.1, 3.2, 23.21 };
for (int i = 0; Array[i] < 20; ++i)
and is actually fairly general.
The problem with this is that it almost never triggers. The reason is that
we run indvars and the loop optimizer only at compile time, which is before
things like strlen and strcpy have been inlined into the program from libc.
Because of this, it almost never is used (it triggers twice in specint2k).
I'm committing it because it DOES work, may be useful in the future, and
doesn't slow us down at all. If/when we start running the loop optimizer
at link-time (-O4?) this will be very nice indeed :)
llvm-svn: 16926