the function: print more precision XX.X% instead of XX%, and cast to ULL
before scaling by 100/1000 to avoid wrap around for large numbers of queries
(such as occur for 253.perlbmk and 176.gcc)
llvm-svn: 20872
query. If the next mod/ref query happens to be for the same call site
(which is extremely likely), use the cache instead of recomputing the
callee/caller mapping. This makes -aa-eval ***MUCH*** faster with
ds-aa
llvm-svn: 20871
1. If memory never escapes the program, it cannot be mod/ref'd by external
functions.
2. If memory is global never mod/ref'd in the program, it cannot be mod/ref'd
by any call.
llvm-svn: 20867
1. Instead of copying Local graphs to the BU graphs to start with, use
spliceFrom to do the job (which is constant time in this case). On
176.gcc, this chops off .17s from the bu pass.
2. When building SCC graphs, simplify the logic and use spliceFrom to
do the heavy lifting, instead of cloneInto/delete. This slices
another .14s off 176.gcc.
llvm-svn: 20826
based approach to find globals and call sites that need to be copied. This
speeds up the BU pass on 176.gcc from 22s back up to 2.3s. Not as good
as 1.5s, but at least it's correct :)
llvm-svn: 20820
something correct. Unfortunately this takes 176.gcc's BU phase back
up to 29s from 1.5. This fixes DSGraph/2005-03-24-Global-Arg-Alias.ll
llvm-svn: 20817
global roots in from callees to callers. The BU graphs do not have accurate
globals information and all of the clients know it. Instead, just make sure
the GG is up-to-date, and they will be perfectly satiated.
This speeds up the BU pass on 176.gcc from 5.5s to 1.5s, and Loc+BU+TD
from 7s to 2.7s.
llvm-svn: 20786
1. Increase max node size from 64->256 to avoid collapsing an important
structure in 181.mcf
2. If we have multiple calls to an indirect call node with an indirect
callee, fold these call nodes together, to avoid DSA turning apoc into
a flaming fireball of death when analyzing 176.gcc.
With this change, 176.gcc now takes ~7s to analyze for loc+bu+td, with
5.7s of that in the BU pass.
llvm-svn: 20775
this clone is supposed to be used for *ALL* of the functions in the SCC.
This fixes the memory explosion problem the TD pass was having, reducing the
memory growth from 24MB -> 3.5MB on povray and 270MB ->8.3MB on perlbmk!
This obviously also speeds up the TD pass *a lot*.
llvm-svn: 20763
up the TD pass about 30% for povray and perlbmk. It's still not clear why
copying a 5MB set of graphs turns into a 25MB set of graphs though :(
llvm-svn: 20762
sites that target multiple callees. If we have a function table, for
example, with N callees, and M callers call through it, we used to have
to perform O(M*N) graph inlinings. Now we perform O(M+N) inlinings.
This speeds up the td pass on perlbmk from 36.26s to 25.75s.
llvm-svn: 20743
graph into all of the functions it calls when we visit a graph, change it so
that the graph visitor inlines all of the callers of a graph into the current
graph when it visits it.
While we're at it, inline global information from the GG instead of from each
of the callers. The GG contains a superset of the info that the callers do
anyway, and this way we only need to do it one time (not one for each caller).
This speeds up the TD pass substantially on several programs, and there is
still room for improvement. For example, the TD pass used to take 147s
on perlbmk, it now takes 36s. On povray, we went from about 5s to 1.97s.
134.perl is down from ~1s for Loc+BU+TD to .6s.
The TD pass needs a lot of improvement though, which will occur with later
patches.
llvm-svn: 20723
Globals Graph for the local pass, the second is after all of the locals
graphs have been constructed. This allows for many additional global EC's
to be recognized that weren't before. This speeds up analysis of programs
like 177.mesa, where it changes DSA from taking 0.712s to 0.4018s.
llvm-svn: 20711
to tell apart anyway, and only track the leader for of these equivalence
classes in our graphs.
This dramatically reduces the number of GlobalValue*'s that appear in scalar
maps, which A) reduces memory usage, by eliminating many many scalarmap entries
and B) reduces time for operations that need to execute an operation for each
global in the scalar map.
As an example, this reduces the memory used to analyze 176.gcc from 1GB to
511MB, which (while it's still way too much) is better because it doesn't hit
swap anymore. On eon, this shrinks the local graphs from 14MB to 6.8MB,
shrinks the bu+td graphs of povray from 50M to 40M, shrinks the TD graphs of
130.li from 8.8M to 3.6M, etc.
This change also speeds up DSA on large programs where this makes a big
difference. For example, 130.li goes from 1.17s -> 0.56s, 134.perl goes
from 2.14 -> 0.93s, povray goes from 15.63s->7.99s (!!!).
This also apparently either fixes the problem that caused DSA to crash on
perlbmk and gcc, or it hides it, because DSA now works on these. These
both take entirely too much time in the TD pass (147s for perl, 538s for
gcc, vs 7.67/5.9s in the bu pass for either one), but this is a known
problem that I'll deal with later.
llvm-svn: 20696
effect these calls can have is due to global variables, and these passes
all use the globals graph to capture their effect anyway. This speeds up
the BU pass very slightly on perlbmk, reducing the number of dsnodes
allocated from 98913 to 96423.
llvm-svn: 20676
to determine mod/ref behavior, instead of creating a *copy* of the caller
graph and inlining the callee graph into the copy.
This speeds up aa-eval on Ptrdist/yacr2 from 109.13s to 3.98s, and gives
identical results. The speedup is similar on other programs.
llvm-svn: 20669
1. Chain to the parent implementation of M/R analysis if we can't find
any information. It has some heuristics that often do well.
2. Do not clear all flags, this can make invalid nodes by turning nodes
that used to be collapsed into non-collapsed nodes (fixing crashes)
llvm-svn: 20659
{ short, short }
to
short
where the second short maps onto the second field of the first struct. In
this case, the struct index is not aligned, so we should avoid calling
getLink(2), which asserts out.
llvm-svn: 20626
If we fold three constants together (c1+c2+c3), make sure to keep
LHSC updated, instead of reusing (in this case), the 1 instead of the
partial sum.
llvm-svn: 20337
Actually teach dsa about select instructions. This doesn't affect the
graph in any way other than not setting a spurious U marker on pointer
nodes that are selected.
llvm-svn: 20324
infinite loops (using the new replaceSymbolicValuesWithConcrete method).
This patch reverts this patch:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20050131/023830.html
... which was an attempted fix for this problem. Unfortunately, that patch
caused test/Regression/Transforms/IndVarsSimplify/exit_value_tests.llx to fail
and slightly castrated the entire analysis. This patch fixes it right.
This patch is dedicated to jeffc, for making me deal with this. :)
llvm-svn: 20146
into a temporary graph, remember it for later, then inline the tmp graph into
the call site.
In the case where there are other call sites to the same set of functions, this
permits us to just inline the temporary graph instead of all of the callees.
This turns N*M inlining situations into an N+M inlining situation.
llvm-svn: 20036
* Change the FunctionCalls and AuxFunctionCalls vectors into std::lists.
This makes many operations on these lists much more natural, and avoids
*exteremely* expensive copying of DSCallSites (e.g. moving nodes around
between lists, erasing a node from not the end of the vector, etc).
With a profile build of analyze, this speeds up BU DS from 25.14s to
12.59s on 176.gcc. I expect that it would help TD even more, but I don't
have data for it.
This effectively eliminates removeIdenticalCalls and children from the
profile, going from 6.53 to 0.27s.
llvm-svn: 19939
not invalidated on entry and on exit of the block. This fixes some N^2
behavior in common cases, and speeds up gcc another 5% to 22.35s.
llvm-svn: 19910