First, it allows SRA of globals that have embedded arrays, implementing
GlobalOpt/globalsra-partial.llx. This comes up infrequently, but does allow,
for example, deleting several stores to dead parts of globals in dhrystone.
Second, this implements GlobalOpt/malloc-promote-*.llx, which is the
following nifty transformation:
Basically if a global pointer is initialized with malloc, and we can tell
that the program won't notice, we transform this:
struct foo *FooPtr;
...
FooPtr = malloc(sizeof(struct foo));
...
FooPtr->A FooPtr->B
Into:
struct foo FooPtrBody;
...
FooPtrBody.A FooPtrBody.B
This comes up occasionally, for example, the 'disp' global in 183.equake (where
the xform speeds the CBE version of the program up from 56.16s to 52.40s (7%)
on apoc), and the 'desired_accept', 'fixLRBT', 'macroArray', & 'key_queue'
globals in 300.twolf (speeding it up from 22.29s to 21.55s (3.4%)).
The nice thing about this xform is that it exposes the resulting global to
global variable optimization and makes alias analysis easier in addition to
eliminating a few loads.
llvm-svn: 16916
still optimize away all of the indirect calls and loads, etc from it.
This turns code like this:
if (G != 0)
G();
into
if (G != 0)
ActualCallee();
This triggers a couple of times in gcc and libstdc++.
llvm-svn: 16901
stored to, but are stored at variable indexes. This occurs at least in
176.gcc, but probably others, and we should handle it for completeness.
llvm-svn: 16876
has a large number of users. Instead, just keep track of whether we're
making changes as we do so.
This patch has no functionlity changes.
llvm-svn: 16874
we know that all uses of the global will trap if the pointer contained is
null. In this case, we forward substitute the stored value to any uses.
This has the effect of devirtualizing trivial globals in trivial cases. For
example, 164.gzip contains this:
gzip.h:extern int (*read_buf) OF((char *buf, unsigned size));
bits.c: read_buf = file_read;
deflate.c: lookahead = read_buf((char*)window,
deflate.c: n = read_buf((char*)window+strstart+lookahead, more);
Since read_buf has to point to file_read at every use, we just replace
the calls through read_buf with a direct call to file_read.
This occurs in several benchmarks, including 176.gcc and 164.gzip. Direct
calls are good and stuff.
llvm-svn: 16871
* Do not lead dangling dead constants prevent optimization
* Iterate global optimization while we're making progress.
These changes allow us to be more aggressive, handling cases like
GlobalOpt/iterate.llx without a problem (turning it into 'ret int 0').
llvm-svn: 16857
optimizations to trigger much more often. This allows the elimination of
several dozen more global variables in Programs/External. Note that we only
do this for non-constant globals: constant globals will already be optimized
out if the accesses to them permit it.
This implements Transforms/GlobalOpt/globalsra.llx
llvm-svn: 16842
* Instead of handling dead functions specially, just nuke them.
* Be more aggressive about cleaning up after constification, in
particular, handle getelementptr instructions and constantexprs.
* Be a little bit more structured about how we process globals.
*** Delete globals that are only stored to, and never read. These are
clearly not useful, so they should go. This implements deadglobal.llx
This last one triggers quite a few times. In particular, 2208 in the
external tests, 1865 of which are in 252.eon. This shrinks eon from
1995094 to 1732341 bytes of bytecode.
llvm-svn: 16802
simplifications of the resultant program to avoid making later passes
do it all.
This allows us to constify globals that just have the same constant that
they are initialized stored into them.
Suprisingly this comes up ALL of the freaking time, dozens of times in
SPEC, 30 times in vortex alone.
For example, on 256.bzip2, it allows us to constify these two globals:
%smallMode = internal global ubyte 0 ; <ubyte*> [#uses=8]
%verbosity = internal global int 0 ; <int*> [#uses=49]
Which (with later optimizations) results in the bytecode file shrinking
from 82286 to 69686 bytes! Lets hear it for IPO :)
For the record, it's nuking lots of "if (verbosity > 2) { do lots of stuff }"
code.
llvm-svn: 16793
a function being deleted. Due to optimizations done while inlining, there
can be edges from the external call node to a function node that were not
apparent any longer.
This fixes the compiler crash while compiling 175.vpr
llvm-svn: 16399
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
instructions in the body of the function (not the entry block). This fixes
test/Programs/SingleSource/Regression/C/2004-08-12-InlinerAndAllocas.c
and test/Programs/External/SPEC/CINT2000/176.gcc on zion.
This should obviously be pulled into 1.3.
llvm-svn: 15684
dangling constant users were removed from a function, causing it to be dead,
we never removed the call graph edge from the external node to the function.
In most cases, this didn't cause a problem (by luck). This should definitely
go into 1.3
llvm-svn: 15570
Eventually it would be nice if CallGraph maintained an ilist of CallGraphNode's instead
of a vector of pointers to them, but today is not that day.
llvm-svn: 13100
Now we collect all of the call sites we are interested in inlining, then inline
them. This entirely avoids issues with trying to inline a call site we got by
inlining another call site. This also eliminates iterator invalidation issues.
llvm-svn: 12770
extracted, and a function that contained a single top-level loop never had
the loop extracted, regardless of how much non-loop code there was.
llvm-svn: 12403
* Be a lot more accurate about what the effects will be when inlining a call
to a function when an argument is an alloca.
* Dramatically reduce the penalty for inlining a call in a large function.
This heuristic made it almost impossible to inline a function into a large
function, no matter how small the callee is.
llvm-svn: 12363
This allows pointers to aggregate objects, whose elements are only read, to
be promoted and passed in by element instead of by reference. This can
enable a LOT of subsequent optimizations in the caller function.
It's worth pointing out that this stuff happens a LOT of C++ programs, because
objects in templates are generally passed around by reference. When these
templates are instantiated on small aggregate or scalar types, however, it is
more efficient to pass them in by value than by reference.
This transformation triggers most on C++ codes (e.g. 334 times on eon), but
does happen on C codes as well. For example, on mesa it triggers 72 times,
and on gcc it triggers 35 times. this is amazingly good considering that
we are using 'basicaa' so far.
llvm-svn: 12202
assume that if they don't intend to write to a global variable, that they
would mark it as constant. However, there are people that don't understand
that the compiler can do nice things for them if they give it the information
it needs.
This pass looks for blatently obvious globals that are only ever read from.
Though it uses a trivially simple "alias analysis" of sorts, it is still able
to do amazing things to important benchmarks. 253.perlbmk, for example,
contains several ***GIANT*** function pointer tables that are not marked
constant and should be. Marking them constant allows the optimizer to turn
a whole bunch of indirect calls into direct calls. Note that only a link-time
optimizer can do this transformation, but perlbmk does have several strings
and other minor globals that can be marked constant by this pass when run
from GCCAS.
176.gcc has a ton of strings and large tables that are marked constant, both
at compile time (38 of them) and at link time (48 more). Other benchmarks
give similar results, though it seems like big ones have disproportionally
more than small ones.
This pass is extremely quick and does good things. I'm going to enable it
in gccas & gccld. Not bad for 50 SLOC.
llvm-svn: 11836
* Make the cost metric for passing constants in as arguments to functions MUCH
more accurate, by actually estimating the amount of code that will be constant
propagated away.
llvm-svn: 10136
* Implement FuncResolve/2003-11-20-BogusResolveWarning.ll
... which eliminates a large number of annoying warnings. I know misha
will miss them though!
llvm-svn: 10123
pool allocator no end of trouble, and doesn't make a lot of sense anyway. This
does not solve the problem with mutually recursive functions, but they are much less common.
llvm-svn: 9828
as well as arguments. Now it can delete arguments and return values which are
only passed into other arguments or are returned, if they are dead. This causes
it to delete several hundred extra args/retvals from the C++ hello world program,
shrinking it by about 2K.
llvm-svn: 9398
Only transform call sites in a setjmp'ing function which are reachable from
the setjmp. If the call dominates the setjmp (for example), the called
function cannot longjmp to the setjmp.
This dramatically reduces the number of invoke instructions created in some
large testcases.
llvm-svn: 9066
of callees between executions.
On eon, in release mode, this changes the inliner from taking 11.5712s
to taking 2.2066s. In debug mode, it went from taking 14.4148s to
taking 7.0745s. In release mode, this is a 24.7% speedup of gccas, in
debug mode, it's a total speedup of 11.7%.
This also makes it slightly more aggressive. This could be because we
are not judging the size of the functions quite as accurately as before.
When we start looking at the performance of the generated code, this can
be investigated further.
llvm-svn: 8893