I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.
llvm-svn: 201827
As defined in LangRef, aliases do not have sections. However, LLVM's
GlobalAlias class inherits from GlobalValue, which means we can read and
set its section. We should probably ban that as a separate change,
since it doesn't make much sense for an alias to have a section that
differs from its aliasee.
Fixes PR18757, where the section was being lost on the global in code
from Clang like:
extern "C" {
__attribute__((used, section("CUSTOM"))) static int in_custom_section;
}
Reviewers: rafael.espindola
Differential Revision: http://llvm-reviews.chandlerc.com/D2758
llvm-svn: 201286
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.
As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.
r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.
The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.
Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.
llvm-svn: 200898
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
llvm-svn: 200892
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.
llvm-svn: 200886
No functional change. Updated loops from:
for (I = scc_begin(), E = scc_end(); I != E; ++I)
to:
for (I = scc_begin(); !I.isAtEnd(); ++I)
for teh win.
llvm-svn: 200789
It disturbs the layout of the parameters in memory and registers,
leading to problems in the backend.
The plan for optimizing internal inalloca functions going forward is to
essentially SROA the argument memory and demote any captured arguments
(things that aren't trivially written by a load or store) to an indirect
pointer to a static alloca.
llvm-svn: 200717
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.
I added tests for any change that would allow an optimization to fire on
inalloca.
Reviewers: nlewycky
Differential Revision: http://llvm-reviews.chandlerc.com/D2449
llvm-svn: 200281
Argument promotion can replace an argument of a call with an alloca. This
requires clearing the tail marker as it is very likely that the callee is now
using an alloca in the caller.
This fixes pr14710.
llvm-svn: 199909
Reapply r199191, reverted in r199197 because it carelessly broke
Other/link-opts.ll. The problem was that calling
createInternalizePass("main") would select
createInternalizePass(bool("main")) instead of
createInternalizePass(ArrayRef<const char *>("main")). This commit
fixes the bug.
The original commit message follows.
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.
This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker. This puts the onus on the
linker to decide whether (and what) to internalize.
In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.
This patch enables three strategies:
- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
visibility.
LTO_INTERNALIZE_FULL should be used when linking an executable.
Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized. E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise. However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.
lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
<rdar://problem/14334895>
llvm-svn: 199244
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199218
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199204
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.
This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker. This puts the onus on the
linker to decide whether (and what) to internalize.
In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.
This patch enables three strategies:
- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
visibility.
LTO_INTERNALIZE_FULL should be used when linking an executable.
Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized. E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise. However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.
lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
<rdar://problem/14334895>
llvm-svn: 199191
can be used by both the new pass manager and the old.
This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.
The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.
Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.
llvm-svn: 199104
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
llvm-svn: 199082
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.
Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.
llvm-svn: 198685
GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage
constant users to be visited. The pointers in this array need to be weak
handles because when we delete a constant array, we may also be holding a
pointer to one of its elements (or an element of one of its elements if we're
dealing with an array of arrays) in the worklist.
Fixes PR17347.
llvm-svn: 197178
The barrier pass is a temporary hack, and should go away soon. Nevertheless, if
we don't initialize it, then opt will not understand -barrier, and this will
break bugpoint (because when it dumps the passes from the default pass manager
-barrier will be there).
llvm-svn: 197177
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.
This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.
The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.
Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.
llvm-svn: 196537
Short description.
This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:
void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)
foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.
As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.
The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.
llvm-svn: 195769
CallGraph.
This makes the CallGraph a totally generic analysis object that is the
container for the graph data structure and the primary interface for
querying and manipulating it. The pass logic is separated into its own
class. For compatibility reasons, the pass provides wrapper methods for
most of the methods on CallGraph -- they all just forward.
This will allow the new pass manager infrastructure to provide its own
analysis pass that constructs the same CallGraph object and makes it
available. The idea is that in the new pass manager, the analysis pass's
'run' method returns a concrete analysis 'result'. Here, that result is
a 'CallGraph'. The 'run' method will typically do only minimal work,
deferring much of the work into the implementation of the result object
in order to be lazy about computing things, but when (like DomTree)
there is *some* up-front computation, the analysis does it prior to
handing the result back to the querying pass.
I know some of this is fairly ugly. I'm happy to change it around if
folks can suggest a cleaner interim state, but there is going to be some
amount of unavoidable ugliness during the transition period. The good
thing is that this is very limited and will naturally go away when the
old pass infrastructure goes away. It won't hang around to bother us
later.
Next up is the initial new-PM-style call graph analysis. =]
llvm-svn: 195722
We can share the implementation between StripSymbols and dropping debug info
for metadata versions that do not match.
Also update the comments to match the implementation. A follow-on patch will
drop the "Debug Info Version" module flag in StripDebugInfo.
llvm-svn: 195505
The fix is simply to use CurI instead of I when handling aliases to
avoid accessing a invalid iterator.
original message:
Convert linkonce* to weak* instead of strong.
Also refactor the logic into a helper function. This is an important improve
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.
llvm-svn: 195477
Also refactor the logic into a helper function. This is an important improvement
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.
llvm-svn: 195470
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.
llvm-svn: 194966
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:
for (int i = 0; i < 3200; i += 5) {
a[i] += alpha * b[i];
a[i + 1] += alpha * b[i + 1];
a[i + 2] += alpha * b[i + 2];
a[i + 3] += alpha * b[i + 3];
a[i + 4] += alpha * b[i + 4];
}
and turn them into this:
for (int i = 0; i < 3200; ++i) {
a[i] += alpha * b[i];
}
and loops like this:
for (int i = 0; i < 500; ++i) {
x[3*i] = foo(0);
x[3*i+1] = foo(0);
x[3*i+2] = foo(0);
}
and turn them into this:
for (int i = 0; i < 1500; ++i) {
x[i] = foo(0);
}
There are two motivations for this transformation:
1. Code-size reduction (especially relevant, obviously, when compiling for
code size).
2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.
The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.
This pass is not currently enabled by default at any optimization level.
llvm-svn: 194939
We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.
The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.
PR17906
llvm-svn: 194846
Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.
Fixes pr17815.
Patch by Chris Smowton!
llvm-svn: 194506
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
files.
* The linker tells LLVM which symbols are not used from other object files,
but will be put in the dso symbol table if present.
GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.
LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.
This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.
llvm-svn: 193800
Major steps include:
1). introduces a not-addr-taken bit-field in GlobalVariable
2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable
dosen't have its address taken.
3). AA use this info for disambiguation.
llvm-svn: 193251
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.
This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.
llvm-svn: 193090
If a function seen at compile time is not necessarily the one linked to
the binary being built, it is illegal to change the actual arguments
passing to it.
e.g.
--------------------------
void foo(int lol) {
// foo() has linkage satisifying isWeakForLinker()
// "lol" is not used at all.
}
void bar(int lo2) {
// xform to foo(undef) is illegal, as compiler dose not know which
// instance of foo() will be linked to the the binary being built.
foo(lol2);
}
-----------------------------
Such functions can be captured by isWeakForLinker(). NOTE that
mayBeOverridden() is insufficient for this purpose as it dosen't include
linkage types like AvailableExternallyLinkage and LinkOnceODRLinkage.
Take link_odr* as an example, it indicates a set of *EQUIVALENT* globals
that can be merged at link-time. However, the semantic of
*EQUIVALENT*-functions includes parameters. Changing parameters breaks
the assumption.
Thank John McCall for help, especially for the explanation of subtle
difference between linkage types.
rdar://11546243
llvm-svn: 192302
Generalize the API so we can distinguish symbols that are needed just for a DSO
symbol table from those that are used from some native .o.
The symbols that are only wanted for the dso symbol table can be dropped if
llvm can prove every other dso has a copy (linkonce_odr) and the address is not
important (unnamed_addr).
llvm-svn: 191922
This makes using array_pod_sort significantly safer. The implementation relies
on function pointer casting but that should be safe as we're dealing with void*
here.
llvm-svn: 191175
LLVM IR doesn't currently allow atomic bool load/store operations, and the
transformation is dubious anyway because it isn't profitable on all platforms.
PR17163.
llvm-svn: 190357
This reverts commit r189886.
I found a corner case where this optimization is not valid:
Say we have a "linkonce_odr unnamed_addr" in two translation units:
* In TU 1 this optimization kicks in and makes it hidden.
* In TU 2 it gets const merged with a constant that is *not* unnamed_addr,
resulting in a non unnamed_addr constant with default visibility.
* The static linker rules for combining visibility them produce a hidden
symbol, which is incorrect from the point of view of the non unnamed_addr
constant.
The one place we can do this is when we know that the symbol is not used from
another TU in the same shared object, i.e., during LTO. I will move it there.
llvm-svn: 189954
Original message:
If a constant or a function has linkonce_odr linkage and unnamed_addr, mark
hidden. Being linkonce_odr guarantees that it is available in every dso that
needs it. Being a constant/function with unnamed_addr guarantees that the
copies don't have to be merged.
llvm-svn: 189886
This patch changes the default setting for the LateVectorization flag that controls where the loop-vectorizer is ran.
Perf gains:
SingleSource/Benchmarks/Shootout/matrix -37.33%
MultiSource/Benchmarks/PAQ8p/paq8p -22.83%
SingleSource/Benchmarks/Linpack/linpack-pc -16.22%
SingleSource/Benchmarks/Shootout-C++/ary3 -15.16%
MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt -10.34%
MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -7.12%
Regressions:
SingleSource/Benchmarks/Misc/lowercase 15.10%
MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt 13.18%
SingleSource/Benchmarks/Shootout-C++/matrix 8.27%
SingleSource/Benchmarks/CoyoteBench/lpbench 7.30%
llvm-svn: 189858
1. They are a kind of cannonicalization.
2. The performance measurements show that it is better to keep them in.
There should be no functional change if you are not enabling the LateVectorization mode.
llvm-svn: 189539
When unrolling is disabled in the pass manager, the loop vectorizer should also
not unroll loops. This will allow the -fno-unroll-loops option in Clang to
behave as expected (even for vectorizable loops). The loop vectorizer's
-force-vector-unroll option will (continue to) override the pass-manager
setting (including -force-vector-unroll=0 to force use of the internal
auto-selection logic).
In order to test this, I added a flag to opt (-disable-loop-unrolling) to force
disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also,
this fixes a small bug in opt where the loop vectorizer was enabled only after
the pass manager populated the queue of passes (the global_alias.ll test needed
a slight update to the RUN line as a result of this fix).
llvm-svn: 189499
The current version of StripDeadDebugInfo became stale and no longer actually
worked since it was expecting an older version of debug info.
This patch updates it to use DebugInfoFinder and the modern DebugInfo classes as
much as possible to make it more redundent to such changes. Additionally, the
only place where that was avoided (the code where we replace the old sets with
the new), I call verify on the DIContextUnit implying that if the format changes
and my live set changes no longer make sense an assert will be hit. In order to
ensure that that occurs I have included a test case.
The actual stripping of the dead debug info follows the same strategy as was
used before in this class: find the live set and replace the old set in the
given compile unit (which may contain dead global variables/functions) with the
new live one.
llvm-svn: 189078
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
The language reference says that:
"If a symbol appears in the @llvm.used list, then the compiler,
assembler, and linker are required to treat the symbol as if there is
a reference to the symbol that it cannot see"
Since even the linker cannot see the reference, we must assume that
the reference can be using the symbol table. For example, a user can add
__attribute__((used)) to a debug helper function like dump and use it from
a debugger.
llvm-svn: 187103
GlobalOpt simplifies llvm.compiler.used by removing any members that are also
in the more strict llvm.used. Handle the special case where llvm.compiler.used
becomes empty.
llvm-svn: 186778
We were incorrectly using compiler_used instead of compiler.used. Unfortunately
the passes using the broken name had tests also using the broken name.
llvm-svn: 186705
Duncan pointed out a mistake in my fix in r186425 when only one of the allocas
being compared had the target-default alignment. This is essentially his
suggested solution. Thanks!
llvm-svn: 186510
For safety, the inliner cannot decrease the allignment on an alloca when
merging it with another.
I've included two variants of the test case for this: one with DataLayout
available, and one without. When DataLayout is not available, if only one of
the allocas uses the default alignment (getAlignment() == 0), then they cannot
be safely merged.
llvm-svn: 186425