Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
llvm-svn: 272737
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
llvm-svn: 272709
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).
I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):
General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis
General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO
Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops
Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata
CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars
Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.
Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------
(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)
Note: "Deduce function attributes in RPO" is a module pass.
1. Do preparatory refactoring.
Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)
2. Rename the old pass class.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).
llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.
Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.
3. Add the new PM pass class.
In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.
Add declaration for the new PM pass in this header:
class ReversePostOrderFunctionAttrsPass
: public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
public:
PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
};
Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.
Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).
Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`
b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();
c) In the old PM getAnalysisUsage method, observe the calls
`AU.addPreserved<...>();`.
In the case `Changed == true`, for each preserved analysis you should do
call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
E.g.:
PreservedAnalyses PA;
PA.preserve<CallGraphAnalysis>();
return PA;
Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.
4. Add the pass to the new PM pass registry to make it available in opt.
In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).
Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.
Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.
llvm-svn: 272505
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
llvm-svn: 272321
Summary:
The module pass pipeline includes a late LICM run after loop
unrolling. LCSSA is implicitly run as a pass dependency of LICM. However no
cleanup pass was run after this, so the LCSSA nodes ended in the optimized output.
Reviewers: hfinkel, mehdi_amini
Subscribers: majnemer, bruno, mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D20606
llvm-svn: 271602
The assumption, made in insert() that weak functions are always inserted after strong functions,
is only true in the first round of adding functions.
In subsequent rounds this is no longer guaranteed , because we might remove a strong function from the tree (because it's modified) and add it later,
where an equivalent weak function already exists in the tree.
This change removes the assert in insert() and explicitly enforces a weak->strong order.
This also removes the need of two separate loops in runOnModule().
llvm-svn: 271299
As a result of D18634 we no longer infer certain attributes on linkonce_odr
functions at compile time, and may only infer them at LTO time. The readnone
attribute in particular is required for virtual constant propagation (part
of whole-program virtual call optimization) to work correctly.
This change moves the whole-program virtual call optimization pass after
the function attribute inference passes, and enables the attribute inference
passes at opt level 1, so that virtual constant propagation has a chance to
work correctly for linkonce_odr functions.
Differential Revision: http://reviews.llvm.org/D20643
llvm-svn: 270765
Move the now index-based ODR resolution and internalization routines out
of ThinLTOCodeGenerator.cpp and into either LTO.cpp (index-based
analysis) or FunctionImport.cpp (index-driven optimizations).
This is to enable usage by other linkers.
llvm-svn: 270698
A volatile load has side effects beyond what callers expect readonly to
signify. For example, it is not safe to reorder two function calls
which each perform a volatile load to the same memory location.
llvm-svn: 270671
Check that the incoming blocks of phi nodes are identical, and block
function merging if they are not.
rdar://problem/26255167
Differential Revision: http://reviews.llvm.org/D20462
llvm-svn: 270250
This should just be a compile-time change. Correct the check for whether
we have already analyzed the callee when making summary based decisions.
There is no need to reprocess one at the same threshold as when it was
last processed.
llvm-svn: 269251
Remove the ModuleLevelChanges argument, and the ability to create new
subprograms for cloned functions. The latter was added without review in
r203662, but it has no in-tree clients (all non-test callers pass false
for ModuleLevelChanges [1], so it isn't reachable outside of tests). It
also isn't clear that adding a duplicate subprogram to the compile unit is
always the right thing to do when cloning a function within a module. If
this functionality comes back it should be accompanied with a more concrete
use case.
Furthermore, all in-tree clients add the returned function to the module.
Since that's pretty much the only sensible thing you can do with the function,
just do that in CloneFunction.
[1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction
Differential Revision: http://reviews.llvm.org/D18628
llvm-svn: 269110
The plan is to eventually make this logic simpler, however I expect it to
be a little tricky for the foreseeable future (at least until we're rid of
pointee types), so move it here so that it can be reused to build a summary
index for devirtualization.
Differential Revision: http://reviews.llvm.org/D20005
llvm-svn: 269081
Summary:
Add support for emission of plaintext lists of the imported files for
each distributed backend compilation. Used for distributed build file
staging.
Invoked with new gold-plugin thinlto-emit-imports-files option, which is
only valid with thinlto-index-only (i.e. for distributed builds), or
from llvm-lto with new -thinlto-action=emitimports value.
Depends on D19556.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19636
llvm-svn: 269067
This restores commit r268627:
Summary:
When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html
...
Differential Revision: http://reviews.llvm.org/D19556
Address msan failures by avoiding std::prev on map.end(), the
theory is that this is causing issues due to some known UB problems
in __tree.
llvm-svn: 269059
Summary:
The original ThinLTO pipeline was derived from some
work I did tuning FullLTO on the test suite and SPEC. This
patch reduces the amount of work done in the "linker phase" of
the build, and extend the function simplifications passes
performed during the "compile phase". This helps the build time
by reducing the IR as much as possible during the compile phase
and limiting the work to be performed during the "link phase",
while keeping the performance "on par" with the existing pipeline.
Reviewers: tejohnson
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19773
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268769
Summary:
When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html
The individual index file encodes the summary and module information
required for implementing the importing/exporting decisions made
for a given module in the thin link step.
This is in place of the current mechanism that uses the combined index
to make importing decisions in each back end independently. It is an
enabler for doing global summary based optimizations in the thin link
step (which will be recorded in the individual index files), and reduces
the size of the index that must be sent to each backend process, and
the amount of work to scan it in the backends.
Rather than create entirely new ModuleSummaryIndex structures (and all
the included unique_ptrs) for each backend index file, a map is created
to record all of the GUID and summary pointers needed for a particular
index file. The IndexBitcodeWriter walks this map instead of the full
index (hiding the details of managing the appropriate summary iteration
in a new iterator subclass). This is more efficient than walking the
entire combined index and filtering out just the needed summaries during
each backend bitcode index write.
Depends on D19481.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19556
llvm-svn: 268627
We forgot to consider the target of ifuncs when considering if a
function was alive or dead.
N.B. Also update a few auxiliary tools like bugpoint and
verify-uselistorder.
This fixes PR27593.
llvm-svn: 268468
This pass is supposed to reduce the size of the IR for compile time
purpose. We should run it ASAP, except when we prepare for LTO or
ThinLTO, and we want to keep them available for link-time inline.
Differential Revision: http://reviews.llvm.org/D19813
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268394
There is not point in importing a "weak" or a "linkonce" function
since we won't be able to inline it anyway.
We already had a targeted check for WeakAny, this is using the
same check on GlobalValue as the inline, i.e.
isMayBeOverriddenLinkage()
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268341
There is not point in importing a "weak" or a "linkonce" function
since we won't be able to inline it anyway.
We already had a targeted check for WeakAny, this is using the
same check on GlobalValue as the inline, i.e.
isMayBeOverriddenLinkage()
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268315
When running cc1 with -flto=thin, it is followed by GlobalOpt, which
requires the callgraph. This saves rebuilding one.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268266
This is where it was originally, until LoopVersioningLICM was
inserted before in r259986, I don't believe it was on purpose.
Differential Revision: http://reviews.llvm.org/D19809
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268252
The implemented heuristic has a large body of code which better sits
in its own function for better readability. It also allows adding more
heuristics easier in the future.
llvm-svn: 268107
Summary: Callee name is not used to identify a callsite now, so do not read it during annotation.
Reviewers: davidxl, dnovillo
Subscribers: dnovillo, danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D19704
llvm-svn: 268069
We neglected to transfer operand bundles for some transforms. These
were found via inspection, I'll try to come up with some test cases.
llvm-svn: 268011
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.
Differential Revision: http://reviews.llvm.org/D17864
llvm-svn: 267815
Now the pass is just a tiny wrapper around the util. This lets us reuse
the logic elsewhere (done here for BuildLibCalls) instead of duplicating
it.
The next step is to have something like getOrInsertLibFunc that also
sets the attributes.
Differential Revision: http://reviews.llvm.org/D19470
llvm-svn: 267759
I tried to be as close as possible to the strongest check that
existed before; cleaning these up properly is left for future work.
Differential Revision: http://reviews.llvm.org/D19469
llvm-svn: 267758
Summary:
D19403 adds a new pragma for loop distribution. This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.
As part of this I had to rethink the flag -enable-loop-distribute. My
goal was to be backward compatible with the existing behavior:
A1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute is specified
A2. pass is on when invoked directly from opt (e.g. for unit-testing)
The new pragma/metadata overrides these defaults so the new behavior is:
B1. A1 + enable distribution for individual loop with the pragma/metadata
B2. A2 + disable distribution for individual loop with the pragma/metadata
The default value whether the pass is on or off comes from the initiator
of the pass. From the PassManagerBuilder the default is off, from opt
it's on.
I moved -enable-loop-distribute under the pass. If the flag is
specified it overrides the default from above.
Then the pragma/metadata can further modifies this per loop.
As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline. So to be precise
this is the new behavior:
C1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute or the pragma/metadata enables it
C2. pass is on when invoked directly from opt
unless -enable-loop-distribute=0 or the pragma/metadata disables it
Reviewers: hfinkel
Subscribers: joker.eph, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19431
llvm-svn: 267672
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.
This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19301
llvm-svn: 267519
Pass all of the state we need around as arguments, so that these
functions are easier to reuse. There is one part of this that is
unusual: we pass around a functor to look up a DomTree for a function.
This will be a necessary abstraction when we try to use this code in
both the legacy and the new pass manager.
llvm-svn: 267498
Add a typedef for the std::map<GlobalValue::GUID, GlobalValueSummary *>
map that is passed around to identify summaries for values defined in a
particular module. This shortens up declarations in a variety of places.
llvm-svn: 267471
The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!).
There seems to be no reason why we can't SRA constants too, so let's do it.
llvm-svn: 267393
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
Summary:
We are always importing the initializer for a GlobalVariable.
So if a GlobalVariable is in the export-list, we pull in any
refs as well.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19102
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267303
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
Summary:
The function importer already decided what symbols need to be pulled
in. Also these magically added ones will not be in the export list
for the source module, which can confuse the internalizer for
instance.
Reviewers: tejohnson, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19096
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266948
Summary:
This patch refined the instruction weight anootation algorithm:
1. Do not use dbg_value intrinsics for annotation.
2. Annotate cold calls if the call is inlined in profile, but not inlined before preparation. This indicates that the annotation preparation step found no sample for the inlined callsite, thus the call should be very cold.
Reviewers: dnovillo, davidxl
Subscribers: mgrang, llvm-commits
Differential Revision: http://reviews.llvm.org/D19286
llvm-svn: 266936
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.
See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.
Reviewers: joker.eph
Subscribers: pcc, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18986
llvm-svn: 266877
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
This is a requirement for the cache handling in D18494
Differential Revision: http://reviews.llvm.org/D18908
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266519
To be able to work accurately on the reference graph when taking
decision about internalizing, promoting, renaming, etc. We need
to have the alias information explicit.
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266517
Allow explicit section for indirectly called functions in cfi-icall.
Jumptables for functions in the same type class must be contiguous, so they
always go to the default text section.
Fixes PR25079.
llvm-svn: 266486
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
Summary:
This IR pass is helpful for GPUs, and other targets with divergent
branches. It's a nop on targets without divergent branches.
Reviewers: chandlerc
Subscribers: llvm-commits, jingyue, rnk, joker.eph, tra
Differential Revision: http://reviews.llvm.org/D18626
llvm-svn: 266399
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
This will save a bunch of copies / initialization of intermediate
datastructure, and (hopefully) simplify the code.
This also abstract the symbol preservation mechanism outside of the
Internalization pass into the client code, which is not forced
to keep a map of strings for instance (ThinLTO will prefere hashes).
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266163
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18883
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
Summary:
The function import pass was computing all the imports for all the
modules in the index, and only using the imports for the current module.
Change this to instead compute only for the given module. This means
that the exports list can't be populated, but they weren't being used
anyway.
Longer term, the linker can collect all the imports and export lists
and serialize them out for consumption by the distributed backend
processes which use this pass.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18945
llvm-svn: 266125
r237193 fix handling of alloca size / align in MergeFunctions, but only tested one and didn't follow FunctionComparator::cmpOperations's usual comparison pattern. It also didn't update Instruction.cpp:haveSameSpecialState which I'll do separately.
llvm-svn: 266022
Summary:
This is the first step in also serializing the index out to LLVM
assembly.
The per-module summary written to bitcode is moved out of the bitcode
writer and to a new analysis pass (ModuleSummaryIndexWrapperPass).
The pass itself uses a new builder class to compute index, and the
builder class is used directly in places where we don't have a pass
manager (e.g. llvm-as).
Because we are computing summaries outside of the bitcode writer, we no
longer can use value ids created by the bitcode writer's
ValueEnumerator. This required changing the reference graph edge type
to use a new ValueInfo class holding a union between a GUID (combined
index) and Value* (permodule index). The Value* are converted to the
appropriate value ID during bitcode writing.
Also, this enables removal of the BitWriter library's dependence on the
Analysis library that was previously required for the summary computation.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18763
llvm-svn: 265941
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602
Summary:
To aid in debugging, dump out the correlation between value names and
GUID for each source module when it is materialized. This will make it
easier to comprehend the earlier summary-based function importing debug
trace which only has access to and prints the GUIDs.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18556
llvm-svn: 265326
Summary: This should make the code more readable, especially all the map declarations.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18721
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265215
This is intended to be used for ThinLTO incremental build.
Differential Revision: http://reviews.llvm.org/D18213
This is a recommit of r265095 after fixing the Windows issues.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265111
This reverts commit r265096, r265095, and r265094.
Windows build is broken, and the validation does not pass.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265102
This is intended to be used for ThinLTO incremental build.
Differential Revision: http://reviews.llvm.org/D18213
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265095
This patch simply mirrors the attributes we give to @llvm.nvvm.reflect
to the __nvvm_reflect libdevice call. This shaves about 30% of the code
in libdevice away because of CSE opportunities. It's also helps us
figure out that libdevice implementations of transcendental functions
don't have side-effects.
llvm-svn: 265060
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style. (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)
No functional change, and the new API is backwards-compatible.
Reviewers: chandlerc
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D18613
llvm-svn: 264918
Summary:
Add a statistic to count the number of imported functions. Also, add a
new -print-imports option to emit a trace of imported functions, that
works even for an NDEBUG build.
Note that emitOptimizationRemark does not work for the above printing as
it expects a Function object and DebugLoc, neither of which we have
with summary-based importing.
This is part 2 of D18487, the first part was committed separately as
r264536.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18487
llvm-svn: 264537
With r264503, aliases are now being added to the GlobalsToImport set
even when their aliasees can't be imported due to their linkage type.
While the importing worked correctly (the aliases imported as
declarations) due to the logic in doImportAsDefinition, there is no
point to adding them to the GlobalsToImport set.
Additionally, with D18487 it was resulting in incorrectly printing a
message indicating that the alias was imported.
To avoid this, delay adding aliases to the GlobalsToImport set until
after the linkage type of the aliasee is checked.
This patch is part of D18487.
llvm-svn: 264536
Summary:
Now that the summary contains the full reference/call graph, we can
replace the existing function importer that loads and inspect the IR
to iteratively walk the call graph by a traversal based purely on the
summary information. Decouple the actual importing decision from any
IR manipulation.
Reviewers: tejohnson
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18343
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264503
Summary:
ThinLTO is relying on linkInModule to import selected function.
However a lot of "magic" was hidden in linkInModule and the IRMover,
who would rename and promote global variables on the fly.
This is moving to an approach where the steps are decoupled and the
client is reponsible to specify the list of globals to import.
As a consequence some test are changed because they were relying on
the previous behavior which was importing the definition of *every*
single global without control on the client side.
Now the burden is on the client to decide if a global has to be imported
or not.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18122
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263863
The latent bug that LLE exposed in the LoopVectorizer was resolved
(PR26952).
The pass can be disabled with -mllvm -enable-loop-load-elim=0
llvm-svn: 263595
Since the static getGlobalIdentifier and getGUID methods are now called
for global values other than functions, reflect that by moving these
methods to the GlobalValue class.
llvm-svn: 263524
(Resubmitting after fixing missing file issue)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263513
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263490
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage. Re-landed here with no changes.
Reviewers: chandlerc, jingyue
Subscribers: llvm-commits, tra, jhen, hfinkel
Differential Revision: http://reviews.llvm.org/D17739
llvm-svn: 263481
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:50 2016 +0000
Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
91177308-0d34-0410-b5e6-96231b3b80d8
until we can figure out what to do about clang and Release build testing.
This reverts commit 263258.
llvm-svn: 263321
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.
In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.
llvm-svn: 263219
tests to run GVN in both modes.
This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.
Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.
I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.
Differential Revision: http://reviews.llvm.org/D18019
llvm-svn: 263208
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.
Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.
Ideally this should be NFC, but in reality its possible that a function:
has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will
Reviewed by Duncan Exon Smith.
Differential Revision: http://reviews.llvm.org/D18074
llvm-svn: 263184
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.
This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.
Differential Revision: http://reviews.llvm.org/D16835
llvm-svn: 263047
This reverts commit r262250.
It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
running with *ref* data set. The error happens with
"-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".
llvm-svn: 262839
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
llvm-svn: 262636
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17827
llvm-svn: 262634
parts of the AA interface out of the base class of every single AA
result object.
Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.
When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.
To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).
However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.
The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.
It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.
However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.
Differential Revision: http://reviews.llvm.org/D17329
llvm-svn: 262490
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17742
llvm-svn: 262419
Summary:
I re-benchmarked this and results are similar to original results in
D13259:
On ARM64:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
SingleSource/Benchmarks/Polybench/stencils/adi -19.78%
On x86:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -27.14%
And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.
In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc. These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.
The reason that time spent in SCEV increases has to do with the design
of the old pass manager. If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.
This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.
(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result. Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)
Reviewers: hfinkel, dberlin
Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16300
llvm-svn: 262250
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
llvm-svn: 261736
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Reviewers: chandlerc, jingyue
Subscribers: hfinkel, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17317
llvm-svn: 261544
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
convert one test to use this.
This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.
For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.
The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.
llvm-svn: 261203
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).
This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.
The ThinLTO pipeline integrates in the regular O2/O3 flow:
- The compile phase perform the inliner with a somehow lighter
function simplification. (TODO: tune the inliner thresholds here)
This is intendend to simplify the IR and get rid of obvious things
like linkonce_odr that will be inlined.
- The link phase will run the pipeline from the start, extended with
some specific passes that leverage the augmented knowledge we have
during LTO. Especially after the inliner is done, a sequence of
globalDCE/globalOpt is performed, followed by another run of the
"function simplification" passes. It is not clear if this part
of the pipeline will stay as is, as the split model of ThinLTO
does not allow the same benefit as FullLTO without added tricks.
The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17115
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.
This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.
This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.
llvm-svn: 260813
node set rather than walking the SCC directly.
This directly exposes the functions and has already had null entries
filtered out. We also don't need need to handle optnone as it has
already been handled in the caller -- we never try to remove convergent
when there are optnone functions in the SCC.
With this change, the code for removing convergent should work with the
new pass manager and a different SCC analysis.
llvm-svn: 260668
with the test for a non-convergent intrinsic call.
While it is possible to use the call records to search for function
calls, we're going to do an instruction scan anyways to find the
intrinsics, we can handle both cases while scanning instructions. This
will also make the logic more amenable to the new pass manager which
doesn't use the same call graph structure.
My next patch will remove use of CallGraphNode entirely and allow this
code to work with both the old and new pass manager. Fortunately, it
should also get strictly simpler without changing functionality.
llvm-svn: 260666
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel.
This pipeline is based on the proposal in D13443 for full LTO. We ]
didn't move forward on this proposal because the link was far too long
after that.
This patch refactor the "function simplification" passes that are part
of the inliner loop in a helper function (this part is NFC and can be
commited separately to simplify the diff). The ThinLTO pipeline
integrates in the regular O2/O3 flow:
- The compile phase perform the inliner with a somehow lighter
function simplification. (TODO: tune the inliner thresholds here)
This is intendend to simplify the IR and get rid of obvious things
like linkonce_odr that will be inlined.
- The link phase will run the pipeline from the start, extended with
some specific passes that leverage the augmented knowledge we have
during LTO. Especially after the inliner is done, a sequence of
globalDCE/globalOpt is performed, followed by another run of the
"function simplification" passes.
The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO but this should provide a good starting point.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits, dexonsmith
Differential Revision: http://reviews.llvm.org/D17115
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260604
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260603
Make sure we split ":" from the end of the global function id (which
is <path>:<function> for local functions) instead of the beginning to
avoid splitting at the wrong place for Windows file paths that contain
a ":".
llvm-svn: 260469
The current function importer will walk the callgraph, importing
transitively any callee that is below the threshold. This can
lead to import very deep which is costly in compile time and not
necessarily beneficial as most of the inline would happen in
imported function and not necessarilly in user code.
The actual factor has been carefully chosen by flipping a coin ;)
Some tuning need to be done (just at the existing limiting threshold).
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17082
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260466
There is not reason to pass an array of "char *" to rebuild a set if
the client already has one.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260462
This restores commit r260408, along with a fix for a bot failure.
The bot failure was caused by dereferencing a unique_ptr in the same
call instruction parameter list where it was passed via std::move.
Apparently due to luck this was not exposed when I built the compiler
with clang, only with gcc.
llvm-svn: 260442
Summary:
This patch uses the lower 64-bits of the MD5 hash of a function name as
a GUID in the function index, instead of storing function names. Any
local functions are first given a global name by prepending the original
source file name. This is the same naming scheme and GUID used by PGO in
the indexed profile format.
This change has a couple of benefits. The primary benefit is size
reduction in the combined index file, for example 483.xalancbmk's
combined index file was reduced by around 70%. It should also result in
memory savings for the index file in memory, as the in-memory map is
also indexed by the hash instead of the string.
Second, this enables integration with indirect call promotion, since the
indirect call profile targets are recorded using the same global naming
convention and hash. This will enable the function importer to easily
locate function summaries for indirect call profile targets to enable
their import and subsequent promotion.
The original source file name is recorded in the bitcode in a new
module-level record for use in the ThinLTO backend pipeline.
Reviewers: davidxl, joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D17028
llvm-svn: 260408
Summary:
As discussed on IRC, move the ThinLTOGlobalProcessing code out of
the linker, and into TransformUtils. The name of the class is changed
to FunctionImportGlobalProcessing.
Reviewers: joker.eph, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17081
llvm-svn: 260395
Summary:
Remove the convergent attribute on any functions which provably do not
contain or invoke any convergent functions.
After this change, we'll be able to modify clang to conservatively add
'convergent' to all functions when compiling CUDA.
Reviewers: jingyue, joker.eph
Subscribers: llvm-commits, tra, jhen, hfinkel, resistor, chandlerc, arsenm
Differential Revision: http://reviews.llvm.org/D17013
llvm-svn: 260319
This pass implements whole program optimization of virtual calls in cases
where we know (via bitset information) that the list of callees is fixed. This
includes the following:
- Single implementation devirtualization: if a virtual call has a single
possible callee, replace all calls with a direct call to that callee.
- Virtual constant propagation: if the virtual function's return type is an
integer <=64 bits and all possible callees are readnone, for each class and
each list of constant arguments: evaluate the function, store the return
value alongside the virtual table, and rewrite each virtual call as a load
from the virtual table.
- Uniform return value optimization: if the conditions for virtual constant
propagation hold and each function returns the same constant value, replace
each virtual call with that constant.
- Unique return value optimization for i1 return values: if the conditions
for virtual constant propagation hold and a single vtable's function
returns 0, or a single vtable's function returns 1, replace each virtual
call with a comparison of the vptr against that vtable's address.
Differential Revision: http://reviews.llvm.org/D16795
llvm-svn: 260312
FunctionAttrs does an "optimistic" analysis of SCCs as a unit, which
means normally it is able to disregard calls from an SCC into itself.
However, calls and invokes with operand bundles are allowed to have
memory effects not fully described by the memory effects on the call
target, so we can't be optimistic around operand-bundled calls from an
SCC into itself.
llvm-svn: 260244
Summary:
Passes that call `getAnalysisIfAvailable<T>` also need to call
`addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
legacy pass manager that it uses `T`. This contract was being
violated by passes that used `createLegacyPMAAResults`. This change
fixes this by exposing a helper in AliasAnalysis.h,
`addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
and does the right thing when called from `getAnalysisUsage`.
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17010
llvm-svn: 260183
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.
Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.
The pass is off by default and can be enabled with command line option
-enable-loop-versioning-licm.
Reviewers: hfinkel, anemet, chatur01, reames
Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
llvm-commits
Differential Revision: http://reviews.llvm.org/D9151
llvm-svn: 259986
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
* __cfi_check gets a 3rd argument: ubsan handler data
* Instead of trapping on failure, call __cfi_check_fail which must be
present in the module (generated in the frontend).
llvm-svn: 258746
Summary:
Make sure that any new and optimized objects created during GlobalOPT copy all the attributes from the base object.
A good example of improper behavior in the current implementation is section information associated with the GlobalObject. If a section was set for it, and GlobalOpt is creating/modifying a new object based on this one (often copying the original name), without this change new object will be placed in a default section, resulting in inappropriate properties of the new variable.
The argument here is that if customer specified a section for a variable, any changes to it that compiler does should not cause it to change that section allocation.
Moreover, any other properties worth representation in copyAttributesFrom() should also be propagated.
Reviewers: jmolloy, joker-eph, joker.eph
Subscribers: slarin, joker.eph, rafael, tobiasvk, llvm-commits
Differential Revision: http://reviews.llvm.org/D16074
llvm-svn: 258556
Summary:
Since we are currently not doing incremental importing there is
no need to link metadata as a postpass. The module linker will
only link in the imported subroutines due to the functionality
added by r256003.
(Note that the metadata postpass linking functionalitiy is still
used by llvm-link, and may be needed here in the future if a more
incremental strategy is adopted.)
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16424
llvm-svn: 258458
This patch includes the passmanagerbuilder change that enables IR level PGO instrumentation. It adds two passmanagerbuilder options: -profile-generate=<profile_filename> and -profile-use=<profile_filename>. The new options are primarily for debug purpose.
Reviewers: davidxl, silvas
Differential Revision: http://reviews.llvm.org/D15828
llvm-svn: 258420
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
Loop trip counts can often be resolved during LTO. We should obviously be unrolling small loops once those trip counts have been resolved, but we weren't.
llvm-svn: 257767
The findExternalCalls routine ignores calls to functions already
defined in the dest module. This was not handling the case where
the definition in the current module is actually an alias to a
function call.
llvm-svn: 257493
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.
llvm-svn: 257191
Due to the new in-place ThinLTO symbol handling support added in
r257174, we now invoke renameModuleForThinLTO on the current
module from within the FunctionImport pass.
Additionally, renameModuleForThinLTO no longer needs to return the
Module as it is performing the renaming in place on the one provided.
This commit will be immediately preceeded by a companion clang patch to
remove its invocation of renameModuleForThinLTO.
llvm-svn: 257181
The function importer was still materializing metadata when modules were
loaded for function importing. We only want to materialize it when we
are going to invoke the metadata linking postpass. Materializing it
before function importing is not only unnecessary, but also causes
metadata referenced by imported functions to be mapped in early, and
then not connected to the rest of the module level metadata when it is
ultimately linked in.
Augmented the test case to specifically check for the metadata being
properly connected, which it wasn't before this fix.
llvm-svn: 257171
a top-down manner into a true top-down or RPO pass over the call graph.
There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.
Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.
This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.
In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.
Differential Revision: http://reviews.llvm.org/D15785
llvm-svn: 257163
Summary: The example in desc should match with actual option name
Reviewers: jmolloy
Differential Revision: http://reviews.llvm.org/D15800
llvm-svn: 256951
Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are:
- We don't yet have a writeonly attribute for the first argument.
- We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp.
Differential Revision: http://reviews.llvm.org/D15879
llvm-svn: 256911
This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have.
Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site.
Differential Revision: http://reviews.llvm.org/D15820
llvm-svn: 256787
InlineCostAnalysis is an analysis pass without any need for it to be one.
Once it stops being an analysis pass, it doesn't maintain any useful state
and the member functions inside can be made free functions. NFC.
Differential Revision: http://reviews.llvm.org/D15701
llvm-svn: 256521
a standalone pass.
There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.
In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.
The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.
Differential Revision: http://reviews.llvm.org/D15676
llvm-svn: 256466
is (by default) run much earlier than FuncitonAttrs proper.
This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.
I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.
Differential Revision: http://reviews.llvm.org/D15668
llvm-svn: 256465
This reapplies r256277 with two changes:
- In emitFnAttrCompatCheck, change FuncName's type to std::string to fix
a use-after-free bug.
- Remove an unnecessary install-local target in lib/IR/Makefile.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 256304
This reapplies r252990 and r252949. I've added member function getKind
to the Attr classes which returns the enum or string of the attribute.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 256277
The code for deleting dead global variables and functions was
duplicated.
This is in preparation for also deleting dead global aliases.
llvm-svn: 256274
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.
Differential Revision: http://reviews.llvm.org/D15245
llvm-svn: 256222
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Includes a fix to scrub value subclass data in dropAllReferences. Does not
use binary literals.
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256095
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Includes a fix to scrub value subclass data in dropAllReferences.
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256093
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256090
Summary:
Second patch split out from http://reviews.llvm.org/D14752.
Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.
This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.
Depends on D14825.
Reviewers: dexonsmith, joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D14838
llvm-svn: 255909
As of r255720, the loop pass manager will DTRT when passes update the
loop info for removed loops, so they no longer need to reach into
LPPassManager APIs to do this kind of transformation. This change very
nearly removes the need for the LPPassManager to even be passed into
loop passes - the only remaining pass that uses the LPM argument is
LoopUnswitch.
llvm-svn: 255797
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.
llvm-svn: 255693
This patch does two things:
1. mem2reg is now run immediately after globalopt. Now that globalopt
can localize variables more aggressively, it makes sense to lower
them to SSA form earlier rather than later so they can benefit from
the full set of optimization passes.
2. More scalar optimizations are run after the loop optimizations in
LTO mode. The loop optimizations (especially indvars) can clean up
scalar code sufficiently to make it worthwhile running more scalar
passes. I've particularly added SCCP here as it isn't run anywhere
else in the LTO pass pipeline.
Mem2reg is super cheap and shouldn't affect compilation time at all. The
rest of the added passes are in the LTO pipeline only so doesn't affect
the vast majority of compilations, just the link step.
llvm-svn: 255634
This patch converts code that has access to a LLVMContext to not take a
diagnostic handler.
This has a few advantages
* It is easier to use a consistent diagnostic handler in a single program.
* Less clutter since we are not passing a handler around.
It does make it a bit awkward to implement some C APIs that return a
diagnostic string. I will propose new versions of these APIs and
deprecate the current ones.
llvm-svn: 255571
DenseMap is the wrong data structure to use for sample records and call
sites. The keys are too large, causing massive core memory growth when
reading profiles.
Before this patch, a 21Mb input profile was causing the compiler to grow
to 3Gb in memory. By switching to std::map, the compiler now grows to
300Mb in memory.
There still are some opportunities for memory footprint reduction. I'll
be looking at those next.
llvm-svn: 255389
Added some missing spaces between the module identifier and the start of
the debug message. Also added a ":" after the module identifier to make
this look a little nicer.
llvm-svn: 255259
- This simplifies the CallSite class, arg_begin / arg_end are now
simple wrapper getters.
- In several places, we were creating CallSite instances solely to call
arg_begin and arg_end. With this change, that's no longer required.
llvm-svn: 255226
loading the source Module, linking the function in the destination
module, and destroying the source Module before repeating with the
next function to import (potentially from the same Module).
Ideally we would keep the source Module alive and import the next
Function needed from this Module. Unfortunately this is not possible
because the linker does not leave it in a usable state.
However we can do better by first computing the list of all candidates
per Module, and only then load the source Module and import all the
function we need for it.
The trick to process callees is to materialize function in the source
module when building the list of function to import, and inspect them
in their source module, collecting the list of callees for each
callee.
When we move the the actual import, we will import from each source
module exactly once. Each source module is loaded exactly once.
The only drawback it that it requires to have all the lazy-loaded
source Module in memory at the same time.
Currently this patch already improves considerably the link time,
a multithreaded link of llvm-dis on my laptop was:
real 1m12.175s user 6m32.430s sys 0m10.529s
and is now:
real 0m40.697s user 2m10.237s sys 0m4.375s
Note: this is the full link time (linker+Import+Optimizer+CodeGen)
Differential Revision: http://reviews.llvm.org/D15178
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255100
For an invoke with operand bundles, the [op_begin(), op_end()-3] range
can contain things other than invoke arguments. This change teaches
PruneEH to use arg_begin() and arg_end() explicitly.
llvm-svn: 255073
Summary:
Add a field on the PassManagerBuilder that clang or gold can use to pass
down a pointer to the function index in memory to use for importing when
the ThinLTO backend is triggered. Add support to supply this to the
function import pass.
Reviewers: joker.eph, dexonsmith
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D15024
llvm-svn: 254926