This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 275401
Summary:
It's useful to have some visibility about which call sites are devirtualized,
especially for debug purposes. Another use case is a regression test on the
application side (like, Chromium).
Reviewers: pcc
Differential Revision: http://reviews.llvm.org/D22252
llvm-svn: 275145
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.
llvm-svn: 275082
In preparation for porting this pass to the new PM (which has no
doInitialization()).
Differential Revision: http://reviews.llvm.org/D22223
llvm-svn: 275074
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.
E.g.
if (A1 && A2 && A3 && ..... && A10) {
for (i=0; i < 100000000; i++) {
callsite();
}
}
Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.
In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.
Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.
Reviewers: davidxl, eraman, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22118
llvm-svn: 275073
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22180
llvm-svn: 275072
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.
Differential Revision: http://reviews.llvm.org/D22202
llvm-svn: 275027
Summary:
This way the metadata will be only generated when asserts enabled,
or when -enable-import-metadata specified
FIXED missing colon on requires.
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D22167
llvm-svn: 274947
Summary:
This way the metadata will be only generated when asserts enabled,
or when -enable-import-metadata specified
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D22167
llvm-svn: 274938
Summary: As we will move to use uniformed hotness check in inliner, we do not need inline hints in SampleProfile pass any more.
Reviewers: dnovillo, davidxl
Subscribers: eraman, llvm-commits
Differential Revision: http://reviews.llvm.org/D19287
llvm-svn: 274918
Added metadata to be able to make statistics on how many functions
that have been imported have been removed. Also module name might
be helpfull when debugging.
Reviewers: tejohnson, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D21943
llvm-svn: 274668
StratifiedSets (as implemented) is very fast, but its accuracy is also
limited. If we take a more aggressive andersens-like approach, we can be
way more accurate, but we'll also end up being slower.
So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA.
Long-term, we want to end up in a place where CFLSteens is queried
first; if it can provide an answer, great (since queries are basically
map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc.
This patch splits everything out so we can try to do something like
that when we get a reasonable CFLAnders implementation.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21910
llvm-svn: 274589
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.
This also removes the dependence of functionattrs on TLI altogether.
llvm-svn: 274455
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 274305
SimplifyCFG had logic to insert calls to llvm.trap for two very
particular IR patterns: stores and invokes of undef/null.
While InstCombine canonicalizes certain undefined behavior IR patterns
to stores of undef, phase ordering means that this cannot be relied upon
in general.
There are much better tools than llvm.trap: UBSan and ASan.
N.B. I could be argued into reverting this change if a clear argument as
to why it is important that we synthesize llvm.trap for stores, I'd be
hard pressed to see why it'd be useful for invokes...
llvm-svn: 273778
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.
This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.
Differential Revision: http://reviews.llvm.org/D21121
llvm-svn: 273756
Summary: Set ProfileSummary in SampleProfilerLoader.
Reviewers: davidxl, eraman
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21702
llvm-svn: 273745
The bitset metadata currently used in LLVM has a few problems:
1. It has the wrong name. The name "bitset" refers to an implementation
detail of one use of the metadata (i.e. its original use case, CFI).
This makes it harder to understand, as the name makes no sense in the
context of virtual call optimization.
2. It is represented using a global named metadata node, rather than
being directly associated with a global. This makes it harder to
manipulate the metadata when rebuilding global variables, summarise it
as part of ThinLTO and drop unused metadata when associated globals are
dropped. For this reason, CFI does not currently work correctly when
both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
globals, and fails to associate metadata with the rebuilt globals. As I
understand it, the same problem could also affect ASan, which rebuilds
globals with a red zone.
This patch solves both of those problems in the following way:
1. Rename the metadata to "type metadata". This new name reflects how
the metadata is currently being used (i.e. to represent type information
for CFI and vtable opt). The new name is reflected in the name for the
associated intrinsic (llvm.type.test) and pass (LowerTypeTests).
2. Attach metadata directly to the globals that it pertains to, rather
than using the "llvm.bitsets" global metadata node as we are doing now.
This is done using the newly introduced capability to attach
metadata to global variables (r271348 and r271358).
See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html
Differential Revision: http://reviews.llvm.org/D21053
llvm-svn: 273729
This is a convenience iterator that allows clients to enumerate the
GlobalObjects within a Module.
Also start using it in a few places where it is obviously the right thing
to use.
Differential Revision: http://reviews.llvm.org/D21580
llvm-svn: 273470
Summary: Inliner needs ACT when calling InlineFunction. Instead of nullptr, we need to pass it in from SampleProfileLoader
Reviewers: davidxl
Subscribers: eraman, vsk, danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D21205
llvm-svn: 273199
pass manager passes' `run` methods.
This removes a bunch of SFINAE goop from the pass manager and just
requires pass authors to accept `AnalysisManager<IRUnitT> &` as a dead
argument. This is a small price to pay for the simplicity of the system
as a whole, despite the noise that changing it causes at this stage.
This will also helpfull allow us to make the signature of the run
methods much more flexible for different kinds af passes to support
things like intelligently updating the pass's progression over IR units.
While this touches many, many, files, the changes are really boring.
Mostly made with the help of my trusty perl one liners.
Thanks to Sean and Hal for bouncing ideas for this with me in IRC.
llvm-svn: 272978
Summary: This reverts the changes to Globals.cpp and IRMover.cpp in
"[IR] Copy comdats in GlobalObject::copyAttributesFrom" (D20631,
rL270743).
The DeadArgElim test is left unchanged, and we change DAE to explicitly
copy comdats.
The reverted change breaks copyAttributesFrom when the destination lives
in a different module from the source. The decision in D21255 was to
revert this patch and handle comdat copying separately from
copyAttributesFrom.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21403
llvm-svn: 272855
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
llvm-svn: 272737
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
llvm-svn: 272709
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).
I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):
General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis
General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO
Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops
Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata
CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars
Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.
Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------
(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)
Note: "Deduce function attributes in RPO" is a module pass.
1. Do preparatory refactoring.
Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)
2. Rename the old pass class.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).
llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.
Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.
3. Add the new PM pass class.
In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.
Add declaration for the new PM pass in this header:
class ReversePostOrderFunctionAttrsPass
: public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
public:
PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
};
Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.
Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).
Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`
b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();
c) In the old PM getAnalysisUsage method, observe the calls
`AU.addPreserved<...>();`.
In the case `Changed == true`, for each preserved analysis you should do
call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
E.g.:
PreservedAnalyses PA;
PA.preserve<CallGraphAnalysis>();
return PA;
Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.
4. Add the pass to the new PM pass registry to make it available in opt.
In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).
Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.
Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.
llvm-svn: 272505
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
llvm-svn: 272321
Summary:
The module pass pipeline includes a late LICM run after loop
unrolling. LCSSA is implicitly run as a pass dependency of LICM. However no
cleanup pass was run after this, so the LCSSA nodes ended in the optimized output.
Reviewers: hfinkel, mehdi_amini
Subscribers: majnemer, bruno, mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D20606
llvm-svn: 271602
The assumption, made in insert() that weak functions are always inserted after strong functions,
is only true in the first round of adding functions.
In subsequent rounds this is no longer guaranteed , because we might remove a strong function from the tree (because it's modified) and add it later,
where an equivalent weak function already exists in the tree.
This change removes the assert in insert() and explicitly enforces a weak->strong order.
This also removes the need of two separate loops in runOnModule().
llvm-svn: 271299