Clang always emit a hash for ThinLTO, but as other frontend are
starting to use ThinLTO, this could be a serious bug.
Differential Revision: https://reviews.llvm.org/D25379
llvm-svn: 283655
We need to add an entry in the combined-index for modules that have
a hash but otherwise empty summary, this is needed so that we can
get the hash for the module.
Also, if no entry is present in the combined index for a module, we
need to skip it when trying to compute a cache entry.
Differential Revision: https://reviews.llvm.org/D25300
llvm-svn: 283654
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
The NativeObjectOutput class has a design problem: it mixes up the caching
policy with the interface for output streams, which makes the client-side
code hard to follow and would for example make it harder to replace the
cache implementation in an arbitrary client.
This change separates the two aspects by moving the caching policy
to a separate field in Config, replacing NativeObjectOutput with a
NativeObjectStream class which only deals with streams and does not need to
be overridden by most clients and introducing an AddFile callback for adding
files (e.g. from the cache) to the link.
Differential Revision: https://reviews.llvm.org/D24622
llvm-svn: 282299
With the new LTO API in r278338, we stopped emitting the individual
index files and imports files for some modules in the distributed backend
case (thinlto-index-only plugin option).
Specifically, this is when the linker decides not to include a module in the
link, because it was in an archive library and did not have a strong
reference to it. Not creating the expected output files makes the
distributed build system implementation more difficult, in terms of
checking for the expected outputs of the thin link, and scheduling the
backend jobs. To address this, the gold-plugin will write dummy empty
.thinlto.bc and .imports files for modules not included in the link
(which LTO never sees).
Augmented a gold v1.12+ test, since that version of gold has the handling
for notifying on modules not being included in the link.
llvm-svn: 282100
Summary:
Emit an empty summary section, instead of no summary section, when
there are no global variables in the index. This ensures that LTO
will treat these files as ThinLTO inputs, instead of as regular
LTO inputs.
In addition to not being what the user likely intended when
compiling with -flto=thin, the current behavior is problematic for
distributed build systems that expect to get ThinLTO index and imports
files back for each input compiled with -flto=thin. Combining into
a single regular LTO module also reduces the backend parallelism.
And in the case where the index was suppressed due to uses in
inline assembly, combining into a single LTO module could provoke
renaming of duplicates that we were trying to prevent by suppressing
the index.
This change required a couple of fixes to handle the empty summary
section.
Reviewers: mehdi_amini
Subscribers: mehdi_amini, llvm-commits, pcc
Differential Revision: https://reviews.llvm.org/D24779
llvm-svn: 282037
Summary:
In runThinLTO we start the task numbering for ThinLTO backend
tasks depending on whether there was also a regular LTO object
(CombinedModule). However, the CombinedModule is moved at
the end of runRegularLTO, so we need to save this information and
pass it into runThinLTO. Otherwise the AddOutput callback to the client
will use the same task number for both the regular LTO object
and the first ThinLTO object, which in gold-plugin caused only
one to be end up in the output filename array and therefore passed
back to gold for the final native link.
Reviewers: pcc, mehdi_amini
Subscribers: mehdi_amini, kromanova
Differential Revision: https://reviews.llvm.org/D24643
llvm-svn: 281725
Previously the prevailing information was not honored, and commons
symbols could override a strong definition. This patch fixes it and
propose the following semantic for commons: the client should mark
as prevailing the commons that it expects the LTO implementation to
merge (i.e. take the maximum size and alignment).
It implies that commons are allowed to have multiple prevailing
definitions.
Differential Revision: https://reviews.llvm.org/D24545
llvm-svn: 281538
Summary:
This addresses a regression in common handling from the new LTO
API in r278338. Only create a new common if the size is different.
The type comparison against an array type fails when the size is
different but not an array. GlobalMerge does not handle the
array types as well and we lose some global merging opportunities.
Reviewers: mehdi_amini
Subscribers: junbuml, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23955
llvm-svn: 279911
Summary:
Have the cache pass back the path to the cache entry when it
is ready to be loaded, instead of a buffer.
For gold-plugin we can simply pass this file back to gold directly,
which avoids expensive writing of a separate tmp file. Ensure
the cache entry is not deleted on cleanup by adjusting the setting
of the IsTemporary flags.
Moved the loading of the buffer into llvm-lto2 to maintain current
behavior.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23946
llvm-svn: 279883
Add the ability to plug a cache on the LTO API.
I tried to write such that a linker implementation can
control the cache backend. This is intrusive and I'm
not totally happy with it, but I can't figure out a
better design right now.
Differential Revision: https://reviews.llvm.org/D23599
llvm-svn: 279576
Summary:
I assume there was a use case, so maybe this strawman patch will help
clarifying if it is legit.
In any case the current situation is not legit: a ThinLTO compilation
should not trigger an unexpected full LTO compilation.
Right now, adding a --save-temps option triggers this and makes the
number of output differs.
Reviewers: tejohnson
Subscribers: pcc, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23600
llvm-svn: 279550
An important performance setting on the LLVMContext for LTO is
enableDebugTypeODRUniquing(), this adds an automatic merging of
debug information in the context based on type ids.
Also, the lto::Config includes a diagnostic handler that needs to
be set on the Context, as well as the setDiscardValueNames() setting.
llvm-svn: 279532
It use to be non-const for the sole purpose of custom handling of
commons symbol. This is moved now in the regular LTO handling now
and such we can constify the callback.
llvm-svn: 279438
The gold-plugin was doing this internally, now the API is handling
commons correctly based on the given resolution.
Differential Revision: https://reviews.llvm.org/D23739
llvm-svn: 279417
directly produce the index as the value type result.
This requires making the index movable which is straightforward. It
greatly simplifies things by allowing us to completely avoid the builder
API and the layers of abstraction inherent there. Instead both pass
managers can directly construct these when run by value. They still
won't be constructed truly eagerly thanks to the optional in the legacy
PM. The code that directly builds the index can also just share a direct
function.
A notable change here is that the result type of the analysis for the
new PM is no longer a reference type. This was really problematic when
making changes to how we handle result types to make our interface
requirements *much* more strict and precise. But I think this is an
overall improvement.
Differential Revision: https://reviews.llvm.org/D23701
llvm-svn: 279216
Summary:
This was reversed compared to ThinLTOCodeGenerator for some reason,
and lead to an increased code-size on my tests. I figured that the
weak resolution may internalize a linkonce function, which will be
promoted immediately (and renamed), before being internalized again.
Reviewers: tejohnson
Subscribers: pcc, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23632
llvm-svn: 279021
Summary:
It does not play well with directories (end up with a bunch of hidden
files).
Also, do not strip the 0 suffix for the first task, especially since
0 can be used by ThinLTO as well now.
Reviewers: tejohnson
Subscribers: mehdi_amini, pcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D23612
llvm-svn: 279014
Summary:
While NFC for now, this will allow more flexibility on the client side
to hold state necessary to back up the stream.
Also when adding caching, this class will grow in complexity.
Note I blindly modified the gold-plugin as I can't compile it.
Reviewers: tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23542
llvm-svn: 278907
Summary:
Multiple APIs were taking a StringMap for the ImportLists containing
the entries for for all the modules while operating on a single entry
for the current module. Instead we can pass the desired ModuleImport
directly. Also some of the APIs were not const, I believe just to be
able to use operator[] on the StringMap.
Reviewers: tejohnson
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23537
llvm-svn: 278776
Summary:
Fixed a bug in ThinLTOCodeGenerator's temp file dumping. The Twine
needs to be passed directly as an argument, or a copy saved into a
std::string.
It doesn't seem there are any consumers of this, so I added a new option
to llvm-lto to enable saving of temp files during ThinLTO, and augmented
a test to use it to check post-import but pre-opt bitcode.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23525
llvm-svn: 278761
This restores commit r278330, with fixes for a few bot failures:
- Fix a late change I had made to the save temps output file that I
missed due to existing files sitting on my disk
- Fix a bunch of Windows bot failures with "ambiguous call to overloaded
function" due to confusion between llvm::make_unique vs
std::make_unique (preface the new make_unique calls with "llvm::")
- Attempt to fix a modules bot failure by adding a missing include
to LTO/Config.h.
Original change:
Resolution-based LTO API.
Summary:
This introduces a resolution-based LTO API. The main advantage of this API over
existing APIs is that it allows the linker to supply a resolution for each
symbol in each object, rather than the combined object as a whole. This will
become increasingly important for use cases such as ThinLTO which require us
to process symbol resolutions in a more complicated way than just adjusting
linkage.
Patch by Peter Collingbourne.
Reviewers: rafael, tejohnson, mehdi_amini
Subscribers: lhames, tejohnson, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D20268
llvm-svn: 278338
This reverts commit r278330.
I made a change to the save temps output that is causing issues with the
bots. Didn't realize this because I had older output files sitting on
disk in my test output directory.
llvm-svn: 278331
Summary:
This introduces a resolution-based LTO API. The main advantage of this API over
existing APIs is that it allows the linker to supply a resolution for each
symbol in each object, rather than the combined object as a whole. This will
become increasingly important for use cases such as ThinLTO which require us
to process symbol resolutions in a more complicated way than just adjusting
linkage.
Patch by Peter Collingbourne.
Reviewers: rafael, tejohnson, mehdi_amini
Subscribers: lhames, tejohnson, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D20268
Address review comments
llvm-svn: 278330
We currently do not touch a symbol's linkage in the case where a definition
has a single copy. However, this code is effectively unnecessary: either
the definition is not exported, in which case the internalize phase sets
its linkage to internal, or it is exported, in which case we need to promote
linkage to weak. Those two cases are already handled by existing code.
I believe that the only real functional change here is in the case where we
have a single definition which does not prevail (e.g. because the definition
in a native object file prevails). In that case we now lower linkage to
available_externally following the existing code path for that case.
As a result we can remove the isExported function parameter from the
thinLTOResolveWeakForLinkerInIndex function.
Differential Revision: http://reviews.llvm.org/D21883
llvm-svn: 274784
This check is not only unnecessary, it can produce the wrong result. If we
are linking a single module and it has an exported linkonce symbol, we need
to promote to weak in order to avoid PR19901-style problems.
Differential Revision: http://reviews.llvm.org/D21917
llvm-svn: 274722
* UpdateCompilerUsed() -> updateCompilerUsed()
* ThinLTO doesn't use the API so we can remove the include
* Clean up unused #include <functional> from the header
* Rename #ifdef guard comment to be correct.
llvm-svn: 273461
This is indeed a much cleaner approach (thanks to Daniel Berlin
for pointing out), and also David/Sean for review.
Differential Revision: http://reviews.llvm.org/D21454
llvm-svn: 273032
Daniel Berlin expressed some real concerns about the port and proposed
and alternative approach. I'll revert this for now while working on a
new patch, which I hope to put up for review shortly. Sorry for the churn.
llvm-svn: 272925
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
llvm-svn: 272737
The need for all these Lookup* functions is just because of calls to
getAnalysis inside methods (i.e. not at the top level) of the
runOnFunction method. They should be straightforward to clean up when
the old PM is gone.
llvm-svn: 272615
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).
I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):
General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis
General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO
Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops
Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata
CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars
Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.
Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------
(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)
Note: "Deduce function attributes in RPO" is a module pass.
1. Do preparatory refactoring.
Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)
2. Rename the old pass class.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).
llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.
Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.
3. Add the new PM pass class.
In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.
Add declaration for the new PM pass in this header:
class ReversePostOrderFunctionAttrsPass
: public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
public:
PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
};
Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.
Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).
Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`
b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();
c) In the old PM getAnalysisUsage method, observe the calls
`AU.addPreserved<...>();`.
In the case `Changed == true`, for each preserved analysis you should do
call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
E.g.:
PreservedAnalyses PA;
PA.preserve<CallGraphAnalysis>();
return PA;
Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.
4. Add the pass to the new PM pass registry to make it available in opt.
In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).
Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.
Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.
llvm-svn: 272505
Summary:
Ensure we keep prevailing copy of LinkOnceAny by converting it to
WeakAny.
Rename odr_resolution test to the now more appropriate weak_resolution
(weak in the linker sense includes linkonce).
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D20634
llvm-svn: 270850
Move the now index-based ODR resolution and internalization routines out
of ThinLTOCodeGenerator.cpp and into either LTO.cpp (index-based
analysis) or FunctionImport.cpp (index-driven optimizations).
This is to enable usage by other linkers.
llvm-svn: 270698
Summary:
This patch changes the ODR resolution and internalization to be based on
updates to the Index, which are consumed by the backend portion of the
transformations.
It will be followed by an NFC change to move these out of libLTO's
ThinLTOCodeGenerator so that it can be used by other linkers
(gold and lld) and by ThinLTO distributed backends.
The global summary-based portions use callbacks so that the client can
determine the prevailing copy and other information in a client-specific
way. Eventually, with the API being developed in D20268, these may be
modified to use information such as symbol resolutions, supplied by the
clients to the API.
Reviewers: joker-eph
Subscribers: joker.eph, pcc, llvm-commits
Differential Revision: http://reviews.llvm.org/D20290
llvm-svn: 270584
Moved the ModuleLoader and supporting helper loadModuleFromBuffer out of
ThinLTOCodeGenerator and into new LTO.h/LTO.cpp files. This is in
preparation for a patch that will utilize these in the gold-plugin.
Note that there are some other pending patches (D20268 and D20290) that
also plan to refactor common interfaces and functionality into this same
pair of new files.
llvm-svn: 270509
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
This is a compile time optimization: keeping a large file to process
at the end hurts parallelism.
The heurisitic used right now is the input buffer size, however we
may want to consider the number of functions to import or the
different number of files to load for importing as well.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 269684
This is reducing pressure on the OS memory system, and is NFC
when not using a cache.
I measure a 10x memory consumption reduction when linking opt
with full debug info.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 269682
Summary:
Add support for emission of plaintext lists of the imported files for
each distributed backend compilation. Used for distributed build file
staging.
Invoked with new gold-plugin thinlto-emit-imports-files option, which is
only valid with thinlto-index-only (i.e. for distributed builds), or
from llvm-lto with new -thinlto-action=emitimports value.
Depends on D19556.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19636
llvm-svn: 269067
This restores commit r268627:
Summary:
When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html
...
Differential Revision: http://reviews.llvm.org/D19556
Address msan failures by avoiding std::prev on map.end(), the
theory is that this is causing issues due to some known UB problems
in __tree.
llvm-svn: 269059
This patch introduces a new option -lto-strip-invalid-debug-info, which
drops malformed debug info from the input.
The problem I'm trying to solve with this sequence of patches is that
historically we've done a really bad job at verifying debug info. We want
to be able to make the verifier stricter without having to worry about
breaking bitcode compatibility with existing producers. For example, we
don't necessarily want IR produced by an older version of clang to be
rejected by an LTO link just because of malformed debug info, and rather
provide an option to strip it. Note that merely outdated (but well-formed)
debug info would continue to be auto-upgraded in this scenario.
rdar://problem/25818489
http://reviews.llvm.org/D19987
This reapplies 268936 with a test case fix for Linux (-exported-symbol foo)
llvm-svn: 268965
This patch introduces a new option -lto-strip-invalid-debug-info, which
drops malformed debug info from the input.
The problem I'm trying to solve with this sequence of patches is that
historically we've done a really bad job at verifying debug info. We want
to be able to make the verifier stricter without having to worry about
breaking bitcode compatibility with existing producers. For example, we
don't necessarily want IR produced by an older version of clang to be
rejected by an LTO link just because of malformed debug info, and rather
provide an option to strip it. Note that merely outdated (but well-formed)
debug info would continue to be auto-upgraded in this scenario.
rdar://problem/25818489
http://reviews.llvm.org/D19987
llvm-svn: 268936
This reverts commit r268658.
I incorrectly diagnose this as the source of an assertion during an
LTO bootstrap of clang.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268680
The assertions were assuming that the linker will not ask to preserve
a global that is internal or available_externally, as it does not
really make sense. In practice this break the bootstrap of clang,
I degrade to a warning for now.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268671
Summary:
When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html
The individual index file encodes the summary and module information
required for implementing the importing/exporting decisions made
for a given module in the thin link step.
This is in place of the current mechanism that uses the combined index
to make importing decisions in each back end independently. It is an
enabler for doing global summary based optimizations in the thin link
step (which will be recorded in the individual index files), and reduces
the size of the index that must be sent to each backend process, and
the amount of work to scan it in the backends.
Rather than create entirely new ModuleSummaryIndex structures (and all
the included unique_ptrs) for each backend index file, a map is created
to record all of the GUID and summary pointers needed for a particular
index file. The IndexBitcodeWriter walks this map instead of the full
index (hiding the details of managing the appropriate summary iteration
in a new iterator subclass). This is more efficient than walking the
entire combined index and filtering out just the needed summaries during
each backend bitcode index write.
Depends on D19481.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19556
llvm-svn: 268627
If the linker requested to preserve a linkonce function, we should
honor this even if we drop all uses.
We explicitely avoid turning them into weak_odr (unlike the first
version of this patch in r267644), because the codegen can be
different on Darwin: because of `llvm::canBeOmittedFromSymbolTable()`
we may emit the symbol as weak_def_can_be_hidden instead of
weak_definition.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268607
This reverts commit r267644. Turning linkonce_odr into weak_odr is
a sementic change on Darwin: because of
`llvm::canBeOmittedFromSymbolTable()` we may emit the symbol as
weak_def_can_be_hidden instead of weak_definition.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268606
This was a remaining of a previous scheme where some IPOs were taking
place before we enter this code. This is not relevant anymore.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268605
This reverts commit r267657, r267656, and r267655.
The test does not pass on multiple bots, I'm unsure why yet but let's unbreak them.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267664
Summary:
If the linker requested to preserve a linkonce function, we should
honor this even if we drop all uses.
Reviewers: dexonsmith
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19527
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267644
If the linker specifically requested for a linkonce to be preserved,
we need to make sure we won't drop it even if all the uses in the
current module disappear.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267543
Add a typedef for the std::map<GlobalValue::GUID, GlobalValueSummary *>
map that is passed around to identify summaries for values defined in a
particular module. This shortens up declarations in a variety of places.
llvm-svn: 267471
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
Keeping as much as possible internal/private is
known to help the optimizer. Let's try to benefit from
this in ThinLTO.
Note: this is early work, but is enough to build clang (and
all the LLVM tools). I still need to write some lit-tests...
Differential Revision: http://reviews.llvm.org/D19103
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267317
This removes the interfaces added (and not yet complete) to support
lazy reading of summaries. This support is not expected to be needed
since we are moving to a model where the full index is only being
traversed in the thin link step, instead of the back ends.
(The second part of this that I plan to do next is remove the
GlobalValueInfo from the ModuleSummaryIndex - it was mostly needed to
support lazy parsing of summaries. The index can instead reference the
summary structures directly.)
llvm-svn: 267097
This help to streamline the process of handling importing since
we don't need to special case alias everywhere: just like
linkonce_odr function, make sure at least one alias is emitted
by turning it weak.
Differential Revision: http://reviews.llvm.org/D19308
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266958
Clients may call writeMergedModules before calling optimize, or call
compileOptimized without calling optimize. Make sure they don't sneak
past the verifier. This adds LTOCodeGenerator::verifyMergedModuleOnce,
and calls it from writeMergedModule, optimize, and codegenOptimized.
I couldn't find a good way to test this. I tried writing broken IR to
send into llvm-lto, but LTOCodeGenerator doesn't understand textual IR,
and assembler runs the verifier itself anyway. Checking in
valid-but-doesn't-verify bitcode here doesn't seem valuable.
llvm-svn: 266894
Summary:
This is a follow-on to apply Duncan's new DIType ODR uniquing from
r266549 and r266713 in more places.
Enable enableDebugTypeODRUniquing() for ThinLTO backends invoked via
libLTO, similar to the way r266549 enabled this for ThinLTO backend
threads launched from gold-plugin.
Also enable enableDebugTypeODRUniquing in opt, similar to the way
r266549 enabled this for llvm-link (on by default, can be disabled with
new -disable-debug-info-type-map option), since we may perform ThinLTO
importing from opt.
Reviewers: dexonsmith, joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19263
llvm-svn: 266746
As per David's review, rename everything in the new API for ODR type
uniquing of debug info.
ensureDITypeMap => enableDebugTypeODRUniquing
destroyDITypeMap => disableDebugTypeODRUniquing
hasDITypeMap => isODRUniquingDebugTypes
llvm-svn: 266713
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
Rather than relying on the structural equivalence of DICompositeType to
merge type definitions, use an explicit map on the LLVMContext that
LLParser and BitcodeReader consult when constructing new nodes.
Each non-forward-declaration DICompositeType with a non-empty
'identifier:' field is stored/loaded from the type map, and the first
definiton will "win".
This map is opt-in: clients that expect ODR types from different modules
to be merged must call LLVMContext::ensureDITypeMap.
- Clients that just happen to load more than one Module in the same
LLVMContext won't magically merge types.
- Clients (like LTO) that want to continue to merge types based on ODR
identifiers should opt-in immediately.
I have updated LTOCodeGenerator.cpp, the two "linking" spots in
gold-plugin.cpp, and llvm-link (unless -disable-debug-info-type-map) to
set this.
With this in place, it will be straightforward to remove the DITypeRef
concept (i.e., referencing types by their 'identifier:' string rather
than pointing at them directly).
llvm-svn: 266549
Summary: For Incremental LTO, we need to make sure that an old
cache entry is not used when incrementally re-linking with a new
libLTO.
Adding a global LLVM_REVISION in llvm-config.h would for to
rebuild/relink the world for every "git pull"/"svn update".
So instead only libLTO is made dependent on the VCS and will
be rebuilt (and the dependent binaries relinked, i.e. as of
today: libLTO.dylib and llvm-lto).
Reviewers: beanz
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18987
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266523
This is a requirement for the cache handling in D18494
Differential Revision: http://reviews.llvm.org/D18908
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266519
It is now only doing the update to the llvm.compiler_used global.
The client has to call separately the internalization stage.
Hopefully the code is simpler to understand this way.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266174
This will save a bunch of copies / initialization of intermediate
datastructure, and (hopefully) simplify the code.
This also abstract the symbol preservation mechanism outside of the
Internalization pass into the client code, which is not forced
to keep a map of strings for instance (ThinLTO will prefere hashes).
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266163
This is intended to be shared by the ThinLTOCodeGenerator.
Note that there is a change in the way the verifier is run, previously
it was ran as a Pass on the merged module during internalization.
While now the verifier is called explicitely on the merged module
outside of the internalize "pass pipeline".
What remains strange in the API is the fact that `DisableVerify` in
the API does not disable this initial verifier.
Differential Revision: http://reviews.llvm.org/D19000
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266047
Summary:
This is the first step in also serializing the index out to LLVM
assembly.
The per-module summary written to bitcode is moved out of the bitcode
writer and to a new analysis pass (ModuleSummaryIndexWrapperPass).
The pass itself uses a new builder class to compute index, and the
builder class is used directly in places where we don't have a pass
manager (e.g. llvm-as).
Because we are computing summaries outside of the bitcode writer, we no
longer can use value ids created by the bitcode writer's
ValueEnumerator. This required changing the reference graph edge type
to use a new ValueInfo class holding a union between a GUID (combined
index) and Value* (permodule index). The Value* are converted to the
appropriate value ID during bitcode writing.
Also, this enables removal of the BitWriter library's dependence on the
Analysis library that was previously required for the summary computation.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18763
llvm-svn: 265941
Remove a default parameter value being passed unnecessarily, which
also reduces the changes required when this parameter is changed in
D18763.
Document the remaining non-default bool value passed for another
parameter.
llvm-svn: 265348
These function can be dropped by the compiler if they are no longer
referenced in the current module. However there is a change that
another module is still referencing them because of the import.
Multiple solutions can be used:
- Always import LinkOnce when a caller is imported. This ensure that
every module with a call to a LinkOnce has the definition and will
be able to emit it if it emits the call.
- Turn the LinkOnce into Weak, so that it is always emitted.
- Turn all LinkOnce into available_externally and come back after all
modules are codegen'ed to emit only one copy of the linkonce, when
there is still a reference to it.
This patch implement the second option, with am optimization that
only *one* module will turn the LinkOnce into Weak, while the others
will turn it into available_externally, so that there is exactly one
copy emitted for the whole compilation.
http://reviews.llvm.org/D18346
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265190
This allows the linker to instruct ThinLTO to perform only the
optimization part or only the codegen part of the process.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265113
Summary:
Now that the summary contains the full reference/call graph, we can
replace the existing function importer that loads and inspect the IR
to iteratively walk the call graph by a traversal based purely on the
summary information. Decouple the actual importing decision from any
IR manipulation.
Reviewers: tejohnson
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18343
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264503
(Resubmitting after fixing missing file issue)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263513
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263490
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
tests to run GVN in both modes.
This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.
Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.
I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.
Differential Revision: http://reviews.llvm.org/D18019
llvm-svn: 263208
This is avoiding a naming conflict with opt and llc.
While opt and llc don't link to LTO usually, users that are building a
monolithic libLLVM.dylib and linking the tools to it would have a
runtime error because of the duplicate cl::opt registration.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263127
Summary:
This is intended to be a performance flag, on the same level as clang
cc1 option "--disable-free". LLVM will never initialize it by default,
it will be up to the client creating the LLVMContext to request this
behavior. Clang will do it by default in Release build (just like
--disable-free).
"opt" and "llc" can opt-in using -disable-named-value command line
option.
When performing LTO on llvm-tblgen, the initial merging of IR peaks
at 92MB without this patch, and 86MB after this patch,setNameImpl()
drops from 6.5MB to 0.5MB.
The total link time goes from ~29.5s to ~27.8s.
Compared to a compile-time flag (like the IRBuilder one), it performs
very close. I profiled on SROA and obtain these results:
420ms with IRBuilder that preserve name
372ms with IRBuilder that strip name
375ms with IRBuilder that preserve name, and a runtime flag to strip
Reviewers: chandlerc, dexonsmith, bogner
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263086
This is intended to provide a parallel (threaded) ThinLTO scheme
for linker plugin use through the libLTO C API.
The intent of this patch is to provide a first implementation as a
proof-of-concept and allows linker to start supporting ThinLTO by
definiing the libLTO C API. Some part of the libLTO API are left
unimplemented yet. Following patches will add support for these.
The current implementation can link all clang/llvm binaries.
Differential Revision: http://reviews.llvm.org/D17066
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 262977
Function lto_module_create_in_local_context() would previously
rely on the default LLVMContext being created for it by
LTOModule::makeLTOModule(). This context exits the program on
error and is not arranged to update sLastStringError in
tools/lto/lto.cpp.
Function lto_module_create_in_local_context() now creates an
LLVMContext by itself, sets it up correctly to its needs and then
passes it to LTOModule::createInLocalContext() which takes
ownership of the context and keeps it present for the lifetime of
the returned LTOModule.
Function LTOModule::makeLTOModule() is modified to take a
reference to LLVMContext (instead of a pointer) and no longer
creates a default context when nullptr is passed to it. Method
LTOModule::createInContext() that takes a pointer to LLVMContext
is removed because it allows to pass a nullptr to it. Instead
LTOModule::createFromBuffer() (that takes a reference to
LLVMContext) should be used.
Differential Revision: http://reviews.llvm.org/D17715
llvm-svn: 262330
convert one test to use this.
This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.
For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.
The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.
llvm-svn: 261203
Summary:
I thought -Xlinker -mllvm -Xlinker -stats worked at some point but maybe
it never did.
For clang, I believe that stats are printed from cc1_main. This patch
also prints them for LTO, specifically right after codegen happens.
I only looked at the C API for LTO briefly to see if this is a good
place. Probably there are still cases where this wouldn't be printed
but it seems to be working for the common case. I also experimented
putting this in the LTOCodeGenerator destructor but that didn't trigger
for me because ld64 does not destroy the LTOCodeGenerator.
Reviewers: dexonsmith, joker.eph
Subscribers: rafael, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17302
llvm-svn: 261013
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
This addresses PR26060 where function lto_module_create() could return nullptr
but lto_get_error_message() returned an empty string.
The error() call after LTOModule::createFromFile() in llvm-lto is then removed
because any error from this function should go through the diagnostic handler in
llvm-lto which will exit the program. The error() call was added because this
previously did not happen when the file was non-existent. This is fixed by the
patch. (The situation that llvm-lto reports an error when the input file does
not exist is tested by llvm/tools/llvm-lto/error.ll).
Differential Revision: http://reviews.llvm.org/D16106
llvm-svn: 258298
Summary:
This is a companion patch for http://reviews.llvm.org/D16124.
Internalized symbols increase the size of strongly-connected components in
SCC-based module splitting and thus reduce the amount of parallelism. This
patch records the original linkage of non-local symbols prior to
internalization and then restores it just before splitting/CodeGen. This is
also useful for cases where the linker requires symbols to remain external, for
instance, so they can be placed according to linker script rules.
It's currently under its own flag (-restore-globals) but should eventually
share a common flag with D16124.
Reviewers: joker.eph, pcc
Subscribers: slarin, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16229
llvm-svn: 258100
a top-down manner into a true top-down or RPO pass over the call graph.
There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.
Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.
This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.
In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.
Differential Revision: http://reviews.llvm.org/D15785
llvm-svn: 257163
Renamed variables to be more reflective of whether they are
an instance of Linker, IRLinker or ModuleLinker. Also fix a stale
comment.
llvm-svn: 256011
This patch converts code that has access to a LLVMContext to not take a
diagnostic handler.
This has a few advantages
* It is easier to use a consistent diagnostic handler in a single program.
* Less clutter since we are not passing a handler around.
It does make it a bit awkward to implement some C APIs that return a
diagnostic string. I will propose new versions of these APIs and
deprecate the current ones.
llvm-svn: 255571
Before this patch the diagnostic handler was optional. If it was not
passed, the one in the LLVMContext was used.
That is probably not a pattern we want to follow. If each area has an
optional callback, there is a sea of callbacks and it is hard to follow
which one is called.
Doing this also found cases where the callback is a nice addition, like
testing that no errors or warnings are reported.
The other option is to always use the diagnostic handler in the
LLVMContext. That has a few problems
* To implement the C API we would have to set the diag handler and then
set it back to the original value.
* Code that creates the context might be far away from code that wants
the diagnostics.
I do have a patch that implements the second option and will send that as
an RFC.
llvm-svn: 254777
This is a continuation of r253367.
These functions return is owned by the caller, so they return
std::unique_ptr now.
The call can fail, so the return is wrapped in ErrorOr.
They have a context where to report diagnostics, so they don't need to
take a string out parameter.
With this there are no call to getGlobalContext in lib/LTO.
llvm-svn: 254721
It was only used from LTO for a debug feature, and LTO can just create
another linker.
It is pretty odd to have a method to reset the module in the middle of a
link. It would make IdentifiedStructTypes inconsistent with the Module
for example.
llvm-svn: 254434
This adds a new API, LTOCodeGenerator::setFileType, to choose the output file
format for LTO CodeGen. A corresponding change to use this new API from
llvm-lto and a test case is coming in a separate commit.
Differential Revision: http://reviews.llvm.org/D14554
llvm-svn: 253622
This patch removes the std::string& argument from a number of C++ LTO API calls
and instead makes them use the installed diagnostic handler. This would also
improve consistency of diagnostic handling infrastructure: if an LTO client used
lto_codegen_set_diagnostic_handler() to install a custom error handler, we do
not want some error messages to go through the custom error handler, and some
other error messages to go into sLastErrorString.
llvm-svn: 253367
This is a follow-up from the previous discussion on the thread:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151019/307763.html
The LibLTO lto_get_error_message() API reads error messages from a std::string
sLastErrorString. Instead of passing this string around as an argument, this
patch creates a diagnostic handler and then sends this handler to the
constructor of LTOCodeGenerator.
Differential Revision: http://reviews.llvm.org/D14313
llvm-svn: 252791
Summary: Mimic parseTriple(); and exposes it to LTOModule.cpp
Reviewers: dexonsmith, rafael
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 252442
The verifier currently runs three times in LTO: (1) after parsing, (2)
at the beginning of the optimization pipeline, and (3) at the end of it.
The first run is important, since we're not sure where the bitcode comes
from and it's nice to validate it, but in release builds the extra runs
aren't appropriate.
This commit:
- Allows these runs to be disabled in LTOCodeGenerator.
- Adds command-line options to llvm-lto.
- Adds command-line options to libLTO.dylib, and disables the verifier
by default in release builds (based on NDEBUG).
This shaves about 3.5% off the runtime of ld64 when linking
verify-uselistorder with -flto -g.
rdar://22509081
llvm-svn: 247729
In some ways this is a very boring port to the new pass manager as there
are no interesting analyses or dependencies or other oddities.
However, this does introduce the first good example of a transformation
pass with non-trivial state porting to the new pass manager. I've tried
to carve out patterns here to replicate elsewhere, and would appreciate
comments on whether folks like these patterns:
- A common need in the new pass manager is to effectively lift the pass
class and some of its state into a public header file. Prior to this,
LLVM used anonymous namespaces to provide "module private" types and
utilities, but that doesn't scale to cases where a public header file
is needed and the new pass manager will exacerbate that. The pattern
I've adopted here is to use the namespace-cased-name of the core pass
(what would be a module if we had them) as a module-private namespace.
Then utility and other code can be declared and defined in this
namespace. At some point in the future, we could even have
(conditionally compiled) code that used modules features when
available to do the same basic thing.
- I've split the actual pass run method in two in order to expose
a private method usable by the old pass manager to wrap the new class
with a minimum of duplicated code. I actually looked at a bunch of
ways to automate or generate these, but they are all quite terrible
IMO. The fundamental need is to extract the set of analyses which need
to cross this interface boundary, and that will end up being too
unpredictable to effectively encapsulate IMO. This is also
a relatively small amount of boiler plate that will live a relatively
short time, so I'm not too worried about the fact that it is boiler
plate.
The rest of the patch is totally boring but results in a massive diff
(sorry). It just moves code around and removes or adds qualifiers to
reflect the new name and nesting structure.
Differential Revision: http://reviews.llvm.org/D12773
llvm-svn: 247501
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
Follow LLVM style for the parameter names (`CamelCase` not `camelCase`),
and surface the header docs in doxygen. No functionality change
intended.
llvm-svn: 246509
llvm::splitCodeGen is a function that implements the core of parallel LTO
code generation. It uses llvm::SplitModule to split the module into linkable
partitions and spawning one code generation thread per partition. The function
produces multiple object files which can be linked in the usual way.
This has been threaded through to LTOCodeGenerator (and llvm-lto for testing
purposes). Separate patches will add parallel LTO support to the gold plugin
and lld.
Differential Revision: http://reviews.llvm.org/D12260
llvm-svn: 246236
This change moves LTOCodeGenerator's ownership of the merged module to a
field of type std::unique_ptr<Module>. This helps simplify parts of the code
and clears the way for the module to be consumed by LLVM CodeGen (see D12132
review comments).
Differential Revision: http://reviews.llvm.org/D12205
llvm-svn: 245891
This allows us to remove a bunch of code in LTOCodeGenerator and llvm-lto
and has the side effect of improving error handling in the libLTO C API.
llvm-svn: 245756
folding the code into the main Analysis library.
There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.
Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.
I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.
Differential Revision: http://reviews.llvm.org/D12075
llvm-svn: 245318
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11103
(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243114
This reverts commit 0f720d984f419c747709462f7476dff962c0bc41.
It breaks clang too badly, I need to prepare a proper patch for clang
first.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243089
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11103
(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243083
This is needed for COFF linkers to distinguish between weak external aliases
and regular symbols with LLVM weak linkage, which are represented as strong
symbols in COFF.
llvm-svn: 241389
This change unifies how LTOModule and the backend obtain linker flags
for globals: via a new TargetLoweringObjectFile member function named
emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns
the list of linker flags as a single concatenated string.
This change affects the C libLTO API: the function lto_module_get_*deplibs now
exposes an empty list, and lto_module_get_*linkeropts exposes a single element
which combines the contents of all observed flags. libLTO should never have
tried to parse the linker flags; it is the linker's job to do so. Because
linkers will need to be able to parse flags in regular object files, it
makes little sense for libLTO to have a redundant mechanism for doing so.
The new API is compatible with the old one. It is valid for a user to specify
multiple linker flags in a single pragma directive like this:
#pragma comment(linker, "/defaultlib:foo /defaultlib:bar")
The previous implementation would not have exposed
either flag via lto_module_get_*deplibs (as the test in
TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive)
and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via
lto_module_get_*linkeropts. This may have been a bug in the implementation,
but it does give us a chance to fix the interface.
Differential Revision: http://reviews.llvm.org/D10548
llvm-svn: 241010
Start using C++ types such as StringRef and MemoryBuffer in the C++ LTO
API. In doing so, clarify the ownership of the native object file: the caller
now owns it, not the LTOCodeGenerator. The C libLTO library has been modified
to use a derived class of LTOCodeGenerator that owns the object file.
Differential Revision: http://reviews.llvm.org/D10114
llvm-svn: 238776
Reverse libLTO's default behaviour for preserving use-list order in
bitcode, and add API for controlling it. The default setting is now
`false` (don't preserve them), which is consistent with `clang`'s
default behaviour.
Users of libLTO should call `lto_codegen_should_embed_uselists(CG,true)`
prior to calling `lto_codegen_write_merged_modules()` whenever the
output file isn't part of the production workflow in order to reproduce
results with subsequent calls to `llc`.
(I haven't added tests since `llvm-lto` (the test tool for LTO) doesn't
support bitcode output, and even if it did: there isn't actually a good
way to test whether a tool has passed the flag. If the order is already
"natural" (if the order will already round-trip) then no use-list
directives are emitted at all. At some point I'll circle back to add
tests to `llvm-as` (etc.) that they actually respect the flag, at which
point I can somehow add a test here as well.)
llvm-svn: 235943
When debugging LTO issues with ld64, we use -save-temps to save the merged
optimized bitcode file, then invoke ld64 again on the single bitcode file.
The saved bitcode file is already internalized, so we can call
lto_codegen_set_should_internalize and skip running internalization again.
rdar://20227235
llvm-svn: 235211
Remove all the global bits to do with preserving use-list order by
moving the `cl::opt`s to the individual tools that want them. There's a
minor functionality change to `libLTO`, in that you can't send in
`-preserve-bc-uselistorder=false`, but making that bit settable (if it's
worth doing) should be through explicit LTO API.
As a drive-by fix, I removed some includes of `UseListOrder.h` that were
made unnecessary by recent commits.
llvm-svn: 234973
Change the callers of `WriteToBitcodeFile()` to pass `true` or
`shouldPreserveBitcodeUseListOrder()` explicitly. I left the callers
that want to send `false` alone.
I'll keep pushing the bit higher until hopefully I can delete the global
`cl::opt` entirely.
llvm-svn: 234957
formatted_raw_ostream is a wrapper over another stream to add column and line
number tracking.
It is used only for asm printing.
This patch moves the its creation down to where we know we are printing
assembly. This has the following advantages:
* Simpler lifetime management: std::unique_ptr
* We don't compute column and line number of object files :-)
llvm-svn: 234535
Revert "Add classof implementations to the raw_ostream classes."
Revert "Use the cast machinery to remove dummy uses of formatted_raw_ostream."
The underlying issue can be fixed without classof.
llvm-svn: 234495
The input to compileOptimized is already optimized and internalized, so remove
internalize pass from compileOptimized.
rdar://20227235
llvm-svn: 234446
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
llvm-svn: 232772
This change also introduces a link-time optimization level of 1. This
optimization level runs only the globaldce pass as well as cleanup passes for
passes that run at -O0, specifically simplifycfg which cleans up lowerbitsets.
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266951.html
llvm-svn: 232769
We only defer loading metadata inside ParseModule when ShouldLazyLoadMetadata
is true and we have not loaded any Metadata block yet.
This commit implements all-or-nothing loading of Metadata. If there is a
request to load any metadata block, we will load all deferred metadata blocks.
We make sure the deferred metadata blocks are loaded before we materialize any
function or a module.
The default value of the added parameter ShouldLazyLoadMetadata for
getLazyBitcodeModule is false, so the default behavior stays the same.
We only set the parameter to true when creating LTOModule in local contexts.
These can only really be used for parsing symbols, so it's unnecessary to ever
load the metadata blocks.
If we are going to enable lazy-loading of Metadata for other usages of
getLazyBitcodeModule, where deferred metadata blocks need to be loaded, we can
expose BitcodeReader::materializeMetadata to Module, similar to
Module::materialize.
rdar://19804575
llvm-svn: 232198
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
When debugging LTO issues with ld64, we use -save-temps to save the merged
optimized bitcode file, then invoke ld64 again on the single bitcode file to
speed up debugging code generation passes and ld64 stuff after code generation.
llvm linking a single bitcode file via lto_codegen_add_module will generate a
different bitcode file from the single input. With the newly-added
lto_codegen_set_module, we can make sure the destination module is the same as
the input.
lto_codegen_set_module will transfer the ownship of the module to code
generator.
rdar://19024554
llvm-svn: 230290
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
llvm-svn: 229094
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.
Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman
llvm-svn: 228798
lto_codegen_compile_optimized. Also add lto_api_version.
Before this commit, we can only dump the optimized bitcode after running
lto_codegen_compile, but it includes some impacts of running codegen passes,
one example is StackProtector pass. We will get assertion failure when running
llc on the optimized bitcode, because StackProtector is effectively run twice.
After splitting lto_codegen_compile, the linker can choose to dump the bitcode
before running lto_codegen_compile_optimized.
lto_api_version is added so ld64 can check for runtime-availability of the new
API.
rdar://19565500
llvm-svn: 228000
terms of the new pass manager's TargetIRAnalysis.
Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.
The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.
One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.
llvm-svn: 227731
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.
This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.
I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.
With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.
llvm-svn: 227685
analyses back into the LTO code generator.
The pass manager builder (and the transforms library in general)
shouldn't be referencing the target machine at all.
This makes the LTO population work like the others -- the data layout
and target transform info need to be pre-populated.
llvm-svn: 227576
accumulateAndSortLibcalls in LTOCodeGenerator.cpp collects names of runtime
library functions which are used to identify user-defined functions that should
be protected. Previously, this function would only scan the TargetLowering
object belonging to the "main" subtarget for the library function names. This
commit changes it to scan all per-function subtargets.
Differential Revision: http://reviews.llvm.org/D7275
llvm-svn: 227533
derived classes.
Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.
llvm-svn: 227113
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
This creates a small internal pass which runs the InstCombiner over
a function. This is the hard part of porting InstCombine to the new pass
manager, as at this point none of the code in InstCombine has access to
a Pass object any longer.
The resulting interface for the InstCombiner is pretty terrible. I'm not
planning on leaving it that way. The key thing missing is that we need
to separate the worklist from the combiner a touch more. Once that's
done, it should be possible for *any* part of LLVM to just create
a worklist with instructions, populate it, and then combine it until
empty. The pass will just be the (obvious and important) special case of
doing that for an entire function body.
For now, this is the first increment of factoring to make all of this
work.
llvm-svn: 226618
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
The bitcode reading interface used std::error_code to report an error to the
callers and it is the callers job to print diagnostics.
This is not ideal for error handling or diagnostic reporting:
* For error handling, all that the callers care about is 3 possibilities:
* It worked
* The bitcode file is corrupted/invalid.
* The file is not bitcode at all.
* For diagnostic, it is user friendly to include far more information
about the invalid case so the user can find out what is wrong with the
bitcode file. This comes up, for example, when a developer introduces a
bug while extending the format.
The compromise we had was to have a lot of error codes.
With this patch we use the DiagnosticHandler to communicate with the
human and std::error_code to communicate with the caller.
This allows us to have far fewer error codes and adds the infrastructure to
print better diagnostics. This is so because the diagnostics are printed when
he issue is found. The code that detected the problem in alive in the stack and
can pass down as much context as needed. As an example the patch updates
test/Bitcode/invalid.ll.
Using a DiagnosticHandler also moves the fatal/non-fatal error decision to the
caller. A simple one like llvm-dis can just use fatal errors. The gold plugin
needs a bit more complex treatment because of being passed non-bitcode files. An
hypothetical interactive tool would make all bitcode errors non-fatal.
llvm-svn: 225562
Start lazy-loading `LTOModule`s that own their contexts. These can only
really be used for parsing symbols, so its unnecessary to ever
materialize their functions.
I looked into using `IRObjectFile::create()` and optionally calling
`materializAllPermanently()` afterwards, but this turned out to be
awkward.
- The default target triple and data layout logic needs to happen
*before* the call to `IRObjectFile::IRObjectFile()`, but after
`Module` was created.
- I tried passing a lambda in to do the module initialization, but
this seemed to require threading the error message from
`TargetRegistry::lookupTarget()` through `std::error_code`.
- I also looked at setting `errMsg` directly from within the lambda,
but this didn't look any better.
(I guess there's a reason we weren't already using that function.)
llvm-svn: 224466
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532. Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other
sub-projects, I apologize in advance :(. Help me compile it on Darwin
I'll try to fix it. FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes:
`MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from
the `Value` class hierarchy. It is typeless -- i.e., instances do
*not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph
construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for
`replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the
result of `MDNode::getTemporary()` -- it maintains a side map of its
uses and can RAUW itself. Once the forward declarations are fully
resolved RAUW support is dropped on the ground. This means that
uniquing collisions on changing operands cause nodes to become
"distinct". (This already happened fairly commonly, whenever an
operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles,
you need to call `MDNode::resolveCycles()` on each node (or on a
top-level node that somehow references all of the nodes). Also,
don't do that. Metadata cycles (and the RAUW machinery needed to
construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called
`ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known
to be, e.g., `ConstantInt`, takes three steps: first, cast from
`Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
metadata schema owners transition away from using `Constant`s when
the type isn't important (and they don't care about referring to
`GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst`
namespace that matches semantics with the old code, in order to
avoid adding the error-prone three-step equivalent to every call
site. If your old code was:
MDNode *N = foo();
bar(isa <ConstantInt>(N->getOperand(0)));
baz(cast <ConstantInt>(N->getOperand(1)));
bak(cast_or_null <ConstantInt>(N->getOperand(2)));
bat(dyn_cast <ConstantInt>(N->getOperand(3)));
bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo();
bar(mdconst::hasa <ConstantInt>(N->getOperand(0)));
baz(mdconst::extract <ConstantInt>(N->getOperand(1)));
bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2)));
bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3)));
bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo();
bar(isa <MDInt>(N->getOperand(0)));
baz(cast <MDInt>(N->getOperand(1)));
bak(cast_or_null <MDInt>(N->getOperand(2)));
bat(dyn_cast <MDInt>(N->getOperand(3)));
bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to
metadata through a bridge called `MetadataAsValue`. This is a
subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a
`LocalAsMetadata`, which is a bridged form of non-`Constant` values
like `Argument` and `Instruction`. It can also refer to any other
`Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)
llvm-svn: 223802
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)
Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.
llvm-svn: 222319
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.
We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.
llvm-svn: 220652
r206400 and r209442 added remarks that are disabled by default.
However, if a diagnostic handler is registered, the remarks are sent
unfiltered to the handler. This is the right behaviour for clang, since
it has its own filters.
However, the diagnostic handler exposed in the LTO API receives only the
severity and message. It doesn't have the information to filter by pass
name. For LTO, disabled remarks should be filtered by the producer.
I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool`
argument indicating whether to respect the built-in filters. This
defaults to `false`, so other consumers don't have a behaviour change,
but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`.
To make this behaviour testable, I added a `-use-diagnostic-handler`
command-line option to `llvm-lto`.
This fixes PR21108.
llvm-svn: 218784
This format is simply a regular object file with the bitcode stored in a
section named ".llvmbc", plus any number of other (non-allocated) sections.
One immediate use case for this is to accommodate compilation processes
which expect the object file to contain metadata in non-allocated sections,
such as the ".go_export" section used by some Go compilers [1], although I
imagine that in the future we could consider compiling parts of the module
(such as large non-inlinable functions) directly into the object file to
improve LTO efficiency.
[1] http://golang.org/doc/install/gccgo#Imports
Differential Revision: http://reviews.llvm.org/D4371
llvm-svn: 218078
With this a DataLayoutPass can be reused for multiple modules.
Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.
Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.
llvm-svn: 217548