2004-04-02 13:06:57 +08:00
|
|
|
//===- opt.cpp - The LLVM Modular Optimizer -------------------------------===//
|
2005-04-22 08:00:37 +08:00
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2005-04-22 08:00:37 +08:00
|
|
|
//
|
2003-10-21 01:47:21 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
2001-06-07 04:29:01 +08:00
|
|
|
//
|
|
|
|
// Optimizations may be specified an arbitrary number of times on the command
|
2006-08-18 14:34:30 +08:00
|
|
|
// line, They are run in the order specified.
|
2001-06-07 04:29:01 +08:00
|
|
|
//
|
2001-10-18 14:05:15 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
2001-06-07 04:29:01 +08:00
|
|
|
|
2014-02-13 00:48:02 +08:00
|
|
|
#include "BreakpointPrinter.h"
|
2018-07-24 08:41:28 +08:00
|
|
|
#include "Debugify.h"
|
[PM] Add (very skeletal) support to opt for running the new pass
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
2014-01-11 16:16:35 +08:00
|
|
|
#include "NewPMDriver.h"
|
2014-02-11 07:34:23 +08:00
|
|
|
#include "PassPrinters.h"
|
2012-12-04 18:44:52 +08:00
|
|
|
#include "llvm/ADT/Triple.h"
|
|
|
|
#include "llvm/Analysis/CallGraph.h"
|
2013-01-07 23:26:48 +08:00
|
|
|
#include "llvm/Analysis/CallGraphSCCPass.h"
|
2012-12-04 18:44:52 +08:00
|
|
|
#include "llvm/Analysis/LoopPass.h"
|
|
|
|
#include "llvm/Analysis/RegionPass.h"
|
2015-01-15 10:16:27 +08:00
|
|
|
#include "llvm/Analysis/TargetLibraryInfo.h"
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
2014-01-13 15:38:24 +08:00
|
|
|
#include "llvm/Bitcode/BitcodeWriterPass.h"
|
2018-04-12 02:49:37 +08:00
|
|
|
#include "llvm/CodeGen/CommandFlags.inc"
|
2017-05-19 01:21:13 +08:00
|
|
|
#include "llvm/CodeGen/TargetPassConfig.h"
|
2018-04-30 22:59:11 +08:00
|
|
|
#include "llvm/Config/llvm-config.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/DataLayout.h"
|
2015-03-28 06:04:28 +08:00
|
|
|
#include "llvm/IR/DebugInfo.h"
|
2014-01-12 19:10:32 +08:00
|
|
|
#include "llvm/IR/IRPrintingPasses.h"
|
[PM] Add (very skeletal) support to opt for running the new pass
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
2014-01-11 16:16:35 +08:00
|
|
|
#include "llvm/IR/LLVMContext.h"
|
2015-12-05 05:56:46 +08:00
|
|
|
#include "llvm/IR/LegacyPassManager.h"
|
2014-03-04 20:32:42 +08:00
|
|
|
#include "llvm/IR/LegacyPassNameParser.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/Module.h"
|
2019-03-06 23:20:13 +08:00
|
|
|
#include "llvm/IR/RemarkStreamer.h"
|
2014-01-13 17:26:24 +08:00
|
|
|
#include "llvm/IR/Verifier.h"
|
2013-03-26 10:25:37 +08:00
|
|
|
#include "llvm/IRReader/IRReader.h"
|
2014-03-04 18:07:28 +08:00
|
|
|
#include "llvm/InitializePasses.h"
|
2013-01-11 05:56:40 +08:00
|
|
|
#include "llvm/LinkAllIR.h"
|
2013-01-19 16:03:47 +08:00
|
|
|
#include "llvm/LinkAllPasses.h"
|
2012-12-04 18:44:52 +08:00
|
|
|
#include "llvm/MC/SubtargetFeature.h"
|
2010-01-05 09:30:32 +08:00
|
|
|
#include "llvm/Support/Debug.h"
|
2014-04-30 07:26:49 +08:00
|
|
|
#include "llvm/Support/FileSystem.h"
|
2015-04-01 13:32:04 +08:00
|
|
|
#include "llvm/Support/Host.h"
|
2018-04-14 02:26:06 +08:00
|
|
|
#include "llvm/Support/InitLLVM.h"
|
2004-09-02 06:55:40 +08:00
|
|
|
#include "llvm/Support/PluginLoader.h"
|
2013-03-26 10:25:37 +08:00
|
|
|
#include "llvm/Support/SourceMgr.h"
|
2004-09-02 06:55:40 +08:00
|
|
|
#include "llvm/Support/SystemUtils.h"
|
2012-10-19 07:22:48 +08:00
|
|
|
#include "llvm/Support/TargetRegistry.h"
|
2012-10-25 01:23:50 +08:00
|
|
|
#include "llvm/Support/TargetSelect.h"
|
2012-12-04 18:44:52 +08:00
|
|
|
#include "llvm/Support/ToolOutputFile.h"
|
Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
2016-09-28 04:55:07 +08:00
|
|
|
#include "llvm/Support/YAMLTraits.h"
|
2012-12-04 18:44:52 +08:00
|
|
|
#include "llvm/Target/TargetMachine.h"
|
2016-07-29 05:04:31 +08:00
|
|
|
#include "llvm/Transforms/Coroutines.h"
|
[PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
2016-08-17 10:56:20 +08:00
|
|
|
#include "llvm/Transforms/IPO/AlwaysInliner.h"
|
2011-08-03 05:50:24 +08:00
|
|
|
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
|
2015-12-05 05:56:46 +08:00
|
|
|
#include "llvm/Transforms/Utils/Cloning.h"
|
2002-07-24 02:12:22 +08:00
|
|
|
#include <algorithm>
|
2012-12-04 18:44:52 +08:00
|
|
|
#include <memory>
|
2003-11-12 06:41:34 +08:00
|
|
|
using namespace llvm;
|
2014-01-13 11:08:40 +08:00
|
|
|
using namespace opt_tool;
|
2002-04-14 02:32:47 +08:00
|
|
|
|
2002-07-24 02:12:22 +08:00
|
|
|
// The OptimizationList is automatically populated with registered Passes by the
|
|
|
|
// PassNameParser.
|
2002-01-31 08:47:12 +08:00
|
|
|
//
|
2006-08-28 06:07:01 +08:00
|
|
|
static cl::list<const PassInfo*, bool, PassNameParser>
|
|
|
|
PassList(cl::desc("Optimizations available:"));
|
2001-06-07 04:29:01 +08:00
|
|
|
|
[PM] Add (very skeletal) support to opt for running the new pass
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
2014-01-11 16:16:35 +08:00
|
|
|
// This flag specifies a textual description of the optimization pass pipeline
|
|
|
|
// to run over the module. This flag switches opt to use the new pass manager
|
|
|
|
// infrastructure, completely disabling all of the flags specific to the old
|
|
|
|
// pass management.
|
|
|
|
static cl::opt<std::string> PassPipeline(
|
|
|
|
"passes",
|
|
|
|
cl::desc("A textual description of the pass pipeline for optimizing"),
|
|
|
|
cl::Hidden);
|
|
|
|
|
2002-07-24 02:12:22 +08:00
|
|
|
// Other command line options...
|
2002-01-31 08:47:12 +08:00
|
|
|
//
|
2003-05-23 04:13:16 +08:00
|
|
|
static cl::opt<std::string>
|
2009-08-22 07:29:40 +08:00
|
|
|
InputFilename(cl::Positional, cl::desc("<input bitcode file>"),
|
2006-08-18 14:34:30 +08:00
|
|
|
cl::init("-"), cl::value_desc("filename"));
|
2002-07-22 10:10:13 +08:00
|
|
|
|
2003-05-23 04:13:16 +08:00
|
|
|
static cl::opt<std::string>
|
2002-07-22 10:10:13 +08:00
|
|
|
OutputFilename("o", cl::desc("Override output filename"),
|
2010-08-19 01:40:10 +08:00
|
|
|
cl::value_desc("filename"));
|
2002-07-22 10:10:13 +08:00
|
|
|
|
|
|
|
static cl::opt<bool>
|
2009-08-25 23:34:52 +08:00
|
|
|
Force("f", cl::desc("Enable binary output on terminals"));
|
2002-07-22 10:10:13 +08:00
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
PrintEachXForm("p", cl::desc("Print module after each transformation"));
|
|
|
|
|
2003-02-13 02:43:33 +08:00
|
|
|
static cl::opt<bool>
|
2003-02-27 04:00:41 +08:00
|
|
|
NoOutput("disable-output",
|
2007-07-06 01:07:56 +08:00
|
|
|
cl::desc("Do not write result bitcode file"), cl::Hidden);
|
2003-02-13 02:43:33 +08:00
|
|
|
|
2009-09-05 19:34:53 +08:00
|
|
|
static cl::opt<bool>
|
2009-10-15 04:01:39 +08:00
|
|
|
OutputAssembly("S", cl::desc("Write output as LLVM assembly"));
|
2009-09-05 19:34:53 +08:00
|
|
|
|
2016-12-16 08:26:30 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
OutputThinLTOBC("thinlto-bc",
|
|
|
|
cl::desc("Write output as ThinLTO-ready bitcode"));
|
|
|
|
|
[LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
2019-01-12 02:31:57 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
SplitLTOUnit("thinlto-split-lto-unit",
|
|
|
|
cl::desc("Enable splitting of a ThinLTO LTOUnit"));
|
|
|
|
|
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
2017-03-24 03:47:39 +08:00
|
|
|
static cl::opt<std::string> ThinLinkBitcodeFile(
|
|
|
|
"thin-link-bitcode-file", cl::value_desc("filename"),
|
|
|
|
cl::desc(
|
|
|
|
"A file in which to write minimized bitcode for the thin link only"));
|
|
|
|
|
2003-02-13 02:45:08 +08:00
|
|
|
static cl::opt<bool>
|
2016-03-10 14:58:53 +08:00
|
|
|
NoVerify("disable-verify", cl::desc("Do not run the verifier"), cl::Hidden);
|
2003-02-13 02:45:08 +08:00
|
|
|
|
2007-02-02 22:46:29 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
VerifyEach("verify-each", cl::desc("Verify after each transform"));
|
|
|
|
|
2016-04-19 23:48:30 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
DisableDITypeMap("disable-debug-info-type-map",
|
|
|
|
cl::desc("Don't use a uniquing type map for debug info"));
|
|
|
|
|
2007-02-02 22:46:29 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
StripDebug("strip-debug",
|
|
|
|
cl::desc("Strip debugger symbol info from translation unit"));
|
|
|
|
|
|
|
|
static cl::opt<bool>
|
2018-06-05 08:56:08 +08:00
|
|
|
StripNamedMetadata("strip-named-metadata",
|
|
|
|
cl::desc("Strip module-level named metadata"));
|
|
|
|
|
|
|
|
static cl::opt<bool> DisableInline("disable-inlining",
|
|
|
|
cl::desc("Do not run the inliner pass"));
|
2007-02-02 22:46:29 +08:00
|
|
|
|
2009-08-22 07:29:40 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
DisableOptimizations("disable-opt",
|
2007-02-02 22:46:29 +08:00
|
|
|
cl::desc("Do not run any optimization passes"));
|
|
|
|
|
2009-07-18 02:09:39 +08:00
|
|
|
static cl::opt<bool>
|
2009-08-22 07:29:40 +08:00
|
|
|
StandardLinkOpts("std-link-opts",
|
2009-07-18 02:09:39 +08:00
|
|
|
cl::desc("Include the standard link time optimizations"));
|
|
|
|
|
2016-08-06 00:27:33 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
OptLevelO0("O0",
|
|
|
|
cl::desc("Optimization level 0. Similar to clang -O0"));
|
|
|
|
|
2008-09-17 06:25:14 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
OptLevelO1("O1",
|
2012-05-16 16:32:49 +08:00
|
|
|
cl::desc("Optimization level 1. Similar to clang -O1"));
|
2008-09-17 06:25:14 +08:00
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
OptLevelO2("O2",
|
2012-05-16 16:32:49 +08:00
|
|
|
cl::desc("Optimization level 2. Similar to clang -O2"));
|
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
OptLevelOs("Os",
|
|
|
|
cl::desc("Like -O2 with extra optimizations for size. Similar to clang -Os"));
|
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
OptLevelOz("Oz",
|
|
|
|
cl::desc("Like -Os but reduces code size further. Similar to clang -Oz"));
|
2008-09-17 06:25:14 +08:00
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
OptLevelO3("O3",
|
2012-05-16 16:32:49 +08:00
|
|
|
cl::desc("Optimization level 3. Similar to clang -O3"));
|
2008-09-17 06:25:14 +08:00
|
|
|
|
2016-04-19 05:48:55 +08:00
|
|
|
static cl::opt<unsigned>
|
|
|
|
CodeGenOptLevel("codegen-opt-level",
|
|
|
|
cl::desc("Override optimization level for codegen hooks"));
|
|
|
|
|
2012-04-18 07:05:48 +08:00
|
|
|
static cl::opt<std::string>
|
|
|
|
TargetTriple("mtriple", cl::desc("Override target triple for module"));
|
|
|
|
|
2013-08-29 02:33:10 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
DisableLoopUnrolling("disable-loop-unrolling",
|
|
|
|
cl::desc("Disable loop unrolling in all relevant passes"),
|
|
|
|
cl::init(false));
|
2013-12-04 00:33:06 +08:00
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
DisableSLPVectorization("disable-slp-vectorization",
|
|
|
|
cl::desc("Disable the slp vectorization pass"),
|
|
|
|
cl::init(false));
|
|
|
|
|
2016-04-13 05:35:18 +08:00
|
|
|
static cl::opt<bool> EmitSummaryIndex("module-summary",
|
|
|
|
cl::desc("Emit module summary index"),
|
|
|
|
cl::init(false));
|
|
|
|
|
|
|
|
static cl::opt<bool> EmitModuleHash("module-hash", cl::desc("Emit module hash"),
|
|
|
|
cl::init(false));
|
2013-08-29 02:33:10 +08:00
|
|
|
|
2008-09-17 06:25:14 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
DisableSimplifyLibCalls("disable-simplify-libcalls",
|
2008-09-18 00:01:39 +08:00
|
|
|
cl::desc("Disable simplify-libcalls"));
|
2008-09-17 06:25:14 +08:00
|
|
|
|
2002-07-22 10:10:13 +08:00
|
|
|
static cl::opt<bool>
|
2004-05-28 04:32:10 +08:00
|
|
|
Quiet("q", cl::desc("Obsolete option"), cl::Hidden);
|
2002-07-22 10:10:13 +08:00
|
|
|
|
2004-05-28 00:28:54 +08:00
|
|
|
static cl::alias
|
|
|
|
QuietA("quiet", cl::desc("Alias for -q"), cl::aliasopt(Quiet));
|
|
|
|
|
2006-08-18 14:34:30 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
AnalyzeOnly("analyze", cl::desc("Only perform analysis, no optimization"));
|
|
|
|
|
2018-01-24 04:43:50 +08:00
|
|
|
static cl::opt<bool> EnableDebugify(
|
|
|
|
"enable-debugify",
|
|
|
|
cl::desc(
|
|
|
|
"Start the pipeline with debugify and end it with check-debugify"));
|
|
|
|
|
2018-05-15 08:29:27 +08:00
|
|
|
static cl::opt<bool> DebugifyEach(
|
|
|
|
"debugify-each",
|
|
|
|
cl::desc(
|
|
|
|
"Start each pass with debugify and end it with check-debugify"));
|
|
|
|
|
2018-07-24 08:41:29 +08:00
|
|
|
static cl::opt<std::string>
|
|
|
|
DebugifyExport("debugify-export",
|
|
|
|
cl::desc("Export per-pass debugify statistics to this file"),
|
|
|
|
cl::value_desc("filename"), cl::init(""));
|
|
|
|
|
2010-12-07 08:33:43 +08:00
|
|
|
static cl::opt<bool>
|
2011-04-06 02:41:31 +08:00
|
|
|
PrintBreakpoints("print-breakpoints-for-testing",
|
2010-12-07 08:33:43 +08:00
|
|
|
cl::desc("Print select breakpoints location for testing"));
|
|
|
|
|
2017-02-18 01:36:52 +08:00
|
|
|
static cl::opt<std::string> ClDataLayout("data-layout",
|
|
|
|
cl::desc("data layout string to use"),
|
|
|
|
cl::value_desc("layout-string"),
|
|
|
|
cl::init(""));
|
2009-10-22 08:44:10 +08:00
|
|
|
|
2015-04-15 11:14:06 +08:00
|
|
|
static cl::opt<bool> PreserveBitcodeUseListOrder(
|
|
|
|
"preserve-bc-uselistorder",
|
|
|
|
cl::desc("Preserve use-list order when writing LLVM bitcode."),
|
|
|
|
cl::init(true), cl::Hidden);
|
2006-08-18 14:34:30 +08:00
|
|
|
|
2015-04-15 11:14:06 +08:00
|
|
|
static cl::opt<bool> PreserveAssemblyUseListOrder(
|
|
|
|
"preserve-ll-uselistorder",
|
|
|
|
cl::desc("Preserve use-list order when writing LLVM assembly."),
|
|
|
|
cl::init(false), cl::Hidden);
|
2010-12-07 08:33:43 +08:00
|
|
|
|
2015-12-05 05:56:46 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
RunTwice("run-twice",
|
|
|
|
cl::desc("Run all passes twice, re-using the same pass manager."),
|
|
|
|
cl::init(false), cl::Hidden);
|
|
|
|
|
Add a flag to the LLVMContext to disable name for Value other than GlobalValue
Summary:
This is intended to be a performance flag, on the same level as clang
cc1 option "--disable-free". LLVM will never initialize it by default,
it will be up to the client creating the LLVMContext to request this
behavior. Clang will do it by default in Release build (just like
--disable-free).
"opt" and "llc" can opt-in using -disable-named-value command line
option.
When performing LTO on llvm-tblgen, the initial merging of IR peaks
at 92MB without this patch, and 86MB after this patch,setNameImpl()
drops from 6.5MB to 0.5MB.
The total link time goes from ~29.5s to ~27.8s.
Compared to a compile-time flag (like the IRBuilder one), it performs
very close. I profiled on SROA and obtain these results:
420ms with IRBuilder that preserve name
372ms with IRBuilder that strip name
375ms with IRBuilder that preserve name, and a runtime flag to strip
Reviewers: chandlerc, dexonsmith, bogner
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263086
2016-03-10 09:28:54 +08:00
|
|
|
static cl::opt<bool> DiscardValueNames(
|
|
|
|
"discard-value-names",
|
|
|
|
cl::desc("Discard names from Value (other than GlobalValue)."),
|
|
|
|
cl::init(false), cl::Hidden);
|
|
|
|
|
2016-07-29 05:04:31 +08:00
|
|
|
static cl::opt<bool> Coroutines(
|
|
|
|
"enable-coroutines",
|
|
|
|
cl::desc("Enable coroutine passes."),
|
|
|
|
cl::init(false), cl::Hidden);
|
|
|
|
|
2016-07-16 01:23:20 +08:00
|
|
|
static cl::opt<bool> PassRemarksWithHotness(
|
|
|
|
"pass-remarks-with-hotness",
|
|
|
|
cl::desc("With PGO, include profile count in optimization remarks"),
|
|
|
|
cl::Hidden);
|
|
|
|
|
2017-07-01 07:14:53 +08:00
|
|
|
static cl::opt<unsigned> PassRemarksHotnessThreshold(
|
|
|
|
"pass-remarks-hotness-threshold",
|
|
|
|
cl::desc("Minimum profile count required for an optimization remark to be output"),
|
|
|
|
cl::Hidden);
|
|
|
|
|
Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
2016-09-28 04:55:07 +08:00
|
|
|
static cl::opt<std::string>
|
|
|
|
RemarksFilename("pass-remarks-output",
|
|
|
|
cl::desc("YAML output filename for pass remarks"),
|
|
|
|
cl::value_desc("filename"));
|
|
|
|
|
2019-03-13 05:22:27 +08:00
|
|
|
static cl::opt<std::string>
|
|
|
|
RemarksPasses("pass-remarks-filter",
|
|
|
|
cl::desc("Only record optimization remarks from passes whose "
|
|
|
|
"names match the given regular expression"),
|
|
|
|
cl::value_desc("regex"));
|
|
|
|
|
2019-01-17 07:19:02 +08:00
|
|
|
cl::opt<PGOKind>
|
|
|
|
PGOKindFlag("pgo-kind", cl::init(NoPGO), cl::Hidden,
|
|
|
|
cl::desc("The kind of profile guided optimization"),
|
|
|
|
cl::values(clEnumValN(NoPGO, "nopgo", "Do not use PGO."),
|
|
|
|
clEnumValN(InstrGen, "pgo-instr-gen-pipeline",
|
|
|
|
"Instrument the IR to generate profile."),
|
|
|
|
clEnumValN(InstrUse, "pgo-instr-use-pipeline",
|
|
|
|
"Use instrumented profile to guide PGO."),
|
|
|
|
clEnumValN(SampleUse, "pgo-sample-use-pipeline",
|
|
|
|
"Use sampled profile to guide PGO.")));
|
|
|
|
cl::opt<std::string> ProfileFile("profile-file",
|
|
|
|
cl::desc("Path to the profile."), cl::Hidden);
|
|
|
|
|
2019-03-05 04:21:27 +08:00
|
|
|
cl::opt<CSPGOKind> CSPGOKindFlag(
|
|
|
|
"cspgo-kind", cl::init(NoCSPGO), cl::Hidden,
|
|
|
|
cl::desc("The kind of context sensitive profile guided optimization"),
|
|
|
|
cl::values(
|
|
|
|
clEnumValN(NoCSPGO, "nocspgo", "Do not use CSPGO."),
|
|
|
|
clEnumValN(
|
|
|
|
CSInstrGen, "cspgo-instr-gen-pipeline",
|
|
|
|
"Instrument (context sensitive) the IR to generate profile."),
|
|
|
|
clEnumValN(
|
|
|
|
CSInstrUse, "cspgo-instr-use-pipeline",
|
|
|
|
"Use instrumented (context sensitive) profile to guide PGO.")));
|
|
|
|
cl::opt<std::string> CSProfileGenFile(
|
|
|
|
"cs-profilegen-file",
|
|
|
|
cl::desc("Path to the instrumented context sensitive profile."),
|
|
|
|
cl::Hidden);
|
|
|
|
|
2018-05-15 08:29:27 +08:00
|
|
|
class OptCustomPassManager : public legacy::PassManager {
|
2018-07-24 08:41:29 +08:00
|
|
|
DebugifyStatsMap DIStatsMap;
|
|
|
|
|
2018-05-15 08:29:27 +08:00
|
|
|
public:
|
|
|
|
using super = legacy::PassManager;
|
|
|
|
|
|
|
|
void add(Pass *P) override {
|
2018-07-24 08:41:29 +08:00
|
|
|
// Wrap each pass with (-check)-debugify passes if requested, making
|
|
|
|
// exceptions for passes which shouldn't see -debugify instrumentation.
|
2018-06-04 08:11:49 +08:00
|
|
|
bool WrapWithDebugify = DebugifyEach && !P->getAsImmutablePass() &&
|
|
|
|
!isIRPrintingPass(P) && !isBitcodeWriterPass(P);
|
2018-05-15 08:29:27 +08:00
|
|
|
if (!WrapWithDebugify) {
|
|
|
|
super::add(P);
|
|
|
|
return;
|
|
|
|
}
|
2018-07-24 08:41:29 +08:00
|
|
|
|
|
|
|
// Apply -debugify/-check-debugify before/after each pass and collect
|
|
|
|
// debug info loss statistics.
|
2018-05-15 08:29:27 +08:00
|
|
|
PassKind Kind = P->getPassKind();
|
2018-07-24 08:41:29 +08:00
|
|
|
StringRef Name = P->getPassName();
|
|
|
|
|
2018-05-15 08:29:27 +08:00
|
|
|
// TODO: Implement Debugify for BasicBlockPass, LoopPass.
|
|
|
|
switch (Kind) {
|
|
|
|
case PT_Function:
|
|
|
|
super::add(createDebugifyFunctionPass());
|
|
|
|
super::add(P);
|
2018-07-24 08:41:29 +08:00
|
|
|
super::add(createCheckDebugifyFunctionPass(true, Name, &DIStatsMap));
|
2018-05-15 08:29:27 +08:00
|
|
|
break;
|
|
|
|
case PT_Module:
|
|
|
|
super::add(createDebugifyModulePass());
|
|
|
|
super::add(P);
|
2018-07-24 08:41:29 +08:00
|
|
|
super::add(createCheckDebugifyModulePass(true, Name, &DIStatsMap));
|
2018-05-15 08:29:27 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
super::add(P);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2018-07-24 08:41:29 +08:00
|
|
|
|
|
|
|
const DebugifyStatsMap &getDebugifyStatsMap() const { return DIStatsMap; }
|
2018-05-15 08:29:27 +08:00
|
|
|
};
|
|
|
|
|
2015-02-13 18:01:29 +08:00
|
|
|
static inline void addPass(legacy::PassManagerBase &PM, Pass *P) {
|
2007-02-02 22:46:29 +08:00
|
|
|
// Add the pass to the pass manager...
|
|
|
|
PM.add(P);
|
|
|
|
|
|
|
|
// If we are verifying all of the intermediate steps, add the verifier...
|
2015-03-20 06:24:17 +08:00
|
|
|
if (VerifyEach)
|
2014-04-16 00:27:38 +08:00
|
|
|
PM.add(createVerifierPass());
|
2007-02-02 22:46:29 +08:00
|
|
|
}
|
|
|
|
|
2014-08-22 03:22:24 +08:00
|
|
|
/// This routine adds optimization passes based on selected optimization level,
|
|
|
|
/// OptLevel.
|
2008-09-17 06:25:14 +08:00
|
|
|
///
|
|
|
|
/// OptLevel - Optimization Level
|
2015-02-13 18:01:29 +08:00
|
|
|
static void AddOptimizationPasses(legacy::PassManagerBase &MPM,
|
|
|
|
legacy::FunctionPassManager &FPM,
|
2016-04-28 03:08:24 +08:00
|
|
|
TargetMachine *TM, unsigned OptLevel,
|
|
|
|
unsigned SizeLevel) {
|
2016-03-10 11:40:14 +08:00
|
|
|
if (!NoVerify || VerifyEach)
|
|
|
|
FPM.add(createVerifierPass()); // Verify that input is correct
|
2011-12-08 01:14:20 +08:00
|
|
|
|
2011-05-22 08:21:33 +08:00
|
|
|
PassManagerBuilder Builder;
|
|
|
|
Builder.OptLevel = OptLevel;
|
2012-05-16 16:32:49 +08:00
|
|
|
Builder.SizeLevel = SizeLevel;
|
2009-06-04 02:22:15 +08:00
|
|
|
|
2010-01-19 06:38:31 +08:00
|
|
|
if (DisableInline) {
|
|
|
|
// No inlining pass
|
2011-06-07 06:13:27 +08:00
|
|
|
} else if (OptLevel > 1) {
|
2017-03-22 03:55:36 +08:00
|
|
|
Builder.Inliner = createFunctionInliningPass(OptLevel, SizeLevel, false);
|
2010-01-19 06:38:31 +08:00
|
|
|
} else {
|
[PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
2016-08-17 10:56:20 +08:00
|
|
|
Builder.Inliner = createAlwaysInlinerLegacyPass();
|
2010-01-19 06:38:31 +08:00
|
|
|
}
|
2013-08-29 02:33:10 +08:00
|
|
|
Builder.DisableUnrollLoops = (DisableLoopUnrolling.getNumOccurrences() > 0) ?
|
|
|
|
DisableLoopUnrolling : OptLevel == 0;
|
2013-10-09 16:55:27 +08:00
|
|
|
|
2019-04-25 12:49:48 +08:00
|
|
|
// Check if vectorization is explicitly disabled via -vectorize-loops=false.
|
|
|
|
// The flag enables vectorization in the LoopVectorize pass, it is on by
|
|
|
|
// default, and if it was disabled, leave it disabled here.
|
|
|
|
// Another flag that exists: -loop-vectorize, controls adding the pass to the
|
|
|
|
// pass manager. If set, the pass is added, and there is no additional check
|
|
|
|
// here for it.
|
|
|
|
if (Builder.LoopVectorize)
|
2013-12-06 05:20:02 +08:00
|
|
|
Builder.LoopVectorize = OptLevel > 1 && SizeLevel < 2;
|
|
|
|
|
|
|
|
// When #pragma vectorize is on for SLP, do the same as above
|
2013-12-04 00:33:06 +08:00
|
|
|
Builder.SLPVectorize =
|
|
|
|
DisableSLPVectorization ? false : OptLevel > 1 && SizeLevel < 2;
|
2013-08-29 02:33:10 +08:00
|
|
|
|
2016-04-28 03:08:24 +08:00
|
|
|
if (TM)
|
2017-01-27 00:49:08 +08:00
|
|
|
TM->adjustPassManager(Builder);
|
2016-04-28 03:08:24 +08:00
|
|
|
|
2016-07-29 05:04:31 +08:00
|
|
|
if (Coroutines)
|
|
|
|
addCoroutinePassesToExtensionPoints(Builder);
|
|
|
|
|
2019-01-17 07:19:02 +08:00
|
|
|
switch (PGOKindFlag) {
|
|
|
|
case InstrGen:
|
|
|
|
Builder.EnablePGOInstrGen = true;
|
|
|
|
Builder.PGOInstrGen = ProfileFile;
|
|
|
|
break;
|
|
|
|
case InstrUse:
|
|
|
|
Builder.PGOInstrUse = ProfileFile;
|
|
|
|
break;
|
|
|
|
case SampleUse:
|
|
|
|
Builder.PGOSampleUse = ProfileFile;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-03-05 04:21:27 +08:00
|
|
|
switch (CSPGOKindFlag) {
|
|
|
|
case CSInstrGen:
|
|
|
|
Builder.EnablePGOCSInstrGen = true;
|
|
|
|
break;
|
2019-03-05 05:00:28 +08:00
|
|
|
case CSInstrUse:
|
2019-03-05 04:21:27 +08:00
|
|
|
Builder.EnablePGOCSInstrUse = true;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2011-05-22 08:21:33 +08:00
|
|
|
Builder.populateFunctionPassManager(FPM);
|
|
|
|
Builder.populateModulePassManager(MPM);
|
2008-09-17 06:25:14 +08:00
|
|
|
}
|
|
|
|
|
2015-02-13 18:01:29 +08:00
|
|
|
static void AddStandardLinkPasses(legacy::PassManagerBase &PM) {
|
2011-05-22 08:21:33 +08:00
|
|
|
PassManagerBuilder Builder;
|
2014-08-22 04:03:44 +08:00
|
|
|
Builder.VerifyInput = true;
|
|
|
|
if (DisableOptimizations)
|
|
|
|
Builder.OptLevel = 0;
|
|
|
|
|
2014-08-21 21:35:30 +08:00
|
|
|
if (!DisableInline)
|
|
|
|
Builder.Inliner = createFunctionInliningPass();
|
|
|
|
Builder.populateLTOPassManager(PM);
|
2009-07-18 02:09:39 +08:00
|
|
|
}
|
|
|
|
|
2012-10-19 07:22:48 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// CodeGen-related helper functions.
|
|
|
|
//
|
|
|
|
|
2015-03-10 00:23:46 +08:00
|
|
|
static CodeGenOpt::Level GetCodeGenOptLevel() {
|
2016-04-19 05:48:55 +08:00
|
|
|
if (CodeGenOptLevel.getNumOccurrences())
|
|
|
|
return static_cast<CodeGenOpt::Level>(unsigned(CodeGenOptLevel));
|
2012-10-19 07:22:48 +08:00
|
|
|
if (OptLevelO1)
|
|
|
|
return CodeGenOpt::Less;
|
|
|
|
if (OptLevelO2)
|
|
|
|
return CodeGenOpt::Default;
|
|
|
|
if (OptLevelO3)
|
|
|
|
return CodeGenOpt::Aggressive;
|
|
|
|
return CodeGenOpt::None;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Returns the TargetMachine instance or zero if no triple is provided.
|
2015-05-07 07:49:24 +08:00
|
|
|
static TargetMachine* GetTargetMachine(Triple TheTriple, StringRef CPUStr,
|
2015-05-23 09:14:08 +08:00
|
|
|
StringRef FeaturesStr,
|
|
|
|
const TargetOptions &Options) {
|
2012-10-19 07:22:48 +08:00
|
|
|
std::string Error;
|
|
|
|
const Target *TheTarget = TargetRegistry::lookupTarget(MArch, TheTriple,
|
|
|
|
Error);
|
2013-01-01 16:00:32 +08:00
|
|
|
// Some modules don't specify a triple, and this is okay.
|
2013-04-15 15:31:37 +08:00
|
|
|
if (!TheTarget) {
|
2014-04-25 12:24:47 +08:00
|
|
|
return nullptr;
|
2013-04-15 15:31:37 +08:00
|
|
|
}
|
2012-10-19 07:22:48 +08:00
|
|
|
|
2016-05-19 06:04:49 +08:00
|
|
|
return TheTarget->createTargetMachine(TheTriple.getTriple(), CPUStr,
|
|
|
|
FeaturesStr, Options, getRelocModel(),
|
2017-08-03 10:16:21 +08:00
|
|
|
getCodeModel(), GetCodeGenOptLevel());
|
2012-10-19 07:22:48 +08:00
|
|
|
}
|
2001-06-07 04:29:01 +08:00
|
|
|
|
2014-03-14 12:04:14 +08:00
|
|
|
#ifdef LINK_POLLY_INTO_TOOLS
|
|
|
|
namespace polly {
|
|
|
|
void initializePollyPasses(llvm::PassRegistry &Registry);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2002-07-24 02:12:22 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// main for opt
|
|
|
|
//
|
2001-07-23 10:35:57 +08:00
|
|
|
int main(int argc, char **argv) {
|
2018-04-14 02:26:06 +08:00
|
|
|
InitLLVM X(argc, argv);
|
2010-09-01 22:20:41 +08:00
|
|
|
|
2010-01-05 09:30:32 +08:00
|
|
|
// Enable debug stream buffering.
|
|
|
|
EnableDebugBuffering = true;
|
|
|
|
|
2016-04-15 05:59:01 +08:00
|
|
|
LLVMContext Context;
|
2011-04-06 02:41:31 +08:00
|
|
|
|
2012-10-25 01:23:50 +08:00
|
|
|
InitializeAllTargets();
|
|
|
|
InitializeAllTargetMCs();
|
2014-06-14 00:12:08 +08:00
|
|
|
InitializeAllAsmPrinters();
|
2016-11-15 01:12:32 +08:00
|
|
|
InitializeAllAsmParsers();
|
2012-10-25 01:23:50 +08:00
|
|
|
|
2010-10-20 01:21:58 +08:00
|
|
|
// Initialize passes
|
|
|
|
PassRegistry &Registry = *PassRegistry::getPassRegistry();
|
|
|
|
initializeCore(Registry);
|
2016-07-29 05:04:31 +08:00
|
|
|
initializeCoroutines(Registry);
|
2010-10-20 01:21:58 +08:00
|
|
|
initializeScalarOpts(Registry);
|
2013-01-28 09:35:51 +08:00
|
|
|
initializeObjCARCOpts(Registry);
|
2012-02-01 11:51:43 +08:00
|
|
|
initializeVectorization(Registry);
|
2010-10-20 01:21:58 +08:00
|
|
|
initializeIPO(Registry);
|
|
|
|
initializeAnalysis(Registry);
|
|
|
|
initializeTransformUtils(Registry);
|
|
|
|
initializeInstCombine(Registry);
|
2018-04-24 08:05:21 +08:00
|
|
|
initializeAggressiveInstCombine(Registry);
|
2010-10-20 01:21:58 +08:00
|
|
|
initializeInstrumentation(Registry);
|
|
|
|
initializeTarget(Registry);
|
2014-02-22 08:07:45 +08:00
|
|
|
// For codegen passes, only passes that do IR to IR transformation are
|
2014-04-18 02:22:47 +08:00
|
|
|
// supported.
|
2017-11-03 20:12:27 +08:00
|
|
|
initializeExpandMemCmpPassPass(Registry);
|
2017-05-15 19:30:54 +08:00
|
|
|
initializeScalarizeMaskedMemIntrinPass(Registry);
|
2014-02-22 08:07:45 +08:00
|
|
|
initializeCodeGenPreparePass(Registry);
|
2014-08-22 05:50:01 +08:00
|
|
|
initializeAtomicExpandPass(Registry);
|
2016-07-26 04:52:00 +08:00
|
|
|
initializeRewriteSymbolsLegacyPassPass(Registry);
|
2015-01-29 08:41:44 +08:00
|
|
|
initializeWinEHPreparePass(Registry);
|
2015-02-19 07:17:41 +08:00
|
|
|
initializeDwarfEHPreparePass(Registry);
|
2017-05-10 08:39:22 +08:00
|
|
|
initializeSafeStackLegacyPassPass(Registry);
|
2015-07-10 05:48:40 +08:00
|
|
|
initializeSjLjEHPreparePass(Registry);
|
2016-06-25 04:13:42 +08:00
|
|
|
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
|
2016-05-19 12:38:56 +08:00
|
|
|
initializeGlobalMergePass(Registry);
|
Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
2018-01-23 06:05:25 +08:00
|
|
|
initializeIndirectBrExpandPassPass(Registry);
|
2018-11-19 22:26:10 +08:00
|
|
|
initializeInterleavedLoadCombinePass(Registry);
|
2016-05-20 04:08:32 +08:00
|
|
|
initializeInterleavedAccessPass(Registry);
|
2017-11-15 05:09:45 +08:00
|
|
|
initializeEntryExitInstrumenterPass(Registry);
|
|
|
|
initializePostInlineEntryExitInstrumenterPass(Registry);
|
2016-07-08 11:32:49 +08:00
|
|
|
initializeUnreachableBlockElimLegacyPassPass(Registry);
|
2017-05-10 17:42:49 +08:00
|
|
|
initializeExpandReductionsPass(Registry);
|
2018-06-01 06:02:34 +08:00
|
|
|
initializeWasmEHPreparePass(Registry);
|
2017-10-25 01:17:27 +08:00
|
|
|
initializeWriteBitcodePassPass(Registry);
|
2011-04-06 02:41:31 +08:00
|
|
|
|
2014-03-14 12:04:14 +08:00
|
|
|
#ifdef LINK_POLLY_INTO_TOOLS
|
|
|
|
polly::initializePollyPasses(Registry);
|
|
|
|
#endif
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
cl::ParseCommandLineOptions(argc, argv,
|
|
|
|
"llvm .bc -> .bc modular optimizer and analysis printer\n");
|
2004-12-30 13:36:08 +08:00
|
|
|
|
2010-12-03 04:35:16 +08:00
|
|
|
if (AnalyzeOnly && NoOutput) {
|
|
|
|
errs() << argv[0] << ": analyze mode conflicts with no-output mode.\n";
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
SMDiagnostic Err;
|
2004-12-30 13:36:08 +08:00
|
|
|
|
Add a flag to the LLVMContext to disable name for Value other than GlobalValue
Summary:
This is intended to be a performance flag, on the same level as clang
cc1 option "--disable-free". LLVM will never initialize it by default,
it will be up to the client creating the LLVMContext to request this
behavior. Clang will do it by default in Release build (just like
--disable-free).
"opt" and "llc" can opt-in using -disable-named-value command line
option.
When performing LTO on llvm-tblgen, the initial merging of IR peaks
at 92MB without this patch, and 86MB after this patch,setNameImpl()
drops from 6.5MB to 0.5MB.
The total link time goes from ~29.5s to ~27.8s.
Compared to a compile-time flag (like the IRBuilder one), it performs
very close. I profiled on SROA and obtain these results:
420ms with IRBuilder that preserve name
372ms with IRBuilder that strip name
375ms with IRBuilder that preserve name, and a runtime flag to strip
Reviewers: chandlerc, dexonsmith, bogner
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263086
2016-03-10 09:28:54 +08:00
|
|
|
Context.setDiscardValueNames(DiscardValueNames);
|
2016-04-19 23:48:30 +08:00
|
|
|
if (!DisableDITypeMap)
|
|
|
|
Context.enableDebugTypeODRUniquing();
|
Add a flag to the LLVMContext to disable name for Value other than GlobalValue
Summary:
This is intended to be a performance flag, on the same level as clang
cc1 option "--disable-free". LLVM will never initialize it by default,
it will be up to the client creating the LLVMContext to request this
behavior. Clang will do it by default in Release build (just like
--disable-free).
"opt" and "llc" can opt-in using -disable-named-value command line
option.
When performing LTO on llvm-tblgen, the initial merging of IR peaks
at 92MB without this patch, and 86MB after this patch,setNameImpl()
drops from 6.5MB to 0.5MB.
The total link time goes from ~29.5s to ~27.8s.
Compared to a compile-time flag (like the IRBuilder one), it performs
very close. I profiled on SROA and obtain these results:
420ms with IRBuilder that preserve name
372ms with IRBuilder that strip name
375ms with IRBuilder that preserve name, and a runtime flag to strip
Reviewers: chandlerc, dexonsmith, bogner
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263086
2016-03-10 09:28:54 +08:00
|
|
|
|
2016-07-16 01:23:20 +08:00
|
|
|
if (PassRemarksWithHotness)
|
[ORE] Unify spelling as "diagnostics hotness"
Summary:
To enable profile hotness information in diagnostics output, Clang takes
the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an
"s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`.
LLVM, on the other hand, defines
`LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not
"diagnostics". It's a small difference, but it's confusing, typo-inducing, and
frustrating.
Add a new method with the spelling "diagnostics", and "deprecate" the
old spelling.
Reviewers: anemet, davidxl
Reviewed By: anemet
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D34864
llvm-svn: 306848
2017-07-01 02:13:59 +08:00
|
|
|
Context.setDiagnosticsHotnessRequested(true);
|
2016-07-16 01:23:20 +08:00
|
|
|
|
2017-07-01 07:14:53 +08:00
|
|
|
if (PassRemarksHotnessThreshold)
|
|
|
|
Context.setDiagnosticsHotnessThreshold(PassRemarksHotnessThreshold);
|
|
|
|
|
2017-09-23 09:03:17 +08:00
|
|
|
std::unique_ptr<ToolOutputFile> OptRemarkFile;
|
Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
2016-09-28 04:55:07 +08:00
|
|
|
if (RemarksFilename != "") {
|
|
|
|
std::error_code EC;
|
2017-09-23 09:03:17 +08:00
|
|
|
OptRemarkFile =
|
|
|
|
llvm::make_unique<ToolOutputFile>(RemarksFilename, EC, sys::fs::F_None);
|
Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
2016-09-28 04:55:07 +08:00
|
|
|
if (EC) {
|
|
|
|
errs() << EC.message() << '\n';
|
|
|
|
return 1;
|
|
|
|
}
|
2019-03-06 23:20:13 +08:00
|
|
|
Context.setRemarkStreamer(llvm::make_unique<RemarkStreamer>(
|
|
|
|
RemarksFilename, OptRemarkFile->os()));
|
2019-03-13 05:22:27 +08:00
|
|
|
|
|
|
|
if (!RemarksPasses.empty())
|
|
|
|
if (Error E = Context.getRemarkStreamer()->setFilter(RemarksPasses)) {
|
|
|
|
errs() << E << '\n';
|
|
|
|
return 1;
|
|
|
|
}
|
Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
2016-09-28 04:55:07 +08:00
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
// Load the input module...
|
2017-10-03 02:31:29 +08:00
|
|
|
std::unique_ptr<Module> M =
|
2018-01-31 06:32:39 +08:00
|
|
|
parseIRFile(InputFilename, Err, Context, !NoVerify, ClDataLayout);
|
2009-08-22 07:29:40 +08:00
|
|
|
|
2014-12-12 15:52:09 +08:00
|
|
|
if (!M) {
|
2011-10-16 12:47:35 +08:00
|
|
|
Err.print(argv[0], errs());
|
2009-10-22 08:46:41 +08:00
|
|
|
return 1;
|
|
|
|
}
|
2002-01-21 06:54:45 +08:00
|
|
|
|
2015-03-28 06:04:28 +08:00
|
|
|
// Strip debug info before running the verifier.
|
|
|
|
if (StripDebug)
|
|
|
|
StripDebugInfo(*M);
|
|
|
|
|
2018-06-05 08:56:08 +08:00
|
|
|
// Erase module-level named metadata, if requested.
|
|
|
|
if (StripNamedMetadata) {
|
|
|
|
while (!M->named_metadata_empty()) {
|
|
|
|
NamedMDNode *NMD = &*M->named_metadata_begin();
|
|
|
|
M->eraseNamedMetadata(NMD);
|
|
|
|
}
|
|
|
|
}
|
2018-06-04 08:11:48 +08:00
|
|
|
|
2018-01-31 06:32:39 +08:00
|
|
|
// If we are supposed to override the target triple or data layout, do so now.
|
|
|
|
if (!TargetTriple.empty())
|
|
|
|
M->setTargetTriple(Triple::normalize(TargetTriple));
|
|
|
|
|
2015-03-28 06:04:28 +08:00
|
|
|
// Immediately run the verifier to catch any problems before starting up the
|
|
|
|
// pass pipelines. Otherwise we can crash on broken code during
|
|
|
|
// doInitialization().
|
|
|
|
if (!NoVerify && verifyModule(*M, &errs())) {
|
2015-03-31 11:07:23 +08:00
|
|
|
errs() << argv[0] << ": " << InputFilename
|
|
|
|
<< ": error: input module is broken!\n";
|
2015-03-28 06:04:28 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
// Figure out what stream we are supposed to write to...
|
2017-09-23 09:03:17 +08:00
|
|
|
std::unique_ptr<ToolOutputFile> Out;
|
|
|
|
std::unique_ptr<ToolOutputFile> ThinLinkOut;
|
2010-08-19 01:42:59 +08:00
|
|
|
if (NoOutput) {
|
2010-08-19 01:40:10 +08:00
|
|
|
if (!OutputFilename.empty())
|
|
|
|
errs() << "WARNING: The -o (output filename) option is ignored when\n"
|
2010-08-19 01:42:59 +08:00
|
|
|
"the --disable-output option is used.\n";
|
2010-08-19 01:40:10 +08:00
|
|
|
} else {
|
|
|
|
// Default to standard output.
|
|
|
|
if (OutputFilename.empty())
|
|
|
|
OutputFilename = "-";
|
|
|
|
|
2014-08-26 02:16:47 +08:00
|
|
|
std::error_code EC;
|
2017-09-23 09:03:17 +08:00
|
|
|
Out.reset(new ToolOutputFile(OutputFilename, EC, sys::fs::F_None));
|
2014-08-26 02:16:47 +08:00
|
|
|
if (EC) {
|
|
|
|
errs() << EC.message() << '\n';
|
2010-08-19 01:40:10 +08:00
|
|
|
return 1;
|
2001-06-07 04:29:01 +08:00
|
|
|
}
|
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
2017-03-24 03:47:39 +08:00
|
|
|
|
|
|
|
if (!ThinLinkBitcodeFile.empty()) {
|
|
|
|
ThinLinkOut.reset(
|
2017-09-23 09:03:17 +08:00
|
|
|
new ToolOutputFile(ThinLinkBitcodeFile, EC, sys::fs::F_None));
|
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
2017-03-24 03:47:39 +08:00
|
|
|
if (EC) {
|
|
|
|
errs() << EC.message() << '\n';
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
2009-10-22 08:46:41 +08:00
|
|
|
}
|
2002-04-19 03:55:25 +08:00
|
|
|
|
2015-02-01 18:11:22 +08:00
|
|
|
Triple ModuleTriple(M->getTargetTriple());
|
2015-05-07 07:49:24 +08:00
|
|
|
std::string CPUStr, FeaturesStr;
|
2015-02-01 18:11:22 +08:00
|
|
|
TargetMachine *Machine = nullptr;
|
2015-05-23 09:14:08 +08:00
|
|
|
const TargetOptions Options = InitTargetOptionsFromCodeGenFlags();
|
2015-05-23 09:12:26 +08:00
|
|
|
|
2015-05-07 07:49:24 +08:00
|
|
|
if (ModuleTriple.getArch()) {
|
|
|
|
CPUStr = getCPUStr();
|
|
|
|
FeaturesStr = getFeaturesStr();
|
2015-05-23 09:14:08 +08:00
|
|
|
Machine = GetTargetMachine(ModuleTriple, CPUStr, FeaturesStr, Options);
|
2019-03-06 07:10:28 +08:00
|
|
|
} else if (ModuleTriple.getArchName() != "unknown" &&
|
|
|
|
ModuleTriple.getArchName() != "") {
|
|
|
|
errs() << argv[0] << ": unrecognized architecture '"
|
|
|
|
<< ModuleTriple.getArchName() << "' provided.\n";
|
|
|
|
return 1;
|
2015-05-07 07:49:24 +08:00
|
|
|
}
|
|
|
|
|
2015-02-01 18:11:22 +08:00
|
|
|
std::unique_ptr<TargetMachine> TM(Machine);
|
|
|
|
|
2015-05-27 04:17:20 +08:00
|
|
|
// Override function attributes based on CPUStr, FeaturesStr, and command line
|
|
|
|
// flags.
|
|
|
|
setFunctionAttributes(CPUStr, FeaturesStr, *M);
|
2015-05-07 07:54:14 +08:00
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
// If the output is set to be emitted to standard out, and standard out is a
|
|
|
|
// console, print out a warning message and refuse to do it. We don't
|
|
|
|
// impress anyone by spewing tons of binary goo to a terminal.
|
2010-01-18 01:47:24 +08:00
|
|
|
if (!Force && !NoOutput && !AnalyzeOnly && !OutputAssembly)
|
2010-09-01 22:20:41 +08:00
|
|
|
if (CheckBitcodeOutputToConsole(Out->os(), !Quiet))
|
2009-10-22 08:46:41 +08:00
|
|
|
NoOutput = true;
|
|
|
|
|
[LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
2019-01-12 02:31:57 +08:00
|
|
|
if (OutputThinLTOBC)
|
|
|
|
M->addModuleFlag(Module::Error, "EnableSplitLTOUnit", SplitLTOUnit);
|
|
|
|
|
2014-01-13 11:08:40 +08:00
|
|
|
if (PassPipeline.getNumOccurrences() > 0) {
|
|
|
|
OutputKind OK = OK_NoOutput;
|
|
|
|
if (!NoOutput)
|
2017-06-01 09:02:12 +08:00
|
|
|
OK = OutputAssembly
|
|
|
|
? OK_OutputAssembly
|
|
|
|
: (OutputThinLTOBC ? OK_OutputThinLTOBitcode : OK_OutputBitcode);
|
2014-01-13 11:08:40 +08:00
|
|
|
|
2014-01-20 19:34:08 +08:00
|
|
|
VerifierKind VK = VK_VerifyInAndOut;
|
|
|
|
if (NoVerify)
|
|
|
|
VK = VK_NoVerifier;
|
|
|
|
else if (VerifyEach)
|
|
|
|
VK = VK_VerifyEachPass;
|
|
|
|
|
[PM] Add (very skeletal) support to opt for running the new pass
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
2014-01-11 16:16:35 +08:00
|
|
|
// The user has asked to use the new pass manager and provided a pipeline
|
|
|
|
// string. Hand off the rest of the functionality to the new code for that
|
|
|
|
// layer.
|
2017-06-01 09:02:12 +08:00
|
|
|
return runPassPipeline(argv[0], *M, TM.get(), Out.get(), ThinLinkOut.get(),
|
2017-08-20 09:30:45 +08:00
|
|
|
OptRemarkFile.get(), PassPipeline, OK, VK,
|
|
|
|
PreserveAssemblyUseListOrder,
|
2016-08-12 21:53:02 +08:00
|
|
|
PreserveBitcodeUseListOrder, EmitSummaryIndex,
|
2018-02-16 05:14:36 +08:00
|
|
|
EmitModuleHash, EnableDebugify)
|
[PM] Add (very skeletal) support to opt for running the new pass
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
2014-01-11 16:16:35 +08:00
|
|
|
? 0
|
|
|
|
: 1;
|
2014-01-13 11:08:40 +08:00
|
|
|
}
|
[PM] Add (very skeletal) support to opt for running the new pass
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
2014-01-11 16:16:35 +08:00
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
// Create a PassManager to hold and optimize the collection of passes we are
|
2011-02-19 06:13:01 +08:00
|
|
|
// about to build.
|
2018-05-15 08:29:27 +08:00
|
|
|
OptCustomPassManager Passes;
|
|
|
|
bool AddOneTimeDebugifyPasses = EnableDebugify && !DebugifyEach;
|
2009-10-22 08:46:41 +08:00
|
|
|
|
2011-02-19 06:13:01 +08:00
|
|
|
// Add an appropriate TargetLibraryInfo pass for the module's triple.
|
2015-02-01 18:11:22 +08:00
|
|
|
TargetLibraryInfoImpl TLII(ModuleTriple);
|
2011-04-06 02:41:31 +08:00
|
|
|
|
2011-02-19 06:34:03 +08:00
|
|
|
// The -disable-simplify-libcalls flag actually disables all builtin optzns.
|
|
|
|
if (DisableSimplifyLibCalls)
|
[PM] Rework how the TargetLibraryInfo pass integrates with the new pass
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
2015-01-24 10:06:09 +08:00
|
|
|
TLII.disableAllFunctions();
|
|
|
|
Passes.add(new TargetLibraryInfoWrapperPass(TLII));
|
2011-04-06 02:41:31 +08:00
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
// Add internal analysis passes from the target machine.
|
2015-02-01 20:26:09 +08:00
|
|
|
Passes.add(createTargetTransformInfoWrapperPass(TM ? TM->getTargetIRAnalysis()
|
|
|
|
: TargetIRAnalysis()));
|
2012-10-19 07:22:48 +08:00
|
|
|
|
2018-05-15 08:29:27 +08:00
|
|
|
if (AddOneTimeDebugifyPasses)
|
|
|
|
Passes.add(createDebugifyModulePass());
|
2018-01-24 04:43:50 +08:00
|
|
|
|
2015-02-13 18:01:29 +08:00
|
|
|
std::unique_ptr<legacy::FunctionPassManager> FPasses;
|
2016-08-06 00:27:33 +08:00
|
|
|
if (OptLevelO0 || OptLevelO1 || OptLevelO2 || OptLevelOs || OptLevelOz ||
|
|
|
|
OptLevelO3) {
|
2015-02-13 18:01:29 +08:00
|
|
|
FPasses.reset(new legacy::FunctionPassManager(M.get()));
|
2015-01-31 19:17:59 +08:00
|
|
|
FPasses->add(createTargetTransformInfoWrapperPass(
|
2015-02-01 20:26:09 +08:00
|
|
|
TM ? TM->getTargetIRAnalysis() : TargetIRAnalysis()));
|
2009-10-22 08:46:41 +08:00
|
|
|
}
|
2009-08-22 07:29:40 +08:00
|
|
|
|
2010-12-07 08:33:43 +08:00
|
|
|
if (PrintBreakpoints) {
|
|
|
|
// Default to standard output.
|
|
|
|
if (!Out) {
|
|
|
|
if (OutputFilename.empty())
|
|
|
|
OutputFilename = "-";
|
2011-04-06 02:41:31 +08:00
|
|
|
|
2014-08-26 02:16:47 +08:00
|
|
|
std::error_code EC;
|
2017-09-23 09:03:17 +08:00
|
|
|
Out = llvm::make_unique<ToolOutputFile>(OutputFilename, EC,
|
|
|
|
sys::fs::F_None);
|
2014-08-26 02:16:47 +08:00
|
|
|
if (EC) {
|
|
|
|
errs() << EC.message() << '\n';
|
2010-12-07 08:33:43 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
2014-02-13 00:48:02 +08:00
|
|
|
Passes.add(createBreakpointPrinter(Out->os()));
|
2010-12-07 08:33:43 +08:00
|
|
|
NoOutput = true;
|
|
|
|
}
|
|
|
|
|
2017-05-19 01:21:13 +08:00
|
|
|
if (TM) {
|
2017-10-13 06:57:28 +08:00
|
|
|
// FIXME: We should dyn_cast this when supported.
|
|
|
|
auto <M = static_cast<LLVMTargetMachine &>(*TM);
|
|
|
|
Pass *TPC = LTM.createPassConfig(Passes);
|
|
|
|
Passes.add(TPC);
|
2017-05-19 01:21:13 +08:00
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
// Create a new optimization pass for each one specified on the command line
|
|
|
|
for (unsigned i = 0; i < PassList.size(); ++i) {
|
|
|
|
if (StandardLinkOpts &&
|
|
|
|
StandardLinkOpts.getPosition() < PassList.getPosition(i)) {
|
2009-07-18 02:09:39 +08:00
|
|
|
AddStandardLinkPasses(Passes);
|
|
|
|
StandardLinkOpts = false;
|
2009-08-22 07:29:40 +08:00
|
|
|
}
|
2009-07-18 02:09:39 +08:00
|
|
|
|
2016-08-06 00:27:33 +08:00
|
|
|
if (OptLevelO0 && OptLevelO0.getPosition() < PassList.getPosition(i)) {
|
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 0, 0);
|
|
|
|
OptLevelO0 = false;
|
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (OptLevelO1 && OptLevelO1.getPosition() < PassList.getPosition(i)) {
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 1, 0);
|
2009-10-22 08:46:41 +08:00
|
|
|
OptLevelO1 = false;
|
2009-07-18 02:09:39 +08:00
|
|
|
}
|
2008-09-17 06:25:14 +08:00
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (OptLevelO2 && OptLevelO2.getPosition() < PassList.getPosition(i)) {
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 2, 0);
|
2009-10-22 08:46:41 +08:00
|
|
|
OptLevelO2 = false;
|
2009-07-18 02:09:39 +08:00
|
|
|
}
|
2008-09-17 06:25:14 +08:00
|
|
|
|
2012-05-16 16:32:49 +08:00
|
|
|
if (OptLevelOs && OptLevelOs.getPosition() < PassList.getPosition(i)) {
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 2, 1);
|
2012-05-16 16:32:49 +08:00
|
|
|
OptLevelOs = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (OptLevelOz && OptLevelOz.getPosition() < PassList.getPosition(i)) {
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 2, 2);
|
2012-05-16 16:32:49 +08:00
|
|
|
OptLevelOz = false;
|
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (OptLevelO3 && OptLevelO3.getPosition() < PassList.getPosition(i)) {
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 3, 0);
|
2009-10-22 08:46:41 +08:00
|
|
|
OptLevelO3 = false;
|
2009-07-18 02:09:39 +08:00
|
|
|
}
|
2008-09-17 06:25:14 +08:00
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
const PassInfo *PassInf = PassList[i];
|
2014-04-25 12:24:47 +08:00
|
|
|
Pass *P = nullptr;
|
2017-05-19 01:21:13 +08:00
|
|
|
if (PassInf->getNormalCtor())
|
2009-10-22 08:46:41 +08:00
|
|
|
P = PassInf->getNormalCtor()();
|
|
|
|
else
|
|
|
|
errs() << argv[0] << ": cannot create pass: "
|
|
|
|
<< PassInf->getPassName() << "\n";
|
|
|
|
if (P) {
|
2010-02-18 20:57:05 +08:00
|
|
|
PassKind Kind = P->getPassKind();
|
2009-10-22 08:46:41 +08:00
|
|
|
addPass(Passes, P);
|
|
|
|
|
|
|
|
if (AnalyzeOnly) {
|
2010-02-18 20:57:05 +08:00
|
|
|
switch (Kind) {
|
2010-01-22 14:03:06 +08:00
|
|
|
case PT_BasicBlock:
|
2014-02-11 07:34:23 +08:00
|
|
|
Passes.add(createBasicBlockPassPrinter(PassInf, Out->os(), Quiet));
|
2010-01-22 14:03:06 +08:00
|
|
|
break;
|
2010-10-20 09:54:44 +08:00
|
|
|
case PT_Region:
|
2014-02-11 07:34:23 +08:00
|
|
|
Passes.add(createRegionPassPrinter(PassInf, Out->os(), Quiet));
|
2010-10-20 09:54:44 +08:00
|
|
|
break;
|
2010-01-22 14:03:06 +08:00
|
|
|
case PT_Loop:
|
2014-02-11 07:34:23 +08:00
|
|
|
Passes.add(createLoopPassPrinter(PassInf, Out->os(), Quiet));
|
2010-01-22 14:03:06 +08:00
|
|
|
break;
|
|
|
|
case PT_Function:
|
2014-02-11 07:34:23 +08:00
|
|
|
Passes.add(createFunctionPassPrinter(PassInf, Out->os(), Quiet));
|
2010-01-22 14:03:06 +08:00
|
|
|
break;
|
|
|
|
case PT_CallGraphSCC:
|
2014-02-11 07:34:23 +08:00
|
|
|
Passes.add(createCallGraphPassPrinter(PassInf, Out->os(), Quiet));
|
2010-01-22 14:03:06 +08:00
|
|
|
break;
|
|
|
|
default:
|
2014-02-11 07:34:23 +08:00
|
|
|
Passes.add(createModulePassPrinter(PassInf, Out->os(), Quiet));
|
2010-01-22 14:03:06 +08:00
|
|
|
break;
|
|
|
|
}
|
2009-10-22 08:46:41 +08:00
|
|
|
}
|
2008-09-17 06:25:14 +08:00
|
|
|
}
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (PrintEachXForm)
|
2015-04-15 11:14:06 +08:00
|
|
|
Passes.add(
|
|
|
|
createPrintModulePass(errs(), "", PreserveAssemblyUseListOrder));
|
2009-10-22 08:46:41 +08:00
|
|
|
}
|
2002-02-21 01:56:53 +08:00
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (StandardLinkOpts) {
|
|
|
|
AddStandardLinkPasses(Passes);
|
|
|
|
StandardLinkOpts = false;
|
|
|
|
}
|
2001-06-07 04:29:01 +08:00
|
|
|
|
2016-08-06 00:27:33 +08:00
|
|
|
if (OptLevelO0)
|
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 0, 0);
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (OptLevelO1)
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 1, 0);
|
2009-10-22 08:46:41 +08:00
|
|
|
|
|
|
|
if (OptLevelO2)
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 2, 0);
|
2012-05-16 16:32:49 +08:00
|
|
|
|
|
|
|
if (OptLevelOs)
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 2, 1);
|
2012-05-16 16:32:49 +08:00
|
|
|
|
|
|
|
if (OptLevelOz)
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 2, 2);
|
2006-08-18 14:34:30 +08:00
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
if (OptLevelO3)
|
2016-04-28 03:08:24 +08:00
|
|
|
AddOptimizationPasses(Passes, *FPasses, TM.get(), 3, 0);
|
2009-10-22 08:46:41 +08:00
|
|
|
|
2016-06-15 18:32:00 +08:00
|
|
|
if (FPasses) {
|
2011-05-22 14:44:19 +08:00
|
|
|
FPasses->doInitialization();
|
2014-12-12 15:52:11 +08:00
|
|
|
for (Function &F : *M)
|
|
|
|
FPasses->run(F);
|
2011-05-22 14:44:19 +08:00
|
|
|
FPasses->doFinalization();
|
|
|
|
}
|
2009-10-22 08:46:41 +08:00
|
|
|
|
|
|
|
// Check that the module is well formed on completion of optimization
|
2015-03-20 06:24:17 +08:00
|
|
|
if (!NoVerify && !VerifyEach)
|
2009-10-22 08:46:41 +08:00
|
|
|
Passes.add(createVerifierPass());
|
|
|
|
|
2018-05-15 08:29:27 +08:00
|
|
|
if (AddOneTimeDebugifyPasses)
|
|
|
|
Passes.add(createCheckDebugifyModulePass(false));
|
2018-01-24 04:43:50 +08:00
|
|
|
|
2015-12-05 05:56:46 +08:00
|
|
|
// In run twice mode, we want to make sure the output is bit-by-bit
|
|
|
|
// equivalent if we run the pass manager again, so setup two buffers and
|
|
|
|
// a stream to write to them. Note that llc does something similar and it
|
|
|
|
// may be worth to abstract this out in the future.
|
|
|
|
SmallVector<char, 0> Buffer;
|
2018-04-14 05:23:11 +08:00
|
|
|
SmallVector<char, 0> FirstRunBuffer;
|
2015-12-05 05:56:46 +08:00
|
|
|
std::unique_ptr<raw_svector_ostream> BOS;
|
2015-12-05 08:06:37 +08:00
|
|
|
raw_ostream *OS = nullptr;
|
2015-12-05 05:56:46 +08:00
|
|
|
|
2010-08-19 01:42:59 +08:00
|
|
|
// Write bitcode or assembly to the output as the last step...
|
2009-10-22 08:46:41 +08:00
|
|
|
if (!NoOutput && !AnalyzeOnly) {
|
2015-12-05 08:06:37 +08:00
|
|
|
assert(Out);
|
|
|
|
OS = &Out->os();
|
|
|
|
if (RunTwice) {
|
|
|
|
BOS = make_unique<raw_svector_ostream>(Buffer);
|
|
|
|
OS = BOS.get();
|
|
|
|
}
|
2016-04-14 01:20:10 +08:00
|
|
|
if (OutputAssembly) {
|
|
|
|
if (EmitSummaryIndex)
|
|
|
|
report_fatal_error("Text output is incompatible with -module-summary");
|
|
|
|
if (EmitModuleHash)
|
|
|
|
report_fatal_error("Text output is incompatible with -module-hash");
|
2015-12-05 05:56:46 +08:00
|
|
|
Passes.add(createPrintModulePass(*OS, "", PreserveAssemblyUseListOrder));
|
2016-12-16 08:26:30 +08:00
|
|
|
} else if (OutputThinLTOBC)
|
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
2017-03-24 03:47:39 +08:00
|
|
|
Passes.add(createWriteThinLTOBitcodePass(
|
|
|
|
*OS, ThinLinkOut ? &ThinLinkOut->os() : nullptr));
|
2016-12-16 08:26:30 +08:00
|
|
|
else
|
2016-04-13 05:35:18 +08:00
|
|
|
Passes.add(createBitcodeWriterPass(*OS, PreserveBitcodeUseListOrder,
|
|
|
|
EmitSummaryIndex, EmitModuleHash));
|
2009-10-22 08:46:41 +08:00
|
|
|
}
|
|
|
|
|
2011-04-06 02:54:36 +08:00
|
|
|
// Before executing passes, print the final values of the LLVM options.
|
|
|
|
cl::PrintOptionValues();
|
|
|
|
|
[DebugInfo][OPT] Fixing a couple of DI duplication bugs of CloneModule
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
2018-04-14 05:22:24 +08:00
|
|
|
if (!RunTwice) {
|
|
|
|
// Now that we have all of the passes ready, run them.
|
|
|
|
Passes.run(*M);
|
|
|
|
} else {
|
|
|
|
// If requested, run all passes twice with the same pass manager to catch
|
|
|
|
// bugs caused by persistent state in the passes.
|
2018-02-15 03:50:40 +08:00
|
|
|
std::unique_ptr<Module> M2(CloneModule(*M));
|
[DebugInfo][OPT] Fixing a couple of DI duplication bugs of CloneModule
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
2018-04-14 05:22:24 +08:00
|
|
|
// Run all passes on the original module first, so the second run processes
|
|
|
|
// the clone to catch CloneModule bugs.
|
|
|
|
Passes.run(*M);
|
2018-04-14 05:23:11 +08:00
|
|
|
FirstRunBuffer = Buffer;
|
2018-02-15 03:50:40 +08:00
|
|
|
Buffer.clear();
|
2015-12-05 09:38:12 +08:00
|
|
|
|
[DebugInfo][OPT] Fixing a couple of DI duplication bugs of CloneModule
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
2018-04-14 05:22:24 +08:00
|
|
|
Passes.run(*M2);
|
2009-10-22 08:46:41 +08:00
|
|
|
|
[DebugInfo][OPT] Fixing a couple of DI duplication bugs of CloneModule
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
2018-04-14 05:22:24 +08:00
|
|
|
// Compare the two outputs and make sure they're the same
|
2015-12-05 08:06:37 +08:00
|
|
|
assert(Out);
|
2018-04-14 05:23:11 +08:00
|
|
|
if (Buffer.size() != FirstRunBuffer.size() ||
|
|
|
|
(memcmp(Buffer.data(), FirstRunBuffer.data(), Buffer.size()) != 0)) {
|
|
|
|
errs()
|
|
|
|
<< "Running the pass manager twice changed the output.\n"
|
|
|
|
"Writing the result of the second run to the specified output.\n"
|
|
|
|
"To generate the one-run comparison binary, just run without\n"
|
|
|
|
"the compile-twice option\n";
|
2015-12-05 05:56:46 +08:00
|
|
|
Out->os() << BOS->str();
|
|
|
|
Out->keep();
|
2017-08-20 09:30:45 +08:00
|
|
|
if (OptRemarkFile)
|
|
|
|
OptRemarkFile->keep();
|
2015-12-05 05:56:46 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
Out->os() << BOS->str();
|
|
|
|
}
|
|
|
|
|
2018-07-24 08:41:29 +08:00
|
|
|
if (DebugifyEach && !DebugifyExport.empty())
|
|
|
|
exportDebugifyStats(DebugifyExport, Passes.getDebugifyStatsMap());
|
|
|
|
|
2010-08-20 09:07:01 +08:00
|
|
|
// Declare success.
|
2010-12-07 08:33:43 +08:00
|
|
|
if (!NoOutput || PrintBreakpoints)
|
2010-08-20 09:07:01 +08:00
|
|
|
Out->keep();
|
|
|
|
|
2017-08-20 09:30:45 +08:00
|
|
|
if (OptRemarkFile)
|
|
|
|
OptRemarkFile->keep();
|
Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
2016-09-28 04:55:07 +08:00
|
|
|
|
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
2017-03-24 03:47:39 +08:00
|
|
|
if (ThinLinkOut)
|
|
|
|
ThinLinkOut->keep();
|
|
|
|
|
2009-10-22 08:46:41 +08:00
|
|
|
return 0;
|
2001-06-07 04:29:01 +08:00
|
|
|
}
|