forked from OSchip/llvm-project
277 lines
12 KiB
ReStructuredText
277 lines
12 KiB
ReStructuredText
=====================================================================
|
|
Building a JIT: Adding Optimizations -- An introduction to ORC Layers
|
|
=====================================================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
**This tutorial is under active development. It is incomplete and details may
|
|
change frequently.** Nonetheless we invite you to try it out as it stands, and
|
|
we welcome any feedback.
|
|
|
|
Chapter 2 Introduction
|
|
======================
|
|
|
|
**Warning: This tutorial is currently being updated to account for ORC API
|
|
changes. Only Chapters 1 and 2 are up-to-date.**
|
|
|
|
**Example code from Chapters 3 to 5 will compile and run, but has not been
|
|
updated**
|
|
|
|
Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
|
|
`Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
|
|
class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
|
|
executable code in memory. KaleidoscopeJIT was able to do this with relatively
|
|
little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and
|
|
ObjectLinkingLayer, to do much of the heavy lifting.
|
|
|
|
In this layer we'll learn more about the ORC layer concept by using a new layer,
|
|
IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
|
|
|
|
Optimizing Modules using the IRTransformLayer
|
|
=============================================
|
|
|
|
In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
|
|
tutorial series the llvm *FunctionPassManager* is introduced as a means for
|
|
optimizing LLVM IR. Interested readers may read that chapter for details, but
|
|
in short: to optimize a Module we create an llvm::FunctionPassManager
|
|
instance, configure it with a set of optimizations, then run the PassManager on
|
|
a Module to mutate it into a (hopefully) more optimized but semantically
|
|
equivalent form. In the original tutorial series the FunctionPassManager was
|
|
created outside the KaleidoscopeJIT and modules were optimized before being
|
|
added to it. In this Chapter we will make optimization a phase of our JIT
|
|
instead. For now this will provide us a motivation to learn more about ORC
|
|
layers, but in the long term making optimization part of our JIT will yield an
|
|
important benefit: When we begin lazily compiling code (i.e. deferring
|
|
compilation of each function until the first time it's run) having
|
|
optimization managed by our JIT will allow us to optimize lazily too, rather
|
|
than having to do all our optimization up-front.
|
|
|
|
To add optimization support to our JIT we will take the KaleidoscopeJIT from
|
|
Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the
|
|
IRTransformLayer works in more detail below, but the interface is simple: the
|
|
constructor for this layer takes a reference to the execution session and the
|
|
layer below (as all layers do) plus an *IR optimization function* that it will
|
|
apply to each Module that is added via addModule:
|
|
|
|
.. code-block:: c++
|
|
|
|
class KaleidoscopeJIT {
|
|
private:
|
|
ExecutionSession ES;
|
|
RTDyldObjectLinkingLayer ObjectLayer;
|
|
IRCompileLayer CompileLayer;
|
|
IRTransformLayer TransformLayer;
|
|
|
|
DataLayout DL;
|
|
MangleAndInterner Mangle;
|
|
ThreadSafeContext Ctx;
|
|
|
|
public:
|
|
|
|
KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
|
|
: ObjectLayer(ES,
|
|
[]() { return std::make_unique<SectionMemoryManager>(); }),
|
|
CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
|
|
TransformLayer(ES, CompileLayer, optimizeModule),
|
|
DL(std::move(DL)), Mangle(ES, this->DL),
|
|
Ctx(std::make_unique<LLVMContext>()) {
|
|
ES.getMainJITDylib().addGenerator(
|
|
cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix())));
|
|
}
|
|
|
|
Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
|
|
but after the CompileLayer we introduce a new member, TransformLayer, which sits
|
|
on top of our CompileLayer. We initialize our OptimizeLayer with a reference to
|
|
the ExecutionSession and output layer (standard practice for layers), along with
|
|
a *transform function*. For our transform function we supply our classes
|
|
optimizeModule static method.
|
|
|
|
.. code-block:: c++
|
|
|
|
// ...
|
|
return cantFail(OptimizeLayer.addModule(std::move(M),
|
|
std::move(Resolver)));
|
|
// ...
|
|
|
|
Next we need to update our addModule method to replace the call to
|
|
``CompileLayer::add`` with a call to ``OptimizeLayer::add`` instead.
|
|
|
|
.. code-block:: c++
|
|
|
|
static Expected<ThreadSafeModule>
|
|
optimizeModule(ThreadSafeModule M, const MaterializationResponsibility &R) {
|
|
// Create a function pass manager.
|
|
auto FPM = std::make_unique<legacy::FunctionPassManager>(M.get());
|
|
|
|
// Add some optimizations.
|
|
FPM->add(createInstructionCombiningPass());
|
|
FPM->add(createReassociatePass());
|
|
FPM->add(createGVNPass());
|
|
FPM->add(createCFGSimplificationPass());
|
|
FPM->doInitialization();
|
|
|
|
// Run the optimizations over all functions in the module being added to
|
|
// the JIT.
|
|
for (auto &F : *M)
|
|
FPM->run(F);
|
|
|
|
return M;
|
|
}
|
|
|
|
At the bottom of our JIT we add a private method to do the actual optimization:
|
|
*optimizeModule*. This function takes the module to be transformed as input (as
|
|
a ThreadSafeModule) along with a reference to a reference to a new class:
|
|
``MaterializationResponsibility``. The MaterializationResponsibility argument
|
|
can be used to query JIT state for the module being transformed, such as the set
|
|
of definitions in the module that JIT'd code is actively trying to call/access.
|
|
For now we will ignore this argument and use a standard optimization
|
|
pipeline. To do this we set up a FunctionPassManager, add some passes to it, run
|
|
it over every function in the module, and then return the mutated module. The
|
|
specific optimizations are the same ones used in `Chapter 4 <LangImpl04.html>`_
|
|
of the "Implementing a language with LLVM" tutorial series. Readers may visit
|
|
that chapter for a more in-depth discussion of these, and of IR optimization in
|
|
general.
|
|
|
|
And that's it in terms of changes to KaleidoscopeJIT: When a module is added via
|
|
addModule the OptimizeLayer will call our optimizeModule function before passing
|
|
the transformed module on to the CompileLayer below. Of course, we could have
|
|
called optimizeModule directly in our addModule function and not gone to the
|
|
bother of using the IRTransformLayer, but doing so gives us another opportunity
|
|
to see how layers compose. It also provides a neat entry point to the *layer*
|
|
concept itself, because IRTransformLayer is one of the simplest layers that
|
|
can be implemented.
|
|
|
|
.. code-block:: c++
|
|
|
|
// From IRTransformLayer.h:
|
|
class IRTransformLayer : public IRLayer {
|
|
public:
|
|
using TransformFunction = std::function<Expected<ThreadSafeModule>(
|
|
ThreadSafeModule, const MaterializationResponsibility &R)>;
|
|
|
|
IRTransformLayer(ExecutionSession &ES, IRLayer &BaseLayer,
|
|
TransformFunction Transform = identityTransform);
|
|
|
|
void setTransform(TransformFunction Transform) {
|
|
this->Transform = std::move(Transform);
|
|
}
|
|
|
|
static ThreadSafeModule
|
|
identityTransform(ThreadSafeModule TSM,
|
|
const MaterializationResponsibility &R) {
|
|
return TSM;
|
|
}
|
|
|
|
void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override;
|
|
|
|
private:
|
|
IRLayer &BaseLayer;
|
|
TransformFunction Transform;
|
|
};
|
|
|
|
// From IRTransformLayer.cpp:
|
|
|
|
IRTransformLayer::IRTransformLayer(ExecutionSession &ES,
|
|
IRLayer &BaseLayer,
|
|
TransformFunction Transform)
|
|
: IRLayer(ES), BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
|
|
|
|
void IRTransformLayer::emit(MaterializationResponsibility R,
|
|
ThreadSafeModule TSM) {
|
|
assert(TSM.getModule() && "Module must not be null");
|
|
|
|
if (auto TransformedTSM = Transform(std::move(TSM), R))
|
|
BaseLayer.emit(std::move(R), std::move(*TransformedTSM));
|
|
else {
|
|
R.failMaterialization();
|
|
getExecutionSession().reportError(TransformedTSM.takeError());
|
|
}
|
|
}
|
|
|
|
This is the whole definition of IRTransformLayer, from
|
|
``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h`` and
|
|
``llvm/lib/ExecutionEngine/Orc/IRTransformLayer.cpp``. This class is concerned
|
|
with two very simple jobs: (1) Running every IR Module that is emitted via this
|
|
layer through the transform function object, and (2) implementing the ORC
|
|
``IRLayer`` interface (which itself conforms to the general ORC Layer concept,
|
|
more on that below). Most of the class is straightforward: a typedef for the
|
|
transform function, a constructor to initialize the members, a setter for the
|
|
transform function value, and a default no-op transform. The most important
|
|
method is ``emit`` as this is half of our IRLayer interface. The emit method
|
|
applies our transform to each module that it is called on and, if the transform
|
|
succeeds, passes the transformed module to the base layer. If the transform
|
|
fails, our emit function calls
|
|
``MaterializationResponsibility::failMaterialization`` (this JIT clients who
|
|
may be waiting on other threads know that the code they were waiting for has
|
|
failed to compile) and logs the error with the execution session before bailing
|
|
out.
|
|
|
|
The other half of the IRLayer interface we inherit unmodified from the IRLayer
|
|
class:
|
|
|
|
.. code-block:: c++
|
|
|
|
Error IRLayer::add(JITDylib &JD, ThreadSafeModule TSM, VModuleKey K) {
|
|
return JD.define(std::make_unique<BasicIRLayerMaterializationUnit>(
|
|
*this, std::move(K), std::move(TSM)));
|
|
}
|
|
|
|
This code, from ``llvm/lib/ExecutionEngine/Orc/Layer.cpp``, adds a
|
|
ThreadSafeModule to a given JITDylib by wrapping it up in a
|
|
``MaterializationUnit`` (in this case a ``BasicIRLayerMaterializationUnit``).
|
|
Most layers that derived from IRLayer can rely on this default implementation
|
|
of the ``add`` method.
|
|
|
|
These two operations, ``add`` and ``emit``, together constitute the layer
|
|
concept: A layer is a way to wrap a part of a compiler pipeline (in this case
|
|
the "opt" phase of an LLVM compiler) whose API is opaque to ORC with an
|
|
interface that ORC can call as needed. The add method takes an
|
|
module in some input program representation (in this case an LLVM IR module)
|
|
and stores it in the target ``JITDylib``, arranging for it to be passed back
|
|
to the layer's emit method when any symbol defined by that module is requested.
|
|
Each layer can complete its own work by calling the ``emit`` method of its base
|
|
layer. For example, in this tutorial our IRTransformLayer calls through to
|
|
our IRCompileLayer to compile the transformed IR, and our IRCompileLayer in
|
|
turn calls our ObjectLayer to link the object file produced by our compiler.
|
|
|
|
So far we have learned how to optimize and compile our LLVM IR, but we have
|
|
not focused on when compilation happens. Our current REPL optimizes and
|
|
compiles each function as soon as it is referenced by any other code,
|
|
regardless of whether it is ever called at runtime. In the next chapter we
|
|
will introduce a fully lazy compilation, in which functions are not compiled
|
|
until they are first called at run-time. At this point the trade-offs get much
|
|
more interesting: the lazier we are, the quicker we can start executing the
|
|
first function, but the more often we will have to pause to compile newly
|
|
encountered functions. If we only code-gen lazily, but optimize eagerly, we
|
|
will have a longer startup time (as everything is optimized at that time) but
|
|
relatively short pauses as each function just passes through code-gen. If we
|
|
both optimize and code-gen lazily we can start executing the first function
|
|
more quickly, but we will have longer pauses as each function has to be both
|
|
optimized and code-gen'd when it is first executed. Things become even more
|
|
interesting if we consider interprocedural optimizations like inlining, which
|
|
must be performed eagerly. These are complex trade-offs, and there is no
|
|
one-size-fits all solution to them, but by providing composable layers we leave
|
|
the decisions to the person implementing the JIT, and make it easy for them to
|
|
experiment with different configurations.
|
|
|
|
`Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
|
|
|
|
Full Code Listing
|
|
=================
|
|
|
|
Here is the complete code listing for our running example with an
|
|
IRTransformLayer added to enable optimization. To build this example, use:
|
|
|
|
.. code-block:: bash
|
|
|
|
# Compile
|
|
clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
|
|
# Run
|
|
./toy
|
|
|
|
Here is the code:
|
|
|
|
.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
|
|
:language: c++
|