forked from OSchip/llvm-project
464 lines
20 KiB
ReStructuredText
464 lines
20 KiB
ReStructuredText
===============================
|
|
ORC Design and Implementation
|
|
===============================
|
|
|
|
Introduction
|
|
============
|
|
|
|
This document aims to provide a high-level overview of the design and
|
|
implementation of the ORC JIT APIs. Except where otherwise stated, all
|
|
discussion applies to the design of the APIs as of LLVM verison 9 (ORCv2).
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Use-cases
|
|
=========
|
|
|
|
ORC provides a modular API for building JIT compilers. There are a range
|
|
of use cases for such an API. For example:
|
|
|
|
1. The LLVM tutorials use a simple ORC-based JIT class to execute expressions
|
|
compiled from a toy languge: Kaleidoscope.
|
|
|
|
2. The LLVM debugger, LLDB, uses a cross-compiling JIT for expression
|
|
evaluation. In this use case, cross compilation allows expressions compiled
|
|
in the debugger process to be executed on the debug target process, which may
|
|
be on a different device/architecture.
|
|
|
|
3. In high-performance JITs (e.g. JVMs, Julia) that want to make use of LLVM's
|
|
optimizations within an existing JIT infrastructure.
|
|
|
|
4. In interpreters and REPLs, e.g. Cling (C++) and the Swift interpreter.
|
|
|
|
By adoping a modular, library-based design we aim to make ORC useful in as many
|
|
of these contexts as possible.
|
|
|
|
Features
|
|
========
|
|
|
|
ORC provides the following features:
|
|
|
|
- *JIT-linking* links relocatable object files (COFF, ELF, MachO) [1]_ into a
|
|
target process an runtime. The target process may be the same process that
|
|
contains the JIT session object and jit-linker, or may be another process
|
|
(even one running on a different machine or architecture) that communicates
|
|
with the JIT via RPC.
|
|
|
|
- *LLVM IR compilation*, which is provided by off the shelf components
|
|
(IRCompileLayer, SimpleCompiler, ConcurrentIRCompiler) that make it easy to
|
|
add LLVM IR to a JIT'd process.
|
|
|
|
- *Eager and lazy compilation*. By default, ORC will compile symbols as soon as
|
|
they are looked up in the JIT session object (``ExecutionSession``). Compiling
|
|
eagerly by default makes it easy to use ORC as a simple in-memory compiler for
|
|
an existing JIT. ORC also provides a simple mechanism, lazy-reexports, for
|
|
deferring compilation until first call.
|
|
|
|
- *Support for custom compilers and program representations*. Clients can supply
|
|
custom compilers for each symbol that they define in their JIT session. ORC
|
|
will run the user-supplied compiler when the a definition of a symbol is
|
|
needed. ORC is actually fully language agnostic: LLVM IR is not treated
|
|
specially, and is supported via the same wrapper mechanism (the
|
|
``MaterializationUnit`` class) that is used for custom compilers.
|
|
|
|
- *Concurrent JIT'd code* and *concurrent compilation*. JIT'd code may spawn
|
|
multiple threads, and may re-enter the JIT (e.g. for lazy compilation)
|
|
concurrently from multiple threads. The ORC APIs also support running multiple
|
|
compilers concurrently, and provides off-the-shelf infrastructure to track
|
|
dependencies on running compiles (e.g. to ensure that we never call into code
|
|
until it is safe to do so, even if that involves waiting on multiple
|
|
compiles).
|
|
|
|
- *Orthogonality* and *composability*: Each of the features above can be used (or
|
|
not) independently. It is possible to put ORC components together to make a
|
|
non-lazy, in-process, single threaded JIT or a lazy, out-of-process,
|
|
concurrent JIT, or anything in between.
|
|
|
|
LLJIT and LLLazyJIT
|
|
===================
|
|
|
|
ORC provides two basic JIT classes off-the-shelf. These are useful both as
|
|
examples of how to assemble ORC components to make a JIT, and as replacements
|
|
for earlier LLVM JIT APIs (e.g. MCJIT).
|
|
|
|
The LLJIT class uses an IRCompileLayer and RTDyldObjectLinkingLayer to support
|
|
compilation of LLVM IR and linking of relocatable object files. All operations
|
|
are performed eagerly on symbol lookup (i.e. a symbol's definition is compiled
|
|
as soon as you attempt to look up its address). LLJIT is a suitable replacement
|
|
for MCJIT in most cases (note: some more advanced features, e.g.
|
|
JITEventListeners are not supported yet).
|
|
|
|
The LLLazyJIT extends LLJIT and adds a CompileOnDemandLayer to enable lazy
|
|
compilation of LLVM IR. When an LLVM IR module is added via the addLazyIRModule
|
|
method, function bodies in that module will not be compiled until they are first
|
|
called. LLLazyJIT aims to provide a replacement of LLVM's original (pre-MCJIT)
|
|
JIT API.
|
|
|
|
LLJIT and LLLazyJIT instances can be created using their respective builder
|
|
classes: LLJITBuilder and LLazyJITBuilder. For example, assuming you have a
|
|
module ``M`` loaded on an ThreadSafeContext ``Ctx``:
|
|
|
|
.. code-block:: c++
|
|
|
|
// Try to detect the host arch and construct an LLJIT instance.
|
|
auto JIT = LLJITBuilder().create();
|
|
|
|
// If we could not construct an instance, return an error.
|
|
if (!JIT)
|
|
return JIT.takeError();
|
|
|
|
// Add the module.
|
|
if (auto Err = JIT->addIRModule(TheadSafeModule(std::move(M), Ctx)))
|
|
return Err;
|
|
|
|
// Look up the JIT'd code entry point.
|
|
auto EntrySym = JIT->lookup("entry");
|
|
if (!EntrySym)
|
|
return EntrySym.takeError();
|
|
|
|
auto *Entry = (void(*)())EntrySym.getAddress();
|
|
|
|
Entry();
|
|
|
|
The builder clasess provide a number of configuration options that can be
|
|
specified before the JIT instance is constructed. For example:
|
|
|
|
.. code-block:: c++
|
|
|
|
// Build an LLLazyJIT instance that uses four worker threads for compilation,
|
|
// and jumps to a specific error handler (rather than null) on lazy compile
|
|
// failures.
|
|
|
|
void handleLazyCompileFailure() {
|
|
// JIT'd code will jump here if lazy compilation fails, giving us an
|
|
// opportunity to exit or throw an exception into JIT'd code.
|
|
throw JITFailed();
|
|
}
|
|
|
|
auto JIT = LLLazyJITBuilder()
|
|
.setNumCompileThreads(4)
|
|
.setLazyCompileFailureAddr(
|
|
toJITTargetAddress(&handleLazyCompileFailure))
|
|
.create();
|
|
|
|
// ...
|
|
|
|
For users wanting to get started with LLJIT a minimal example program can be
|
|
found at ``llvm/examples/HowToUseLLJIT``.
|
|
|
|
Design Overview
|
|
===============
|
|
|
|
ORC's JIT'd program model aims to emulate the linking and symbol resolution
|
|
rules used by the static and dynamic linkers. This allows ORC to JIT
|
|
arbitrary LLVM IR, including IR produced by an ordinary static compiler (e.g.
|
|
clang) that uses constructs like symbol linkage and visibility, and weak and
|
|
common symbol definitions.
|
|
|
|
To see how this works, imagine a program ``foo`` which links against a pair
|
|
of dynamic libraries: ``libA`` and ``libB``. On the command line, building this
|
|
system might look like:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ clang++ -shared -o libA.dylib a1.cpp a2.cpp
|
|
$ clang++ -shared -o libB.dylib b1.cpp b2.cpp
|
|
$ clang++ -o myapp myapp.cpp -L. -lA -lB
|
|
$ ./myapp
|
|
|
|
In ORC, this would translate into API calls on a "CXXCompilingLayer" (with error
|
|
checking omitted for brevity) as:
|
|
|
|
.. code-block:: c++
|
|
|
|
ExecutionSession ES;
|
|
RTDyldObjectLinkingLayer ObjLinkingLayer(
|
|
ES, []() { return llvm::make_unique<SectionMemoryManager>(); });
|
|
CXXCompileLayer CXXLayer(ES, ObjLinkingLayer);
|
|
|
|
// Create JITDylib "A" and add code to it using the CXX layer.
|
|
auto &LibA = ES.createJITDylib("A");
|
|
CXXLayer.add(LibA, MemoryBuffer::getFile("a1.cpp"));
|
|
CXXLayer.add(LibA, MemoryBuffer::getFile("a2.cpp"));
|
|
|
|
// Create JITDylib "B" and add code to it using the CXX layer.
|
|
auto &LibB = ES.createJITDylib("B");
|
|
CXXLayer.add(LibB, MemoryBuffer::getFile("b1.cpp"));
|
|
CXXLayer.add(LibB, MemoryBuffer::getFile("b2.cpp"));
|
|
|
|
// Specify the search order for the main JITDylib. This is equivalent to a
|
|
// "links against" relationship in a command-line link.
|
|
ES.getMainJITDylib().setSearchOrder({{&LibA, false}, {&LibB, false}});
|
|
CXXLayer.add(ES.getMainJITDylib(), MemoryBuffer::getFile("main.cpp"));
|
|
|
|
// Look up the JIT'd main, cast it to a function pointer, then call it.
|
|
auto MainSym = ExitOnErr(ES.lookup({&ES.getMainJITDylib()}, "main"));
|
|
auto *Main = (int(*)(int, char*[]))MainSym.getAddress();
|
|
|
|
int Result = Main(...);
|
|
|
|
|
|
This example tells us nothing about *how* or *when* compilation will happen.
|
|
That will depend on the implementation of the hypothetical CXXCompilingLayer,
|
|
but the linking rules will be the same regardless. For example, if a1.cpp and
|
|
a2.cpp both define a function "foo" the API should generate a duplicate
|
|
definition error. On the other hand, if a1.cpp and b1.cpp both define "foo"
|
|
there is no error (different dynamic libraries may define the same symbol). If
|
|
main.cpp refers to "foo", it should bind to the definition in LibA rather than
|
|
the one in LibB, since main.cpp is part of the "main" dylib, and the main dylib
|
|
links against LibA before LibB.
|
|
|
|
Many JIT clients will have no need for this strict adherence to the usual
|
|
ahead-of-time linking rules and should be able to get by just fine by putting
|
|
all of their code in a single JITDylib. However, clients who want to JIT code
|
|
for languages/projects that traditionally rely on ahead-of-time linking (e.g.
|
|
C++) will find that this feature makes life much easier.
|
|
|
|
Symbol lookup in ORC serves two other important functions, beyond basic lookup:
|
|
(1) It triggers compilation of the symbol(s) searched for, and (2) it provides
|
|
the synchronization mechanism for concurrent compilation. The pseudo-code for
|
|
the lookup process is:
|
|
|
|
.. code-block:: none
|
|
|
|
construct a query object from a query set and query handler
|
|
lock the session
|
|
lodge query against requested symbols, collect required materializers (if any)
|
|
unlock the session
|
|
dispatch materializers (if any)
|
|
|
|
In this context a materializer is something that provides a working definition
|
|
of a symbol upon request. Generally materializers wrap compilers, but they may
|
|
also wrap a linker directly (if the program representation backing the
|
|
definitions is an object file), or even just a class that writes bits directly
|
|
into memory (if the definitions are stubs). Materialization is the blanket term
|
|
for any actions (compiling, linking, splatting bits, registering with runtimes,
|
|
etc.) that is requried to generate a symbol definition that is safe to call or
|
|
access.
|
|
|
|
As each materializer completes its work it notifies the JITDylib, which in turn
|
|
notifies any query objects that are waiting on the newly materialized
|
|
definitions. Each query object maintains a count of the number of symbols that
|
|
it is still waiting on, and once this count reaches zero the query object calls
|
|
the query handler with a *SymbolMap* (a map of symbol names to addresses)
|
|
describing the result. If any symbol fails to materialize the query immediately
|
|
calls the query handler with an error.
|
|
|
|
The collected materialization units are sent to the ExecutionSession to be
|
|
dispatched, and the dispatch behavior can be set by the client. By default each
|
|
materializer is run on the calling thread. Clients are free to create new
|
|
threads to run materializers, or to send the work to a work queue for a thread
|
|
pool (this is what LLJIT/LLLazyJIT do).
|
|
|
|
Top Level APIs
|
|
==============
|
|
|
|
Many of ORC's top-level APIs are visible in the example above:
|
|
|
|
- *ExecutionSession* represents the JIT'd program and provides context for the
|
|
JIT: It contains the JITDylibs, error reporting mechanisms, and dispatches the
|
|
materializers.
|
|
|
|
- *JITDylibs* provide the symbol tables.
|
|
|
|
- *Layers* (ObjLinkingLayer and CXXLayer) are wrappers around compilers and
|
|
allow clients to add uncompiled program representations supported by those
|
|
compilers to JITDylibs.
|
|
|
|
Several other important APIs are used explicitly. JIT clients need not be aware
|
|
of them, but Layer authors will use them:
|
|
|
|
- *MaterializationUnit* - When XXXLayer::add is invoked it wraps the given
|
|
program representation (in this example, C++ source) in a MaterializationUnit,
|
|
which is then stored in the JITDylib. MaterializationUnits are responsible for
|
|
describing the definitions they provide, and for unwrapping the program
|
|
representation and passing it back to the layer when compilation is required
|
|
(this ownership shuffle makes writing thread-safe layers easier, since the
|
|
ownership of the program representation will be passed back on the stack,
|
|
rather than having to be fished out of a Layer member, which would require
|
|
synchronization).
|
|
|
|
- *MaterializationResponsibility* - When a MaterializationUnit hands a program
|
|
representation back to the layer it comes with an associated
|
|
MaterializationResponsibility object. This object tracks the definitions
|
|
that must be materialized and provides a way to notify the JITDylib once they
|
|
are either successfully materialized or a failure occurs.
|
|
|
|
Handy utilities
|
|
===============
|
|
|
|
TBD: absolute symbols, aliases, off-the-shelf layers.
|
|
|
|
Laziness
|
|
========
|
|
|
|
Laziness in ORC is provided by a utility called "lazy-reexports". The aim of
|
|
this utility is to re-use the synchronization provided by the symbol lookup
|
|
mechanism to make it safe to lazily compile functions, even if calls to the
|
|
stub occur simultaneously on multiple threads of JIT'd code. It does this by
|
|
reducing lazy compilation to symbol lookup: The lazy stub performs a lookup of
|
|
its underlying definition on first call, updating the function body pointer
|
|
once the definition is available. If additional calls arrive on other threads
|
|
while compilation is ongoing they will be safely blocked by the normal lookup
|
|
synchronization guarantee (no result until the result is safe) and can also
|
|
proceed as soon as compilation completes.
|
|
|
|
TBD: Usage example.
|
|
|
|
Supporting Custom Compilers
|
|
===========================
|
|
|
|
TBD.
|
|
|
|
Transitioning from ORCv1 to ORCv2
|
|
=================================
|
|
|
|
Since LLVM 7.0 new ORC developement has focused on adding support for concurrent
|
|
compilation. In order to enable concurrency new APIs were introduced
|
|
(ExecutionSession, JITDylib, etc.) and new implementations of existing layers
|
|
were written. In LLVM 8.0 the old layer implementations, which do not support
|
|
concurrency, were renamed (with a "Legacy" prefix), but remained in tree. In
|
|
LLVM 9.0 we have added a deprecation warning for the old layers and utilities,
|
|
and in LLVM 10.0 the old layers and utilities will be removed.
|
|
|
|
Clients currently using the legacy (ORCv1) layers and utilities will usually
|
|
find it easy to transition to the newer (ORCv2) variants. Most of the ORCv1
|
|
layers and utilities have ORCv2 counterparts[2]_ that can be
|
|
substituted. However there are some differences between ORCv1 and ORCv2 to be
|
|
aware of:
|
|
|
|
1. All JIT stacks now need an ExecutionSession instance which manages the
|
|
string pool, error reporting, synchronization, and symbol lookup.
|
|
|
|
2. ORCv2 uses uniqued strings (``SymbolStringPtr`` instances) to reduce memory
|
|
overhead and improve lookup performance. To get a uniqued string, call
|
|
``intern`` on your ExecutionSession instance:
|
|
|
|
.. code-block:: c++
|
|
|
|
ExecutionSession ES;
|
|
|
|
/// ...
|
|
|
|
auto MainSymbolName = ES.intern("main");
|
|
|
|
3. Program representations (Modules, Object Files, etc.) are no longer added
|
|
*to* layers. Instead they are added *to* JITDylibs *by* layers. The layer
|
|
determines how the program representation will be compiled if it is needed.
|
|
The JITDylib provides the symbol table, enforces linkage rules (e.g.
|
|
rejecting duplicate definitions), and synchronizes concurrent compiles.
|
|
|
|
Most ORCv1 clients (or MCJIT clients wanting to try out ORCv2) should
|
|
simply add code to the default *main* JITDylib provided by the
|
|
ExecutionSession:
|
|
|
|
.. code-block:: c++
|
|
|
|
ExecutionSession ES;
|
|
RTDyldObjectLinkingLayer ObjLinkingLayer(
|
|
ES, []() { return llvm::make_unique<SectionMemoryManager>(); });
|
|
IRCompileLayer CompileLayer(ES, ObjLinkingLayer, SimpleIRCompiler(TM));
|
|
|
|
auto M = loadModule(...);
|
|
|
|
if (auto Err = CompileLayer.add(ES.getMainJITDylib(), M))
|
|
return Err;
|
|
|
|
4. IR layers require ThreadSafeModule instances, rather than
|
|
std::unique_ptr<Module>s. A ThreadSafeModule instance is a pair of a
|
|
std::unique_ptr<Module> and a ThreadSafeContext, which is in turn a
|
|
pair of a std::unique_ptr<LLVMContext> and a lock. This allows the JIT
|
|
to ensure that the LLVMContext for a module is locked before the module
|
|
is accessed. Multiple ThreadSafeModules may share a ThreadSafeContext
|
|
value, but in that case the modules will not be able to be compiled
|
|
concurrently[3]_.
|
|
|
|
ThreadSafeContexts may be constructed explicitly:
|
|
|
|
.. code-block:: c++
|
|
|
|
// ThreadSafeContext shared between two modules.
|
|
ThreadSafeContext TSCtx(llvm::make_unique<LLVMContext>());
|
|
ThreadSafeModule TSM1(
|
|
llvm::make_unique<Module>("M1", *TSCtx.getContext()), TSCtx);
|
|
ThreadSafeModule TSM2(
|
|
llvm::make_unique<Module>("M2", *TSCtx.getContext()), TSCtx);
|
|
|
|
, or they can be created implicitly by passing a new LLVMContext to the
|
|
ThreadSafeModuleConstructor:
|
|
|
|
.. code-block:: c++
|
|
|
|
// Constructing a ThreadSafeModule (and implicitly a ThreadSafeContext)
|
|
// from a pair of a Module and a Context.
|
|
auto Ctx = llvm::make_unique<LLVMContext>();
|
|
auto M = llvm::make_unique<Module>("M", *Ctx);
|
|
return ThreadSafeModule(std::move(M), std::move(Ctx));
|
|
|
|
5. The symbol resolution and lookup scheme have been fundamentally changed.
|
|
Symbol lookup has been removed from the layer interface. Instead,
|
|
symbols are looked up via the ``ExecutionSession::lookup`` method by
|
|
scanning a list of JITDylibs.
|
|
|
|
SymbolResolvers have been removed entirely. Resolution rules now follow the
|
|
linkage relationship between JITDylibs. For example, to resolve a reference
|
|
to a symbol *F* from a module *M* that has been added to JITDylib *J1* we
|
|
would first search for a definition of *F* in *J1* then (if no definition
|
|
was found) search each of the JITDylibs that *J1* links against.
|
|
|
|
While the new resolution scheme is, strictly speaking, less flexible than
|
|
the old scheme of customizable resolvers this has not yet led to problems
|
|
in practice. Instead, using standard linker rules has removed a lot of
|
|
boilerplate while providing correct[4]_ behavior for common and weak symbols.
|
|
|
|
One notable difference is in exposing in-process symbols to the JIT. To
|
|
support this (without requiring the set of symbols to be enumerated up
|
|
front), JITDylibs allow for a *GeneratorFunction* to be attached to
|
|
generate new definitions upon lookup. Reflecting the processes symbols into
|
|
the JIT can be done by writing:
|
|
|
|
.. code-block:: c++
|
|
|
|
ExecutionSession ES;
|
|
const auto DataLayout &DL = ...;
|
|
|
|
{
|
|
auto ProcessSymbolsGenerator =
|
|
DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix());
|
|
if (!ProcessSymbolsGenerator)
|
|
return ProcessSymbolsGenerator.takeError();
|
|
ES.getMainJITDylib().setGenerator(std::move(*ProcessSymbolsGenerator));
|
|
}
|
|
|
|
6. Module removal is not yet supported. There is no equivalent of the
|
|
layer concept removeModule/removeObject methods. Work on resource tracking
|
|
and removal in ORCv2 is ongoing.
|
|
|
|
Future Features
|
|
===============
|
|
|
|
TBD: Speculative compilation. Object Caches.
|
|
|
|
.. [1] Formats/architectures vary in terms of supported features. MachO and
|
|
ELF tend to have better support than COFF. Patches very welcome!
|
|
|
|
.. [2] The ``LazyEmittingLayer``, ``RemoteObjectClientLayer`` and
|
|
``RemoteObjectServerLayer`` do not have counterparts in the new
|
|
system. In the case of ``LazyEmittingLayer`` it was simply no longer
|
|
needed: in ORCv2, deferring compilation until symbols are looked up is
|
|
the default. The removal of ``RemoteObjectClientLayer`` and
|
|
``RemoteObjectServerLayer`` means that JIT stacks can no longer be split
|
|
across processes, however this functionality appears not to have been
|
|
used.
|
|
|
|
.. [3] Sharing ThreadSafeModules in a concurrent compilation can be dangerous:
|
|
if interdependent modules are loaded on the same context, but compiled
|
|
on different threads a deadlock may occur (with each compile waiting for
|
|
the other(s) to complete, and the other(s) unable to proceed because the
|
|
context is locked).
|
|
|
|
.. [4] Mostly. Weak definitions are handled correctly within dylibs, but if
|
|
multiple dylibs provide a weak definition of a symbol each will end up
|
|
with its own definition (similar to how weak symbols in Windows DLLs
|
|
behave). This will be fixed in the future. |