forked from OSchip/llvm-project
First draft of the practical guide to atomics.
This is mostly descriptive of the intended state once atomic load and store have landed. llvm-svn: 137145
This commit is contained in:
parent
e0152a73c2
commit
84cd7927b9
|
@ -0,0 +1,295 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>LLVM Atomic Instructions and Concurrency Guide</title>
|
||||
<link rel="stylesheet" href="llvm.css" type="text/css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>
|
||||
LLVM Atomic Instructions and Concurrency Guide
|
||||
</h1>
|
||||
|
||||
<ol>
|
||||
<li><a href="#introduction">Introduction</a></li>
|
||||
<li><a href="#loadstore">Load and store</a></li>
|
||||
<li><a href="#ordering">Atomic orderings</a></li>
|
||||
<li><a href="#otherinst">Other atomic instructions</a></li>
|
||||
<li><a href="#iropt">Atomics and IR optimization</a></li>
|
||||
<li><a href="#codegen">Atomics and Codegen</a></li>
|
||||
</ol>
|
||||
|
||||
<div class="doc_author">
|
||||
<p>Written by Eli Friedman</p>
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="introduction">Introduction</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>Historically, LLVM has not had very strong support for concurrency; some
|
||||
minimal intrinsics were provided, and <code>volatile</code> was used in some
|
||||
cases to achieve rough semantics in the presence of concurrency. However, this
|
||||
is changing; there are now new instructions which are well-defined in the
|
||||
presence of threads and asynchronous signals, and the model for existing
|
||||
instructions has been clarified in the IR.</p>
|
||||
|
||||
<p>The atomic instructions are designed specifically to provide readable IR and
|
||||
optimized code generation for the following:</p>
|
||||
<ul>
|
||||
<li>The new C++0x <code><atomic></code> header.</li>
|
||||
<li>Proper semantics for Java-style memory, for both <code>volatile</code> and
|
||||
regular shared variables.</li>
|
||||
<li>gcc-compatible <code>__sync_*</code> builtins.</li>
|
||||
<li>Other scenarios with atomic semantics, including <code>static</code>
|
||||
variables with non-trivial constructors in C++.</li>
|
||||
</ul>
|
||||
|
||||
<p>This document is intended to provide a guide to anyone either writing a
|
||||
frontend for LLVM or working on optimization passes for LLVM with a guide
|
||||
for how to deal with instructions with special semantics in the presence of
|
||||
concurrency. This is not intended to be a precise guide to the semantics;
|
||||
the details can get extremely complicated and unreadable, and are not
|
||||
usually necessary.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="loadstore">Load and store</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of
|
||||
optimizations, but can have unintuitive results in a concurrent environment.
|
||||
For a frontend writer, the rule is essentially that all memory accessed
|
||||
with basic loads and stores by multiple threads should be protected by a
|
||||
lock or other synchronization; otherwise, you are likely to run into
|
||||
undefined behavior. (Do not use volatile as a substitute for atomics; it
|
||||
might work on some platforms, but does not provide the necessary guarantees
|
||||
in general.)</p>
|
||||
|
||||
<p>From the optimizer's point of view, the rule is that if there
|
||||
are not any instructions with atomic ordering involved, concurrency does not
|
||||
matter, with one exception: if a variable might be visible to another
|
||||
thread or signal handler, a store cannot be inserted along a path where it
|
||||
might not execute otherwise. Note that speculative loads are allowed;
|
||||
a load which is part of a race returns <code>undef</code>, but is not
|
||||
undefined behavior.</p>
|
||||
|
||||
<p>For cases where simple loads and stores are not sufficient, LLVM provides
|
||||
atomic loads and stores with varying levels of guarantees.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="ordering">Atomic orderings</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>In order to achieve a balance between performance and necessary guarantees,
|
||||
there are six levels of atomicity. They are listed in order of strength;
|
||||
each level includes all the guarantees of the previous level except for
|
||||
Acquire/Release.</p>
|
||||
|
||||
<p>Unordered is the lowest level of atomicity. It essentially guarantees that
|
||||
races produce somewhat sane results instead of having undefined behavior.
|
||||
This is intended to match the Java memory model for shared variables. It
|
||||
cannot be used for synchronization, but is useful for Java and other
|
||||
"safe" languages which need to guarantee that the generated code never
|
||||
exhibits undefined behavior. Note that this guarantee is cheap on common
|
||||
platforms for loads of a native width, but can be expensive or unavailable
|
||||
for wider loads, like a 64-bit load on ARM. (A frontend for a "safe"
|
||||
language would normally split a 64-bit load on ARM into two 32-bit
|
||||
unordered loads.) In terms of the optimizer, this prohibits any
|
||||
transformation that transforms a single load into multiple loads,
|
||||
transforms a store into multiple stores, narrows a store, or stores a
|
||||
value which would not be stored otherwise. Some examples of unsafe
|
||||
optimizations are narrowing an assignment into a bitfield, rematerializing
|
||||
a load, and turning loads and stores into a memcpy call. Reordering
|
||||
unordered operations is safe, though, and optimizers should take
|
||||
advantage of that because unordered operations are common in
|
||||
languages that need them.</p>
|
||||
|
||||
<p>Monotonic is the weakest level of atomicity that can be used in
|
||||
synchronization primitives, although it does not provide any general
|
||||
synchronization. It essentially guarantees that if you take all the
|
||||
operations affecting a specific address, a consistent ordering exists.
|
||||
This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>; see
|
||||
those standards for the exact definition. If you are writing a frontend, do
|
||||
not use the low-level synchronization primitives unless you are compiling
|
||||
a language which requires it or are sure a given pattern is correct. In
|
||||
terms of the optimizer, this can be treated as a read+write on the relevant
|
||||
memory location (and alias analysis will take advantage of that). In
|
||||
addition, it is legal to reorder non-atomic and Unordered loads around
|
||||
Monotonic loads. CSE/DSE and a few other optimizations are allowed, but
|
||||
Monotonic operations are unlikely to be used in ways which would make
|
||||
those optimizations useful.</p>
|
||||
|
||||
<p>Acquire provides a barrier of the sort necessary to acquire a lock to access
|
||||
other memory with normal loads and stores. This corresponds to the
|
||||
C++0x/C1x <code>memory_order_acquire</code>. This is a low-level
|
||||
synchronization primitive. In general, optimizers should treat this like
|
||||
a nothrow call.</p>
|
||||
|
||||
<p>Release is similar to Acquire, but with a barrier of the sort necessary to
|
||||
release a lock.This corresponds to the C++0x/C1x
|
||||
<code>memory_order_release</code>.</p>
|
||||
|
||||
<p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a Release barrier.
|
||||
This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>. In general,
|
||||
optimizers should treat this like a nothrow call.</p>
|
||||
|
||||
<p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides Acquire and/or
|
||||
Release semantics, and in addition guarantees a total ordering exists with
|
||||
all other SequentiallyConsistent operations. This corresponds to the
|
||||
C++0x/C1x <code>memory_order_seq_cst</code>, and Java volatile. The intent
|
||||
of this ordering level is to provide a programming model which is relatively
|
||||
easy to understand. In general, optimizers should treat this like a
|
||||
nothrow call.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="otherinst">Other atomic instructions</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
|
||||
atomic load followed by an atomic store (where the store is conditional for
|
||||
<code>cmpxchg</code>), but no other memory operation operation can happen
|
||||
between the load and store.</p>
|
||||
|
||||
<p>A <code>fence</code> provides Acquire and/or Release ordering which is not
|
||||
part of another operation; it is normally used along with Monotonic memory
|
||||
operations. A Monotonic load followed by an Acquire fence is roughly
|
||||
equivalent to an Acquire load.</p>
|
||||
|
||||
<p>Frontends generating atomic instructions generally need to be aware of the
|
||||
target to some degree; atomic instructions are guaranteed to be lock-free,
|
||||
and therefore an instruction which is wider than the target natively supports
|
||||
can be impossible to generate.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="iropt">Atomics and IR optimization</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>Predicates for optimizer writers to query:
|
||||
<ul>
|
||||
<li>isSimple(): A load or store which is not volatile or atomic. This is
|
||||
what, for example, memcpyopt would check for operations it might
|
||||
transform.
|
||||
<li>isUnordered(): A load or store which is not volatile and at most
|
||||
Unordered. This would be checked, for example, by LICM before hoisting
|
||||
an operation.
|
||||
<li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
|
||||
that they returns true for any operation which is volatile or at least
|
||||
Monotonic.
|
||||
<li>Alias analysis: Note that AA will return ModRef for anything Acquire or
|
||||
Release, and for the address accessed by any Monotonic operation.
|
||||
</ul>
|
||||
|
||||
<p>There are essentially two components to supporting atomic operations. The
|
||||
first is making sure to query isSimple() or isUnordered() instead
|
||||
of isVolatile() before transforming an operation. The other piece is
|
||||
making sure that a transform does not end up replacing, for example, an
|
||||
Unordered operation with a non-atomic operation. Most of the other
|
||||
necessary checks automatically fall out from existing predicates and
|
||||
alias analysis queries.</p>
|
||||
|
||||
<p>Some examples of how optimizations interact with various kinds of atomic
|
||||
operations:
|
||||
<ul>
|
||||
<li>memcpyopt: An atomic operation cannot be optimized into part of a
|
||||
memcpy/memset, including unordered loads/stores. It can pull operations
|
||||
across some atomic operations.
|
||||
<li>LICM: Unordered loads/stores can be moved out of a loop. It just treats
|
||||
monotonic operations like a read+write to a memory location, and anything
|
||||
stricter than that like a nothrow call.
|
||||
<li>DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores
|
||||
can be DSE'ed in some cases, but it's tricky to reason about, and not
|
||||
especially important.
|
||||
<li>Folding a load: Any atomic load from a constant global can be
|
||||
constant-folded, because it cannot be observed. Similar reasoning allows
|
||||
scalarrepl with atomic loads and stores.
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="codegen">Atomics and Codegen</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>Atomic operations are represented in the SelectionDAG with
|
||||
<code>ATOMIC_*</code> opcodes. On architectures which use barrier
|
||||
instructions for all atomic ordering (like ARM), appropriate fences are
|
||||
split out as the DAG is built.</p>
|
||||
|
||||
<p>The MachineMemOperand for all atomic operations is currently marked as
|
||||
volatile; this is not correct in the IR sense of volatile, but CodeGen
|
||||
handles anything marked volatile very conservatively. This should get
|
||||
fixed at some point.</p>
|
||||
|
||||
<p>The implementation of atomics on LL/SC architectures (like ARM) is currently
|
||||
a bit of a mess; there is a lot of copy-pasted code across targets, and
|
||||
the representation is relatively unsuited to optimization (it would be nice
|
||||
to be able to optimize loops involving cmpxchg etc.).</p>
|
||||
|
||||
<p>On x86, all atomic loads generate a <code>MOV</code>.
|
||||
SequentiallyConsistent stores generate an <code>XCHG</code>, other stores
|
||||
generate a <code>MOV</code>. SequentiallyConsistent fences generate an
|
||||
<code>MFENCE</code>, other fences do not cause any code to be generated.
|
||||
cmpxchg uses the <code>LOCK CMPXCHG</code> instruction.
|
||||
<code>atomicrmw xchg</code> uses <code>XCHG</code>,
|
||||
<code>atomicrmw add</code> and <code>atomicrmw sub</code> use
|
||||
<code>XADD</code>, and all other <code>atomicrmw</code> operations generate
|
||||
a loop with <code>LOCK CMPXCHG</code>. Depending on the users of the
|
||||
result, some <code>atomicrmw</code> operations can be translated into
|
||||
operations like <code>LOCK AND</code>, but that does not work in
|
||||
general.</p>
|
||||
|
||||
<p>On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
|
||||
SequentiallyConsistent semantics require barrier instructions
|
||||
for every such operation. Loads and stores generate normal instructions.
|
||||
<code>atomicrmw</code> and <code>cmpxchg</code> generate LL/SC loops.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<hr>
|
||||
<address>
|
||||
<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
|
||||
src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
|
||||
<a href="http://validator.w3.org/check/referer"><img
|
||||
src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
|
||||
|
||||
<a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
|
||||
Last modified: $Date: 2011-08-09 02:07:00 -0700 (Tue, 09 Aug 2011) $
|
||||
</address>
|
||||
|
||||
</body>
|
||||
</html>
|
Loading…
Reference in New Issue