Some reorganization of atomic docs. Added explicit section for NonAtomic. Added example for illegal non-atomic operation.

llvm-svn: 137520
This commit is contained in:
Eli Friedman 2011-08-12 21:50:54 +00:00
parent f15dfe5818
commit c13f05c978
1 changed files with 111 additions and 32 deletions

View File

@ -14,8 +14,8 @@
<ol>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#loadstore">Load and store</a></li>
<li><a href="#otherinst">Other atomic instructions</a></li>
<li><a href="#outsideatomic">Optimization outside atomic</a></li>
<li><a href="#atomicinst">Atomic instructions</a></li>
<li><a href="#ordering">Atomic orderings</a></li>
<li><a href="#iropt">Atomics and IR optimization</a></li>
<li><a href="#codegen">Atomics and Codegen</a></li>
@ -75,51 +75,84 @@ instructions has been clarified in the IR.</p>
<!-- *********************************************************************** -->
<h2>
<a name="loadstore">Load and store</a>
<a name="outsideatomic">Optimization outside atomic</a>
</h2>
<!-- *********************************************************************** -->
<div>
<p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of
optimizations, but can have unintuitive results in a concurrent environment.
For a frontend writer, the rule is essentially that all memory accessed
with basic loads and stores by multiple threads should be protected by a
lock or other synchronization; otherwise, you are likely to run into
undefined behavior. (Do not use volatile as a substitute for atomics; it
might work on some platforms, but does not provide the necessary guarantees
in general.)</p>
optimizations, but can lead to undefined results in a concurrent environment;
see <a href="#o_nonatomic">NonAtomic</a>. This section specifically goes
into the one optimizer restriction which applies in concurrent environments,
which gets a bit more of an extended description because any optimization
dealing with stores needs to be aware of it.</p>
<p>From the optimizer's point of view, the rule is that if there
are not any instructions with atomic ordering involved, concurrency does
not matter, with one exception: if a variable might be visible to another
thread or signal handler, a store cannot be inserted along a path where it
might not execute otherwise. For example, suppose LICM wants to take all the
loads and stores in a loop to and from a particular address and promote them
to registers. LICM is not allowed to insert an unconditional store after
the loop with the computed value unless a store unconditionally executes
within the loop. Note that speculative loads are allowed; a load which
might not execute otherwise. Take the following example:</p>
<pre>
/* C code, for readability; run through clang -O2 -S -emit-llvm to get
equivalent IR */
int x;
void f(int* a) {
for (int i = 0; i &lt; 100; i++) {
if (a[i])
x += 1;
}
}
</pre>
<p>The following is equivalent in non-concurrent situations:</p>
<pre>
int x;
void f(int* a) {
int xtemp = x;
for (int i = 0; i &lt; 100; i++) {
if (a[i])
xtemp += 1;
}
x = xtemp;
}
</pre>
<p>However, LLVM is not allowed to transform the former to the latter: it could
introduce undefined behavior if another thread can access x at the same time.
(This example is particularly of interest because before the concurrency model
was implemented, LLVM would perform this transformation.)</p>
<p>Note that speculative loads are allowed; a load which
is part of a race returns <code>undef</code>, but does not have undefined
behavior.</p>
<p>For cases where simple loads and stores are not sufficient, LLVM provides
atomic loads and stores with varying levels of guarantees.</p>
</div>
<!-- *********************************************************************** -->
<h2>
<a name="otherinst">Other atomic instructions</a>
<a name="atomicinst">Atomic instructions</a>
</h2>
<!-- *********************************************************************** -->
<div>
<p>For cases where simple loads and stores are not sufficient, LLVM provides
various atomic instructions. The exact guarantees provided depend on the
ordering; see <a href="#ordering">Atomic orderings</a></p>
<p><code>load atomic</code> and <code>store atomic</code> provide the same
basic functionality as non-atomic loads and stores, but provide additional
guarantees in situations where threads and signals are involved.</p>
<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
atomic load followed by an atomic store (where the store is conditional for
<code>cmpxchg</code>), but no other memory operation can happen between
the load and store. Note that our cmpxchg does not have quite as many
options for making cmpxchg weaker as the C++0x version.</p>
<code>cmpxchg</code>), but no other memory operation can happen on any thread
between the load and store. Note that LLVM's cmpxchg does not provide quite
as many options as the C++0x version.</p>
<p>A <code>fence</code> provides Acquire and/or Release ordering which is not
part of another operation; it is normally used along with Monotonic memory
@ -146,6 +179,54 @@ instructions has been clarified in the IR.</p>
each level includes all the guarantees of the previous level except for
Acquire/Release.</p>
<!-- ======================================================================= -->
<h3>
<a name="o_notatomic">NotAtomic</a>
</h3>
<div>
<p>NotAtomic is the obvious, a load or store which is not atomic. (This isn't
really a level of atomicity, but is listed here for comparison.) This is
essentially a regular load or store. If code accesses a memory location
from multiple threads at the same time, the resulting loads return
'undef'.</p>
<dl>
<dt>Relevant standard</dt>
<dd>This is intended to match shared variables in C/C++, and to be used
in any other context where memory access is necessary, and
a race is impossible.
<dt>Notes for frontends</dt>
<dd>The rule is essentially that all memory accessed with basic loads and
stores by multiple threads should be protected by a lock or other
synchronization; otherwise, you are likely to run into undefined
behavior. If your frontend is for a "safe" language like Java,
use Unordered to load and store any shared variable. Note that NotAtomic
volatile loads and stores are not properly atomic; do not try to use
them as a substitute. (Per the C/C++ standards, volatile does provide
some limited guarantees around asynchronous signals, but atomics are
generally a better solution.)
<dt>Notes for optimizers</dt>
<dd>Introducing loads to shared variables along a codepath where they would
not otherwise exist is allowed; introducing stores to shared variables
is not. See <a href="#outsideatomic">Optimization outside
atomic</a>.</dd>
<dt>Notes for code generation</dt>
<dd>The one interesting restriction here is that it is not allowed to write
to bytes outside of the bytes relevant to a store. This is mostly
relevant to unaligned stores: it is not allowed in general to convert
an unaligned store into two aligned stores of the same width as the
unaligned store. Backends are also expected to generate an i8 store
as an i8 store, and not an instruction which writes to surrounding
bytes. (If you are writing a backend for an architecture which cannot
satisfy these restrictions and cares about concurrency, please send an
email to llvmdev.)</dd>
</dl>
</div>
<!-- ======================================================================= -->
<h3>
<a name="o_unordered">Unordered</a>
@ -379,24 +460,22 @@ instructions has been clarified in the IR.</p>
<ul>
<li>isSimple(): A load or store which is not volatile or atomic. This is
what, for example, memcpyopt would check for operations it might
transform.
transform.</li>
<li>isUnordered(): A load or store which is not volatile and at most
Unordered. This would be checked, for example, by LICM before hoisting
an operation.
an operation.</li>
<li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
that they return true for any operation which is volatile or at least
Monotonic.
Monotonic.</li>
<li>Alias analysis: Note that AA will return ModRef for anything Acquire or
Release, and for the address accessed by any Monotonic operation.
Release, and for the address accessed by any Monotonic operation.</li>
</ul>
<p>There are essentially two components to supporting atomic operations. The
first is making sure to query isSimple() or isUnordered() instead
of isVolatile() before transforming an operation. The other piece is
making sure that a transform does not end up replacing, for example, an
Unordered operation with a non-atomic operation. Most of the other
necessary checks automatically fall out from existing predicates and
alias analysis queries.</p>
<p>To support optimizing around atomic operations, make sure you are using
the right predicates; everything should work if that is done. If your
pass should optimize some atomic operations (Unordered operations in
particular), make sure it doesn't replace an atomic load or store with
a non-atomic operation.</p>
<p>Some examples of how optimizations interact with various kinds of atomic
operations: