forked from OSchip/llvm-project
Some reorganization of atomic docs. Added explicit section for NonAtomic. Added example for illegal non-atomic operation.
llvm-svn: 137520
This commit is contained in:
parent
f15dfe5818
commit
c13f05c978
|
@ -14,8 +14,8 @@
|
|||
|
||||
<ol>
|
||||
<li><a href="#introduction">Introduction</a></li>
|
||||
<li><a href="#loadstore">Load and store</a></li>
|
||||
<li><a href="#otherinst">Other atomic instructions</a></li>
|
||||
<li><a href="#outsideatomic">Optimization outside atomic</a></li>
|
||||
<li><a href="#atomicinst">Atomic instructions</a></li>
|
||||
<li><a href="#ordering">Atomic orderings</a></li>
|
||||
<li><a href="#iropt">Atomics and IR optimization</a></li>
|
||||
<li><a href="#codegen">Atomics and Codegen</a></li>
|
||||
|
@ -75,51 +75,84 @@ instructions has been clarified in the IR.</p>
|
|||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="loadstore">Load and store</a>
|
||||
<a name="outsideatomic">Optimization outside atomic</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of
|
||||
optimizations, but can have unintuitive results in a concurrent environment.
|
||||
For a frontend writer, the rule is essentially that all memory accessed
|
||||
with basic loads and stores by multiple threads should be protected by a
|
||||
lock or other synchronization; otherwise, you are likely to run into
|
||||
undefined behavior. (Do not use volatile as a substitute for atomics; it
|
||||
might work on some platforms, but does not provide the necessary guarantees
|
||||
in general.)</p>
|
||||
optimizations, but can lead to undefined results in a concurrent environment;
|
||||
see <a href="#o_nonatomic">NonAtomic</a>. This section specifically goes
|
||||
into the one optimizer restriction which applies in concurrent environments,
|
||||
which gets a bit more of an extended description because any optimization
|
||||
dealing with stores needs to be aware of it.</p>
|
||||
|
||||
<p>From the optimizer's point of view, the rule is that if there
|
||||
are not any instructions with atomic ordering involved, concurrency does
|
||||
not matter, with one exception: if a variable might be visible to another
|
||||
thread or signal handler, a store cannot be inserted along a path where it
|
||||
might not execute otherwise. For example, suppose LICM wants to take all the
|
||||
loads and stores in a loop to and from a particular address and promote them
|
||||
to registers. LICM is not allowed to insert an unconditional store after
|
||||
the loop with the computed value unless a store unconditionally executes
|
||||
within the loop. Note that speculative loads are allowed; a load which
|
||||
might not execute otherwise. Take the following example:</p>
|
||||
|
||||
<pre>
|
||||
/* C code, for readability; run through clang -O2 -S -emit-llvm to get
|
||||
equivalent IR */
|
||||
int x;
|
||||
void f(int* a) {
|
||||
for (int i = 0; i < 100; i++) {
|
||||
if (a[i])
|
||||
x += 1;
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>The following is equivalent in non-concurrent situations:</p>
|
||||
|
||||
<pre>
|
||||
int x;
|
||||
void f(int* a) {
|
||||
int xtemp = x;
|
||||
for (int i = 0; i < 100; i++) {
|
||||
if (a[i])
|
||||
xtemp += 1;
|
||||
}
|
||||
x = xtemp;
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>However, LLVM is not allowed to transform the former to the latter: it could
|
||||
introduce undefined behavior if another thread can access x at the same time.
|
||||
(This example is particularly of interest because before the concurrency model
|
||||
was implemented, LLVM would perform this transformation.)</p>
|
||||
|
||||
<p>Note that speculative loads are allowed; a load which
|
||||
is part of a race returns <code>undef</code>, but does not have undefined
|
||||
behavior.</p>
|
||||
|
||||
<p>For cases where simple loads and stores are not sufficient, LLVM provides
|
||||
atomic loads and stores with varying levels of guarantees.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="otherinst">Other atomic instructions</a>
|
||||
<a name="atomicinst">Atomic instructions</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>For cases where simple loads and stores are not sufficient, LLVM provides
|
||||
various atomic instructions. The exact guarantees provided depend on the
|
||||
ordering; see <a href="#ordering">Atomic orderings</a></p>
|
||||
|
||||
<p><code>load atomic</code> and <code>store atomic</code> provide the same
|
||||
basic functionality as non-atomic loads and stores, but provide additional
|
||||
guarantees in situations where threads and signals are involved.</p>
|
||||
|
||||
<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
|
||||
atomic load followed by an atomic store (where the store is conditional for
|
||||
<code>cmpxchg</code>), but no other memory operation can happen between
|
||||
the load and store. Note that our cmpxchg does not have quite as many
|
||||
options for making cmpxchg weaker as the C++0x version.</p>
|
||||
<code>cmpxchg</code>), but no other memory operation can happen on any thread
|
||||
between the load and store. Note that LLVM's cmpxchg does not provide quite
|
||||
as many options as the C++0x version.</p>
|
||||
|
||||
<p>A <code>fence</code> provides Acquire and/or Release ordering which is not
|
||||
part of another operation; it is normally used along with Monotonic memory
|
||||
|
@ -146,6 +179,54 @@ instructions has been clarified in the IR.</p>
|
|||
each level includes all the guarantees of the previous level except for
|
||||
Acquire/Release.</p>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_notatomic">NotAtomic</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>NotAtomic is the obvious, a load or store which is not atomic. (This isn't
|
||||
really a level of atomicity, but is listed here for comparison.) This is
|
||||
essentially a regular load or store. If code accesses a memory location
|
||||
from multiple threads at the same time, the resulting loads return
|
||||
'undef'.</p>
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This is intended to match shared variables in C/C++, and to be used
|
||||
in any other context where memory access is necessary, and
|
||||
a race is impossible.
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>The rule is essentially that all memory accessed with basic loads and
|
||||
stores by multiple threads should be protected by a lock or other
|
||||
synchronization; otherwise, you are likely to run into undefined
|
||||
behavior. If your frontend is for a "safe" language like Java,
|
||||
use Unordered to load and store any shared variable. Note that NotAtomic
|
||||
volatile loads and stores are not properly atomic; do not try to use
|
||||
them as a substitute. (Per the C/C++ standards, volatile does provide
|
||||
some limited guarantees around asynchronous signals, but atomics are
|
||||
generally a better solution.)
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>Introducing loads to shared variables along a codepath where they would
|
||||
not otherwise exist is allowed; introducing stores to shared variables
|
||||
is not. See <a href="#outsideatomic">Optimization outside
|
||||
atomic</a>.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>The one interesting restriction here is that it is not allowed to write
|
||||
to bytes outside of the bytes relevant to a store. This is mostly
|
||||
relevant to unaligned stores: it is not allowed in general to convert
|
||||
an unaligned store into two aligned stores of the same width as the
|
||||
unaligned store. Backends are also expected to generate an i8 store
|
||||
as an i8 store, and not an instruction which writes to surrounding
|
||||
bytes. (If you are writing a backend for an architecture which cannot
|
||||
satisfy these restrictions and cares about concurrency, please send an
|
||||
email to llvmdev.)</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_unordered">Unordered</a>
|
||||
|
@ -379,24 +460,22 @@ instructions has been clarified in the IR.</p>
|
|||
<ul>
|
||||
<li>isSimple(): A load or store which is not volatile or atomic. This is
|
||||
what, for example, memcpyopt would check for operations it might
|
||||
transform.
|
||||
transform.</li>
|
||||
<li>isUnordered(): A load or store which is not volatile and at most
|
||||
Unordered. This would be checked, for example, by LICM before hoisting
|
||||
an operation.
|
||||
an operation.</li>
|
||||
<li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
|
||||
that they return true for any operation which is volatile or at least
|
||||
Monotonic.
|
||||
Monotonic.</li>
|
||||
<li>Alias analysis: Note that AA will return ModRef for anything Acquire or
|
||||
Release, and for the address accessed by any Monotonic operation.
|
||||
Release, and for the address accessed by any Monotonic operation.</li>
|
||||
</ul>
|
||||
|
||||
<p>There are essentially two components to supporting atomic operations. The
|
||||
first is making sure to query isSimple() or isUnordered() instead
|
||||
of isVolatile() before transforming an operation. The other piece is
|
||||
making sure that a transform does not end up replacing, for example, an
|
||||
Unordered operation with a non-atomic operation. Most of the other
|
||||
necessary checks automatically fall out from existing predicates and
|
||||
alias analysis queries.</p>
|
||||
<p>To support optimizing around atomic operations, make sure you are using
|
||||
the right predicates; everything should work if that is done. If your
|
||||
pass should optimize some atomic operations (Unordered operations in
|
||||
particular), make sure it doesn't replace an atomic load or store with
|
||||
a non-atomic operation.</p>
|
||||
|
||||
<p>Some examples of how optimizations interact with various kinds of atomic
|
||||
operations:
|
||||
|
|
Loading…
Reference in New Issue