From a35a35fc90c939a241982c63e4cc27f0cab6ad68 Mon Sep 17 00:00:00 2001 From: Howard Hinnant Date: Wed, 6 Oct 2010 16:15:10 +0000 Subject: [PATCH] Updated docs with three design options llvm-svn: 115791 --- libcxx/www/atomic_design.html | 416 +---------------------------- libcxx/www/atomic_design_a.html | 126 +++++++++ libcxx/www/atomic_design_b.html | 247 +++++++++++++++++ libcxx/www/atomic_design_c.html | 458 ++++++++++++++++++++++++++++++++ 4 files changed, 839 insertions(+), 408 deletions(-) create mode 100644 libcxx/www/atomic_design_a.html create mode 100644 libcxx/www/atomic_design_b.html create mode 100644 libcxx/www/atomic_design_c.html diff --git a/libcxx/www/atomic_design.html b/libcxx/www/atomic_design.html index 0750b733c1f5..36e73244d911 100644 --- a/libcxx/www/atomic_design.html +++ b/libcxx/www/atomic_design.html @@ -36,422 +36,22 @@

-The <atomic> header is one of the most closely coupled headers to -the compiler. Ideally when you invoke any function from -<atomic>, it should result in highly optimized assembly being -inserted directly into your application ... assembly that is not otherwise -representable by higher level C or C++ expressions. The design of the libc++ -<atomic> header started with this goal in mind. A secondary, but -still very important goal is that the compiler should have to do minimal work to -faciliate the implementaiton of <atomic>. Without this second -goal, then practically speaking, the libc++ <atomic> header would -be doomed to be a barely supported, second class citizen on almost every -platform. +There are currently 3 designs under consideration. They differ in where most +of the implmentation work is done. The functionality exposed to the customer +should be identical (and conforming) for all three designs.

-

Goals:

- -
    -
  • Optimal code generation for atomic operations
  • -
  • Minimal effort for the compiler to achieve goal 1 on any given platform
  • -
  • Conformance to the C++0X draft standard
  • -
- -

-The purpose of this document is to inform compiler writers what they need to do -to enable a high performance libc++ <atomic> with minimal effort. -

- -

The minimal work that must be done for a conforming <atomic>

- -

-The only "atomic" operations that must actually be lock free in -<atomic> are represented by the following compiler intrinsics: -

- -
-__atomic_flag__
-__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
-{
-    unique_lock<mutex> _(some_mutex);
-    __atomic_flag__ result = *obj;
-    *obj = desr;
-    return result;
-}
-
-void
-__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
-{
-    unique_lock<mutex> _(some_mutex);
-    *obj = desr;
-}
-
- -

-Where: -

- -
    +
    1. -If __has_feature(__atomic_flag) evaluates to 1 in the preprocessor then -the compiler must define __atomic_flag__ (e.g. as a typedef to -int). +Minimal work for the library
    2. -If __has_feature(__atomic_flag) evaluates to 0 in the preprocessor then -the library defines __atomic_flag__ as a typedef to bool. +Something in between
    3. -

      -To communicate that the above intrinsics are available, the compiler must -arrange for __has_feature to return 1 when fed the intrinsic name -appended with an '_' and the mangled type name of __atomic_flag__. -

      -

      -For example if __atomic_flag__ is unsigned int: -

      -
      -__has_feature(__atomic_flag) == 1
      -__has_feature(__atomic_exchange_seq_cst_j) == 1
      -__has_feature(__atomic_store_seq_cst_j) == 1
      -
      -typedef unsigned int __atomic_flag__; 
      -
      -unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
      -{
      -   // ...
      -}
      -
      -void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
      -{
      -   // ...
      -}
      -
      +Minimal work for the front end
    4. -
- -

-That's it! Compiler writers do the above and you've got a fully conforming -(though sub-par performance) <atomic> header! -

- -

Recommended work for a higher performance <atomic>

- -

-It would be good if the above intrinsics worked with all integral types plus -void*. Because this may not be possible to do in a lock-free manner for -all integral types on all platforms, a compiler must communicate each type that -an intrinsic works with. For example if __atomic_exchange_seq_cst works -for all types except for long long and unsigned long long -then: -

- -
-__has_feature(__atomic_exchange_seq_cst_b) == 1  // bool
-__has_feature(__atomic_exchange_seq_cst_c) == 1  // char
-__has_feature(__atomic_exchange_seq_cst_a) == 1  // signed char
-__has_feature(__atomic_exchange_seq_cst_h) == 1  // unsigned char
-__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
-__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
-__has_feature(__atomic_exchange_seq_cst_w) == 1  // wchar_t
-__has_feature(__atomic_exchange_seq_cst_s) == 1  // short
-__has_feature(__atomic_exchange_seq_cst_t) == 1  // unsigned short
-__has_feature(__atomic_exchange_seq_cst_i) == 1  // int
-__has_feature(__atomic_exchange_seq_cst_j) == 1  // unsigned int
-__has_feature(__atomic_exchange_seq_cst_l) == 1  // long
-__has_feature(__atomic_exchange_seq_cst_m) == 1  // unsigned long
-__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
-
- -

-Note that only the __has_feature flag is decorated with the argument -type. The name of the compiler intrinsic is not decorated, but instead works -like a C++ overloaded function. -

- -

-Additionally there are other intrinsics besides -__atomic_exchange_seq_cst and __atomic_store_seq_cst. They -are optional. But if the compiler can generate faster code than provided by the -library, then clients will benefit from the compiler writer's expertise and -knowledge of the targeted platform. -

- -

-Below is the complete list of sequentially consistent intrinsics, and -their library implementations. Template syntax is used to indicate the desired -overloading for integral and void* types. The template does not represent a -requirement that the intrinsic operate on any type! -

- -
-T is one of:  bool, char, signed char, unsigned char, short, unsigned short,
-              int, unsigned int, long, unsigned long,
-              long long, unsigned long long, char16_t, char32_t, wchar_t, void*
-
-template <class T>
-T
-__atomic_load_seq_cst(T const volatile* obj)
-{
-    unique_lock<mutex> _(some_mutex);
-    return *obj;
-}
-
-template <class T>
-void
-__atomic_store_seq_cst(T volatile* obj, T desr)
-{
-    unique_lock<mutex> _(some_mutex);
-    *obj = desr;
-}
-
-template <class T>
-T
-__atomic_exchange_seq_cst(T volatile* obj, T desr)
-{
-    unique_lock<mutex> _(some_mutex);
-    T r = *obj;
-    *obj = desr;
-    return r;
-}
-
-template <class T>
-bool
-__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
-{
-    unique_lock<mutex> _(some_mutex);
-    if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
-    {
-        std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
-        return true;
-    }
-    std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
-    return false;
-}
-
-template <class T>
-bool
-__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
-{
-    unique_lock<mutex> _(some_mutex);
-    if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
-    {
-        std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
-        return true;
-    }
-    std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
-    return false;
-}
-
-T is one of:  char, signed char, unsigned char, short, unsigned short,
-              int, unsigned int, long, unsigned long,
-              long long, unsigned long long, char16_t, char32_t, wchar_t
-
-template <class T>
-T
-__atomic_fetch_add_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    T r = *obj;
-    *obj += operand;
-    return r;
-}
-
-template <class T>
-T
-__atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    T r = *obj;
-    *obj -= operand;
-    return r;
-}
-
-template <class T>
-T
-__atomic_fetch_and_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    T r = *obj;
-    *obj &= operand;
-    return r;
-}
-
-template <class T>
-T
-__atomic_fetch_or_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    T r = *obj;
-    *obj |= operand;
-    return r;
-}
-
-template <class T>
-T
-__atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    T r = *obj;
-    *obj ^= operand;
-    return r;
-}
-
-void*
-__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    void* r = *obj;
-    (char*&)(*obj) += operand;
-    return r;
-}
-
-void*
-__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
-{
-    unique_lock<mutex> _(some_mutex);
-    void* r = *obj;
-    (char*&)(*obj) -= operand;
-    return r;
-}
-
-void __atomic_thread_fence_seq_cst()
-{
-    unique_lock<mutex> _(some_mutex);
-}
-
-void __atomic_signal_fence_seq_cst()
-{
-    unique_lock<mutex> _(some_mutex);
-}
-
- -

-One should consult the (currently draft) -C++ standard -for the details of the definitions for these operations. For example -__atomic_compare_exchange_weak_seq_cst_seq_cst is allowed to fail -spuriously while __atomic_compare_exchange_strong_seq_cst_seq_cst is -not. -

- -

-If on your platform the lock-free definition of -__atomic_compare_exchange_weak_seq_cst_seq_cst would be the same as -__atomic_compare_exchange_strong_seq_cst_seq_cst, you may omit the -__atomic_compare_exchange_weak_seq_cst_seq_cst intrinsic without a -performance cost. The library will prefer your implementation of -__atomic_compare_exchange_strong_seq_cst_seq_cst over its own -definition for implementing -__atomic_compare_exchange_weak_seq_cst_seq_cst. That is, the library -will arrange for __atomic_compare_exchange_weak_seq_cst_seq_cst to call -__atomic_compare_exchange_strong_seq_cst_seq_cst if you supply an -intrinsic for the strong version but not the weak. -

- -

Taking advantage of weaker memory synchronization

- -

-So far all of the intrinsics presented require a sequentially -consistent memory ordering. That is, no loads or stores can move across -the operation (just as if the library had locked that internal mutex). But -<atomic> supports weaker memory ordering operations. In all, -there are six memory orderings (listed here from strongest to weakest): -

- -
-memory_order_seq_cst
-memory_order_acq_rel
-memory_order_release
-memory_order_acquire
-memory_order_consume
-memory_order_relaxed
-
- -

-(See the -C++ standard -for the detailed definitions of each of these orderings). -

- -

-On some platforms, the compiler vendor can offer some or even all of the above -intrinsics at one or more weaker levels of memory synchronization. This might -lead for example to not issuing an mfense instruction on the x86. -

- -

-If the compiler does not offer any given operation, at any given memory ordering -level, the library will automatically attempt to call the next highest memory -ordering operation. This continues up to seq_cst, and if that doesn't -exist, then the library takes over and does the job with a mutex. This -is a compile-time search & selection operation. At run time, the -application will only see the few inlined assembly instructions for the selected -intrinsic. -

- -

-Each intrinsic is appended with the 7-letter name of the memory ordering it -addresses. For example a load with relaxed ordering is -defined by: -

- -
-T __atomic_load_relaxed(const volatile T* obj);
-
- -

-And announced with: -

- -
-__has_feature(__atomic_load_relaxed_b) == 1  // bool
-__has_feature(__atomic_load_relaxed_c) == 1  // char
-__has_feature(__atomic_load_relaxed_a) == 1  // signed char
-...
-
- -

-The __atomic_compare_exchange_strong(weak) intrinsics are parameterized -on two memory orderings. The first ordering applies when the operation returns -true and the second ordering applies when the operation returns -false. -

- -

-Not every memory ordering is appropriate for every operation. exchange -and the fetch_op operations support all 6. But load -only supports relaxed, consume, acquire and seq_cst. -store -only supports relaxed, release, and seq_cst. The -compare_exchange operations support the following 16 combinations out -of the possible 36: -

- -
-relaxed_relaxed
-consume_relaxed
-consume_consume
-acquire_relaxed
-acquire_consume
-acquire_acquire
-release_relaxed
-release_consume
-release_acquire
-acq_rel_relaxed
-acq_rel_consume
-acq_rel_acquire
-seq_cst_relaxed
-seq_cst_consume
-seq_cst_acquire
-seq_cst_seq_cst
-
- -

-Again, the compiler supplies intrinsics only for the strongest orderings where -it can make a difference. The library takes care of calling the weakest -supplied intrinsic that is as strong or stronger than the customer asked for. -

+ diff --git a/libcxx/www/atomic_design_a.html b/libcxx/www/atomic_design_a.html new file mode 100644 index 000000000000..dc16568dc122 --- /dev/null +++ b/libcxx/www/atomic_design_a.html @@ -0,0 +1,126 @@ + + + + + + <atomic> design + + + + + + + +
+ +

<atomic> design

+ + +

+This is more of a synopsis than a true description. The compiler supplies all +of the intrinsics as described below. This list of intrinsics roughly parallels +the requirements of the C and C++ atomics proposals. The C and C++ library +imlpementations simply drop through to these intrinsics. Anything the platform +does not support in hardware, the compiler arranges for a (compiler-rt) library +call to be made which will do the job with a mutex, and in this case ignoring +the memory ordering parameter. +

+ +
+// type can be any pod
+// Behavior is defined for mem_ord = 0, 1, 2, 5
+type __atomic_load(const volatile type* atomic_obj, int mem_ord);
+
+// type can be any pod
+// Behavior is defined for mem_ord = 0, 3, 5
+type __atomic_store(volatile type* atomic_obj, type desired, int mem_ord);
+
+// type can be any pod
+// Behavior is defined for mem_ord = [0 ... 5]
+type __atomic_exchange(volatile type* atomic_obj, type desired, int mem_ord);
+
+// type can be any pod
+// Behavior is defined for mem_success = [0 ... 5],
+//   mem_falure <= mem_success && mem_failure != [3, 4]
+bool __atomic_compare_exchange_strong(volatile type* atomic_obj,
+                                      type* expected, type desired,
+                                      int mem_success, int mem_failure);
+
+// type can be any pod
+// Behavior is defined for mem_success = [0 ... 5],
+//   mem_falure <= mem_success && mem_failure != [3, 4]
+bool __atomic_compare_exchange_weak(volatile type* atomic_obj,
+                                    type* expected, type desired,
+                                    int mem_success, int mem_failure);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+// Behavior is defined for mem_ord = [0 ... 5]
+type __atomic_fetch_add(volatile type* atomic_obj, type operand, int mem_ord);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+// Behavior is defined for mem_ord = [0 ... 5]
+type __atomic_fetch_sub(volatile type* atomic_obj, type operand, int mem_ord);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+// Behavior is defined for mem_ord = [0 ... 5]
+type __atomic_fetch_and(volatile type* atomic_obj, type operand, int mem_ord);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+// Behavior is defined for mem_ord = [0 ... 5]
+type __atomic_fetch_or(volatile type* atomic_obj, type operand, int mem_ord);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+// Behavior is defined for mem_ord = [0 ... 5]
+type __atomic_fetch_xor(volatile type* atomic_obj, type operand, int mem_ord);
+
+// Behavior is defined for mem_ord = [0 ... 5]
+void* __atomic_fetch_add(void* volatile* atomic_obj, ptrdiff_t operand, int mem_ord);
+void* __atomic_fetch_sub(void* volatile* atomic_obj, ptrdiff_t operand, int mem_ord);
+
+// Behavior is defined for mem_ord = [0 ... 5]
+void __atomic_thread_fence(int mem_ord);
+void __atomic_signal_fence(int mem_ord);
+
+ +

+If desired the intrinsics taking a single mem_ord parameter can default +this argument to 5. +

+ +

+If desired the intrinsics taking two ordering parameters can default +mem_success to 5, and mem_failure to mem_success. +

+ +
+ + diff --git a/libcxx/www/atomic_design_b.html b/libcxx/www/atomic_design_b.html new file mode 100644 index 000000000000..7891050c132d --- /dev/null +++ b/libcxx/www/atomic_design_b.html @@ -0,0 +1,247 @@ + + + + + + <atomic> design + + + + + + + +
+ +

<atomic> design

+ + +

+This is a variation of design A which puts the burden on the library to arrange +for the correct manipulation of the run time memory ordering arguments, and only +calls the compiler for well-defined memory orderings. I think of this design as +the worst of A and C, instead of the best of A and C. But I offer it as an +option in the spirit of completeness. +

+ +
+// type can be any pod
+type __atomic_load_relaxed(const volatile type* atomic_obj);
+type __atomic_load_consume(const volatile type* atomic_obj);
+type __atomic_load_acquire(const volatile type* atomic_obj);
+type __atomic_load_seq_cst(const volatile type* atomic_obj);
+
+// type can be any pod
+type __atomic_store_relaxed(volatile type* atomic_obj, type desired);
+type __atomic_store_release(volatile type* atomic_obj, type desired);
+type __atomic_store_seq_cst(volatile type* atomic_obj, type desired);
+
+// type can be any pod
+type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired);
+type __atomic_exchange_consume(volatile type* atomic_obj, type desired);
+type __atomic_exchange_acquire(volatile type* atomic_obj, type desired);
+type __atomic_exchange_release(volatile type* atomic_obj, type desired);
+type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired);
+type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired);
+
+// type can be any pod
+bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj,
+                                                      type* expected,
+                                                      type desired);
+
+// type can be any pod
+bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj,
+                                                    type* expected,
+                                                    type desired);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand);
+
+// type is one of: char, signed char, unsigned char, short, unsigned short, int,
+//      unsigned int, long, unsigned long, long long, unsigned long long,
+//      char16_t, char32_t, wchar_t
+type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand);
+type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand);
+
+void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
+
+void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
+void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
+
+void __atomic_thread_fence_relaxed();
+void __atomic_thread_fence_consume();
+void __atomic_thread_fence_acquire();
+void __atomic_thread_fence_release();
+void __atomic_thread_fence_acq_rel();
+void __atomic_thread_fence_seq_cst();
+
+void __atomic_signal_fence_relaxed();
+void __atomic_signal_fence_consume();
+void __atomic_signal_fence_acquire();
+void __atomic_signal_fence_release();
+void __atomic_signal_fence_acq_rel();
+void __atomic_signal_fence_seq_cst();
+
+ +
+ + diff --git a/libcxx/www/atomic_design_c.html b/libcxx/www/atomic_design_c.html new file mode 100644 index 000000000000..ae2f5ffbc64b --- /dev/null +++ b/libcxx/www/atomic_design_c.html @@ -0,0 +1,458 @@ + + + + + + <atomic> design + + + + + + + +
+ +

<atomic> design

+ + +

+The <atomic> header is one of the most closely coupled headers to +the compiler. Ideally when you invoke any function from +<atomic>, it should result in highly optimized assembly being +inserted directly into your application ... assembly that is not otherwise +representable by higher level C or C++ expressions. The design of the libc++ +<atomic> header started with this goal in mind. A secondary, but +still very important goal is that the compiler should have to do minimal work to +faciliate the implementaiton of <atomic>. Without this second +goal, then practically speaking, the libc++ <atomic> header would +be doomed to be a barely supported, second class citizen on almost every +platform. +

+ +

Goals:

+ +
    +
  • Optimal code generation for atomic operations
  • +
  • Minimal effort for the compiler to achieve goal 1 on any given platform
  • +
  • Conformance to the C++0X draft standard
  • +
+ +

+The purpose of this document is to inform compiler writers what they need to do +to enable a high performance libc++ <atomic> with minimal effort. +

+ +

The minimal work that must be done for a conforming <atomic>

+ +

+The only "atomic" operations that must actually be lock free in +<atomic> are represented by the following compiler intrinsics: +

+ +
+__atomic_flag__
+__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
+{
+    unique_lock<mutex> _(some_mutex);
+    __atomic_flag__ result = *obj;
+    *obj = desr;
+    return result;
+}
+
+void
+__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
+{
+    unique_lock<mutex> _(some_mutex);
+    *obj = desr;
+}
+
+ +

+Where: +

+ +
    +
  • +If __has_feature(__atomic_flag) evaluates to 1 in the preprocessor then +the compiler must define __atomic_flag__ (e.g. as a typedef to +int). +
  • +
  • +If __has_feature(__atomic_flag) evaluates to 0 in the preprocessor then +the library defines __atomic_flag__ as a typedef to bool. +
  • +
  • +

    +To communicate that the above intrinsics are available, the compiler must +arrange for __has_feature to return 1 when fed the intrinsic name +appended with an '_' and the mangled type name of __atomic_flag__. +

    +

    +For example if __atomic_flag__ is unsigned int: +

    +
    +__has_feature(__atomic_flag) == 1
    +__has_feature(__atomic_exchange_seq_cst_j) == 1
    +__has_feature(__atomic_store_seq_cst_j) == 1
    +
    +typedef unsigned int __atomic_flag__; 
    +
    +unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
    +{
    +   // ...
    +}
    +
    +void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
    +{
    +   // ...
    +}
    +
    +
  • +
+ +

+That's it! Compiler writers do the above and you've got a fully conforming +(though sub-par performance) <atomic> header! +

+ +

Recommended work for a higher performance <atomic>

+ +

+It would be good if the above intrinsics worked with all integral types plus +void*. Because this may not be possible to do in a lock-free manner for +all integral types on all platforms, a compiler must communicate each type that +an intrinsic works with. For example if __atomic_exchange_seq_cst works +for all types except for long long and unsigned long long +then: +

+ +
+__has_feature(__atomic_exchange_seq_cst_b) == 1  // bool
+__has_feature(__atomic_exchange_seq_cst_c) == 1  // char
+__has_feature(__atomic_exchange_seq_cst_a) == 1  // signed char
+__has_feature(__atomic_exchange_seq_cst_h) == 1  // unsigned char
+__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
+__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
+__has_feature(__atomic_exchange_seq_cst_w) == 1  // wchar_t
+__has_feature(__atomic_exchange_seq_cst_s) == 1  // short
+__has_feature(__atomic_exchange_seq_cst_t) == 1  // unsigned short
+__has_feature(__atomic_exchange_seq_cst_i) == 1  // int
+__has_feature(__atomic_exchange_seq_cst_j) == 1  // unsigned int
+__has_feature(__atomic_exchange_seq_cst_l) == 1  // long
+__has_feature(__atomic_exchange_seq_cst_m) == 1  // unsigned long
+__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
+
+ +

+Note that only the __has_feature flag is decorated with the argument +type. The name of the compiler intrinsic is not decorated, but instead works +like a C++ overloaded function. +

+ +

+Additionally there are other intrinsics besides +__atomic_exchange_seq_cst and __atomic_store_seq_cst. They +are optional. But if the compiler can generate faster code than provided by the +library, then clients will benefit from the compiler writer's expertise and +knowledge of the targeted platform. +

+ +

+Below is the complete list of sequentially consistent intrinsics, and +their library implementations. Template syntax is used to indicate the desired +overloading for integral and void* types. The template does not represent a +requirement that the intrinsic operate on any type! +

+ +
+T is one of:  bool, char, signed char, unsigned char, short, unsigned short,
+              int, unsigned int, long, unsigned long,
+              long long, unsigned long long, char16_t, char32_t, wchar_t, void*
+
+template <class T>
+T
+__atomic_load_seq_cst(T const volatile* obj)
+{
+    unique_lock<mutex> _(some_mutex);
+    return *obj;
+}
+
+template <class T>
+void
+__atomic_store_seq_cst(T volatile* obj, T desr)
+{
+    unique_lock<mutex> _(some_mutex);
+    *obj = desr;
+}
+
+template <class T>
+T
+__atomic_exchange_seq_cst(T volatile* obj, T desr)
+{
+    unique_lock<mutex> _(some_mutex);
+    T r = *obj;
+    *obj = desr;
+    return r;
+}
+
+template <class T>
+bool
+__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
+{
+    unique_lock<mutex> _(some_mutex);
+    if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
+    {
+        std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
+        return true;
+    }
+    std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
+    return false;
+}
+
+template <class T>
+bool
+__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
+{
+    unique_lock<mutex> _(some_mutex);
+    if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
+    {
+        std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
+        return true;
+    }
+    std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
+    return false;
+}
+
+T is one of:  char, signed char, unsigned char, short, unsigned short,
+              int, unsigned int, long, unsigned long,
+              long long, unsigned long long, char16_t, char32_t, wchar_t
+
+template <class T>
+T
+__atomic_fetch_add_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    T r = *obj;
+    *obj += operand;
+    return r;
+}
+
+template <class T>
+T
+__atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    T r = *obj;
+    *obj -= operand;
+    return r;
+}
+
+template <class T>
+T
+__atomic_fetch_and_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    T r = *obj;
+    *obj &= operand;
+    return r;
+}
+
+template <class T>
+T
+__atomic_fetch_or_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    T r = *obj;
+    *obj |= operand;
+    return r;
+}
+
+template <class T>
+T
+__atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    T r = *obj;
+    *obj ^= operand;
+    return r;
+}
+
+void*
+__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    void* r = *obj;
+    (char*&)(*obj) += operand;
+    return r;
+}
+
+void*
+__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
+{
+    unique_lock<mutex> _(some_mutex);
+    void* r = *obj;
+    (char*&)(*obj) -= operand;
+    return r;
+}
+
+void __atomic_thread_fence_seq_cst()
+{
+    unique_lock<mutex> _(some_mutex);
+}
+
+void __atomic_signal_fence_seq_cst()
+{
+    unique_lock<mutex> _(some_mutex);
+}
+
+ +

+One should consult the (currently draft) +C++ standard +for the details of the definitions for these operations. For example +__atomic_compare_exchange_weak_seq_cst_seq_cst is allowed to fail +spuriously while __atomic_compare_exchange_strong_seq_cst_seq_cst is +not. +

+ +

+If on your platform the lock-free definition of +__atomic_compare_exchange_weak_seq_cst_seq_cst would be the same as +__atomic_compare_exchange_strong_seq_cst_seq_cst, you may omit the +__atomic_compare_exchange_weak_seq_cst_seq_cst intrinsic without a +performance cost. The library will prefer your implementation of +__atomic_compare_exchange_strong_seq_cst_seq_cst over its own +definition for implementing +__atomic_compare_exchange_weak_seq_cst_seq_cst. That is, the library +will arrange for __atomic_compare_exchange_weak_seq_cst_seq_cst to call +__atomic_compare_exchange_strong_seq_cst_seq_cst if you supply an +intrinsic for the strong version but not the weak. +

+ +

Taking advantage of weaker memory synchronization

+ +

+So far all of the intrinsics presented require a sequentially +consistent memory ordering. That is, no loads or stores can move across +the operation (just as if the library had locked that internal mutex). But +<atomic> supports weaker memory ordering operations. In all, +there are six memory orderings (listed here from strongest to weakest): +

+ +
+memory_order_seq_cst
+memory_order_acq_rel
+memory_order_release
+memory_order_acquire
+memory_order_consume
+memory_order_relaxed
+
+ +

+(See the +C++ standard +for the detailed definitions of each of these orderings). +

+ +

+On some platforms, the compiler vendor can offer some or even all of the above +intrinsics at one or more weaker levels of memory synchronization. This might +lead for example to not issuing an mfence instruction on the x86. +

+ +

+If the compiler does not offer any given operation, at any given memory ordering +level, the library will automatically attempt to call the next highest memory +ordering operation. This continues up to seq_cst, and if that doesn't +exist, then the library takes over and does the job with a mutex. This +is a compile-time search & selection operation. At run time, the +application will only see the few inlined assembly instructions for the selected +intrinsic. +

+ +

+Each intrinsic is appended with the 7-letter name of the memory ordering it +addresses. For example a load with relaxed ordering is +defined by: +

+ +
+T __atomic_load_relaxed(const volatile T* obj);
+
+ +

+And announced with: +

+ +
+__has_feature(__atomic_load_relaxed_b) == 1  // bool
+__has_feature(__atomic_load_relaxed_c) == 1  // char
+__has_feature(__atomic_load_relaxed_a) == 1  // signed char
+...
+
+ +

+The __atomic_compare_exchange_strong(weak) intrinsics are parameterized +on two memory orderings. The first ordering applies when the operation returns +true and the second ordering applies when the operation returns +false. +

+ +

+Not every memory ordering is appropriate for every operation. exchange +and the fetch_op operations support all 6. But load +only supports relaxed, consume, acquire and seq_cst. +store +only supports relaxed, release, and seq_cst. The +compare_exchange operations support the following 16 combinations out +of the possible 36: +

+ +
+relaxed_relaxed
+consume_relaxed
+consume_consume
+acquire_relaxed
+acquire_consume
+acquire_acquire
+release_relaxed
+release_consume
+release_acquire
+acq_rel_relaxed
+acq_rel_consume
+acq_rel_acquire
+seq_cst_relaxed
+seq_cst_consume
+seq_cst_acquire
+seq_cst_seq_cst
+
+ +

+Again, the compiler supplies intrinsics only for the strongest orderings where +it can make a difference. The library takes care of calling the weakest +supplied intrinsic that is as strong or stronger than the customer asked for. +

+ +
+ +