Commit Graph

20 Commits

Author SHA1 Message Date
Mark Rutland 6d2779ecae locking/atomic: scripts: fix fallback ifdeffery
Since commit:

  9257959a6e ("locking/atomic: scripts: restructure fallback ifdeffery")

The ordering fallbacks for atomic*_read_acquire() and
atomic*_set_release() erroneously fall back to the implictly relaxed
atomic*_read() and atomic*_set() variants respectively, without any
additional barriers. This loses the ACQUIRE and RELEASE ordering
semantics, which can result in a wide variety of problems, even on
strongly-ordered architectures where the implementation of
atomic*_read() and/or atomic*_set() allows the compiler to reorder those
relative to other accesses.

In practice this has been observed to break bit spinlocks on arm64,
resulting in dentry cache corruption.

The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to
be defined in terms of FULL ops, but where an op had RELAXED ordering by
default, this unintentionally permitted the ACQUIRE/RELEASE ops to be
defined in terms of the implicitly RELAXED default.

This patch corrects the logic to avoid falling back to implicitly
RELAXED ops, resulting in the same behaviour as prior to commit
9257959a6e.

I've verified the resulting assembly on arm64 by generating outlined
wrappers of the atomics. Prior to this patch the compiler generates
sequences using relaxed load (LDR) and store (STR) instructions, e.g.

| <outlined_atomic64_read_acquire>:
|         ldr     x0, [x0]
|         ret
|
| <outlined_atomic64_set_release>:
|         str     x1, [x0]
|         ret

With this patch applied the compiler generates sequences using the
intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.

| <outlined_atomic64_read_acquire>:
|         ldar    x0, [x0]
|         ret
|
| <outlined_atomic64_set_release>:
|         stlr    x1, [x0]
|         ret

To make sure that there were no other victims of the ifdeffery rewrite,
I generated outlined copies of all of the {atomic,atomic64,atomic_long}
atomic operations before and after commit 9257959a6e. A diff of
the generated assembly on arm64 shows that only the read_acquire() and
set_release() operations were changed, and only lost their intended
ordering:

| [mark@lakrids:~/src/linux]% diff -u \
| 	<(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o)
| 	<(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o)
| --- /proc/self/fd/11    2023-09-19 16:51:51.114779415 +0100
| +++ /proc/self/fd/16    2023-09-19 16:51:51.114779415 +0100
| @@ -1,5 +1,5 @@
|
| -before-9257959a6e5b4fca.o:     file format elf64-littleaarch64
| +after-9257959a6e5b4fca.o:     file format elf64-littleaarch64
|
|
|  Disassembly of section .text:
| @@ -9,7 +9,7 @@
|         4:      d65f03c0        ret
|
|  0000000000000008 <outlined_atomic_read_acquire>:
| -       8:      88dffc00        ldar    w0, [x0]
| +       8:      b9400000        ldr     w0, [x0]
|         c:      d65f03c0        ret
|
|  0000000000000010 <outlined_atomic_set>:
| @@ -17,7 +17,7 @@
|        14:      d65f03c0        ret
|
|  0000000000000018 <outlined_atomic_set_release>:
| -      18:      889ffc01        stlr    w1, [x0]
| +      18:      b9000001        str     w1, [x0]
|        1c:      d65f03c0        ret
|
|  0000000000000020 <outlined_atomic_add>:
| @@ -1230,7 +1230,7 @@
|      1070:      d65f03c0        ret
|
|  0000000000001074 <outlined_atomic64_read_acquire>:
| -    1074:      c8dffc00        ldar    x0, [x0]
| +    1074:      f9400000        ldr     x0, [x0]
|      1078:      d65f03c0        ret
|
|  000000000000107c <outlined_atomic64_set>:
| @@ -1238,7 +1238,7 @@
|      1080:      d65f03c0        ret
|
|  0000000000001084 <outlined_atomic64_set_release>:
| -    1084:      c89ffc01        stlr    x1, [x0]
| +    1084:      f9000001        str     x1, [x0]
|      1088:      d65f03c0        ret
|
|  000000000000108c <outlined_atomic64_add>:
| @@ -2427,7 +2427,7 @@
|      207c:      d65f03c0        ret
|
|  0000000000002080 <outlined_atomic_long_read_acquire>:
| -    2080:      c8dffc00        ldar    x0, [x0]
| +    2080:      f9400000        ldr     x0, [x0]
|      2084:      d65f03c0        ret
|
|  0000000000002088 <outlined_atomic_long_set>:
| @@ -2435,7 +2435,7 @@
|      208c:      d65f03c0        ret
|
|  0000000000002090 <outlined_atomic_long_set_release>:
| -    2090:      c89ffc01        stlr    x1, [x0]
| +    2090:      f9000001        str     x1, [x0]
|      2094:      d65f03c0        ret
|
|  0000000000002098 <outlined_atomic_long_add>:

I've build tested this with a variety of configs for alpha, arm, arm64,
csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv,
s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I
was unable to build test for ia64 and parisc due to existing build
breakage in v6.6-rc2.

Fixes: 9257959a6e ("locking/atomic: scripts: restructure fallback ifdeffery")
Reported-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com
2023-09-20 09:39:03 +02:00
Mark Rutland b33eb50a92 locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
The ${atomic}_dec_if_positive() ops are unlike all the other conditional
atomic ops. Rather than returning a boolean success value, these return
the value that the atomic variable would be updated to, even when no
update is performed.

We missed this when adding kerneldoc comments, and the documentation for
${atomic}_dec_if_positive() erroneously states:

| Return: @true if @v was updated, @false otherwise.

Ideally we'd clean this up by aligning ${atomic}_dec_if_positive() with
the usual atomic op conventions: with ${atomic}_fetch_dec_if_positive()
for those who care about the value of the varaible, and
${atomic}_dec_if_positive() returning a boolean success value.

In the mean time, align the documentation with the current reality.

Fixes: ad8110706f ("locking/atomic: scripts: generate kerneldoc comments")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20230615132734.1119765-1-mark.rutland@arm.com
2023-06-16 16:46:30 +02:00
Mark Rutland ad8110706f locking/atomic: scripts: generate kerneldoc comments
Currently the atomics are documented in Documentation/atomic_t.txt, and
have no kerneldoc comments. There are a sufficient number of gotchas
(e.g. semantics, noinstr-safety) that it would be nice to have comments
to call these out, and it would be nice to have kerneldoc comments such
that these can be collated.

While it's possible to derive the semantics from the code, this can be
painful given the amount of indirection we currently have (e.g. fallback
paths), and it's easy to be mislead by naming, e.g.

* The unconditional void-returning ops *only* have relaxed variants
  without a _relaxed suffix, and can easily be mistaken for being fully
  ordered.

  It would be nice to give these a _relaxed() suffix, but this would
  result in significant churn throughout the kernel.

* Our naming of conditional and unconditional+test ops is rather
  inconsistent, and it can be difficult to derive the name of an
  operation, or to identify where an op is conditional or
  unconditional+test.

  Some ops are clearly conditional:
  - dec_if_positive
  - add_unless
  - dec_unless_positive
  - inc_unless_negative

  Some ops are clearly unconditional+test:
  - sub_and_test
  - dec_and_test
  - inc_and_test

  However, what exactly those test is not obvious. A _test_zero suffix
  might be clearer.

  Others could be read ambiguously:
  - inc_not_zero	// conditional
  - add_negative	// unconditional+test

  It would probably be worth renaming these, e.g. to inc_unless_zero and
  add_test_negative.

As a step towards making this more consistent and easier to understand,
this patch adds kerneldoc comments for all generated *atomic*_*()
functions. These are generated from templates, with some common text
shared, making it easy to extend these in future if necessary.

I've tried to make these as consistent and clear as possible, and I've
deliberately ensured:

* All ops have their ordering explicitly mentioned in the short and long
  description.

* All test ops have "test" in their short description.

* All ops are described as an expression using their usual C operator.
  For example:

  andnot: "Atomically updates @v to (@v & ~@i)"
  inc:    "Atomically updates @v to (@v + 1)"

  Which may be clearer to non-naative English speakers, and allows all
  the operations to be described in the same style.

* All conditional ops have their condition described as an expression
  using the usual C operators. For example:

  add_unless: "If (@v != @u), atomically updates @v to (@v + @i)"
  cmpxchg:    "If (@v == @old), atomically updates @v to @new"

  Which may be clearer to non-naative English speakers, and allows all
  the operations to be described in the same style.

* All bitwise ops (and,andnot,or,xor) explicitly mention that they are
  bitwise in their short description, so that they are not mistaken for
  performing their logical equivalents.

* The noinstr safety of each op is explicitly described, with a
  description of whether or not to use the raw_ form of the op.

There should be no functional change as a result of this patch.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-26-mark.rutland@arm.com
2023-06-05 09:57:23 +02:00
Mark Rutland 1d78814d41 locking/atomic: scripts: simplify raw_atomic*() definitions
Currently each ordering variant has several potential definitions,
with a mixture of preprocessor and C definitions, including several
copies of its C prototype, e.g.

| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
|       int ret = arch_atomic_fetch_andnot_relaxed(i, v);
|       __atomic_acquire_fence();
|       return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
|       return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif

Make this a bit simpler by defining the C prototype once, and writing
the various potential definitions as plain C code guarded by ifdeffery.
For example, the above becomes:

| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| #if defined(arch_atomic_fetch_andnot_acquire)
|         return arch_atomic_fetch_andnot_acquire(i, v);
| #elif defined(arch_atomic_fetch_andnot_relaxed)
|         int ret = arch_atomic_fetch_andnot_relaxed(i, v);
|         __atomic_acquire_fence();
|         return ret;
| #elif defined(arch_atomic_fetch_andnot)
|         return arch_atomic_fetch_andnot(i, v);
| #else
|         return raw_atomic_fetch_and_acquire(~i, v);
| #endif
| }

Which is far easier to read. As we now always have a single copy of the
C prototype wrapping all the potential definitions, we now have an
obvious single location for kerneldoc comments.

At the same time, the fallbacks for raw_atomic*_xhcg() are made to use
'new' rather than 'i' as the name of the new value. This is what the
existing fallback template used, and is more consistent with the
raw_atomic{_try,}cmpxchg() fallbacks.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-24-mark.rutland@arm.com
2023-06-05 09:57:22 +02:00
Mark Rutland 630399469f locking/atomic: scripts: simplify raw_atomic_long*() definitions
Currently, atomic-long is split into two sections, one defining the
raw_atomic_long_*() ops for CONFIG_64BIT, and one defining the raw
atomic_long_*() ops for !CONFIG_64BIT.

With many lines elided, this looks like:

| #ifdef CONFIG_64BIT
| ...
| static __always_inline bool
| raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| {
|         return raw_atomic64_try_cmpxchg(v, (s64 *)old, new);
| }
| ...
| #else /* CONFIG_64BIT */
| ...
| static __always_inline bool
| raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| {
|         return raw_atomic_try_cmpxchg(v, (int *)old, new);
| }
| ...
| #endif

The two definitions are spread far apart in the file, and duplicate the
prototype, making it hard to have a legible set of kerneldoc comments.

Make this simpler by defining the C prototype once, and writing the two
definitions inline. For example, the above becomes:

| static __always_inline bool
| raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| {
| #ifdef CONFIG_64BIT
|         return raw_atomic64_try_cmpxchg(v, (s64 *)old, new);
| #else
|         return raw_atomic_try_cmpxchg(v, (int *)old, new);
| #endif
| }

As we now always have a single copy of the C prototype wrapping all the
potential definitions, we now have an obvious single location for kerneldoc
comments. As a bonus, both the script and the generated file are
somewhat shorter.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-23-mark.rutland@arm.com
2023-06-05 09:57:22 +02:00
Mark Rutland 9257959a6e locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are
defined in groups of full/acquire/release/relaxed ordering variants with
some shared ifdeffery and several potential definitions of each ordering
variant in different branches of the shared ifdeffery.

As an ordering variant can have several potential definitions down
different branches of the shared ifdeffery, it can be painful for a
human to find a relevant definition, and we don't have a good location
to place anything common to all definitions of an ordering variant (e.g.
kerneldoc).

Historically the grouping of full/acquire/release/relaxed ordering
variants was necessary as we filled in the missing atomics in the same
namespace as the architecture used. It would be easy to accidentally
define one ordering fallback in terms of another ordering fallback with
redundant barriers, and avoiding that would otherwise require a lot of
baroque ifdeffery.

With recent changes we no longer need to fill in the missing atomics in
the arch_atomic*_<op>() namespace, and only need to fill in the
raw_atomic*_<op>() namespace. Due to this, there's no risk of a
namespace collision, and we can define each raw_atomic*_<op> ordering
variant with its own ifdeffery checking for the arch_atomic*_<op>
ordering variants.

Restructure the fallbacks in this way, with each ordering variant having
its own ifdeffery of the form:

| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| 	int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| 	__atomic_acquire_fence();
| 	return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| 	return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif

Note that where there's no relevant arch_atomic*_<op>() ordering
variant, we'll define the operation in terms of a distinct
raw_atomic*_<otherop>(), as this itself might have been filled in with a
fallback.

As we now generate the raw_atomic*_<op>() implementations directly, we
no longer need the trivial wrappers, so they are removed.

This makes the ifdeffery easier to follow, and will allow for further
improvements in subsequent patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
2023-06-05 09:57:21 +02:00
Mark Rutland 1815da1718 locking/atomic: scripts: build raw_atomic_long*() directly
Now that arch_atomic*() usage is limited to the atomic headers, we no
longer have any users of arch_atomic_long_*(), and can generate
raw_atomic_long_*() directly.

Generate the raw_atomic_long_*() ops directly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-20-mark.rutland@arm.com
2023-06-05 09:57:20 +02:00
Mark Rutland c9268ac615 locking/atomic: scripts: add trivial raw_atomic*_<op>()
Currently a number of arch_atomic*_<op>() functions are optional, and
where an arch does not provide a given arch_atomic*_<op>() we will
define an implementation of arch_atomic*_<op>() in
atomic-arch-fallback.h.

Filling in the missing ops requires special care as we want to select
the optimal definition of each op (e.g. preferentially defining ops in
terms of their relaxed form rather than their fully-ordered form). The
ifdeffery necessary for this requires us to group ordering variants
together, which can be a bit painful to read, and is painful for
kerneldoc generation.

It would be easier to handle this if we generated ops into a separate
namespace, as this would remove the need to take special care with the
ifdeffery, and allow each ordering variant to be generated separately.

This patch adds a new set of raw_atomic_<op>() definitions, which are
currently trivial wrappers of their arch_atomic_<op>() equivalent. This
will allow us to move treewide users of arch_atomic_<op>() over to raw
atomic op before we rework the fallback generation to generate
raw_atomic_<op> directly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-18-mark.rutland@arm.com
2023-06-05 09:57:19 +02:00
Mark Rutland d12157efc8 locking/atomic: make atomic*_{cmp,}xchg optional
Most architectures define the atomic/atomic64 xchg and cmpxchg
operations in terms of arch_xchg and arch_cmpxchg respectfully.

Add fallbacks for these cases and remove the trivial cases from arch
code. On some architectures the existing definitions are kept as these
are used to build other arch_atomic*() operations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-5-mark.rutland@arm.com
2023-06-05 09:57:14 +02:00
Mark Rutland 14d72d4b6f locking/atomic: remove fallback comments
Currently a subset of the fallback templates have kerneldoc comments,
resulting in a haphazard set of generated kerneldoc comments as only
some operations have fallback templates to begin with.

We'd like to generate more consistent kerneldoc comments, and to do so
we'll need to restructure the way the fallback code is generated.

To minimize churn and to make it easier to restructure the fallback
code, this patch removes the existing kerneldoc comments from the
fallback templates. We can add new kerneldoc comments in subsequent
patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-3-mark.rutland@arm.com
2023-06-05 09:57:13 +02:00
Peter Zijlstra febe950dbf arch: Remove cmpxchg_double
No moar users, remove the monster.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
2023-06-05 09:36:39 +02:00
Peter Zijlstra 8c8b096a23 instrumentation: Wire up cmpxchg128()
Wire up the cmpxchg128 family in the atomic wrapper scripts.

These provide the generic cmpxchg128 family of functions from the
arch_ prefixed version, adding explicit instrumentation where needed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.519237070@infradead.org
2023-06-05 09:36:36 +02:00
Mark Rutland ec570320b0 locking/atomic: Correct (cmp)xchg() instrumentation
All xchg() and cmpxchg() ops are atomic RMWs, but currently we
instrument these with instrument_atomic_write() rather than
instrument_atomic_read_write(), missing the read aspect.

Similarly, all try_cmpxchg() ops are non-atomic RMWs on *oldp, but we
instrument these accesses with instrument_atomic_write() rather than
instrument_read_write(), missing the read aspect and erroneously marking
these as atomic.

Fix the instrumentation for both points.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20230413160644.490976-1-mark.rutland@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:31 +02:00
Uros Bizjak e6ce9d7411 locking/atomic: Add generic try_cmpxchg{,64}_local() support
Add generic support for try_cmpxchg{,64}_local() and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230405141710.3551-2-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:02 +02:00
Thomas Gleixner e5ab9eff46 atomics: Provide atomic_add_negative() variants
atomic_add_negative() does not provide the relaxed/acquire/release
variants.

Provide them in preparation for a new scalable reference count algorithm.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230323102800.101763813@linutronix.de
2023-03-28 10:39:29 +02:00
Uros Bizjak 0aa7be05d8 locking/atomic: Add generic try_cmpxchg64 support
Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed}
and their falbacks involving cmpxchg64.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com
2022-05-18 00:08:27 +02:00
Mark Rutland dc1b4df09a atomics: Fix atomic64_{read_acquire,set_release} fallbacks
Arnd reports that on 32-bit architectures, the fallbacks for
atomic64_read_acquire() and atomic64_set_release() are broken as they
use smp_load_acquire() and smp_store_release() respectively, which do
not work on types larger than the native word size.

Since those contain compiletime_assert_atomic_type(), any attempt to use
those fallbacks will result in a build-time error. e.g. with the
following added to arch/arm/kernel/setup.c:

| void test_atomic64(atomic64_t *v)
| {
|        atomic64_set_release(v, 5);
|        atomic64_read_acquire(v);
| }

The compiler will complain as follows:

| In file included from <command-line>:
| In function 'arch_atomic64_set_release',
|     inlined from 'test_atomic64' at ./include/linux/atomic/atomic-instrumented.h:669:2:
| ././include/linux/compiler_types.h:346:38: error: call to '__compiletime_assert_9' declared with attribute error: Need native word sized stores/loads for atomicity.
|   346 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
|       |                                      ^
| ././include/linux/compiler_types.h:327:4: note: in definition of macro '__compiletime_assert'
|   327 |    prefix ## suffix();    \
|       |    ^~~~~~
| ././include/linux/compiler_types.h:346:2: note: in expansion of macro '_compiletime_assert'
|   346 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
|       |  ^~~~~~~~~~~~~~~~~~~
| ././include/linux/compiler_types.h:349:2: note: in expansion of macro 'compiletime_assert'
|   349 |  compiletime_assert(__native_word(t),    \
|       |  ^~~~~~~~~~~~~~~~~~
| ./include/asm-generic/barrier.h:133:2: note: in expansion of macro 'compiletime_assert_atomic_type'
|   133 |  compiletime_assert_atomic_type(*p);    \
|       |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| ./include/asm-generic/barrier.h:164:55: note: in expansion of macro '__smp_store_release'
|   164 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
|       |                                                       ^~~~~~~~~~~~~~~~~~~
| ./include/linux/atomic/atomic-arch-fallback.h:1270:2: note: in expansion of macro 'smp_store_release'
|  1270 |  smp_store_release(&(v)->counter, i);
|       |  ^~~~~~~~~~~~~~~~~
| make[2]: *** [scripts/Makefile.build:288: arch/arm/kernel/setup.o] Error 1
| make[1]: *** [scripts/Makefile.build:550: arch/arm/kernel] Error 2
| make: *** [Makefile:1831: arch/arm] Error 2

Fix this by only using smp_load_acquire() and smp_store_release() for
native atomic types, and otherwise falling back to the regular barriers
necessary for acquire/release semantics, as we do in the more generic
acquire and release fallbacks.

Since the fallback templates are used to generate the atomic64_*() and
atomic_*() operations, the __native_word() check is added to both. For
the atomic_*() operations, which are always 32-bit, the __native_word()
check is redundant but not harmful, as it is always true.

For the example above this works as expected on 32-bit, e.g. for arm
multi_v7_defconfig:

| <test_atomic64>:
|         push    {r4, r5}
|         dmb     ish
|         pldw    [r0]
|         mov     r2, #5
|         mov     r3, #0
|         ldrexd  r4, [r0]
|         strexd  r4, r2, [r0]
|         teq     r4, #0
|         bne     484 <test_atomic64+0x14>
|         ldrexd  r2, [r0]
|         dmb     ish
|         pop     {r4, r5}
|         bx      lr

... and also on 64-bit, e.g. for arm64 defconfig:

| <test_atomic64>:
|         bti     c
|         paciasp
|         mov     x1, #0x5
|         stlr    x1, [x0]
|         ldar    x0, [x0]
|         autiasp
|         ret

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20220207101943.439825-1-mark.rutland@arm.com
2022-02-11 12:13:56 +01:00
Marco Elver e87c4f6642 locking/atomics, kcsan: Add instrumentation for barriers
Adds the required KCSAN instrumentation for barriers of atomics.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-09 16:42:27 -08:00
Mark Rutland 67d1b0de25 locking/atomic: add arch_atomic_long*()
Now that all architectures provide arch_{atomic,atomic64}_*(), we can
build arch_atomic_long_*() atop these, which can be safely used in
noinstr code. The regular atomic_long_*() wrappers are built atop these,
as we do for {atomic,atomic64}_*() atop arch_{atomic,atomic64}_*().

We don't provide arch_* versions of the cond_read*() variants, as we
don't have arch_* versions of the underlying atomic/atomic64 functions
(nor the smp_cond_load*() helpers these are typically based on).

Note that the headers in this patch under include/linux/atomic/ are
generated by the scripts in scripts/atomic/.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210713105253.7615-5-mark.rutland@arm.com
2021-07-16 18:46:45 +02:00
Mark Rutland e3d18cee25 locking/atomic: centralize generated headers
The generated atomic headers are only intended to be included directly
by <linux/atomic.h>, but are spread across include/linux/ and
include/asm-generic/, where people mnay be encouraged to include them.

This patch centralizes them under include/linux/atomic/.

Other than the header guards and hashes, there is no change to any of
the generated headers as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210713105253.7615-4-mark.rutland@arm.com
2021-07-16 18:46:45 +02:00