removal of kernel/rcu/srcu.c. Also correct a stray pointer in
memory-barriers.txt.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZZqwLAAoJEI3ONVYwIuV6KJoP/0Ci87obaOe/zBCgPN+bNPD3
QCe4HjihMKHmXNJbqX2kQFHwrruKKa6mS/OfKGy+UOmGVcBdNlOvMm2NjV4npw2P
Av9HUmSiheaVIgWkbkv4Ov705DxAd3G8SqKwau34jNSajg1LmtStcEAwgM7yXQNb
/+r2i4SWG+S5MnuJQ319qtYHyBBoXjY0PcMAaPL+n6xJ/Lt7pFjHqXuHe+rjH44U
a7NODob3zquHELmqu3gIDNvnnaRR4yHmk66bYa4jFqRf9CIvuY3DrnM61gWq69vb
ROv7rLKX8VEE8RShx4Gn+F6akEPYTUtTCRO8dKeT0RK4nPlhVeObF1Fq+9Ttj1xx
SvmDhp3Kq/xI6aXXH76THOMxwoi6bykZFfZoi3zD7d6PQBmdwGPKDxPRMVOzsUqS
E2DIo/8LSReWGoqat51DjmHAg5npoxSdcuw0BU++OVaoDTYCQ2iaPn5htIiQ/8SM
jG4fjgMJB9I6ub/RrxVSTPPSgYC/bdRTSH0q4D7lC5/uh4jAhSM2GMtwprejQ//k
DgUaY1UHutVnpGoceRE0jw3nj5mZZ9ILr/5iRYOztemjCJOSa/MSfyVkgRObmOKc
P6ihr0QZq34nwOL0K/JTQml+NuszcDy53+hBIOdXhikl4wsy0BjbYysFC5gzH4V3
DhIRdwTeTKleq0tnKNOl
=JEoA
-----END PGP SIGNATURE-----
Merge tag '4.13-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A set of fixes for various warnings, including the one caused by the
removal of kernel/rcu/srcu.c. Also correct a stray pointer in
memory-barriers.txt"
* tag '4.13-fixes' of git://git.lwn.net/linux:
kokr/memory-barriers.txt: Fix obsolete link to atomic_ops.txt
memory-barriers.txt: Fix broken link to atomic_ops.txt
docs: Turn off section numbering for the input docs
docs: Include uaccess docs from the right file
docs: Do not include from kernel/rcu/srcu.c
Few obsolete links to atomic_ops.txt exist in memory-barriers.txt though
the file has moved to core-api/atomic_ops.rst. This commit fixes the
obsolete links.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZWkGAAAoJEI3ONVYwIuV6rf0P/0B3JTiVPKS/WUx53+jzbAi4
1BN7dmmuMxE1bWpgdEq+ac4aKxm07iAojuntuMj0qz/ZB1WARcmvEqqzI5i4wfq9
5MrLduLkyuWfr4MOPseKJ2VK83p8nkMOiO7jmnBsilu7fE4nF+5YY9j4cVaArfMy
cCQvAGjQzvej2eiWMGUSLHn4QFKh00aD7cwKyBVsJ08b27C9xL0J2LQyCDZ4yDgf
37/MH3puEd3HX/4qAwLonIxT3xrIrrbDturqLU7OSKcWTtGZNrYyTFbwR3RQtqWd
H8YZVg2Uyhzg9MYhkbQ2E5dEjUP4mkegcp6/JTINH++OOPpTbdTJgirTx7VTkSf1
+kL8t7+Ayxd0FH3+77GJ5RMj8LUK6rj5cZfU5nClFQKWXP9UL3IelQ3Nl+SpdM8v
ZAbR2KjKgH9KS6+cbIhgFYlvY+JgPkOVruwbIAc7wXVM3ibk1sWoBOFEujcbueWh
yDpQv3l1UX0CKr3jnevJoW26LtEbGFtC7gSKZ+3btyeSBpWFGlii42KNycEGwUW0
ezlwryDVHzyTUiKllNmkdK4v73mvPsZHEjgmme4afKAIiUilmcUF4XcqD86hISFT
t+UJLA/zEU+0sJe26o2nK6GNJzmo4oCtVyxfhRe26Ojs1n80xlYgnZRfuIYdd31Z
nwLBnwDCHAOyX91WXp9G
=cVjZ
-----END PGP SIGNATURE-----
Merge tag 'docs-4.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"There has been a fair amount of activity in the docs tree this time
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates"
* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
scripts/kernel-doc: handle DECLARE_HASHTABLE
Documentation: atomic_ops.txt is core-api/atomic_ops.rst
Docs: clean up some DocBook loose ends
Make the main documentation title less Geocities
Docs: Use kernel-figure in vidioc-g-selection.rst
Docs: fix table problems in ras.rst
Docs: Fix breakage with Sphinx 1.5 and upper
Docs: Include the Latex "ifthen" package
doc/kokr/howto: Only send regression fixes after -rc1
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
doc: Document suitability of IBM Verse for kernel development
Doc: fix a markup error in coding-style.rst
docs: driver-api: i2c: remove some outdated information
Documentation: DMA API: fix a typo in a function name
Docs: Insert missing space to separate link from text
doc/ko_KR/memory-barriers: Update control-dependencies example
Documentation, kbuild: fix typo "minimun" -> "minimum"
docs: Fix some formatting issues in request-key.rst
doc: ReSTify keys-trusted-encrypted.txt
doc: ReSTify keys-request-key.txt
...
I was reading the memory barries documentation in order to make sure the
RISC-V barries were correct, and I found a broken link to the atomic
operations documentation.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit changes "architecure" to the correct spelling,
"architecture".
Signed-off-by: Stan Drozd <drozdziak1@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
until the input pull was done. There's also a few small fixes that
wandered in.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZFKWWAAoJEI3ONVYwIuV6e1QP/iuwl4dBo9wl8KZAarErCWSf
uAXx33ca4dnPOoxbDwRtg41ioSrF69vVjJp35oyBBSOVyDhQiLvec0Fq6EGObRo4
Xsoe0JwvadqY+aTETxXm3Id4aZHk3OMyCtZRinomeU9tN5dRNQaffvLG6Rtl7JK2
/tlqeNTmD1hLoD7azuvhwKfPYWDaYvmqd5I6F/2ANhYyJSg7pivpuN3Xfpf8GaiG
wmaCqF0OLKBXfTPQwC69YX7PPye3AOGUbIYt6RwfVhutKkXjazsH7n3vSJD63UFn
8/1eh+UjOGWfriz5oih3DTd+Hf1A7KhRCyxoOSReo78Z3k3YXqmOFgCCgfE0oX71
2WIdEBHiNnHNH8bdaTMIfqjG0JvNzYmXq+uXvHWiW8juhE/2ZhNRG+uyrRiADctm
TWjC4v219yY5h2IMcfyWOyLi1q/4zM3nmc8J1Ia+J9jH6ICKM+8sMrHaMyoSlCIQ
FcYe6gQdD1NepV675QiZUObCxUWu840jNSqYtF+0Ck/DREJKDEAdS7cmCxfDSAgf
oh6LsSHZy8tMyeOa9Wf9d5EPzK/eqd7x/vMKPI3jaETrwdXZzjO6Bn3codWx/gl6
7E/xqE8qCremcOWHIyx9jhd6T38swD+NScLU9CfRCKh+mddRDy3IiXD6NBo+m6Rq
echBXgxl/P0CIck2W4Ql
=Zk24
-----END PGP SIGNATURE-----
Merge tag 'docs-4.12-2' of git://git.lwn.net/linux
Pull more documentation updates from Jonathan Corbet:
"Connect the newly RST-formatted documentation to the rest; this had to
wait until the input pull was done. There's also a few small fixes
that wandered in"
* tag 'docs-4.12-2' of git://git.lwn.net/linux:
doc: replace FTP URL to kernel.org with HTTPS one
docs: update references to the device io book
Documentation: earlycon: fix Marvell Armada 3700 UART name
docs-rst: add input docs at main index and use kernel-figure
While converting the deviceiobook from DocBook to RST, dangling
references were left behind. This commit updates all remaining
references to the new location. SeongJae Park improved the ko_KR
translation.
Fixes: 8a8a602fdb ("docs: Convert the deviceio template to RST")
Signed-off-by: Helmut Grohne <h.grohne@intenta.de>
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
In the following example, if MAX is defined to be 1, then the compiler
knows (Q % MAX) is equal to zero. The compiler can therefore throw
away the "then" branch (and the "if"), retaining only the "else" branch.
q = READ_ONCE(a);
if (q % MAX) {
WRITE_ONCE(b, 1);
do_something();
} else {
WRITE_ONCE(b, 2);
do_something_else();
}
It is therefore necessary to modify the example like this:
q = READ_ONCE(a);
- WRITE_ONCE(b, 1);
+ WRITE_ONCE(b, 2);
do_something_else();
Signed-off-by: pierre Kuo <vichy.kuo@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds consistency to examples, formatting, and a couple of
additional warnings.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Nothing in the control-dependencies section of memory-barriers.txt
says that control dependencies don't extend beyond the end of the
if-statement containing the control dependency. Worse yet, in many
situations, they do extend beyond that if-statement. In particular,
the compiler cannot destroy the control dependency given proper use of
READ_ONCE() and WRITE_ONCE(). However, a weakly ordered system having
a conditional-move instruction provides the control-dependency guarantee
only to code within the scope of the if-statement itself.
This commit therefore adds words and an example demonstrating this
limitation of control dependencies.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160615230817.GA18039@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For compound atomics performing both a load and a store operation, make
it clear that _acquire and _release variants refer only to the load and
store portions of compound atomic. For example, xchg_acquire is an xchg
operation where the load takes on ACQUIRE semantics.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dave@stgolabs.net
Cc: dhowells@redhat.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1461691328-5429-3-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It appears people are reading this document as a requirements list for
building hardware. This is not the intent of this document. Nor is it
particularly suited for this purpose.
The primary purpose of this document is our collective attempt to define
a set of primitives that (hopefully) allow us to write correct code on
the myriad of SMP platforms Linux supports.
Its a definite work in progress as our understanding of these platforms,
and memory ordering in general, progresses.
Nor does being mentioned in this document mean we think its a
particularly good idea; the data dependency barrier required by Alpha
being a prime example. Yes we have it, no you're insane to require it
when building new hardware.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: dave@stgolabs.net
Cc: dhowells@redhat.com
Cc: linux-doc@vger.kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1461691328-5429-1-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The compiler store-fusion example in memory-barriers.txt uses a C
comment to represent arbitrary code that does not update a given
variable. Unfortunately, someone could reasonably interpret the
comment as instead referring to the following line of code. This
commit therefore replaces the comment with a string that more
clearly represents the arbitrary code.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The "transitivity" section mentions cumulativity in a potentially
confusing way. Contrary to the current wording, cumulativity is
not transitivity, but rather a hardware discipline that can be used
to implement transitivity on ARM and PowerPC CPUs. This commit
therefore deletes the mention of cumulativity.
Reported-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The memory-barriers.txt discussion of local transitivity and
release-acquire chains leaves out discussion of the outcome of
the read from "u". This commit therefore adds an outcome showing
that you can get a "1" from this read even if the release-acquire
pairs don't line up.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The introduction of smp_load_acquire() and smp_store_release() had
the side effect of introducing a weaker notion of transitivity:
The transitivity of full smp_mb() barriers is global, but that
of smp_store_release()/smp_load_acquire() chains is local. This
commit therefore introduces the notion of local transitivity and
gives an example.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current memory-barriers.txt does not address the possibility of
a write to a dereferenced pointer. This should be rare, but when it
happens, we need that write -not- to be clobbered by the initialization.
This commit therefore adds an example showing a data dependency ordering
a later data-dependent write.
Reported-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit #1ebee8017d84 (rcu: Eliminate array-index-based RCU primitives)
eliminated the primitives supporting RCU-protected array indexes, but
failed to update Documentation/memory-barriers.txt accordingly. This
commit therefore removes the discussion of RCU-protected array indexes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit fixes a couple of "Compiler Barrier" section references to
be "COMPILER BARRIER". This makes it easier to find the section in
the usual text editors.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The summary of the "CONTROL DEPENDENCIES" section incorrectly states that
barrier() may be used to prevent compiler reordering when more than one
leg of the control-dependent "if" statement start with identical stores.
This is incorrect at high optimization levels. This commit therefore
updates the summary to match the detailed description.
Reported by: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This adds a new kind of barrier, and reworks virtio and xen
to use it.
Plus some fixes here and there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWlU2kAAoJECgfDbjSjVRpZ6IH/Ra19ecG8sCQo9zskr4zo22Z
DZXC3u0sJDBYjjBAiw3IY1FKh7wx2Fr1RhUOj1bteBgcFCMCV1zInP5ITiCyzd1H
YYh1w9C2tZaj2T4t9L4hIrAdtIF8fGS+oI2IojXPjOuDLEt6pfFBEjHp/sfl3UJq
ZmZvw4OXviSNej7jBw8Xni3Uv18yfmLGXvMdkvMSPC1/XL29voGDqTVwhqJwxLVz
k/ZLcKFOzIs9N7Nja0Jl1EiZtC2Y9cpItqweicNAzszlpkSL44vQxmCSefB+WyQ4
gt0O3+AxYkLfrxzCBhUA4IpRex3/XPW1b+1e/V1XjfR2n/FlyLe+AIa8uPJElFc=
=ukaV
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio barrier rework+fixes from Michael Tsirkin:
"This adds a new kind of barrier, and reworks virtio and xen to use it.
Plus some fixes here and there"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits)
checkpatch: add virt barriers
checkpatch: check for __smp outside barrier.h
checkpatch.pl: add missing memory barriers
virtio: make find_vqs() checkpatch.pl-friendly
virtio_balloon: fix race between migration and ballooning
virtio_balloon: fix race by fill and leak
s390: more efficient smp barriers
s390: use generic memory barriers
xen/events: use virt_xxx barriers
xen/io: use virt_xxx barriers
xenbus: use virt_xxx barriers
virtio_ring: use virt_store_mb
sh: move xchg_cmpxchg to a header by itself
sh: support 1 and 2 byte xchg
virtio_ring: update weak barriers to use virt_xxx
Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
asm-generic: implement virt_xxx memory barriers
x86: define __smp_xxx
xtensa: define __smp_xxx
tile: define __smp_xxx
...
Guests running within virtual machines might be affected by SMP effects even if
the guest itself is compiled without SMP support. This is an artifact of
interfacing with an SMP host while running an UP kernel. Using mandatory
barriers for this use-case would be possible but is often suboptimal.
In particular, virtio uses a bunch of confusing ifdefs to work around
this, while xen just uses the mandatory barriers.
To better handle this case, low-level virt_mb() etc macros are made available.
These are implemented trivially using the low-level __smp_xxx macros,
the purpose of these wrappers is to annotate those specific cases.
These have the same effect as smp_mb() etc when SMP is enabled, but generate
identical code for SMP and non-SMP systems. For example, virtual machine guests
should use virt_mb() rather than smp_mb() when synchronizing against a
(possibly SMP) host.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Pull locking updates from Ingo Molnar:
"So we have a laundry list of locking subsystem changes:
- continuing barrier API and code improvements
- futex enhancements
- atomics API improvements
- pvqspinlock enhancements: in particular lock stealing and adaptive
spinning
- qspinlock micro-enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
futex: Cleanup the goto confusion in requeue_pi()
futex: Remove pointless put_pi_state calls in requeue()
futex: Document pi_state refcounting in requeue code
futex: Rename free_pi_state() to put_pi_state()
futex: Drop refcount if requeue_pi() acquired the rtmutex
locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
lcoking/barriers, arch: Use smp barriers in smp_store_release()
locking/cmpxchg, arch: Remove tas() definitions
locking/pvqspinlock: Queue node adaptive spinning
locking/pvqspinlock: Allow limited lock stealing
locking/pvqspinlock: Collect slowpath lock statistics
sched/core, locking: Document Program-Order guarantees
locking, sched: Introduce smp_cond_acquire() and use it
locking/pvqspinlock, x86: Optimize the PV unlock code path
locking/qspinlock: Avoid redundant read of next pointer
locking/qspinlock: Prefetch the next node cacheline
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
atomics: Add test for atomic operations with _relaxed variants
In commit 2ecf810121 ("Documentation/memory-barriers.txt: Add
needed ACCESS_ONCE() calls to memory-barriers.txt") the statement
"Q = P" was converted to "ACCESS_ONCE(Q) = P". This should have
been "Q = ACCESS_ONCE(P)". It later became "WRITE_ONCE(Q, P)".
This doesn't match the following text, which is "Q = LOAD P".
Change the statement to be "Q = READ_ONCE(P)".
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It serves no purpose but to confuse readers, and is most
likely a left over from constant memory-barriers.txt
updates. I.e.:
http://lists.openwall.net/linux-kernel/2006/07/15/27
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445975631-17047-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This seems to be a mis-reading of how alpha memory ordering works, and
is not backed up by the alpha architecture manual. The helper functions
don't do anything special on any other architectures, and the arguments
that support them being safe on other architectures also argue that they
are safe on alpha.
Basically, the "control dependency" is between a previous read and a
subsequent write that is dependent on the value read. Even if the
subsequent write is actually done speculatively, there is no way that
such a speculative write could be made visible to other cpu's until it
has been committed, which requires validating the speculation.
Note that most weakely ordered architectures (very much including alpha)
do not guarantee any ordering relationship between two loads that depend
on each other on a control dependency:
read A
if (val == 1)
read B
because the conditional may be predicted, and the "read B" may be
speculatively moved up to before reading the value A. So we require the
user to insert a smp_rmb() between the two accesses to be correct:
read A;
if (A == 1)
smp_rmb()
read B
Alpha is further special in that it can break that ordering even if the
*address* of B depends on the read of A, because the cacheline that is
read later may be stale unless you have a memory barrier in between the
pointer read and the read of the value behind a pointer:
read ptr
read offset(ptr)
whereas all other weakly ordered architectures guarantee that the data
dependency (as opposed to just a control dependency) will order the two
accesses. As a result, alpha needs a "smp_read_barrier_depends()" in
between those two reads for them to be ordered.
The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
had was a control dependency to a subsequent *write*, however, and
nobody can finalize such a subsequent write without having actually done
the read. And were you to write such a value to a "stale" cacheline
(the way the unordered reads came to be), that would seem to lose the
write entirely.
So the things that make alpha able to re-order reads even more
aggressively than other weak architectures do not seem to be relevant
for a subsequent write. Alpha memory ordering may be strange, but
there's no real indication that it is *that* strange.
Also, the alpha architecture reference manual very explicitly talks
about the definition of "Dependence Constraints" in section 5.6.1.7,
where a preceding read dominates a subsequent write.
Such a dependence constraint admittedly does not impose a BEFORE (alpha
architecture term for globally visible ordering), but it does guarantee
that there can be no "causal loop". I don't see how you could avoid
such a loop if another cpu could see the stored value and then impact
the value of the first read. Put another way: the read and the write
could not be seen as being out of order wrt other cpus.
So I do not see how these "x_ctrl()" functions can currently be necessary.
I may have to eat my words at some point, but in the absense of clear
proof that alpha actually needs this, or indeed even an explanation of
how alpha could _possibly_ need it, I do not believe these functions are
called for.
And if it turns out that alpha really _does_ need a barrier for this
case, that barrier still should not be "smp_read_barrier_depends()".
We'd have to make up some new speciality barrier just for alpha, along
with the documentation for why it really is necessary.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul E McKenney <paulmck@us.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull locking changes from Ingo Molnar:
"The main changes in this cycle were:
- More gradual enhancements to atomic ops: new atomic*_read_ctrl()
ops, synchronize atomic_{read,set}() ordering requirements between
architectures, add atomic_long_t bitops. (Peter Zijlstra)
- Add _{relaxed|acquire|release}() variants for inc/dec atomics and
use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
This enables weakly ordered architectures (such as arm64) to make
use of more locking related optimizations. (Davidlohr Bueso)
- Implement atomic[64]_{inc,dec}_relaxed() on ARM. (Will Deacon)
- Futex kernel data cache footprint micro-optimization. (Rasmus
Villemoes)
- pvqspinlock runtime overhead micro-optimization. (Waiman Long)
- misc smaller fixlets"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
locking/rwsem: Use acquire/release semantics
locking/mcs: Use acquire/release semantics
locking/rtmutex: Use acquire/release semantics
locking/mutex: Use acquire/release semantics
locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
atomic: Implement atomic_read_ctrl()
atomic, arch: Audit atomic_{read,set}()
atomic: Add atomic_long_t bitops
futex: Force hot variables into a single cache line
locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
locking/osq: Relax atomic semantics
locking/qrwlock: Rename ->lock to ->wait_lock
locking/Documentation/lockstat: Fix typo - lokcing -> locking
locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
The recently added lockless_dereference() macro is not present in the
Documentation/ directory, so this commit fixes that.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Documentation/memory-barriers.txt calls out RCU as one of the sets
of primitives associated with ACQUIRE and RELEASE. There really
is an association in that rcu_assign_pointer() includes a RELEASE
operation, but a quick read can convince people that rcu_read_lock() and
rcu_read_unlock() have ACQUIRE and RELEASE semantics, which they do not.
This commit therefore removes RCU from this list in order to avoid
this confusion.
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
more conveniently use atomics in control dependencies.
Since we can assume atomic_read() implies a READ_ONCE(), we must only
emit an extra smp_read_barrier_depends() in order to upgrade to
READ_ONCE_CTRL() semantics.
Requested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20150918115637.GM3604@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull locking and atomic updates from Ingo Molnar:
"Main changes in this cycle are:
- Extend atomic primitives with coherent logic op primitives
(atomic_{or,and,xor}()) and deprecate the old partial APIs
(atomic_{set,clear}_mask())
The old ops were incoherent with incompatible signatures across
architectures and with incomplete support. Now every architecture
supports the primitives consistently (by Peter Zijlstra)
- Generic support for 'relaxed atomics':
- _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
- atomic_read_acquire()
- atomic_set_release()
This came out of porting qwrlock code to arm64 (by Will Deacon)
- Clean up the fragile static_key APIs that were causing repeat bugs,
by introducing a new one:
DEFINE_STATIC_KEY_TRUE(name);
DEFINE_STATIC_KEY_FALSE(name);
which define a key of different types with an initial true/false
value.
Then allow:
static_branch_likely()
static_branch_unlikely()
to take a key of either type and emit the right instruction for the
case. To be able to know the 'type' of the static key we encode it
in the jump entry (by Peter Zijlstra)
- Static key self-tests (by Jason Baron)
- qrwlock optimizations (by Waiman Long)
- small futex enhancements (by Davidlohr Bueso)
- ... and misc other changes"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
jump_label/x86: Work around asm build bug on older/backported GCCs
locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
locking/static_keys: Make verify_keys() static
jump label, locking/static_keys: Update docs
locking/static_keys: Provide a selftest
jump_label: Provide a self-test
s390/uaccess, locking/static_keys: employ static_branch_likely()
x86, tsc, locking/static_keys: Employ static_branch_likely()
locking/static_keys: Add selftest
locking/static_keys: Add a new static_key interface
locking/static_keys: Rework update logic
locking/static_keys: Add static_key_{en,dis}able() helpers
...
RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
A failed cmpxchg does not provide any memory ordering guarantees, a
property that is used to optimise the cmpxchg implementations on Alpha,
PowerPC and arm64.
This patch updates atomic_ops.txt and memory-barriers.txt to reflect
this.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hp.com>
Link: http://lkml.kernel.org/r/20150716151006.GH26390@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Although "full barrier" should be interpreted as providing transitivity,
it is worth eliminating any possible confusion. This commit therefore
adds "(including transitivity)" to eliminate any possible confusion.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull locking updates from Ingo Molnar:
"The main changes are:
- 'qspinlock' support, enabled on x86: queued spinlocks - these are
now the spinlock variant used by x86 as they outperform ticket
spinlocks in every category. (Waiman Long)
- 'pvqspinlock' support on x86: paravirtualized variant of queued
spinlocks. (Waiman Long, Peter Zijlstra)
- 'qrwlock' support, enabled on x86: queued rwlocks. Similar to
queued spinlocks, they are now the variant used by x86:
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
- various lockdep fixlets
- various locking primitives cleanups, further WRITE_ONCE()
propagation"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
locking/lockdep: Remove hard coded array size dependency
locking/qrwlock: Don't contend with readers when setting _QW_WAITING
lockdep: Do not break user-visible string
locking/arch: Rename set_mb() to smp_store_mb()
locking/arch: Add WRITE_ONCE() to set_mb()
rtmutex: Warn if trylock is called from hard/softirq context
arch: Remove __ARCH_HAVE_CMPXCHG
locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
locking/pvqspinlock, x86: Enable PV qspinlock for Xen
locking/pvqspinlock, x86: Enable PV qspinlock for KVM
locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
locking/pvqspinlock: Implement simple paravirt support for the qspinlock
locking/qspinlock: Revert to test-and-set on hypervisors
locking/qspinlock: Use a simple write to grab the lock
locking/qspinlock: Optimize for smaller NR_CPUS
locking/qspinlock: Extract out code snippets for the next patch
locking/qspinlock: Add pending bit
...
The current formulation of control dependencies fails on DEC Alpha,
which does not respect dependencies of any kind unless an explicit
memory barrier is provided. This means that the current fomulation of
control dependencies fails on Alpha. This commit therefore creates a
READ_ONCE_CTRL() that has the same overhead on non-Alpha systems, but
causes Alpha to produce the needed ordering. This commit also applies
READ_ONCE_CTRL() to the one known use of control dependencies.
Use of READ_ONCE_CTRL() also has the beneficial effect of adding a bit
of self-documentation to control dependencies.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Our current documentation claims that, when followed by an ACQUIRE,
smp_mb__before_spinlock() orders prior loads against subsequent loads
and stores, which isn't the intent. This commit therefore fixes the
documentation to state that this sequence orders only prior stores
against subsequent loads and stores.
In addition, the original intent of smp_mb__before_spinlock() was to only
order prior loads against subsequent stores, however, people have started
using it as if it ordered prior loads against subsequent loads and stores.
This commit therefore also updates smp_mb__before_spinlock()'s header
comment to reflect this new reality.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>