Go to file
Linus Torvalds d7852fbd0f access: avoid the RCU grace period for the temporary subjective credentials
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.

The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.

Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.

But it turns out that all of this is entirely unnecessary.  Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all.  Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.

So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this).  We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.

Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards.  It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.

It is possible that we should just remove the whole RCU markings for
->cred entirely.  Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.

But this is a "minimal semantic changes" change for the immediate
problem.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-24 10:12:09 -07:00
Documentation dt-bindings: riscv: resolve 'make dt_binding_check' warnings 2019-06-26 10:28:53 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch A few more MIPS fixes: 2019-07-06 10:32:12 -07:00
block blk-mq: fix up placement of debugfs directory of queue files 2019-07-06 10:07:38 -06:00
certs treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-07-05 13:31:19 +09:00
drivers dmaengine fixes for v5.2 2019-07-06 10:06:37 -07:00
fs access: avoid the RCU grace period for the temporary subjective credentials 2019-07-24 10:12:09 -07:00
include access: avoid the RCU grace period for the temporary subjective credentials 2019-07-24 10:12:09 -07:00
init initramfs: fix populate_initrd_image() section mismatch 2019-06-29 16:43:45 +08:00
ipc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
kernel access: avoid the RCU grace period for the temporary subjective credentials 2019-07-24 10:12:09 -07:00
lib Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-07-05 13:31:19 +09:00
mm Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
net Two more quick bugfixes for nfsd, fixing a regression causing mount 2019-07-05 19:00:37 -07:00
samples samples: make pidfd-metadata fail gracefully on older kernels 2019-06-24 15:55:50 +02:00
scripts treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 2019-06-19 17:11:22 +02:00
security SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
sound sound fixes for 5.2 2019-07-05 02:03:50 +09:00
tools KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCS 2019-07-02 19:02:45 +02:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
.clang-format Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-17 11:26:25 -07:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore .gitignore: exclude .get_maintainer.ignore and .gitattributes 2019-05-18 11:49:54 +09:00
.mailmap MAINTAINERS: Update my email address 2019-06-18 14:37:27 +01:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Farewell Martin Schwidefsky 2019-05-31 10:14:11 +02:00
Kbuild Kbuild updates for v5.1 2019-03-10 17:48:21 -07:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS A few more MIPS fixes: 2019-07-06 10:32:12 -07:00
Makefile Linux 5.2 2019-07-07 15:41:56 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.