Documentation: path-lookup: include new LOOKUP flags
Now that we have new LOOKUP flags, we should document them in the relevant path-walking documentation. And now that we've settled on a common name for nd_jump_link() style symlinks ("magic links"), use that term where magic-link semantics are described. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This commit is contained in:
parent
b28a10aedc
commit
b55eef872a
|
@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel
|
||||||
including:
|
including:
|
||||||
|
|
||||||
- per-directory parallel name lookup.
|
- per-directory parallel name lookup.
|
||||||
|
- ``openat2()`` resolution restriction flags.
|
||||||
|
|
||||||
Introduction to pathname lookup
|
Introduction to pathname lookup
|
||||||
===============================
|
===============================
|
||||||
|
@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it
|
||||||
unsuccessfully scanned a chain in the hash table, it simply tries
|
unsuccessfully scanned a chain in the hash table, it simply tries
|
||||||
again.
|
again.
|
||||||
|
|
||||||
|
``rename_lock`` is also used to detect and defend against potential attacks
|
||||||
|
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
|
||||||
|
the parent directory is moved outside the root, bypassing the ``path_equal()``
|
||||||
|
check). If ``rename_lock`` is updated during the lookup and the path encounters
|
||||||
|
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
|
||||||
|
``-EAGAIN``.
|
||||||
|
|
||||||
inode->i_rwsem
|
inode->i_rwsem
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is
|
||||||
needed to stabilize the link to the mounted-on dentry, which the
|
needed to stabilize the link to the mounted-on dentry, which the
|
||||||
refcount on the mount itself doesn't ensure.
|
refcount on the mount itself doesn't ensure.
|
||||||
|
|
||||||
|
``mount_lock`` is also used to detect and defend against potential attacks
|
||||||
|
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
|
||||||
|
the parent directory is moved outside the root, bypassing the ``path_equal()``
|
||||||
|
check). If ``mount_lock`` is updated during the lookup and the path encounters
|
||||||
|
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
|
||||||
|
``-EAGAIN``.
|
||||||
|
|
||||||
RCU
|
RCU
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that
|
||||||
only one root is in effect for the entire path walk, even if it races
|
only one root is in effect for the entire path walk, even if it races
|
||||||
with a ``chroot()`` system call.
|
with a ``chroot()`` system call.
|
||||||
|
|
||||||
|
It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
|
||||||
|
``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor
|
||||||
|
passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags).
|
||||||
|
|
||||||
The root is needed when either of two conditions holds: (1) either the
|
The root is needed when either of two conditions holds: (1) either the
|
||||||
pathname or a symbolic link starts with a "'/'", or (2) a "``..``"
|
pathname or a symbolic link starts with a "'/'", or (2) a "``..``"
|
||||||
component is being handled, since "``..``" from the root must always stay
|
component is being handled, since "``..``" from the root must always stay
|
||||||
|
@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and
|
||||||
the stack frame discarded.
|
the stack frame discarded.
|
||||||
|
|
||||||
The other case involves things in ``/proc`` that look like symlinks but
|
The other case involves things in ``/proc`` that look like symlinks but
|
||||||
aren't really::
|
aren't really (and are therefore commonly referred to as "magic-links")::
|
||||||
|
|
||||||
$ ls -l /proc/self/fd/1
|
$ ls -l /proc/self/fd/1
|
||||||
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
|
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
|
||||||
|
@ -1286,7 +1305,9 @@ A few flags
|
||||||
A suitable way to wrap up this tour of pathname walking is to list
|
A suitable way to wrap up this tour of pathname walking is to list
|
||||||
the various flags that can be stored in the ``nameidata`` to guide the
|
the various flags that can be stored in the ``nameidata`` to guide the
|
||||||
lookup process. Many of these are only meaningful on the final
|
lookup process. Many of these are only meaningful on the final
|
||||||
component, others reflect the current state of the pathname lookup.
|
component, others reflect the current state of the pathname lookup, and some
|
||||||
|
apply restrictions to all path components encountered in the path lookup.
|
||||||
|
|
||||||
And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with
|
And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with
|
||||||
the others. If this is not set, an empty pathname causes an error
|
the others. If this is not set, an empty pathname causes an error
|
||||||
very early on. If it is set, empty pathnames are not considered to be
|
very early on. If it is set, empty pathnames are not considered to be
|
||||||
|
@ -1310,13 +1331,48 @@ longer needed.
|
||||||
``LOOKUP_JUMPED`` means that the current dentry was chosen not because
|
``LOOKUP_JUMPED`` means that the current dentry was chosen not because
|
||||||
it had the right name but for some other reason. This happens when
|
it had the right name but for some other reason. This happens when
|
||||||
following "``..``", following a symlink to ``/``, crossing a mount point
|
following "``..``", following a symlink to ``/``, crossing a mount point
|
||||||
or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the
|
or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic
|
||||||
filesystem has not been asked to revalidate the name (with
|
link"). In this case the filesystem has not been asked to revalidate the
|
||||||
``d_revalidate()``). In such cases the inode may still need to be
|
name (with ``d_revalidate()``). In such cases the inode may still need
|
||||||
revalidated, so ``d_op->d_weak_revalidate()`` is called if
|
to be revalidated, so ``d_op->d_weak_revalidate()`` is called if
|
||||||
``LOOKUP_JUMPED`` is set when the look completes - which may be at the
|
``LOOKUP_JUMPED`` is set when the look completes - which may be at the
|
||||||
final component or, when creating, unlinking, or renaming, at the penultimate component.
|
final component or, when creating, unlinking, or renaming, at the penultimate component.
|
||||||
|
|
||||||
|
Resolution-restriction flags
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In order to allow userspace to protect itself against certain race conditions
|
||||||
|
and attack scenarios involving changing path components, a series of flags are
|
||||||
|
available which apply restrictions to all path components encountered during
|
||||||
|
path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field.
|
||||||
|
|
||||||
|
``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links).
|
||||||
|
This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only
|
||||||
|
relates to restricting the following of trailing symlinks.
|
||||||
|
|
||||||
|
``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must
|
||||||
|
ensure that they return errors from ``nd_jump_link()``, because that is how
|
||||||
|
``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented.
|
||||||
|
|
||||||
|
``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both
|
||||||
|
bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
|
||||||
|
lookup is determined by the first mountpoint the path lookup reaches --
|
||||||
|
absolute paths start with the ``vfsmount`` of ``/``, and relative paths start
|
||||||
|
with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the
|
||||||
|
``vfsmount`` of the path is unchanged.
|
||||||
|
|
||||||
|
``LOOKUP_BENEATH`` blocks any path components which resolve outside the
|
||||||
|
starting point of the resolution. This is done by blocking ``nd_jump_root()``
|
||||||
|
as well as blocking ".." if it would jump outside the starting point.
|
||||||
|
``rename_lock`` and ``mount_lock`` are used to detect attacks against the
|
||||||
|
resolution of "..". Magic-links are also blocked.
|
||||||
|
|
||||||
|
``LOOKUP_IN_ROOT`` resolves all path components as though the starting point
|
||||||
|
were the filesystem root. ``nd_jump_root()`` brings the resolution back to to
|
||||||
|
the starting point, and ".." at the starting point will act as a no-op. As with
|
||||||
|
``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect
|
||||||
|
attacks against ".." resolution. Magic-links are also blocked.
|
||||||
|
|
||||||
Final-component flags
|
Final-component flags
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue