When xino feature is enabled and a real directory inode number overflows
the lower xino bits, we cannot map this directory inode number to a unique
and persistent inode number and we fall back to the real inode st_ino and
overlay st_dev.
The real inode st_ino with high bits may collide with a lower inode number
on overlay st_dev that was mapped using xino.
To avoid possible collision with legitimate xino values, map a non
persistent inode number to a dedicated range in the xino address space.
The dedicated range is created by adding one more bit to the number of
reserved high xino bits. We could have added just one more fsid, but that
would have had the undesired effect of changing persistent overlay inode
numbers on kernel or require more complex xino mapping code.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There is no reason to deplete the system's global get_next_ino() pool for
overlay non-persistent inode numbers and there is no reason at all to
allocate non-persistent inode numbers for non-directories.
For non-directories, it is much better to leave i_ino the same as real
i_ino, to be consistent with st_ino/d_ino.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Overlayfs works sub-optimally with upper fs that has no xattr/d_type/
RENAME_WHITEOUT support. We should basically deprecate support for those
filesystems, but so far, we only issue a warning and don't fail the mount
for the sake of backward compat. Some features are already being disabled
with no xattr support.
For newly supported remote upper fs, we do not need to worry about backward
compatibility, so we can fail the mount if upper fs is a sub-optimal
filesystem.
This reduces the in-tree remote filesystems supported as upper to just
FUSE, for which the remote upper fs support was added.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
As with other required upper fs features, we only warn if support is
missing to avoid breaking existing sub-optimal setups.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
No reason to prevent upper layer being a remote filesystem. Do the
revalidation in that case, just as we already do for lower layers.
This lets virtiofs be used as upper layer, which appears to be a real use
case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Allow completely skipping ->revalidate() on a per-dentry basis, in case the
underlying layers used for a dentry do not themselves have ->revalidate().
E.g. negative overlay dentry has no underlying layers, hence revalidate is
unnecessary. Or if lower layer is remote but overlay dentry is pure-upper,
then can skip revalidate.
The following places need to update whether the dentry needs revalidate or
not:
- fill-super (root dentry)
- lookup
- create
- fh_to_dentry
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Following patch will allow remote as upper layer, but not overlay stacked
on upper layer. Separate the two concepts.
This patch is doesn't change behavior.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Use a common loop for plain and weak revalidation. This will aid doing
revalidation on upper layer.
This patch doesn't change behavior.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move i_ino initialization to ovl_inode_init() to avoid the dance of setting
i_ino in ovl_fill_inode() sometimes on the first call and sometimes on the
seconds call.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Allocates and initializes the root dentry and inode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fix up two bugs in the coversion to xino_mode:
1. xino=off does not always end up in disabled mode
2. xino=auto on 32bit arch should end up in disabled mode
Take a proactive approach to disabling xino on 32bit kernel:
1. Disable XINO_AUTO config during build time
2. Disable xino with a warning on mount time
As a by product, xino=on on 32bit arch also ends up in disabled mode.
We never intended to enable xino on 32bit arch and this will make the
rest of the logic simpler.
Fixes: 0f831ec85e ("ovl: simplify ovl_same_sb() helper")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
A performance regression was observed since linux v4.19 with aio test using
fio with iodepth 128 on overlayfs. The queue depth of the device was
always 1 which is unexpected.
After investigation, it was found that commit 16914e6fc7 ("ovl: add
ovl_read_iter()") and commit 2a92e07edc ("ovl: add ovl_write_iter()")
resulted in vfs_iter_{read,write} being called on underlying filesystem,
which always results in syncronous IO.
Implement async IO for stacked reading and writing. This resolves the
performance regresion.
This is implemented by allocating a new kiocb for submitting the AIO
request on the underlying filesystem. When the request is completed, the
new kiocb is freed and the completion callback is called on the original
iocb.
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On non-samefs overlay without xino, non pure upper inodes should use a
pseudo_dev assigned to each unique lower fs, but if lower layer is on the
same fs and upper layer, it has no pseudo_dev assigned.
In this overlay layers setup:
- two filesystems, A and B
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
Non pure upper overlay inode, whose origin is in layer 1 will have the
st_dev;st_ino values of the real lower inode before copy up and the
st_dev;st_ino values of the real upper inode after copy up.
Fix this inconsitency by assigning a unique pseudo_dev also for upper fs,
that will be used as st_dev value along with the lower inode st_dev for
overlay inodes in the case above.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This fixes ovl_lower_uuid_ok() to correctly detect the corner case:
- two filesystems, A and B, both have null uuid
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
In this case, bad_uuid would not have been set for B, because the check
only involved the list of lower fs. Hence we'll try to decode a layer 2
origin on layer 1 and fail.
We check for conflicting (and null) uuid among all lower layers, including
those layers that are on the same fs as the upper layer.
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Rename lower_fs[] array to fs[], extend its size by one and use index fsid
(instead of fsid-1) to access the fs[] array.
Initialize fs[0] with upper fs values. fsid 0 is reserved even with lower
only overlay, so fs[0] remains null in this case.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
No code uses the sb returned from this helper, so make it retrun a boolean
and rename it to ovl_same_fs().
The xino mode is irrelevant when all layers are on same fs, so instead of
describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which
is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino
enabled.
Create a new helper ovl_same_dev(), to use instead of the common check for
(ovl_same_fs() || xinobits).
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Rename lower_layers[] array to layers[], extend its size by one and
initialize layers[0] with upper layer values. Lower layers are now
addressed with index 1..numlower. layers[0] is reserved even with lower
only overlay.
[SzM: replace ofs->numlower with ofs->numlayer, the latter's value is
incremented by one]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In the past, overlayfs required that lower fs have non null uuid in
order to support nfs export and decode copy up origin file handles.
Commit 9df085f3c9 ("ovl: relax requirement for non null uuid of
lower fs") relaxed this requirement for nfs export support, as long
as uuid (even if null) is unique among all lower fs.
However, said commit unintentionally also relaxed the non null uuid
requirement for decoding copy up origin file handles, regardless of
the unique uuid requirement.
Amend this mistake by disabling decoding of copy up origin file handle
from lower fs with a conflicting uuid.
We still encode copy up origin file handles from those fs, because
file handles like those already exist in the wild and because they
might provide useful information in the future.
There is an unhandled corner case described by Miklos this way:
- two filesystems, A and B, both have null uuid
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
In this case bad_uuid won't be set for B, because the check only
involves the list of lower fs. Hence we'll try to decode a layer 2
origin on layer 1 and fail.
We will deal with this corner case later.
Reported-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/
Fixes: 9df085f3c9 ("ovl: relax requirement for non null uuid ...")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Once upon a time, commit 2cac0c00a6 ("ovl: get exclusive ownership on
upper/work dirs") in v4.13 added some sanity checks on overlayfs layers.
This change caused a docker regression. The root cause was mount leaks
by docker, which as far as I know, still exist.
To mitigate the regression, commit 85fdee1eef ("ovl: fix regression
caused by exclusive upper/work dir protection") in v4.14 turned the
mount errors into warnings for the default index=off configuration.
Recently, commit 146d62e5a5 ("ovl: detect overlapping layers") in
v5.2, re-introduced exclusive upper/work dir checks regardless of
index=off configuration.
This changes the status quo and mount leak related bug reports have
started to re-surface. Restore the status quo to fix the regressions.
To clarify, index=off does NOT relax overlapping layers check for this
ovelayfs mount. index=off only relaxes exclusive upper/work dir checks
with another overlayfs mount.
To cover the part of overlapping layers detection that used the
exclusive upper/work dir checks to detect overlap with self upper/work
dir, add a trap also on the work base dir.
Link: https://github.com/moby/moby/issues/34672
Link: https://lore.kernel.org/linux-fsdevel/20171006121405.GA32700@veci.piliscsaba.szeredi.hu/
Link: https://github.com/containers/libpod/issues/3540
Fixes: 146d62e5a5 ("ovl: detect overlapping layers")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Colin Walters <walters@verbum.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Another round of SPDX updates for 5.2-rc6
Here is what I am guessing is going to be the last "big" SPDX update for
5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that
were "easy" to determine by pattern matching. The ones after this are
going to be a bit more difficult and the people on the spdx list will be
discussing them on a case-by-case basis now.
Another 5000+ files are fixed up, so our overall totals are:
Files checked: 64545
Files with SPDX: 45529
Compared to the 5.1 kernel which was:
Files checked: 63848
Files with SPDX: 22576
This is a huge improvement.
Also, we deleted another 20000 lines of boilerplate license crud, always
nice to see in a diffstat.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXQyQYA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymnGQCghETUBotn1p3hTjY56VEs6dGzpHMAnRT0m+lv
kbsjBGEJpLbMRB2krnaU
=RMcT
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull still more SPDX updates from Greg KH:
"Another round of SPDX updates for 5.2-rc6
Here is what I am guessing is going to be the last "big" SPDX update
for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates
that were "easy" to determine by pattern matching. The ones after this
are going to be a bit more difficult and the people on the spdx list
will be discussing them on a case-by-case basis now.
Another 5000+ files are fixed up, so our overall totals are:
Files checked: 64545
Files with SPDX: 45529
Compared to the 5.1 kernel which was:
Files checked: 63848
Files with SPDX: 22576
This is a huge improvement.
Also, we deleted another 20000 lines of boilerplate license crud,
always nice to see in a diffstat"
* tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485
...
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change first argument to MODULE_PARM_DESC() calls, that each of them
matched the actual module parameter name. The matching results in
changing (the 'parm' section from) the output of `modinfo overlay` from:
parm: ovl_check_copy_up:Obsolete; does nothing
parm: redirect_max:ushort
parm: ovl_redirect_max:Maximum length of absolute redirect xattr value
parm: redirect_dir:bool
parm: ovl_redirect_dir_def:Default to on or off for the redirect_dir feature
parm: redirect_always_follow:bool
parm: ovl_redirect_always_follow:Follow redirects even if redirect_dir feature is turned off
parm: index:bool
parm: ovl_index_def:Default to on or off for the inodes index feature
parm: nfs_export:bool
parm: ovl_nfs_export_def:Default to on or off for the NFS export feature
parm: xino_auto:bool
parm: ovl_xino_auto_def:Auto enable xino feature
parm: metacopy:bool
parm: ovl_metacopy_def:Default to on or off for the metadata only copy up feature
into:
parm: check_copy_up:Obsolete; does nothing
parm: redirect_max:Maximum length of absolute redirect xattr value (ushort)
parm: redirect_dir:Default to on or off for the redirect_dir feature (bool)
parm: redirect_always_follow:Follow redirects even if redirect_dir feature is turned off (bool)
parm: index:Default to on or off for the inodes index feature (bool)
parm: nfs_export:Default to on or off for the NFS export feature (bool)
parm: xino_auto:Auto enable xino feature (bool)
parm: metacopy:Default to on or off for the metadata only copy up feature (bool)
Signed-off-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
gcc gets a bit confused by the logic in ovl_setup_trap() and
can't figure out whether the local 'trap' variable in the caller
was initialized or not:
fs/overlayfs/super.c: In function 'ovl_fill_super':
fs/overlayfs/super.c:1333:4: error: 'trap' may be used uninitialized in this function [-Werror=maybe-uninitialized]
iput(trap);
^~~~~~~~~~
fs/overlayfs/super.c:1312:17: note: 'trap' was declared here
Reword slightly to make it easier for the compiler to understand.
Fixes: 146d62e5a5 ("ovl: detect overlapping layers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
NFS mounts can be disconnected from fs root. Don't fail the overlapping
layer check because of this.
The check is not authoritative anyway, since topology can change during or
after the check.
Reported-by: Antti Antinoja <antti@fennosys.fi>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 146d62e5a5 ("ovl: detect overlapping layers")
Overlapping overlay layers are not supported and can cause unexpected
behavior, but overlayfs does not currently check or warn about these
configurations.
User is not supposed to specify the same directory for upper and
lower dirs or for different lower layers and user is not supposed to
specify directories that are descendants of each other for overlay
layers, but that is exactly what this zysbot repro did:
https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000
Moving layer root directories into other layers while overlayfs
is mounted could also result in unexpected behavior.
This commit places "traps" in the overlay inode hash table.
Those traps are dummy overlay inodes that are hashed by the layers
root inodes.
On mount, the hash table trap entries are used to verify that overlay
layers are not overlapping. While at it, we also verify that overlay
layers are not overlapping with directories "in-use" by other overlay
instances as upperdir/workdir.
On lookup, the trap entries are used to verify that overlay layers
root inodes have not been moved into other layers after mount.
Some examples:
$ ./run --ov --samefs -s
...
( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt
mount -o bind base/lower lower
mount -o bind base/upper upper
mount -t overlay none mnt ...
-o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w)
$ umount mnt
$ mount -t overlay none mnt ...
-o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w
[ 94.434900] overlayfs: overlapping upperdir path
mount: mount overlay on mnt failed: Too many levels of symbolic links
$ mount -t overlay none mnt ...
-o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w
[ 151.350132] overlayfs: conflicting lowerdir path
mount: none is already mounted or mnt busy
$ mount -t overlay none mnt ...
-o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w
[ 201.205045] overlayfs: overlapping lowerdir path
mount: mount overlay on mnt failed: Too many levels of symbolic links
$ mount -t overlay none mnt ...
-o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w
$ mv base/upper/0/ base/lower/
$ find mnt/0
mnt/0
mnt/0/w
find: 'mnt/0/w/work': Too many levels of symbolic links
find: 'mnt/0/u': Too many levels of symbolic links
Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Current behavior is to automatically disable metacopy if redirect_dir is
not enabled and proceed with the mount.
If "metacopy=on" mount option was given, then this behavior can confuse the
user: no mount failure, yet metacopy is disabled.
This patch makes metacopy=on imply redirect_dir=on.
The converse is also true: turning off full redirect with redirect_dir=
{off|follow|nofollow} will disable metacopy.
If both metacopy=on and redirect_dir={off|follow|nofollow} is specified,
then mount will fail, since there's no way to correctly resolve the
conflict.
Reported-by: Daniel Walsh <dwalsh@redhat.com>
Fixes: d5791044d2 ("ovl: Provide a mount option metacopy=on/off...")
Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We use uuid to associate an overlay lower file handle with a lower layer,
so we can accept lower fs with null uuid as long as all lower layers with
null uuid are on the same fs.
This change allows enabling index and nfs_export features for the setup of
single lower fs of type squashfs - squashfs supports file handles, but has
a null uuid. This change also allows enabling index and nfs_export features
for nested overlayfs, where the lower overlay has nfs_export enabled.
Enabling the index feature with single lower squashfs fixes the
unionmount-testsuite test:
./run --ov --squashfs --verify
As a by-product, if, like the lower squashfs, upper fs also uses the
generic export_encode_fh() implementation to export 32bit inode file
handles (e.g. ext4), then the xino_auto config/module/mount option will
enable unique overlay inode numbers.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
ovl_free_fs() dereferences ofs->workbasedir and ofs->upper_mnt in cases when
those might not have been initialized yet.
Fix the initialization order for these fields.
Reported-by: syzbot+c75f181dc8429d2eb887@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.15
Fixes: 95e6d4177c ("ovl: grab reference to workbasedir early")
Fixes: a9075cdb46 ("ovl: factor out ovl_free_fs() helper")
Metacopy dentry/inode is internal to overlay and is never exposed outside
of it. Exception is metacopy upper file used for fsync(). Modify d_real()
to look for dentries/inode which have data, but also allow matching upper
inode without data for the fsync case.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Right now ovl_inode stores inode pointer for lower inode. This helps with
quickly getting lower inode given overlay inode (ovl_inode_lower()).
Now with metadata only copy-up, we can have metacopy inode in middle layer
as well and inode containing data can be different from ->lower. I need to
be able to open the real file in ovl_open_realfile() and for that I need to
quickly find the lower data inode.
Hence store lower data inode also in ovl_inode. Also provide an helper
ovl_inode_lowerdata() to access this field.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Now we will have the capability to have upper inodes which might be only
metadata copy up and data is still on lower inode. So add a new xattr
OVL_XATTR_METACOPY to distinguish between two cases.
Presence of OVL_XATTR_METACOPY reflects that file has been copied up
metadata only and and data will be copied up later from lower origin. So
this xattr is set when a metadata copy takes place and cleared when data
copy takes place.
We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
whether ovl inode has data or not (as opposed to metadata only copy up).
If a file is copied up metadata only and later when same file is opened for
WRITE, then data copy up takes place. We copy up data, remove METACOPY
xattr and then set the UPPERDATA flag in ovl_inode->flags. While all these
operations happen with oi->lock held, read side of oi->flags can be
lockless. That is another thread on another cpu can check if UPPERDATA
flag is set or not.
So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
another cpu sees UPPERDATA flag set, then it should be guaranteed that
effects of data copy up and remove xattr operations are also visible.
For example.
CPU1 CPU2
ovl_open() acquire(oi->lock)
ovl_open_maybe_copy_up() ovl_copy_up_data()
open_open_need_copy_up() vfs_removexattr()
ovl_already_copied_up()
ovl_dentry_needs_data_copy_up() ovl_set_flag(OVL_UPPERDATA)
ovl_test_flag(OVL_UPPERDATA) release(oi->lock)
Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
CPU1 perceives the effects of setting UPPERDATA flag but not the effects of
preceding operations (ex. upper that is not fully copied up), it will be a
problem.
Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
and smp_rmb() on UPPERDATA flag test operation.
May be some other lock or barrier is already covering it. But I am not sure
what that is and is it obvious enough that we will not break it in future.
So hence trying to be safe here and introducing barriers explicitly for
UPPERDATA flag/bit.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
By default metadata only copy up is disabled. Provide a mount option so
that users can choose one way or other.
Also provide a kernel config and module option to enable/disable metacopy
feature.
metacopy feature requires redirect_dir=on when upper is present.
Otherwise, it requires redirect_dir=follow atleast.
As of now, metacopy does not work with nfs_export=on. So if both
metacopy=on and nfs_export=on then nfs_export is disabled.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Opening regular files on overlayfs is now handled via ovl_open(). Remove
the now unused "open_flags" argument from d_op->d_real() and the d_real()
helper.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This partially reverts commit c568d68341.
Overlayfs files will now automatically get the correct locks, no need to
hack overlay support in VFS.
It is a partial revert, because it leaves the locks_inode() calls in place
and defines locks_inode() to file_inode(). We could revert those as well,
but it would be unnecessary code churn and it makes sense to document that
we are getting the inode for locking purposes.
Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace
API for some time (though not in a useful way). Will try to remove
internal flags later when the dust around the new mount API settles.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Al Viro suggested to simplify callers of ovl_create_real() by
returning the created dentry (or ERR_PTR) from ovl_create_real().
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
With mount option "xino=on", mounter declares that there are enough
free high bits in underlying fs to hold the layer fsid.
If overlayfs does encounter underlying inodes using the high xino
bits reserved for layer fsid, a warning will be emitted and the original
inode number will be used.
The mount option name "xino" goes after a similar meaning mount option
of aufs, but in overlayfs case, the mapping is stateless.
An example for a use case of "xino=on" is when upper/lower is on an xfs
filesystem. xfs uses 64bit inode numbers, but it currently never uses the
upper 8bit for inode numbers exposed via stat(2) and that is not likely to
change in the future without user opting-in for a new xfs feature. The
actual number of unused upper bit is much larger and determined by the xfs
filesystem geometry (64 - agno_log - agblklog - inopblog). That means
that for all practical purpose, there are enough unused bits in xfs
inode numbers for more than OVL_MAX_STACK unique fsid's.
Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode
numbers are allocated sequentially since boot, so they will practially
never use the high inode number bits.
For compatibility with applications that expect 32bit inodes, the feature
can be disabled with "xino=off". The option "xino=auto" automatically
detects underlying filesystem that use 32bit inodes and enables the
feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of
the same name, determine if the default mode for overlayfs mount is
"xino=auto" or "xino=off".
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On 64bit systems, when overlay layers are not all on the same fs, but
all inode numbers of underlying fs are not using the high bits, use the
high bits to partition the overlay st_ino address space. The high bits
hold the fsid (upper fsid is 0). This way overlay inode numbers are unique
and all inodes use overlay st_dev. Inode numbers are also persistent
for a given layer configuration.
Currently, our only indication for available high ino bits is from a
filesystem that supports file handles and uses the default encode_fh()
operation, which encodes a 32bit inode number.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Instead of allocating an anonymous bdev per lower layer, allocate
one anonymous bdev per every unique lower fs that is different than
upper fs.
Every unique lower fs is assigned an fsid > 0 and the number of
unique lower fs are stored in ofs->numlowerfs.
The assigned fsid is stored in the lower layer struct and will be
used also for inode number multiplexing.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This change relaxes copy up on encode of merge dir with lower layer > 1
and handles the case of encoding a merge dir with lower layer 1, where an
ancestor is a non-indexed merge dir. In that case, decode of the lower
file handle will not have been possible if the non-indexed ancestor is
redirected before or after encode.
Before encoding a non-upper directory file handle from real layer N, we
need to check if it will be possible to reconnect an overlay dentry from
the real lower decoded dentry. This is done by following the overlay
ancestry up to a "layer N connected" ancestor and verifying that all
parents along the way are "layer N connectable". If an ancestor that is
NOT "layer N connectable" is found, we need to copy up an ancestor, which
is "layer N connectable", thus making that ancestor "layer N connected".
For example:
layer 1: /a
layer 2: /a/b/c
The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
copied up and renamed, upper dir /a will be indexed by lower dir /a from
layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
overlay dentry from the connected lower dentry /a/b/c.
To avoid this problem on decode time, we need to copy up an ancestor of
/a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
/a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
and when the time comes to decode the file handle from lower dentry /a/b/c,
ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
a connected overlay dentry will be accomplished.
(*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
an entry /a in the lower layers above layer N and find the indexed dir /a
from layer 1. If that improvement is made, then the check for "layer N
connected" will need to verify there are no redirects in lower layers above
layer N. In the example above, /a will be "layer 2 connectable". However,
if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
"layer 2 connectable":
layer 1: /A (redirect = /a)
layer 2: /a/b/c
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Now that NFS export operations are implemented, enable overlayfs NFS
export support if the "nfs_export" feature is enabled.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We need to make some room in struct ovl_entry to store information
about redirected ancestors for NFS export, so cram two booleans as
bit flags.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
A directory index is a directory type entry in index dir with a
"trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge
directory upper dir inode.
On lookup of non-dir files, lower file is followed by origin file handle.
On lookup of dir entries, lower dir is found by name and then compared
to origin file handle. We only trust dir index if we verified that lower
dir matches origin file handle, otherwise index may be inconsistent and
we ignore it.
If we find an indexed non-upper dir or an indexed merged dir, whose
index 'upper' xattr points to a different upper dir, that means that the
lower directory may be also referenced by another upper dir via redirect,
so we fail the lookup on inconsistency error.
To be consistent with directory index entries format, the association of
index dir to upper root dir, that was stored by older kernels in
"trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper"
xattr. This also serves as an indication that overlay was mounted with a
kernel that support index directory entries. For backward compatibility,
if an 'origin' xattr exists on the index dir we also verify it on mount.
Directory index entries are going to be used for NFS export.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Introduce the "nfs_export" config, module and mount options.
The NFS export feature depends on the "index" feature and enables two
implicit overlayfs features: "index_all" and "verify_lower".
The "index_all" feature creates an index on copy up of every file and
directory. The "verify_lower" feature uses the full index to detect
overlay filesystems inconsistencies on lookup, like redirect from
multiple upper dirs to the same lower dir.
NFS export can be enabled for non-upper mount with no index. However,
because lower layer redirects cannot be verified with the index, enabling
NFS export support on an overlay with no upper layer requires turning off
redirect follow (e.g. "redirect_dir=nofollow").
The full index may incur some overhead on mount time, especially when
verifying that lower directory file handles are not stale.
NFS export support, full index and consistency verification will be
implemented by following patches.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Remove the "origin" language from the functions that handle set, get
and verify of "origin" xattr and pass the xattr name as an argument.
The same helpers are going to be used for NFS export to get, get and
verify the "upper" xattr for directory index entries.
ovl_verify_origin() is now a helper used only to verify non upper
file handle stored in "origin" xattr of upper inode.
The upper root dir file handle is still stored in "origin" xattr on
the index dir for backward compatibility. This is going to be changed
by the patch that adds directory index entries support.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pass the fs instance with lower_layers array instead of the dentry
lowerstack array to ovl_check_origin_fh(), because the dentry members
of lowerstack play no role in this helper.
This change simplifies the argument list of ovl_check_origin(),
ovl_cleanup_index() and ovl_verify_index().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Store the fs root layer index inside ovl_layer struct, so we can
get the root fs layer index from merge dir lower layer instead of
find it with ovl_find_layer() helper.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When work dir creation fails, a warning is emitted and overlay is
mounted r/o. Trying to remount r/w will fail with no work dir.
When index dir creation fails, the same warning is emitted and overlay
is mounted r/o, but trying to remount r/w will succeed. This may cause
unintentional corruption of filesystem consistency.
Adjust the behavior of index dir creation failure to that of work dir
creation failure and do not allow to remount r/w. User needs to state
an explicitly intention to work without an index by mounting with
option 'index=off' to allow r/w mount with no index dir.
When mounting with option 'index=on' and no 'upperdir', index is
implicitly disabled, so do not warn about no file handle support.
The issue was introduced with inodes index feature in v4.13, but this
patch will not apply cleanly before ovl_fill_super() re-factoring in
v4.15.
Fixes: 02bcd15774 ("ovl: introduce the inodes index dir feature")
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Overlayfs falls back to index=off if lower/upper fs does not support
file handles. Do the same if upper fs does not support xattr.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There are several write operations on upper fs not covered by
mnt_want_write():
- test set/remove OPAQUE xattr
- test create O_TMPFILE
- set ORIGIN xattr in ovl_verify_origin()
- cleanup of index entries in ovl_indexdir_cleanup()
Some of these go way back, but this patch only applies over the
v4.14 re-factoring of ovl_fill_super().
Cc: <stable@vger.kernel.org> #v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fsnotify pins a watched directory inode in cache, but if directory dentry
is released, new lookup will allocate a new dentry and a new inode.
Directory events will be notified on the new inode, while fsnotify listener
is watching the old pinned inode.
Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
dirs are hashes by real upper inode, merge and lower dirs are hashed by
real lower inode.
The reference to lower inode was being held by the lower dentry object
in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
may drop lower inode refcount to zero. Add a refcount on behalf of the
overlay inode to prevent that.
As a by-product, hashing directory inodes also detects multiple
redirected dirs to the same lower dir and uncovered redirected dir
target on and returns -ESTALE on lookup.
The reported issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.
Cc: <stable@vger.kernel.org> #v4.13
Reported-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
When executing filesystem sync or umount on overlayfs,
dirty data does not get synced as expected on upper filesystem.
This patch fixes sync filesystem method to keep data consistency
for overlayfs.
Signed-off-by: Chengguang Xu <cgxu@mykernel.net>
Fixes: e593b2bf51 ("ovl: properly implement sync_filesystem()")
Cc: <stable@vger.kernel.org> #4.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Overlayfs is following redirects even when redirects are disabled. If this
is unintentional (probably the majority of cases) then this can be a
problem. E.g. upper layer comes from untrusted USB drive, and attacker
crafts a redirect to enable read access to otherwise unreadable
directories.
If "redirect_dir=off", then turn off following as well as creation of
redirects. If "redirect_dir=follow", then turn on following, but turn off
creation of redirects (which is what "redirect_dir=off" does now).
This is a backward incompatible change, so make it dependent on a config
option.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename all "struct ovl_fs" pointers to "ofs". The "ufs" name is historical
and can only be found in overlayfs/super.c.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move calling ovl_get_lower_layers() into ovl_get_lowerstack().
ovl_get_lowerstack() now returns the root dentry's filled in ovl_entry.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move calling ovl_get_workdir() into ovl_get_workpath().
Rename ovl_get_workdir() to ovl_make_workdir() and ovl_get_workpath() to
ovl_get_workdir().
Workpath is now not needed outside ovl_get_workdir().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Merge ovl_get_upper() and ovl_get_upperpath().
The resulting function is named ovl_get_upper(), though it still returns
upperpath as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Remove "sb" and "dentry" arguments of ovl_workdir_create() and related
functions. Move setting MS_RDONLY flag to callers.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move ovl_get_upper() immediately after ovl_get_upperpath(),
ovl_get_workdir() immediately after ovl_get_workdir() and
ovl_get_lower_layers() immediately after ovl_get_lowerstack().
Also move prepare_creds() up to where other allocations are happening.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Generate unique values of st_dev per lower layer for non-samefs
overlay mount. The unique values are obtained by allocating anonymous
bdevs for each of the lowerdirs in the overlayfs instance.
The anonymous bdev is going to be returned by stat(2) for lowerdir
non-dir entries in non-samefs case.
[amir: split from ovl_getattr() and re-structure patches]
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Define new structures to represent overlay instance lower layers and
overlay merge dir lower layers to make room for storing more per layer
information in-memory.
Instead of keeping the fs instance lower layers in an array of struct
vfsmount, keep them in an array of new struct ovl_layer, that has a
pointer to struct vfsmount.
Instead of keeping the dentry lower layers in an array of struct path,
keep them in an array of new struct ovl_path, that has a pointer to
struct dentry and to struct ovl_layer.
Add a small helper to find the fs layer id that correspopnds to a lower
struct ovl_path and use it in ovl_lookup().
[amir: split re-structure from anonymous bdev patch]
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Most overlayfs c files already explicitly include ovl_entry.h
to use overlay entry struct definitions and upcoming changes
are going to require even more c files to include this header.
All overlayfs c files include overlayfs.h and overlayfs.h itself
refers to some structs defined in ovl_entry.h, so it seems more
logic to include ovl_entry.h from overlayfs.h than from c files.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If a non-merge dir in an overlay mount has an overlay.origin xattr, it
means it was once an upper merge dir, which may contain whiteouts and
then the lower dir was removed under it.
Do not iterate real dir directly in this case to avoid exposing whiteouts.
[SzM] Set OVL_WHITEOUT for all merge directories as well.
[amir] A directory that was just copied up does not have the OVL_WHITEOUTS
flag. We need to set it to fix merge dir iteration.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This was detected by fault injection test
Signed-off-by: Hirofumi Nakagawa <nklabs@gmail.com>
Fixes: 13cf199d00 ("ovl: allocate an ovl_inode struct")
Cc: <stable@vger.kernel.org> # v4.13
Enforcing exclusive ownership on upper/work dirs caused a docker
regression: https://github.com/moby/moby/issues/34672.
Euan spotted the regression and pointed to the offending commit.
Vivek has brought the regression to my attention and provided this
reproducer:
Terminal 1:
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
merged/
Terminal 2:
unshare -m
Terminal 1:
umount merged
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
merged/
mount: /root/overlay-testing/merged: none already mounted or mount point
busy
To fix the regression, I replaced the error with an alarming warning.
With index feature enabled, mount does fail, but logs a suggestion to
override exclusive dir protection by disabling index.
Note that index=off mount does take the inuse locks, so a concurrent
index=off will issue the warning and a concurrent index=on mount will fail.
Documentation was updated to reflect this change.
Fixes: 2cac0c00a6 ("ovl: get exclusive ownership on upper/work dirs")
Cc: <stable@vger.kernel.org> # v4.13
Reported-by: Euan Kemp <euank@euank.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).
This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like
list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')
sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
-e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
-e 's/\<MS_NODEV\>/SB_NODEV/g' \
-e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
-e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
-e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
-e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
-e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
-e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
-e 's/\<MS_SILENT\>/SB_SILENT/g' \
-e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
-e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
-e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
-e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
$list
and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
Need to treat non-regular overlayfs files the same as regular files when
checking for an atime update.
Add a d_real() flag to make it return the upper dentry for all file types.
Reported-by: "zhangyi (F)" <yi.zhang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
d_real() is never called with a negative dentry. So remove the
d_is_negative() check (which would never trigger anyway, since d_is_reg()
returns false for a negative dentry).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Impure directories are ones which contain objects with origins (i.e. those
that have been copied up). These are relevant to readdir operation only
because of the d_ino field, no other transformation is necessary. Also a
directory can become impure between two getdents(2) calls.
This patch creates a cache for impure directories. Unlike the cache for
merged directories, this one only contains entries with origin and is not
refcounted but has a its lifetime tied to that of the dentry.
Similarly to the merged cache, the impure cache is invalidated based on a
version number. This version number is incremented when an entry with
origin is added or removed from the directory.
If the cache is empty, then the impure xattr is removed from the directory.
This patch also fixes up handling of d_ino for the ".." entry if the parent
directory is merged.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr
to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which
depends on dentry->d_inode, but d_inode is null and not initialized yet at
this point resulting in an Oops.
Fix by getting the upperdentry info from the inode directly in this case.
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 09d8b58673 ("ovl: move __upperdentry to ovl_inode")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Firstly by applying the following with coccinelle's spatch:
@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)
to effect the conversion to sb_rdonly(sb), then by applying:
@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)
@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)
to remove left over excess bracketage and finally by applying:
@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)
to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.
Signed-off-by: David Howells <dhowells@redhat.com>
ovl_workdir_create() returns a valid index dentry or NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On failure to prepare_creds(), mount fails with a random
return value, as err was last set to an integer cast of
a valid lower mnt pointer or set to 0 if inodes index feature
is enabled.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 3fe6e52f06 ("ovl: override creds with the ones from ...")
Cc: <stable@vger.kernel.org> # v4.7
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
index entry should live only as long as there are upper or lower
hardlinks.
Cleanup orphan index entries on mount and when dropping the last
overlay inode nlink.
When about to cleanup or link up to orphan index and the index inode
nlink > 1, admit that something went wrong and adjust overlay nlink
to index inode nlink - 1 to prevent it from dropping below zero.
This could happen when adding lower hardlinks underneath a mounted
overlay and then trying to unlink them.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
For rename, we need to ensure that an upper alias exists for hard links
before attempting the operation. Introduce a flag in ovl_entry to track
the state of the upper alias.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Bad index entries are entries whose name does not match the
origin file handle stored in trusted.overlay.origin xattr.
Bad index entries could be a result of a system power off in
the middle of copy up.
Stale index entries are entries whose origin file handle is
stale. Stale index entries could be a result of copying layers
or removing lower entries while the overlay is not mounted.
The case of copying layers should be detected earlier by the
verification of upper root dir origin and index dir origin.
Both bad and stale index entries are detected and removed
on mount.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
An index dir contains persistent hardlinks to files in upper dir.
Therefore, we must never mount an existing index dir with a differnt
upper dir.
Store the upper root dir file handle in index dir inode when index
dir is created and verify the file handle before using an existing
index dir on mount.
Add an 'is_upper' flag to the overlay file handle encoding and set it
when encoding the upper root file handle. This is not critical for index
dir verification, but it is good practice towards a standard overlayfs
file handle format for NFS export.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When inodes index feature is enabled, verify that the file handle stored
in upper root dir matches the lower root dir or fail to mount.
If upper root dir has no stored file handle, encode and store the lower
root dir file handle in overlay.origin xattr.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Create the index dir on mount. The index dir will contain hardlinks to
upper inodes, named after the hex representation of their origin lower
inodes.
The index dir is going to be used to prevent breaking lower hardlinks
on copy up and to implement overlayfs NFS export.
Because the feature is not fully backward compat, enabling the feature
is opt-in by config/module/mount option.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>