This reverts commit 7c6893e3c9.
Overlayfs no longer relies on the vfs for checking writability of files.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This reverts commit 954c736f86.
Overlayfs no longer relies on the vfs for checking writability of files.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Let overlayfs do its thing when opening a file.
This enables stacking and fixes the corner case when a file is opened for
read, modified through a writable open, and data is read from the read-only
file. After this patch the read-only open will not return stale data even
in this case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Implement stacked fiemap().
Need to split inode operations for regular file (which has fiemap) and
special file (which doesn't have fiemap).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In the common case we can just use the real file cached in
file->private_data. There are two exceptions:
1) File has been copied up since open: in this unlikely corner case just
use a throwaway real file for the operation. If ever this becomes a
perfomance problem (very unlikely, since overlayfs has been doing most fine
without correctly handling this case at all), then we can deal with that by
updating the cached real file.
2) File's f_flags have changed since open: no need to reopen the cached
real file, we can just change the flags there as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Implement file operations on a regular overlay file. The underlying file
is opened separately and cached in ->private_data.
It might be worth making an exception for such files when accounting in
nr_file to confirm to userspace expectations. We are only adding a small
overhead (248bytes for the struct file) since the real inode and dentry are
pinned by overlayfs anyway.
This patch doesn't have any effect, since the vfs will use d_real() to find
the real underlying file to open. The patch at the end of the series will
actually enable this functionality.
AV: make it use open_with_fake_path(), don't mess with override_creds
SzM: still need to mess with override_creds() until no fs uses
current_cred() in their open method.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr().
This is in preparation for stacking I/O operations on overlay files.
This patch shouldn't have any observable effect.
Remove stale comment from ovl_setattr() [spotted by Vivek Goyal].
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This reverts commit 31c3a70695.
Re-add functionality dealing with i_writecount on truncate to overlayfs.
This patch shouldn't have any observable effects, since we just re-assert
the writecout that vfs_truncate() already got for us.
This is in preparation for moving overlay functionality out of the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On inode creation copy certain inode flags from the underlying real inode
to the overlay inode.
This is in preparation for moving overlay functionality out of the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Copy up mtime and ctime to overlay inode after times in real object are
modified. Be careful not to dirty cachelines when not necessary.
This is in preparation for moving overlay functionality out of the VFS.
This patch shouldn't have any observable effect.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Stacking file operations in overlay will store an extra open file for each
overlay file opened.
The overhead is just that of "struct file" which is about 256bytes, because
overlay already pins an extra dentry and inode when the file is open, which
add up to a much larger overhead.
For fear of breaking working setups, don't start accounting the extra file.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.
Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():
docker run --rm drupal:8.5.4-fpm-alpine \
sh -c 'cd /var/www/html/vendor/symfony && \
chown -R www-data:www-data . && ls -l .'
Reported-by: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Fixes: 4edb83bb10 ('ovl: constant d_ino for non-merge dirs')
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
open a file by given inode, faking ->f_path. Use with shitloads
of caution - at the very least you'd damn better make sure that
some dentry alias of that inode is pinned down by the path in
question. Again, this is no general-purpose interface and I hope
it will eventually go away. Right now overlayfs wants something
like that, but nothing else should.
Any out-of-tree code with bright idea of using this one *will*
eventually get hurt, with zero notice and great delight on my part.
I refuse to use EXPORT_SYMBOL_GPL(), especially in situations when
it's really EXPORT_SYMBOL_DONT_USE_IT(), but don't take that export
as "you are welcome to use it".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
These checks are better off in do_dentry_open(); the reason we couldn't
put them there used to be that callers couldn't tell what kind of cleanup
would do_dentry_open() failure call for. Now that we have FMODE_OPENED,
cleanup is the same in all cases - it's simply fput(). So let's fold
that into do_dentry_open(), as Christoph's patch tried to.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just check FMODE_OPENED in __fput() and be done with that...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
basically, "is that instance set up enough for regular fput(), or
do we want put_filp() for that one".
NOTE: the only alloc_file() caller that could be followed by put_filp()
is in arch/ia64/kernel/perfmon.c, which is (Kconfig-level) broken.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and rename get_empty_filp() to alloc_empty_file().
dentry_open() gets creds as argument, but the only thing that sees those is
security_file_open() - file->f_cred still ends up with current_cred(). For
almost all callers it's the same thing, but there are several broken cases.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... so that it could set both ->f_flags and ->f_mode, without callers
having to set ->f_flags manually.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
An ->open() instances really, really should not be doing that. There's
a lot of places e.g. around atomic_open() that could be confused by that,
so let's catch that early.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... just use put_pipe_info() to get the pipe->files down to 1 and let
fput()-called pipe_release() do freeing.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
it's exactly the same thing as
dentry_open(&file->f_path, file->f_flags, file->f_cred)
... and rename it to file_clone_open(), while we are at it.
'filp' naming convention is bogus; sure, it's "file pointer",
but we generally don't do that kind of Hungarian notation.
Some of the instances have too many callers to touch, but this
one has only two, so let's sanitize it while we can...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
.. and the call of file_free() in case of security_file_alloc() failure
in get_empty_filp() should be simply file_free_rcu() - no point in
rcu-delays there, anyway.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Failure of ->open() should *not* be followed by fput(). Fixed by
using filp_clone_open(), which gets the cleanups right.
Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Extract vfs_dedupe_file_range_one() helper to deal with a single dedup
request.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Clean up f_op->dedupe_file_range() interface.
1) Use loff_t for offsets and length instead of u64
2) Order the arguments the same way as {copy|clone}_file_range().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
kmemleak reported some memory leak on reading proc files. After adding
some debug lines, find that proc_seq_fops is using seq_release as
release handler, which won't handle the free of 'private' field of
seq_file, while in fact the open handler proc_seq_open could create
the private data with __seq_open_private when state_size is greater
than zero. So after reading files created with proc_create_seq_private,
such as /proc/timer_list and /proc/vmallocinfo, the private mem of a
seq_file is not freed. Fix it by adding the paired proc_seq_release
as the default release handler of proc_seq_ops instead of seq_release.
Fixes: 44414d82cf ("proc: introduce proc_create_seq_private")
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlslJkoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpi3READZgtKEaixC4EMLNH/7WqVLC5XxsIpuV3N7
kxaV2KoJ/zJbBPqE+WIW/UasuTRkIFQpmXp3aYw879FZfCq1sN7K9yyzY2i7qvuG
l8WRPWFc2SMfMrXb3pctGiNyYhTIZrOOzPmKAwlNinyyr7tlan2Ha4z5XHOJon9O
uVk252kTlBqvTEX4nZKN2x6UiI9RaVegbOV6lthm3lQZF+25C6kN/8emdkhDIkjG
mOI8MaHp6iDbpIBWBziEpkvGxvxO4PAaESldyJH6Fg+uzmNEDEwp6jvFBmlCiQIE
wdt5Swsl2zKk8g/tBHeDSDK0inHnP5xho4tC68rDtLrgxyLeBuVjyLMItkYJdTqh
s8B5gdCs3SS41aXMk+w3BnLBIRRMrByGiY2Nx23Si6KZU8pT5sM0dGGOTGCma1dy
NNitIRbePyrajV2p/xqnMeiGvxqnCckRRYr0uV2KPJzqwwMoZvNsnS+XL2KqdgCY
ndK5Nda5oRb+yDDlPKlRvY8cKh1N8GWdoFIGbfrMIYJcGiUXBoX7UoA/DKKbnY2o
p+TE3t9RGz9UPwxHqSMTXRV+xs4X3lgWf4sA/ciGGPeuAgwdOoGJnfn4kY7AIPq8
LYXj8K4rx3cRA+l7Y4DCppPQEl8bxKXhFFccKsXToFITW+PNdCuM7FDugOHdzwQE
QlesAkLV+w==
=gyaY
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should go into -rc1. This contains:
- bsg_open vs bsg_unregister race fix (Anatoliy)
- NVMe pull request from Christoph, with fixes for regressions in
this window, FC connect/reconnect path code unification, and a
trace point addition.
- timeout fix (Christoph)
- remove a few unused functions (Christoph)
- blk-mq tag_set reinit fix (Roman)"
* tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
bsg: fix race of bsg_open and bsg_unregister
block: remov blk_queue_invalidate_tags
nvme-fabrics: fix and refine state checks in __nvmf_check_ready
nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
nvme-fabrics: refactor queue ready check
blk-mq: remove blk_mq_tagset_iter
nvme: remove nvme_reinit_tagset
nvme-fc: fix nulling of queue data on reconnect
nvme-fc: remove reinit_request routine
blk-mq: don't time out requests again that are in the timeout handler
nvme-fc: change controllers first connect to use reconnect path
nvme: don't rely on the changed namespace list log
nvmet: free smart-log buffer after use
nvme-rdma: fix error flow during mapping request data
nvme: add bio remapping tracepoint
nvme: fix NULL pointer dereference in nvme_init_subsystem
blk-mq: reinit q->tag_set_list entry only after grace period