For now in ext4_mb_new_blocks_simple, if we found a block which
should be excluded then will switch to next group, this may
probably cause 'group' run out of range.
Change to check next block in the same group when get a block should
be excluded. Also change the search range to EXT4_CLUSTERS_PER_GROUP
and add error checking.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20220110035141.1980-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
During fast commit replay procedure, we clear inode blocks bitmap in
ext4_ext_clear_bb(), this may cause ext4_mb_new_blocks_simple() allocate
blocks still in use.
Make ext4_fc_record_regions() also record physical disk regions used by
inodes during replay procedure. Then ext4_mb_new_blocks_simple() can
excludes these blocks in use.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220110035141.1980-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Set workstation_name from the master_tcon for multiuser mounts.
Just in case, protect size_of_ntlmssp_blob against a NULL workstation_name.
Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org # 5.16
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ryan Bair <ryandbair@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For example if mtime or size has changed.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
- Ensure SM_NOTIFY doesn't crash the NFS server host
- Ensure NLM locks are cleaned up after client reboot
- Fix a leak of internal NFSv4 lease information
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmH1em4ACgkQM2qzM29m
f5drhRAAq8uU+tgABqZNj4aLivUOAionkSiV6Blk1V44DO00yhY2y3dAsOu8bO0k
Kh1Yu0QSZeaYDSi2Ak9qCKAl8eNg8lvlxWJ5pQ+GERVJiZj3JJRPSUJI+5r/aQMi
k774Y+DzLwPn6/r5iTyymm3vx1wcas+Y/v2nvmHob/G74UKngbhOhP05XS/1MDlM
fdTtXVKqLx92grDljTXWCtT5q5mpOc+OFufo2a5+b1aJjUWiU/rraT1mArNlEC7F
IMw/eZn6ZnZv+ywbVJFGeRib/Xa7jNeKA+4CQMH+quk/s8rHEaUJqeM5439HLBYk
E0KrFAdn+VDV5A6I9TIB1vtykl0KzC/r2u8G4vbA++rfpuxW36lGS95JFnDctGG+
uwk/f4p2+D7oSGt7gLXt8LTOAx0/NeT+OTtUqZRPcoKO7uXvkkCCu2irD9VpGSpD
A83Qq0ewT9ntNy0Feik3FgmRSmPTgvywE78MeRFoundd3QhtghUunfY1N2soDt7t
0hyqBhcH8ypWjFoKmv+wAHLPcGcdeg+8T0w3hFPcyTrrdYo/OJl4MNgrIczA2z8O
nWCZ+lOZq3QtAkd0eGSFPhnTVebCP5n6yvIfDN4rZc+ASNAqXCR5e1yCDE1gfO+E
I1uCcxzewWPe3DsuYWQznEx5u4Rpiml5JF1q5uKFwTNj4UTBFKQ=
=IC/r
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Notable bug fixes:
- Ensure SM_NOTIFY doesn't crash the NFS server host
- Ensure NLM locks are cleaned up after client reboot
- Fix a leak of internal NFSv4 lease information"
* tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
lockd: fix failure to cleanup client locks
lockd: fix server crash on reboot of client holding lock
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmH6gtsACgkQnJ2qBz9k
QNkHTggAvFfCBZ11AM3MIQifVrN06q0Aq/xTCR56lbLoqjVbLx+oUAPYk45AgzBJ
9NhYFwPbevLs+c1JrigmTW/i2juZghdfV3CBu6uWvnIDgZbjDwt1RFZ/TMuzG488
Orr12n34J+kaM89BxBPxecFXqGW1bqtIeIUkM6M4OefagVvueRP591GEHRPGk60S
nz90LIZN2fsXrDq6K0EC4LVnMF8VWe7lpW8vHORc/O83KasHFGv1xXJ/Z1ovq9ln
N17pbjFvECyKwIsvQCuKoxa/iutKqiUSQiVyFyN9IryBE5bMKSuv87EP+dKPcP/O
Vmkg2AZRAbL9+M/rhgu4mF0bfhnuJQ==
=uGov
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"Fix stale file descriptor in copy_event_to_user"
* tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: Fix stale file descriptor in copy_event_to_user()
If we've reached the end of the directory, then cache that information
in the context so that we don't need to do an uncached readdir in order
to rediscover that fact.
Fixes: 794092c57f ("NFS: Do uncached readdir when we're seeking a cookie in an empty page cache")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If we're doing an uncached read of the directory, then we ideally want
to read only the exact set of entries that will fit in the buffer
supplied by the getdents() system call. So unlike the case where we're
reading into the page cache, let's send only one READDIR call, before
trying to fill up the buffer.
Fixes: 35df59d3ef ("NFS: Reduce number of RPC calls when doing uncached readdir")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Since we've started treating fallocate more like a file write, we
should flush the log to disk if the user has asked for synchronous
writes either by setting it via fcntl flags, or inode flags, or with
the sync mount option. We've already got a helper for this, so use
it.
[The original patch by Darrick was massaged by Dave to fit this patchset]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
The operations that xfs_update_prealloc_flags() perform are now
unique to xfs_fs_map_blocks(), so move xfs_update_prealloc_flags()
to be a static function in xfs_pnfs.c and cut out all the
other functionality that is doesn't use anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Now that we only call xfs_update_prealloc_flags() from
xfs_file_fallocate() in the case where we need to set the
preallocation flag, do this in xfs_alloc_file_space() where we
already have the inode joined into a transaction and get
rid of the call to xfs_update_prealloc_flags() from the fallocate
code.
This also means that we now correctly avoid setting the
XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as
these inodes will never have preallocated extents.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
In XFS, we always update the inode change and modification time when
any fallocate() operation succeeds. Furthermore, as various
fallocate modes can change the file contents (extending EOF,
punching holes, zeroing things, shifting extents), we should drop
file privileges like suid just like we do for a regular write().
There's already a VFS helper that figures all this out for us, so
use that.
The net effect of this is that we no longer drop suid/sgid if the
caller is root, but we also now drop file capabilities.
We also move the xfs_update_prealloc_flags() function so that it now
is only called by the scope that needs to set the the prealloc flag.
Based on a patch from Darrick Wong.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Callers can acheive the same thing by calling xfs_log_force_inode()
after making their modifications. There is no need for
xfs_update_prealloc_flags() to do this.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYfkylwAKCRDh3BK/laaZ
PLY6AP9rui/UD3RE/koFfRVTTPuZkv7I14mHmCzDloYcmPDJCgEAwebzCiOo22kQ
Jn3V/B8mmG2vBv+qu+iM3/WbkPqCtgg=
=ajWT
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix a regression introduced in v5.15, affecting copy up of files with
'noatime' or 'sync' attributes to a tmpfs upper layer"
* tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: don't fail copy up if no fileattr support on upper
ovl: fix NULL pointer dereference in copy up warning
previous CONFIG_UNICODE. It is -rc material since we don't want to
expose the former symbol on 5.17.
This has been living on linux-next for the past week.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAmH4lC0ACgkQQEuZqsMc
ppRl1Q/+Lyba+DORs26C4p1GDS5ezHOCdbBUE8RFwWjIl+h5ckQ/8kndaXPRLorZ
1S9E6h5RfqhekGKOhMTXyfzqcW8qMzUy4i3J2lmJpDwATqLt+4Wu/M2BBH2CaIIL
EhhW8D+WduAEM/TFYihH9LJ0RopvIsqcy8qdu+oSBGfPAdxJ0f2+Yx0pNTRfqVmi
8+Dry0nRhP12o9wXElpZ0/BYEZTlY+Zo6L/heT6/GKDLpz/YmZp18GAc/0TWb3LL
ASujr+anU2LxSFskkyuMu+rbFE8eDshvHEuBZLxlD2o+tG6lAi4mNWZYc0/+jPMw
8TdJ5MEX3IlljXLRKuYctoCdsFQKLxH5IN5wLkiLvM5fBpeb/sWqNolx8f2s/f9R
TaUdjwiqFnML4VnlEH3hd3/hUUVbnE+xJo6g1iRGgJY3eecimvwl8P5H7k9Sn3OS
4zh0bHT9pfg+vUR0BVnfdWi4OpPxSrdqCgFhHsmKaGMvTApm0qMKK1Cg4OPNtYwr
d1RMqsqEBSJTHzr0nHoiWLhkIo8npRPy+LMK51D8j6wg0kOj4GGYerWm1MD9ZlbI
rhPy7nDgdcH48Gk1m6o7dROZKCvkZK+/QDPelBgZHGcGB94lUugYVJQrlBjI+2+7
Wx5oQLgQgeabeMtDZ/YNy5Dsre20vas2oLj5cs6uuoWNOcBO6Ew=
=YVNN
-----END PGP SIGNATURE-----
Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode cleanup from Gabriel Krisman Bertazi:
"A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
the previous CONFIG_UNICODE. It is -rc material since we don't want to
expose the former symbol on 5.17.
This has been living on linux-next for the past week"
* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: clean up the Kconfig symbol confusion
Fix the readahead conversion to correctly manage the last batch skipping
when reading from cache. This involves a readahead batch of one page or
one folio, so set the batch size according to the number of constituent
pages (should be 1 for a filesystem that doesn't do multipage folios yet).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
Reviewed-by: Rohith Surabattula <rohiths.msft@gmail.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Move cifs to using fscache DIO API instead of the old upstream I/O API as
that has been removed. This is a stopgap solution as the intention is that
at sometime in the future, the cache will move to using larger blocks and
won't be able to store individual pages in order to deal with the potential
for data corruption due to the backing filesystem being able insert/remove
bridging blocks of zeros into its extent list[1].
cifs then reads and writes cache pages synchronously and one page at a time.
The preferred change would be to use the netfs lib, but the new I/O API can
be used directly. It's just that as the cache now needs to track data for
itself, caching blocks may exceed page size...
This code is somewhat borrowed from my "fallback I/O" patchset[2].
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YO17ZNOcq+9PajfQ@mit.edu [1]
Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2]
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add a netfs_cache_ops method by which a network filesystem can ask the
cache about what data it has available and where so that it can make a
multipage read more efficient.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Transition the cifs filesystem from using the old ->readpages() method to
using the new ->readahead() method.
For the moment, this removes any invocation of fscache to read data from
the local cache, leaving that to another patch.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This code calls fd_install() which gives the userspace access to the fd.
Then if copy_info_records_to_user() fails it calls put_unused_fd(fd) but
that will not release it and leads to a stale entry in the file
descriptor table.
Generally you can't trust the fd after a call to fd_install(). The fix
is to delay the fd_install() until everything else has succeeded.
Fortunately it requires CAP_SYS_ADMIN to reach this code so the security
impact is less.
Fixes: f644bc449b ("fanotify: fix copy_event_to_user() fid error clean up")
Link: https://lore.kernel.org/r/20220128195656.GA26981@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Syzbot tripped over the following complaint from the kernel:
WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597
While trying to run XFS_IOC_GETBMAP against the following structure:
struct getbmap fubar = {
.bmv_count = 0x22dae649,
};
Obviously, this is a crazy huge value since the next thing that the
ioctl would do is allocate 37GB of memory. This is enough to make
kvmalloc mad, but isn't large enough to trip the validation functions.
In other words, I'm fussing with checks that were **already sufficient**
because that's easier than dealing with 644 internal bug reports. Yes,
that's right, six hundred and forty-four.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
After the recent changes made by commit c2e3930529 ("btrfs: clear
extent buffer uptodate when we fail to write it") and its followup fix,
commit 651740a502 ("btrfs: check WRITE_ERR when trying to read an
extent buffer"), we can now end up not cleaning up space reservations of
log tree extent buffers after a transaction abort happens, as well as not
cleaning up still dirty extent buffers.
This happens because if writeback for a log tree extent buffer failed,
then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer
and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on,
when trying to free the log tree with free_log_tree(), which iterates
over the tree, we can end up getting an -EIO error when trying to read
a node or a leaf, since read_extent_buffer_pages() returns -EIO if an
extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the
EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return
immediately as we can not iterate over the entire tree.
In that case we never update the reserved space for an extent buffer in
the respective block group and space_info object.
When this happens we get the following traces when unmounting the fs:
[174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure
[174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure
[174957.399379] ------------[ cut here ]------------
[174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs]
[174957.407523] Modules linked in: btrfs overlay dm_zero (...)
[174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1
[174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs]
[174957.429717] Code: 21 48 8b bd (...)
[174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206
[174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8
[174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000
[174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000
[174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148
[174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100
[174957.439317] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
[174957.440402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
[174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[174957.443948] Call Trace:
[174957.444264] <TASK>
[174957.444538] btrfs_free_block_groups+0x255/0x3c0 [btrfs]
[174957.445238] close_ctree+0x301/0x357 [btrfs]
[174957.445803] ? call_rcu+0x16c/0x290
[174957.446250] generic_shutdown_super+0x74/0x120
[174957.446832] kill_anon_super+0x14/0x30
[174957.447305] btrfs_kill_super+0x12/0x20 [btrfs]
[174957.447890] deactivate_locked_super+0x31/0xa0
[174957.448440] cleanup_mnt+0x147/0x1c0
[174957.448888] task_work_run+0x5c/0xa0
[174957.449336] exit_to_user_mode_prepare+0x1e5/0x1f0
[174957.449934] syscall_exit_to_user_mode+0x16/0x40
[174957.450512] do_syscall_64+0x48/0xc0
[174957.450980] entry_SYSCALL_64_after_hwframe+0x44/0xae
[174957.451605] RIP: 0033:0x7f328fdc4a97
[174957.452059] Code: 03 0c 00 f7 (...)
[174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
[174957.456131] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
[174957.457118] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
[174957.458005] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
[174957.459113] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
[174957.460193] </TASK>
[174957.460534] irq event stamp: 0
[174957.461003] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[174957.461947] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.463147] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.465116] softirqs last disabled at (0): [<0000000000000000>] 0x0
[174957.466323] ---[ end trace bc7ee0c490bce3af ]---
[174957.467282] ------------[ cut here ]------------
[174957.468184] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:3976 btrfs_free_block_groups+0x330/0x3c0 [btrfs]
[174957.470066] Modules linked in: btrfs overlay dm_zero (...)
[174957.483137] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1
[174957.484691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[174957.486853] RIP: 0010:btrfs_free_block_groups+0x330/0x3c0 [btrfs]
[174957.488050] Code: 00 00 00 ad de (...)
[174957.491479] RSP: 0018:ffffb70d41cffde0 EFLAGS: 00010206
[174957.492520] RAX: ffff8b08d79310b0 RBX: ffff8b09c3848000 RCX: 0000000000000000
[174957.493868] RDX: 0000000000000001 RSI: fffff443055ee600 RDI: ffffffffb1131846
[174957.495183] RBP: ffff8b08d79310b0 R08: 0000000000000000 R09: 0000000000000000
[174957.496580] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8b08d7931000
[174957.498027] R13: ffff8b09c38492b0 R14: dead000000000122 R15: dead000000000100
[174957.499438] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
[174957.500990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[174957.502117] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
[174957.503513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[174957.504864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[174957.506167] Call Trace:
[174957.506654] <TASK>
[174957.507047] close_ctree+0x301/0x357 [btrfs]
[174957.507867] ? call_rcu+0x16c/0x290
[174957.508567] generic_shutdown_super+0x74/0x120
[174957.509447] kill_anon_super+0x14/0x30
[174957.510194] btrfs_kill_super+0x12/0x20 [btrfs]
[174957.511123] deactivate_locked_super+0x31/0xa0
[174957.511976] cleanup_mnt+0x147/0x1c0
[174957.512610] task_work_run+0x5c/0xa0
[174957.513309] exit_to_user_mode_prepare+0x1e5/0x1f0
[174957.514231] syscall_exit_to_user_mode+0x16/0x40
[174957.515069] do_syscall_64+0x48/0xc0
[174957.515718] entry_SYSCALL_64_after_hwframe+0x44/0xae
[174957.516688] RIP: 0033:0x7f328fdc4a97
[174957.517413] Code: 03 0c 00 f7 d8 (...)
[174957.521052] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[174957.522514] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
[174957.523950] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
[174957.525375] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
[174957.526763] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
[174957.528058] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
[174957.529404] </TASK>
[174957.529843] irq event stamp: 0
[174957.530256] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[174957.531061] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.532075] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.533083] softirqs last disabled at (0): [<0000000000000000>] 0x0
[174957.533865] ---[ end trace bc7ee0c490bce3b0 ]---
[174957.534452] BTRFS info (device dm-0): space_info 4 has 1070841856 free, is not full
[174957.535404] BTRFS info (device dm-0): space_info total=1073741824, used=2785280, pinned=0, reserved=49152, may_use=0, readonly=65536 zone_unusable=0
[174957.537029] BTRFS info (device dm-0): global_block_rsv: size 0 reserved 0
[174957.537859] BTRFS info (device dm-0): trans_block_rsv: size 0 reserved 0
[174957.538697] BTRFS info (device dm-0): chunk_block_rsv: size 0 reserved 0
[174957.539552] BTRFS info (device dm-0): delayed_block_rsv: size 0 reserved 0
[174957.540403] BTRFS info (device dm-0): delayed_refs_rsv: size 0 reserved 0
This also means that in case we have log tree extent buffers that are
still dirty, we can end up not cleaning them up in case we find an
extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case
we have no way for iterating over the rest of the tree.
This issue is very often triggered with test cases generic/475 and
generic/648 from fstests.
The issue could almost be fixed by iterating over the io tree attached to
each log root which keeps tracks of the range of allocated extent buffers,
log_root->dirty_log_pages, however that does not work and has some
inconveniences:
1) After we sync the log, we clear the range of the extent buffers from
the io tree, so we can't find them after writeback. We could keep the
ranges in the io tree, with a separate bit to signal they represent
extent buffers already written, but that means we need to hold into
more memory until the transaction commits.
How much more memory is used depends a lot on whether we are able to
allocate contiguous extent buffers on disk (and how often) for a log
tree - if we are able to, then a single extent state record can
represent multiple extent buffers, otherwise we need multiple extent
state record structures to track each extent buffer.
In fact, my earlier approach did that:
https://lore.kernel.org/linux-btrfs/3aae7c6728257c7ce2279d6660ee2797e5e34bbd.1641300250.git.fdmanana@suse.com/
However that can cause a very significant negative impact on
performance, not only due to the extra memory usage but also because
we get a larger and deeper dirty_log_pages io tree.
We got a report that, on beefy machines at least, we can get such
performance drop with fsmark for example:
https://lore.kernel.org/linux-btrfs/20220117082426.GE32491@xsang-OptiPlex-9020/
2) We would be doing it only to deal with an unexpected and exceptional
case, which is basically failure to read an extent buffer from disk
due to IO failures. On a healthy system we don't expect transaction
aborts to happen after all;
3) Instead of relying on iterating the log tree or tracking the ranges
of extent buffers in the dirty_log_pages io tree, using the radix
tree that tracks extent buffers (fs_info->buffer_radix) to find all
log tree extent buffers is not reliable either, because after writeback
of an extent buffer it can be evicted from memory by the release page
callback of the btree inode (btree_releasepage()).
Since there's no way to be able to properly cleanup a log tree without
being able to read its extent buffers from disk and without using more
memory to track the logical ranges of the allocated extent buffers do
the following:
1) When we fail to cleanup a log tree, setup a flag that indicates that
failure;
2) Trigger writeback of all log tree extent buffers that are still dirty,
and wait for the writeback to complete. This is just to cleanup their
state, page states, page leaks, etc;
3) When unmounting the fs, ignore if the number of bytes reserved in a
block group and in a space_info is not 0 if, and only if, we failed to
cleanup a log tree. Also ignore only for metadata block groups and the
metadata space_info object.
This is far from a perfect solution, but it serves to silence test
failures such as those from generic/475 and generic/648. However having
a non-zero value for the reserved bytes counters on unmount after a
transaction abort, is not such a terrible thing and it's completely
harmless, it does not affect the filesystem integrity in any way.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Clang static analysis reports this problem
ioctl.c:3333:8: warning: 3rd function call argument is an
uninitialized value
ret = exclop_start_or_cancel_reloc(fs_info,
cancel is only set in one branch of an if-check and is always used. So
initialize to false.
Fixes: 1a15eb724a ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At ioctl.c:create_snapshot(), we allocate a pending snapshot structure and
then attach it to the transaction's list of pending snapshots. After that
we call btrfs_commit_transaction(), and if that returns an error we jump
to 'fail' label, where we kfree() the pending snapshot structure. This can
result in a later use-after-free of the pending snapshot:
1) We allocated the pending snapshot and added it to the transaction's
list of pending snapshots;
2) We call btrfs_commit_transaction(), and it fails either at the first
call to btrfs_run_delayed_refs() or btrfs_start_dirty_block_groups().
In both cases, we don't abort the transaction and we release our
transaction handle. We jump to the 'fail' label and free the pending
snapshot structure. We return with the pending snapshot still in the
transaction's list;
3) Another task commits the transaction. This time there's no error at
all, and then during the transaction commit it accesses a pointer
to the pending snapshot structure that the snapshot creation task
has already freed, resulting in a user-after-free.
This issue could actually be detected by smatch, which produced the
following warning:
fs/btrfs/ioctl.c:843 create_snapshot() warn: '&pending_snapshot->list' not removed from list
So fix this by not having the snapshot creation ioctl directly add the
pending snapshot to the transaction's list. Instead add the pending
snapshot to the transaction handle, and then at btrfs_commit_transaction()
we add the snapshot to the list only when we can guarantee that any error
returned after that point will result in a transaction abort, in which
case the ioctl code can safely free the pending snapshot and no one can
access it anymore.
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Check item size before accessing the device item to avoid out of bound
access, similar to inode_item check.
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following super simple script would crash btrfs at unmount time, if
CONFIG_BTRFS_ASSERT() is set.
mkfs.btrfs -f $dev
mount $dev $mnt
xfs_io -f -c "pwrite 0 4k" $mnt/file
umount $mnt
mount -r ro $dev $mnt
btrfs scrub start -Br $mnt
umount $mnt
This will trigger the following ASSERT() introduced by commit
0a31daa4b6 ("btrfs: add assertion for empty list of transactions at
late stage of umount").
That patch is definitely not the cause, it just makes enough noise for
developers.
[CAUSE]
We will start transaction for the following call chain during scrub:
scrub_enumerate_chunks()
|- btrfs_inc_block_group_ro()
|- btrfs_join_transaction()
However for RO mount, there is no running transaction at all, thus
btrfs_join_transaction() will start a new transaction.
Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
btrfs_commit_super() to commit the new but empty transaction.
And leads to the ASSERT().
The bug has been there for a long time. Only the new ASSERT() makes it
noisy enough to be noticed.
[FIX]
For read-only scrub on read-only mount, there is no need to start a
transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
if it's read-only, skip all chunk allocation and go inc_block_group_ro()
directly.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that the VFS will do something with the return values from
->sync_fs, make ours pass on error codes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Strangely, sync_filesystem ignores the return code from the ->sync_fs
call, which means that syscalls like syncfs(2) never see the error.
This doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
If we fail to synchronize the filesystem while preparing to freeze the
fs, abort the freeze.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
This reverts commit 478ba09edc.
That commit was meant as a fix for setattrs with by fd (e.g. ftruncate)
to use an open fid instead of the first fid it found on lookup.
The proper fix for that is to use the fid associated with the open file
struct, available in iattr->ia_file for such operations, and was
actually done just before in 6624664160 ("9p: retrieve fid from file
when file instance exist.")
As such, this commit is no longer required.
Furthermore, changing lookup to return open fids first had unwanted side
effects, as it turns out the protocol forbids the use of open fids for
further walks (e.g. clone_fid) and we broke mounts for some servers
enforcing this rule.
Note this only reverts to the old working behaviour, but it's still
possible for lookup to return open fids if dentry->d_fsdata is not set,
so more work is needed to make sure we respect this rule in the future,
for example by adding a flag to the lookup functions to only match
certain fid open modes depending on caller requirements.
Link: https://lkml.kernel.org/r/20220130130651.712293-1-asmadeus@codewreck.org
Fixes: 478ba09edc ("fs/9p: search open fids first")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: ron minnich <rminnich@gmail.com>
Reported-by: ng@0x80.stream
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
commit 6f1b228529 introduces a regression which can deadlock as
follows:
Task1: Task2:
jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable
spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head
__jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock)
jbd2_journal_put_journal_head
jbd_lock_bh_journal_head
Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
order, which finally result in a deadlock.
So use jbd2_journal_[grab|put]_journal_head instead in
ocfs2_test_bg_bit_allocatable() to fix it.
Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com
Fixes: 6f1b228529 ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ocfs2: fix a deadlock case".
This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols
jbd2_journal_[grab|put]_journal_head as preparation and later use them
in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
deadlock.
This patch (of 2):
This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
used outside modules, e.g. ocfs2.
Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again. Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.
Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.
To reproduce:
modprobe binfmt_misc
modprobe -r binfmt_misc
modprobe binfmt_misc
modprobe -r binfmt_misc
modprobe binfmt_misc
resulting in
modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory
and an unhappy kernel:
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
BUG: unable to handle page fault for address: fffffbfff8004802
Call Trace:
init_misc_binfmt+0x2d/0x1000 [binfmt_misc]
Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d533 ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Co-developed-by: Christian Brauner<brauner@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While removing an smb session, we need to free up the
tcp session for each channel for that session. We were
doing this with chan_lock held. This results in a
cyclic dependency with cifs_tcp_ses_lock.
For now, unlock the chan_lock temporarily before calling
cifs_put_tcp_session. This should not cause any problem
for now, since we do not remove channels anywhere else.
And this code segment will not be called by two threads.
When we do implement the code for removing channels, we
will need to execute proper ref counting here.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmH0Z+0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpgisD/99DrERqFZpgyWybHamp+ehzZLiAjJUKskQ
ihRuQ5lULQvtKTHpJm6uDw2SbwA/mJNxT2UIdQtMlqbtLOdkLnE5CWr587rbI88i
HqW2FYiGJOHbU5abEbCno47OsaHiwtgcANFQn/RzQgpWQWvYhQac0I9CnEvkkpzI
wBlgVdSZ5rc5u309fulK3SPPtiaZ6ThwBxmgAqirW/1c4n6dLFp6Eu/ORAdRoFRE
fzokylWi9sGSPH5ansvoN/PtiDdn4RW43hc/Jwf9F09MSFpsEm/NCE2HYp96T1U6
dn5O9xxN7dGe9+IsB1VreQSaO+ELbZRzlFSSvmZLPf03LX3O6ZiyIphABCThVSp2
u3FdVHfZFRfS208krqFmjPEtxFfRNLJTwdUN4o7gsd7hBXyj1aR2/kySyg7dAV5D
cBxJrRZp3+He4BJN4R/HDKt1tMWFroLT30IOHvVb/DHMo6Ow7yvtS65Iu43QD7G2
G9WeeVv1pXU+jwSTfhK45oiAyvIO1ud9DqDbWiorOKQwQz8P5JUUmy8392FDIcuJ
f6+PgJlcJzlzoC7kwxw0QEarkySYxQ5ezAcppa99p/ylqosr6M61/QrAp5xdyziH
RzlmmIQ5VCgnDvdnmCtCtnyaptbAiJjozcXHe7SaAI2CK091AhZqUnN8eUkGOJaG
8eYcyVT2Dg==
=G140
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just two small fixes this time:
- Fix a bug that can lead to node registration taking 1 second, when
it should finish much quicker (Dylan)
- Remove an unused argument from a function (Usama)"
* tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
io_uring: remove unused argument from io_rsrc_node_alloc
io_uring: fix bug in slow unregistering of nodes
creates interacting with pool namespace-constrained OSD permissions
from Jeff (marked for stable).
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmH0FAkTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzixpkB/4ssxwtq8aP82Kh/WuS9qdtofWtZvOB
6BPJdxseWvVMk9Bw/bO6BrxHpHJJKbhPPxKkhJGliSi19kWJKC7FYbuDRp7ffj4Y
3UDiRaD5aQyjFi1rIrBjDb3dgA1dxdXH4EVAPf44dlRD7HjqaglVldFWSHxqH1RQ
v4eB9RYHzNZKuAFsXF4+bzwZXMWgzaMXDNA5AzgRb8Zb3I2K2xSsIxDHD8MHlw8v
BFX7QNobsxi9vjmENOmH8WcjWx8abtdNl0ndNmHcc3u2QyNEfRcdx8DIytQCVRSw
gCyQVLgZIiedeZ84D6jmQu+RR668ztH+X6/QWWj5eHh6YlvtyqTPnuHe
=VuFn
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A ZERO_SIZE_PTR dereference fix from Xiubo and two fixes for async
creates interacting with pool namespace-constrained OSD permissions
from Jeff (marked for stable)"
* tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client:
ceph: set pool_ns in new inode layout for async creates
ceph: properly put ceph_string reference after async create attempt
ceph: put the requests/sessions when it fails to alloc memory
Fix by removing the extra asterisk.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The kernel test robot reports that commit c42ff46f97 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like
sysctl table check failed: fs/ocfs2/nm Not a file
sysctl table check failed: fs/ocfs2/nm No proc_handler
sysctl table check failed: fs/ocfs2/nm bogus .mode 0555
and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report. Possibly that original report
only ever made it to the lkp list.
The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.
So take that last step.
Fixes: c42ff46f97 ("ocfs2: simplify subdirectory registration with register_sysctl()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220128065310.GF8421@xsang-OptiPlex-9020/
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/KQ2F6TPJWMDVEXJM4WTUC4DU3EH3YJVT/
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHz0QsACgkQnJ2qBz9k
QNkN+AgA6XqWHKYyElfgJFt1UqaoNMz/Faz9H/+PKiBNSTf6/+67D+V7DFz6jJrv
dDwHNzfDg9kR+pbAAPwhl2jfnQoUlsr019Hrqa5HpWlj5geVpbdunYUzS2WOkwqD
/m+brcLgPdKb2uIysj6wOh9B7wa8V9ksl3EjQvvwaHaU0p1YLUqidVXucYvs8DUo
bgXNaj9GmeysxnmU+aILotWuuXH2vOP4Q2Uk4qz3rN6xW9eEXtpQ4y7gWBp/GA8y
Ia8FtFdQdvlSDOJYMdPOTBu5RB7gY9ElrapvVaWNYtCWI/jRv666nZsLaERYNhLN
uUsG4MWjYbOqW5XqFDbSOwbDqvMh5Q==
=mEFA
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Fixes for userspace breakage caused by fsnotify changes ~3 years ago
and one fanotify cleanup"
* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix fsnotify hooks in pseudo filesystems
fsnotify: invalidate dcache before IN_DELETE event
fanotify: remove variable set but not used
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHz0I4ACgkQnJ2qBz9k
QNkvkAf/fD0swq13rFWiMDo+Azj9BwdjTeK7kEhGBCw9rL3BchH+OTMgPmcdplTJ
uzXVyWZ98hsCBY4Aclu3Grcy1Cj9A3+JW9FqrWbwWy8t583DuaV31UL99AXb/4sH
xYAmMm9V9gO5ckIAffHzHcEyN+dHMj89zl8vAtxGnlwR6gZLE1wjB3AmWBlctJpE
bWQWCpuJsW+7o7ky5hIQ2hQWLxb5OejdYtLu6Su4tM86xfMwi3pB0tzilJ6jorAH
eH5LBt5pVDI+gYQY1WYoZTy4eQKM0lhVW3WplhbNJwkJUupNS5xCXFfwB8bUppwc
+hyGn5G3GJZ2Ya+xALO3GfS8fX9ESQ==
=dQRa
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and quota fixes from Jan Kara:
"Fixes for crashes in UDF when inode expansion fails and one quota
cleanup"
* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: cleanup double word in comment
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
From RFC 7530 Section 16.34.5:
o The server has not recorded an unconfirmed { v, x, c, *, * } and
has recorded a confirmed { v, x, c, *, s }. If the principals of
the record and of SETCLIENTID_CONFIRM do not match, the server
returns NFS4ERR_CLID_INUSE without removing any relevant leased
client state, and without changing recorded callback and
callback_ident values for client { x }.
The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.
Fixes: 2b63482185 ("nfsd: fix clid_inuse on mount with security change")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
In my testing, we're sometimes hitting the request->fl_flags & FL_EXISTS
case in posix_lock_inode, presumably just by random luck since we're not
actually initializing fl_flags here.
This probably didn't matter before commit 7f024fcd5c ("Keep read and
write fds with each nlm_file") since we wouldn't previously unlock
unless we knew there were locks.
But now it causes lockd to give up on removing more locks.
We could just initialize fl_flags, but really it seems dubious to be
calling vfs_lock_file with random values in some of the fields.
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: fixed checkpatch.pl nit ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Dan reported that he was unable to write to files that had been
asynchronously created when the client's OSD caps are restricted to a
particular namespace.
The issue is that the layout for the new inode is only partially being
filled. Ensure that we populate the pool_ns_data and pool_ns_len in the
iinfo before calling ceph_fill_inode.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/54013
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The reference acquired by try_prep_async_create is currently leaked.
Ensure we put it.
Cc: stable@vger.kernel.org
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When failing to allocate the sessions memory we should make sure
the req1 and req2 and the sessions get put. And also in case the
max_sessions decreased so when kreallocate the new memory some
sessions maybe missed being put.
And if the max_sessions is 0 krealloc will return ZERO_SIZE_PTR,
which will lead to a distinct access fault.
URL: https://tracker.ceph.com/issues/53819
Fixes: e1a4541ec0 ("ceph: flush the mdlog before waiting on unsafe reqs")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Trond Myklebust reported soft lockups in XFS IO completion such as
this:
watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
Workqueue: xfs-conv/md127 xfs_end_io [xfs]
RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
Call Trace:
wake_up_page_bit+0x8a/0x110
iomap_finish_ioend+0xd7/0x1c0
iomap_finish_ioends+0x7f/0xb0
xfs_end_ioend+0x6b/0x100 [xfs]
xfs_end_io+0xb9/0xe0 [xfs]
process_one_work+0x1a7/0x360
worker_thread+0x1fa/0x390
kthread+0x116/0x130
ret_from_fork+0x35/0x40
Ioends are processed as an atomic completion unit when all the
chained bios in the ioend have completed their IO. Logically
contiguous ioends can also be merged and completed as a single,
larger unit. Both of these things can be problematic as both the
bio chains per ioend and the size of the merged ioends processed as
a single completion are both unbound.
If we have a large sequential dirty region in the page cache,
write_cache_pages() will keep feeding us sequential pages and we
will keep mapping them into ioends and bios until we get a dirty
page at a non-sequential file offset. These large sequential runs
can will result in bio and ioend chaining to optimise the io
patterns. The pages iunder writeback are pinned within these chains
until the submission chaining is broken, allowing the entire chain
to be completed. This can result in huge chains being processed
in IO completion context.
We get deep bio chaining if we have large contiguous physical
extents. We will keep adding pages to the current bio until it is
full, then we'll chain a new bio to keep adding pages for writeback.
Hence we can build bio chains that map millions of pages and tens of
gigabytes of RAM if the page cache contains big enough contiguous
dirty file regions. This long bio chain pins those pages until the
final bio in the chain completes and the ioend can iterate all the
chained bios and complete them.
OTOH, if we have a physically fragmented file, we end up submitting
one ioend per physical fragment that each have a small bio or bio
chain attached to them. We do not chain these at IO submission time,
but instead we chain them at completion time based on file
offset via iomap_ioend_try_merge(). Hence we can end up with unbound
ioend chains being built via completion merging.
XFS can then do COW remapping or unwritten extent conversion on that
merged chain, which involves walking an extent fragment at a time
and running a transaction to modify the physical extent information.
IOWs, we merge all the discontiguous ioends together into a
contiguous file range, only to then process them individually as
discontiguous extents.
This extent manipulation is computationally expensive and can run in
a tight loop, so merging logically contiguous but physically
discontigous ioends gains us nothing except for hiding the fact the
fact we broke the ioends up into individual physical extents at
submission and then need to loop over those individual physical
extents at completion.
Hence we need to have mechanisms to limit ioend sizes and
to break up completion processing of large merged ioend chains:
1. bio chains per ioend need to be bound in length. Pure overwrites
go straight to iomap_finish_ioend() in softirq context with the
exact bio chain attached to the ioend by submission. Hence the only
way to prevent long holdoffs here is to bound ioend submission
sizes because we can't reschedule in softirq context.
2. iomap_finish_ioends() has to handle unbound merged ioend chains
correctly. This relies on any one call to iomap_finish_ioend() being
bound in runtime so that cond_resched() can be issued regularly as
the long ioend chain is processed. i.e. this relies on mechanism #1
to limit individual ioend sizes to work correctly.
3. filesystems have to loop over the merged ioends to process
physical extent manipulations. This means they can loop internally,
and so we break merging at physical extent boundaries so the
filesystem can easily insert reschedule points between individual
extent manipulations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-and-tested-by: Trond Myklebust <trondmy@hammerspace.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
- New Features:
- Basic handling for case insensitive filesystems
- Initial support for fs_locations and server trunking
- Bugfixes and Cleanups:
- Cleanups to how the "struct cred *" is handled for the nfs_access_entry
- Ensure the server has an up to date ctimes before hardlinking or renaming
- Update 'blocks used' after writeback, fallocate, and clone
- nfs_atomic_open() fixes
- Improvements to sunrpc tracing
- Various null check & indenting related cleanups
- Some improvements to the sunrpc sysfs code
- Use default_groups in kobj_type
- Fix some potential races and reference leaks
- A few tracepoint cleanups in xprtrdma
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmHodsIACgkQ18tUv7Cl
QOt3xQ//c9JPmMJZZoZtaD5UrHg28iyxaJpOUUpwC/jxQhLOETCf+nU1cELYgLq5
4W06NBYEmjDJ/tihUvcGMKLvbCtQR9Zl9HepFKDTLTQpGmRFD4enwSmMNvW/AV+h
I7PoN6J1DX/TZ5InOHH9asyoC2MjwrNHMn3bbQVT0qy+i3T76zJiBF79eWTnPR48
kKPnF1I0p4LKGJy+y+y/z2mdCsz7tzFkhssxVhot0nafxXzbUOp1H9aiwxroRiUC
ljbBA0TX8FWkGpGFt3y2QK2fMD7ovDpRhLFYiJClmeERXJVH5mXL9O5XfN5AL0xe
W/QqT5lbWfeHLkpm2j87yTyaHASC7hGKsAyPD0zWLDcNZws61l1Sy4BHymSE5ZVh
zt7sJjBnOWAtntyUGBg78G2vhBsd63GzrtcqAOlrngwA5ohJ8f32qvBQGyw4MQu9
75CjRcO8K8mnf9BJ6I1vYPycjkUh9RSFfNdnUEAI9ZwiTEC/hfEvH/omvEtZsNol
jBgv2SItTkdMZlEppEL4gxuaYT2wiZf2C6Gco215iPAqLC6dudoroN6yoLk/LRd0
OWZLl5XTr3j6m5QDm22k5CG080vl6XiAxmAFaFSLza6Q34Jmuluc0gLAZZxvqXk9
Ay7dQt9PQQk6mXD5Hreb0E5N9zcm2LkfvWpyGJ7mTV7sSHjA2DU=
=wcVT
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Basic handling for case insensitive filesystems
- Initial support for fs_locations and server trunking
Bugfixes and Cleanups:
- Cleanups to how the "struct cred *" is handled for the
nfs_access_entry
- Ensure the server has an up to date ctimes before hardlinking or
renaming
- Update 'blocks used' after writeback, fallocate, and clone
- nfs_atomic_open() fixes
- Improvements to sunrpc tracing
- Various null check & indenting related cleanups
- Some improvements to the sunrpc sysfs code:
- Use default_groups in kobj_type
- Fix some potential races and reference leaks
- A few tracepoint cleanups in xprtrdma"
[ This should have gone in during the merge window, but didn't. The
original pull request - sent during the merge window - had gotten
marked as spam and discarded due missing DKIM headers in the email
from Anna. - Linus ]
* tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (35 commits)
SUNRPC: Don't dereference xprt->snd_task if it's a cookie
xprtrdma: Remove definitions of RPCDBG_FACILITY
xprtrdma: Remove final dprintk call sites from xprtrdma
sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change()
net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_change
NFSv4.1 test and add 4.1 trunking transport
SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt
NFSv4 handle port presence in fs_location server string
NFSv4 expose nfs_parse_server_name function
NFSv4.1 query for fs_location attr on a new file system
NFSv4 store server support for fs_location attribute
NFSv4 remove zero number of fs_locations entries error check
NFSv4: nfs_atomic_open() can race when looking up a non-regular file
NFSv4: Handle case where the lookup of a directory fails
NFSv42: Fallocate and clone should also request 'blocks used'
NFSv4: Allow writebacks to request 'blocks used'
SUNRPC: use default_groups in kobj_type
NFS: use default_groups in kobj_type
NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open()
NFS: Add a helper to remove case-insensitive aliases
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmHwDeMACgkQxWXV+ddt
WDtdMQ//QFqkIB34zW5N3uX1xBFht/G/bCPNdGiK5YerjJZj1f6Rmsytbb6qlWHg
NlB/XEPeQaQVrSfF37svnvATgySPaqePsufrT2XYu3x2w8muPTl460wmzdMt5h47
rGB+ct4JdLBH4KJgqe2Bilrqg+FJmL3XT5k0aU3driy4Gb+bcDGeEyVmTWcnNRIg
DzfUlNwTKhAhZDl8D3B9X2vV8TZDBtrRLquI94eYvooF3LYDL+kExLUW8WDmmAfy
mjnANs8c+EtcVAzN7tW+O1UqdYYJ8Yo4ngk1nVVRdRvA2BDp9ixgWi/m/3jZ3JmJ
jySV1zsZJB3ZGp/hIuEvtGY7jheDtbTnfgtI+vwjVdr208acs+XhfDckuOZBZIUY
7Zk+Qif/narxFAoAvkgkH5QDNSSReKqaHgzohfnzSQqrfO0bh6fw1FnBOm4iXT7C
cXvReD4m36g46SdTsxnvttpXizXIFe4JPOkpRkBzxIQFaMTA4Is43W0lYC24Ppxj
A0UVevh3HPhOYzABynuy0EnknZeylb6P+WpGG6Ge+sVrVquQiwR01n4HeoaJO3qe
re46uUGwO8Q30blYY50HBSJp0bpcciPZRVMJaspcAT9KD0fJ1s/csd2lQyP4ewn6
A0zg6eabc0PD3LwdlHqp//jTNft/BL4RVZ2c3uM+mgXnGeekcoQ=
=EysX
-----END PGP SIGNATURE-----
Merge tag 'for-5.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several fixes for defragmentation that got broken in 5.16 after
refactoring and added subpage support. The observed bugs are excessive
IO or uninterruptible ioctl.
All stable material"
* tag 'for-5.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: update writeback index when starting defrag
btrfs: add back missing dirty page rate limiting to defrag
btrfs: fix deadlock when reserving space during defrag
btrfs: defrag: properly update range->start for autodefrag
btrfs: defrag: fix wrong number of defragged sectors
btrfs: allow defrag to be interruptible
btrfs: fix too long loop when defragging a 1 byte file
When starting a defrag, we should update the writeback index of the
inode's mapping in case it currently has a value beyond the start of the
range we are defragging. This can help performance and often result in
getting less extents after writeback - for e.g., if the current value
of the writeback index sits somewhere in the middle of a range that
gets dirty by the defrag, then after writeback we can get two smaller
extents instead of a single, larger extent.
We used to have this before the refactoring in 5.16, but it was removed
without any reason to do so. Originally it was added in kernel 3.1, by
commit 2a0f7f5769 ("Btrfs: fix recursive auto-defrag"), in order to
fix a loop with autodefrag resulting in dirtying and writing pages over
and over, but some testing on current code did not show that happening,
at least with the test described in that commit.
So add back the behaviour, as at the very least it is a nice to have
optimization.
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A defrag operation can dirty a lot of pages, specially if operating on
the entire file or a large file range. Any task dirtying pages should
periodically call balance_dirty_pages_ratelimited(), as stated in that
function's comments, otherwise they can leave too many dirty pages in
the system. This is what we did before the refactoring in 5.16, and
it should have remained, just like in the buffered write path and
relocation. So restore that behaviour.
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When defragging we can end up collecting a range for defrag that has
already pages under delalloc (dirty), as long as the respective extent
map for their range is not mapped to a hole, a prealloc extent or
the extent map is from an old generation.
Most of the time that is harmless from a functional perspective at
least, however it can result in a deadlock:
1) At defrag_collect_targets() we find an extent map that meets all
requirements but there's delalloc for the range it covers, and we add
its range to list of ranges to defrag;
2) The defrag_collect_targets() function is called at defrag_one_range(),
after it locked a range that overlaps the range of the extent map;
3) At defrag_one_range(), while the range is still locked, we call
defrag_one_locked_target() for the range associated to the extent
map we collected at step 1);
4) Then finally at defrag_one_locked_target() we do a call to
btrfs_delalloc_reserve_space(), which will reserve data and metadata
space. If the space reservations can not be satisfied right away, the
flusher might be kicked in and start flushing delalloc and wait for
the respective ordered extents to complete. If this happens we will
deadlock, because both flushing delalloc and finishing an ordered
extent, requires locking the range in the inode's io tree, which was
already locked at defrag_collect_targets().
So fix this by skipping extent maps for which there's already delalloc.
Fixes: eb793cf857 ("btrfs: defrag: introduce helper to collect target file extents")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After seeking time on testing today upstream fsdax, I found it
actually doesn't work well as below:
[ 186.492983] ------------[ cut here ]------------
[ 186.493629] WARNING: CPU: 1 PID: 205 at fs/iomap/iter.c:33 iomap_iter+0x2f6/0x310
The problem is that m_dax_part_off should be applied to physical
addresses and very sorry about that I didn't catch this eariler.
Anyway, let's fix it up now. Also, I need to find a way to set up
a standalone testcase to look after this later.
Link: https://lore.kernel.org/r/20220113051845.244461-1-hsiangkao@linux.alibaba.com
Fixes: de20511477 ("fsdax: shift partition offset handling into the file systems")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
When we fail to expand inode from inline format to a normal format, we
restore inode to contain the original inline formatting but we forgot to
set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then
causing further problems such as warnings and lost data down the line.
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 7e49b6f248 ("udf: Convert UDF to new truncate calling sequence")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
udf_expand_file_adinicb() calls directly ->writepage to write data
expanded into a page. This however misses to setup inode for writeback
properly and so we can crash on inode->i_wb dereference when submitting
page for IO like:
BUG: kernel NULL pointer dereference, address: 0000000000000158
#PF: supervisor read access in kernel mode
...
<TASK>
__folio_start_writeback+0x2ac/0x350
__block_write_full_page+0x37d/0x490
udf_expand_file_adinicb+0x255/0x400 [udf]
udf_file_write_iter+0xbe/0x1b0 [udf]
new_sync_write+0x125/0x1c0
vfs_write+0x28e/0x400
Fix the problem by marking the page dirty and going through the standard
writeback path to write the page. Strictly speaking we would not even
have to write the page but we want to catch e.g. ENOSPC errors early.
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 52ebea749a ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.
This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.
To fix the regression in pseudo filesystems, convert d_delete() calls
to d_drop() (see commit 46c46f8df9 ("devpts_pty_kill(): don't bother
with d_delete()") and move the fsnotify hook after d_drop().
Add a missing fsnotify_unlink() hook in nfsdfs that was found during
the audit of fsnotify hooks in pseudo filesystems.
Note that the fsnotify hooks in simple_recursive_removal() follow
d_invalidate(), so they require no change.
Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.
Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.
This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.
To fix the regression, create a new hook fsnotify_delete() that takes
the unlinked inode as an argument and use a helper d_delete_notify() to
pin the inode, so we can pass it to fsnotify_delete() after d_delete().
Backporting hint: this regression is from v5.3. Although patch will
apply with only trivial conflicts to v5.4 and v5.10, it won't build,
because fsnotify_delete() implementation is different in each of those
versions (see fsnotify_link()).
A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo
filesystem that do not need to call d_delete().
Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
cifs client set 4 to DataLength of create_posix context, which mean
Mode variable of create_posix context is only available. So buffer
validation of ksmbd should check only the size of Mode except for
the size of Reserved variable.
Fixes: 8f77150c15 ("ksmbd: add buffer validation for SMB2_CREATE_CONTEXT")
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Steve French <smfrench@gmail.com>
Tested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In some cases io_rsrc_ref_quiesce will call io_rsrc_node_switch_start,
and then immediately flush the delayed work queue &ctx->rsrc_put_work.
However the percpu_ref_put does not immediately destroy the node, it
will be called asynchronously via RCU. That ends up with
io_rsrc_node_ref_zero only being called after rsrc_put_work has been
flushed, and so the process ends up sleeping for 1 second unnecessarily.
This patch executes the put code immediately if we are busy
quiescing.
Fixes: 4a38aed2a0 ("io_uring: batch reap of dead file registrations")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220121123856.3557884-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
=RKW4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
Merge yet more updates from Andrew Morton:
"This is the post-linux-next queue. Material which was based on or
dependent upon material which was in -next.
69 patches.
Subsystems affected by this patch series: mm (migration and zsmalloc),
sysctl, proc, and lib"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (69 commits)
mm: hide the FRONTSWAP Kconfig symbol
frontswap: remove support for multiple ops
mm: mark swap_lock and swap_active_head static
frontswap: simplify frontswap_register_ops
frontswap: remove frontswap_test
mm: simplify try_to_unuse
frontswap: remove the frontswap exports
frontswap: simplify frontswap_init
frontswap: remove frontswap_curr_pages
frontswap: remove frontswap_shrink
frontswap: remove frontswap_tmem_exclusive_gets
frontswap: remove frontswap_writethrough
mm: remove cleancache
lib/stackdepot: always do filter_irq_stacks() in stack_depot_save()
lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()
proc: remove PDE_DATA() completely
fs: proc: store PDE()->data into inode->i_private
zsmalloc: replace get_cpu_var with local_lock
zsmalloc: replace per zpage lock with pool->migrate_lock
locking/rwlocks: introduce write_lock_nested
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmHrkScACgkQiiy9cAdy
T1EY6Av8C2by0N3vBK/hhjq+ocKhoLWzlziyOlVX6Hzrqa2BH5I9UBUrGmI6rXRI
SP9QiyjO8WlVGajOYMAgQpjysKJjRnJTkGGvCb8iV4VfQPq8yYfuoLRaihO20WI4
VaRAwKAZ6RjNrOtbYQ6KPx2OXDE0OUZ6Y8QuDfgn6jXzsMfjTCmM4BGntUNweiUU
Yn7sBBIh0XWy1D3Fl3Dm47t6d1iW1/I9WIym/2vHWqxI8QkWv05z9Dk5Sijrl8v/
pa9eQ05Ym8bRTI7dU9be8uySVTO21sdPitNZ4L4vNqH4KY0L3pKqUL4E9QzVcv4O
wnO8Q7wtBIOEgREuRF041d14vJsRiMUKjxUVG5PubhCE7pSesHHhgh0DN4ymDmKi
cBYkpYIcPLopfOvhE4CEPjULE3DI91KNM8x8ePB+fm/ZbnPHrXyhOcUCz3gzKYGf
jsYHVVAofWEU1HRPQjTPtwxuof/3u1ZgkzIj+gqzbVOWIQO1JAsdBE2TQQKTadL9
Y7aKEfci
=/mdT
-----END PGP SIGNATURE-----
Merge tag '5.17-rc-part2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- multichannel fixes, addressing additional reconnect and DFS scenarios
- reenabling fscache support (indexing rewrite, metadata caching e.g.)
- send additional version information during NTLMSSP negotiate to
improve debugging
- fix for a mount race
- DFS fixes
- fix for a memory leak for stable
* tag '5.17-rc-part2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
smb3: send NTLMSSP version information
cifs: Support fscache indexing rewrite
cifs: cifs_ses_mark_for_reconnect should also update reconnect bits
cifs: update tcpStatus during negotiate and sess setup
cifs: make status checks in version independent callers
cifs: remove repeated state change in dfs tree connect
cifs: fix the cifs_reconnect path for DFS
cifs: remove unused variable ses_selected
cifs: protect all accesses to chan_* with chan_lock
cifs: fix the connection state transitions with multichannel
cifs: check reconnects for channels of active tcons too
smb3: add new defines from protocol specification
cifs: serialize all mount attempts
cifs: quirk for STATUS_OBJECT_NAME_INVALID returned for non-ASCII dfs refs
cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty
cifs: clean up an inconsistent indenting
cifs: free ntlmsspblob allocated in negotiate
- Minor cleanup of ioctl32 cruft
- Clean up open coded inodegc workqueue function calls
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHpnxUACgkQ+H93GTRK
tOtpig/9GzzlPSpbjGMU3BW+dfqvYMLV9YcB3bkZLXPJIb/D2UPXzOQd2SvM9ScJ
XwM857EjrmV7XifEezbBvla33N9ToNv8/IVdmctcc2as4c6VzDXmkiyze7wD+2D/
i9UJEOjEtaTvqioOMQJpfaATI7y/9N45UFfcjkbmc4lTP0xOvd1Owx9+ujrCYZ9e
UccsC7ovUzHxxuK1uFhRMI0up1f5urdFA0wFKvEGsfhIzBthU6BDtYUjuzhC/vLV
DxqPlQgkRqQfzB9loFQOiNNSC2kVF+SKPXZS7k/HiQIALDOKtFNnSanXG5oPB5+d
DnuSCvJ+RfejmEoDtMydQu0KXN5fq9g7FodH3tAMPPNn8N1l+qJuC//+6VHZjvIB
GhfHyWkpg/mzSAAqU0sqeGSvzG1byawYNUzueIIcEWLP1qg4O3QkXW7VJ+yuGb70
gsO7CQjnW20CmJqeftNsBaBor114UjUJLBQsXR8uowKHq/L1MhNh7Ir6lTfQdti+
S099EVjrrFV0ZVRou1ZyQ5ZWsdGNtz0p0hEEVEPG2ERHOKfbVWLwbL38s8JY7C56
hBVwQRwZScdIb/SYTWxPXbBJNR/BXUyxrCikk66isQT0kwFCHXE67YaXbFklxybs
uzKzVYEBPtYhvDeFTa53GEotGw/LDtZD07E58yowlJUqmJtMLM0=
=EIqW
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"One of the patches removes some dead code from xfs_ioctl32.h and the
other fixes broken workqueue flushing in the inode garbage collector.
- Minor cleanup of ioctl32 cruft
- Clean up open coded inodegc workqueue function calls"
* tag 'xfs-5.17-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: flush inodegc workqueue tasks before cancel
xfs: remove unused xfs_ioctl32.h declarations
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmHrJ4UACgkQ+7dXa6fL
C2taFg/+K0nKY3TDKc7JVxe9BEUa/uUvNba5j16qKWJKnjCrjqU0RWJSUnVOxcwH
eBwxjaoZTmtGqS3snvKMYNIGLf7FAEeRY3a1JO81sdoCA2X7P/4HtfDO+DdaCFpB
tYr9+Hfjy+2baMLEimTWp2SYsfVKD/J4gHeqdRF12QMQJ01m+X9fcV7TqpkyPo5C
L35Gi1RVDXzwn81qOV9yrmkOBhPK4hiREmCmD0BGEGT/+FYlYVBOv92n31CQMl1Q
ef8eMNiDYgXQ0iWoUYIJqMBs88M7DwAmeaA/qi85gzfGM0Z5pBE3SNy02F/81b8b
PHgmzDfj42AuAQ9ZOuwREwtvQrpGsCsw5+60klzZ6PiFtO+q/PtS/ODTdi4foDDd
ura6RM3wqlE9P4Vi7gj9NYhdUlp5p0YYPlJOGr6CepYS63O3DgwXyoVMPUE5s5fF
Boc3Ef7fKy25Wc1s/ZVSbvvgh9KYDJOKqxs4ELeNSW8GD7uA2d0XWkhjzZIsVdKZ
kd7VEYr2/j6gpPR63yw2MVOfymJN1WYuu6ittuL40hSYW8hbBg/m8hVE1E7b7NlP
+u10xvCwN9LBK+757I733wV3hB4rYa6pg9KBjHDXyCLQ1uCKVE0QaILTSjqC/mCM
OTPFmi50y1oa/9BxooEg5+ShXOH95EE4jTBCaL1lv7JYfYNUbYA=
=CgMV
-----END PGP SIGNATURE-----
Merge tag 'fscache-fixes-20220121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull more fscache updates from David Howells:
"A set of fixes and minor updates for the fscache rewrite:
- Fix mishandling of volume collisions (the wait condition is
inverted and so it was only waiting if the volume collision was
already resolved).
- Fix miscalculation of whether there's space available in
cachefiles.
- Make sure a default cache name is set on a cache if the user hasn't
set one by the time they bind the cache.
- Adjust the way the backing inode is presented in tracepoints, add a
tracepoint for mkdir and trace directory lookup.
- Add a tracepoint for failure to set the active file mark.
- Add an explanation of the checks made on the backing filesystem.
- Check that the backing filesystem supports tmpfile.
- Document how the page-release cancellation of the read-skip
optimisation works.
And I've included a change for netfslib:
- Make ops->init_rreq() optional"
* tag 'fscache-fixes-20220121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
netfs: Make ops->init_rreq() optional
fscache: Add a comment explaining how page-release optimisation works
cachefiles: Check that the backing filesystem supports tmpfiles
cachefiles: Explain checks in a comment
cachefiles: Trace active-mark failure
cachefiles: Make some tracepoint adjustments
cachefiles: set default tag name if it's unspecified
cachefiles: Calculate the blockshift in terms of bytes, not pages
fscache: Fix the volume collision wait condition
Patch series "remove Xen tmem leftovers".
Since the removal of the Xen tmem driver in 2019, the cleancache hooks
are entirely unused, as are large parts of frontswap. This series
against linux-next (with the folio changes included) removes
cleancaches, and cuts down frontswap to the bits actually used by zswap.
This patch (of 13):
The cleancache subsystem is unused since the removal of Xen tmem driver
in commit 814bbf49dc ("xen: remove tmem driver").
[akpm@linux-foundation.org: remove now-unreachable code]
Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de
Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove PDE_DATA() completely and replace it with pde_data().
[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]
Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PDE_DATA(inode) is introduced to get user private data and hide the
layout of struct proc_dir_entry. The inode->i_private is used to do the
same thing as well. Save a copy of user private data to inode->
i_private when proc inode is allocated. This means the user also can
get their private data by inode->i_private.
Introduce pde_data() to wrap inode->i_private so that we can remove
PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of
pde_data(). It will be easier if we decide to remove PDE_DATE() in the
future.
Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the fs/coredump.c respective sysctls to its own file.
Link: https://lkml.kernel.org/r/20211129211943.640266-6-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename sysctl_init() to sysctl_init_bases() so to reflect exactly what
this is doing.
Link: https://lkml.kernel.org/r/20211129211943.640266-4-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the namespace sysctls to its own file as part of the
kernel/sysctl.c spring cleaning
Since we have now removed all sysctls for "fs", we now have to declare
it on the filesystem code, we do that using the new helper, which
reduces boiler plate code.
We rename init_fs_shared_sysctls() to init_fs_sysctls() to reflect that
now fs/sysctls.c is taking on the burden of being the first to register
the base directory as well.
Lastly, since init code will load in the order in which we link it we
have to move the sysctl code to be linked in early, so that its early
init routine runs prior to other fs code. This way, other filesystem
code can register their own sysctls using the helpers after this:
* register_sysctl_init()
* register_sysctl()
Link: https://lkml.kernel.org/r/20211129211943.640266-3-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "sysctl: add and use base directory declarer and
registration helper".
In this patch series we start addressing base directories, and so we
start with the "fs" sysctls. The end goal is we end up completely
moving all "fs" sysctl knobs out from kernel/sysctl.
This patch (of 6):
Add a set of helpers which can be used to declare and register base
directory sysctls on their own. We do this so we can later move each of
the base sysctl directories like "fs", "kernel", etc, to their own
respective files instead of shoving the declarations and registrations
all on kernel/sysctl.c. The lazy approach has caught up and with this,
we just end up extending the list of base directories / sysctls on one
file and this makes maintenance difficult due to merge conflicts from
many developers.
The declarations are used first by kernel/sysctl.c for registration its
own base which over time we'll try to clean up. It will be used in the
next patch to demonstrate how to cleanly deal with base sysctl
directories.
[mcgrof@kernel.org: null-terminate the ctl_table arrays]
Link: https://lkml.kernel.org/r/YafJY3rXDYnjK/gs@bombadil.infradead.org
Link: https://lkml.kernel.org/r/20211129211943.640266-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211129211943.640266-2-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move the pipe sysctls to its own file.
Link: https://lkml.kernel.org/r/20211129205548.605569-10-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move the fs/exec.c respective sysctls to its own file.
Since checkpatch complains about style issues with the old code, this
move also fixes a few of those minor style issues:
* Use pr_warn() instead of prink(WARNING
* New empty lines are wanted at the beginning of routines
Link: https://lkml.kernel.org/r/20211129205548.605569-9-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move namei's own sysctl knobs to its own file.
Other than the move we also avoid initializing two static variables to 0
as this is not needed:
* sysctl_protected_symlinks
* sysctl_protected_hardlinks
Link: https://lkml.kernel.org/r/20211129205548.605569-8-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
The locking fs sysctls are only used on fs/locks.c, so move them there.
Link: https://lkml.kernel.org/r/20211129205548.605569-7-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move sysctls which are shared between filesystems into a common file
outside of kernel/sysctl.c.
Link: https://lkml.kernel.org/r/20211129205548.605569-6-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The maxolduid value is only shared for sysctl purposes for use on a max
range. Just stuff this into our shared const array.
[akpm@linux-foundation.org: fix sysctl_vals[], per Mickaël]
Link: https://lkml.kernel.org/r/20211129205548.605569-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move the dcache sysctl clutter out of kernel/sysctl.c. This is a
small one-off entry, perhaps later we can simplify this representation,
but for now we use the helpers we have. We won't know how we can
simplify this further untl we're fully done with the cleanup.
[arnd@arndb.de: avoid unused-function warning]
Link: https://lkml.kernel.org/r/20211203190123.874239-2-arnd@kernel.org
Link: https://lkml.kernel.org/r/20211129205548.605569-4-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
We can create the sysctl dynamically on early init for fs stat to help
with this clutter. This dusts off the fs stat syctls knobs and puts
them into where they are declared.
Link: https://lkml.kernel.org/r/20211129205548.605569-3-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "sysctl: 4th set of kernel/sysctl cleanups".
This is slimming down the fs uses of kernel/sysctl.c to the point that
the next step is to just get rid of the fs base directory for it and
move that elsehwere, so that next patch series starts dealing with that
to demo how we can end up cleaning up a full base directory from
kernel/sysctl.c, one at a time.
This patch (of 9):
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move the inode sysctls to its own file. Since we are no longer using
this outside of fs/ remove the extern declaration of its respective proc
helper.
We use early_initcall() as it is the earliest we can use.
[arnd@arndb.de: avoid unused-variable warning]
Link: https://lkml.kernel.org/r/20211203190123.874239-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20211129205548.605569-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211129205548.605569-2-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a way to share unsigned long values. This will allow others to
not have to re-invent these values.
Link: https://lkml.kernel.org/r/20211124231435.1445213-9-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
This moves the binfmt_misc sysctl to its own file to help remove clutter
from kernel/sysctl.c.
Link: https://lkml.kernel.org/r/20211124231435.1445213-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The way to create a subdirectory on top of sysctl_mount_point is a bit
obscure, and *why* we do that even so more. Provide a helper which
makes it clear why we do this.
[akpm@linux-foundation.org: export register_sysctl_mount_point() to
modules]
Link: https://lkml.kernel.org/r/20211124231435.1445213-4-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move the epoll_table sysctl to fs/eventpoll.c and use
register_sysctl().
Link: https://lkml.kernel.org/r/20211123202422.819032-9-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.
Move inotify_user sysctl to inotify_user.c while at it to remove clutter
from kernel/sysctl.c.
[mcgrof@kernel.org: remember to register fanotify_table]
Link: https://lkml.kernel.org/r/YZ5A6iWLb0h3N3RC@bombadil.infradead.org
[mcgrof@kernel.org: update commit log to reflect new path we decided to take]
Link: https://lkml.kernel.org/r/20211123202422.819032-7-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move dnotify sysctls to dnotify.c and use the new
register_sysctl_init() to register the sysctl interface.
[mcgrof@kernel.org: adjust the commit log to justify the move]
Link: https://lkml.kernel.org/r/20211123202347.818157-10-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
Move aio sysctl to aio.c and use the new register_sysctl_init() to
register the sysctl interface for aio.
[mcgrof@kernel.org: adjust commit log to justify the move]
Link: https://lkml.kernel.org/r/20211123202347.818157-9-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl has helpers which let us specify boundary values for a min or max
int value. Since these are used for a boundary check only they don't
change, so move these variables to sysctl_vals to avoid adding duplicate
variables. This will help with our cleanup of kernel/sysctl.c.
[akpm@linux-foundation.org: update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%"]
[mcgrof@kernel.org: major rebase]
Link: https://lkml.kernel.org/r/20211123202347.818157-3-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "sysctl: first set of kernel/sysctl cleanups", v2.
Finally had time to respin the series of the work we had started last
year on cleaning up the kernel/sysct.c kitchen sink. People keeps
stuffing their sysctls in that file and this creates a maintenance
burden. So this effort is aimed at placing sysctls where they actually
belong.
I'm going to split patches up into series as there is quite a bit of
work.
This first set adds register_sysctl_init() for uses of registerting a
sysctl on the init path, adds const where missing to a few places,
generalizes common values so to be more easy to share, and starts the
move of a few kernel/sysctl.c out where they belong.
The majority of rework on v2 in this first patch set is 0-day fixes.
Eric Biederman's feedback is later addressed in subsequent patch sets.
I'll only post the first two patch sets for now. We can address the
rest once the first two patch sets get completely reviewed / Acked.
This patch (of 9):
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
Today though folks heavily rely on tables on kernel/sysctl.c so they can
easily just extend this table with their needed sysctls. In order to
help users move their sysctls out we need to provide a helper which can
be used during code initialization.
We special-case the initialization use of register_sysctl() since it
*is* safe to fail, given all that sysctls do is provide a dynamic
interface to query or modify at runtime an existing variable. So the
use case of register_sysctl() on init should *not* stop if the sysctls
don't end up getting registered. It would be counter productive to stop
boot if a simple sysctl registration failed.
Provide a helper for init then, and document the recommended init levels
to use for callers of this routine. We will later use this in
subsequent patches to start slimming down kernel/sysctl.c tables and
moving sysctl registration to the code which actually needs these
sysctls.
[mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ]
Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make some adjustments to tracepoints to make the tracing a bit more
followable:
(1) Standardise on displaying the backing inode number as "B=<hex>" with
no leading zeros.
(2) Make the cachefiles_lookup tracepoint log the directory inode number
as well as the looked-up inode number.
(3) Add a cachefiles_lookup tracepoint into cachefiles_get_directory() to
log directory lookup.
(4) Add a new cachefiles_mkdir tracepoint and use that to log a successful
mkdir from cachefiles_get_directory().
(5) Make the cachefiles_unlink and cachefiles_rename tracepoints log the
inode number of the affected file/dir rather than dentry struct
pointers.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251403694.3435901.9797725381831316715.stgit@warthog.procyon.org.uk/ # v1
fscache_acquire_cache() requires a non-empty name, while 'tag <name>'
command is optional for cachefilesd.
Thus set default tag name if it's unspecified to avoid the regression of
cachefilesd. The logic is the same with that before rewritten.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251399914.3435901.4761991152407411408.stgit@warthog.procyon.org.uk/ # v1
Cachefiles keeps track of how much space is available on the backing
filesystem and refuses new writes permission to start if there isn't enough
(we especially don't want ENOSPC happening). It also tracks the amount of
data pending in DIO writes (cache->b_writing) and reduces the amount of
free space available by this amount before deciding if it can set up a new
write.
However, the old fscache I/O API was very much page-granularity dependent
and, as such, cachefiles's cache->bshift was meant to be a multiplier to
get from PAGE_SIZE to block size (ie. a blocksize of 512 would give a shift
of 3 for a 4KiB page) - and this was incorrectly being used to turn the
number of bytes in a DIO write into a number of blocks, leading to a
massive over estimation of the amount of data in flight.
Fix this by changing cache->bshift to be a multiplier from bytes to
blocksize and deal with quantities of blocks, not quantities of pages.
Fix also the rounding in the calculation in cachefiles_write() which needs
a "- 1" inserting.
Fixes: 047487c947 ("cachefiles: Implement the I/O routines")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251398954.3435901.7138806620218474123.stgit@warthog.procyon.org.uk/ # v1
The condition that the waits in fscache_wait_on_volume_collision() are
waiting until are inverted. This suddenly started happening on the
upstream kernel with something like the following appearing in dmesg when
running xfstests:
CacheFiles: cachefiles: Inode already in use: Iafs,example.com,100055
Fix them by inverting the conditions.
Fixes: 62ab633523 ("fscache: Implement volume registration")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251398010.3435901.943876048104930939.stgit@warthog.procyon.org.uk/ # v1
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHqtf8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoIbD/9pfiURokRzb/UncQoS+zX0mMQxToYWluMN
xyyAHLRqAyylzdZykJFnxfGKMN4z/cfBKJugISEd+O+88wjNLUALUBzD56C6vbVq
h+gPL+zBRNu/Bx5bfEYEI+zhUr7AE1gtlg+x/hzZy1x/Q2b2Fyuely+FaHY/ChUy
+HUnLVP74Gj2H7XY6a1yEuEQRdRRBn0NjWVU0v8g8zNLjO+XUKJFE5se1EAjIcBy
IzcMvbDbnfGnHQRp7saq3TaY8+OaADSch0QYS1pDyCRUcqMTBnONTgyOkwxNQA+z
58KHM7n9QDp3QzRPE4TbvjHuX1J9fbw9wekIQIm6eAQfGRa6IvoMbUaI1btAsFLU
zsIRUWNPv2w7tzG4OMY438bISnUhnGKnNG4ZnWCx7oa+GF8SgOJ7D8EOGzqimh8n
YBFfoLTkKfcVcrb+G/cfa5qWK35qv1nGTVk1A9V37eOi4YRBOFnbfmzMAb0CC5bw
8zpFyRS45VJLjnS+FUdvpqZeSuP0sZlKx6KH0hUa2Nd/7xSmsVkNuexCrttTWuLX
sfHcXj3rMG7YBXRtxHnMH7rDsdph4h5CPPqqMXHD8ho4YYxvFNktxu8IJgJRV6+J
cDH937ADsYNM8MBAgPZ3i9Kg8mVttnpHLMRhCeVpPKzZtr2tLoUh0mXLHb87docu
L12jvdCS0A==
=bFY0
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.17-2022-01-21' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Fix the io_uring POLLFREE handling, similarly to how it was done for
aio (Pavel)
- Remove (now) unused function (Jiapeng)
- Small series fixing an issue with work cancelations. A window exists
where work isn't locatable in the pending list, and isn't active in a
worker yet either. (me)
* tag 'io_uring-5.17-2022-01-21' of git://git.kernel.dk/linux-block:
io-wq: delete dead lock shuffling code
io_uring: perform poll removal even if async work removal is successful
io-wq: add intermediate work step between pending list and active work
io-wq: perform both unstarted and started work cancelations in one go
io-wq: invoke work cancelation with wqe->lock held
io-wq: make io_worker lock a raw spinlock
io-wq: remove useless 'work' argument to __io_worker_busy()
io_uring: fix UAF due to missing POLLFREE handling
io_uring: Remove unused function req_ref_put
Remove the header definitions for these ioctls. The just-removed
implementation has allowed callers to read stale disk contents for more
than **21 years** and nobody noticed or complained, which implies a lack
of users aside from exploit programs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlp28ACgkQ+H93GTRK
tOt77Q/+KP7XtwaloGbkBfMzk9JhVh4wISagQ8BxZ5Jb8cHg9ekWJz585he9siNY
mfcuaLh5onWiBRsfqo4RXg03X4E3c/U+Q3oqe405O3TZkK2LOZMjkPE1ijDdTej6
V9WhCTCRGuJA01sKgFuuwFJxJRjIROE29FPgRQP9EvLtEITIxLbHMFZRYKpYh143
EhNwzQQwwPg/L4/m1qmWfC3L+Z7qTm6EQhOnpyzxlKxyX7qXIjBHi6WvErcDeJ3F
pS7v1fwcZxctrh7PrKwhXSrkbMmd5J3p5qI/MrCtGEKWNXk+rv6AC0n4gcXvpJ/v
wL0OTyik9pwA8V0XPQcuWvQXmrm8vR2XvMok6gXkHB1jCfzYAwJsrHQbop4pyCFe
U3HU46x0g7UFXY7jUjztD8YNIT1+B+ducetCCGAhI97HiQrSsqSvgvPFZNle7Cef
Oheab4iIs1zUblNrVzyGCQmK42ankypxPbfrrtvLi7SFrLRAGXeWeqDf1RXJnt5b
xrOqCe1hgXR4RJrkTPWiiQindLlhDuywfa+Q1Y5fYZatsTtgceE/HIOg80x4pPgR
4Ip7hW9lsjoDckpu0bC0bvYiqhrYM1eztpUToYdy7FeOkQKkPHO9xm/m1tHbqzmi
bF3hkBo6bLByXiY/ZXzrQGrErJ6OTdNVpsR1vYjoaycrQt6wznI=
=hq3i
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more xfs irix ioctl housecleaning from Darrick Wong:
"Withdraw the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl definitions.
This is the third and final of a series of small pull requests that
perform some long overdue housecleaning of XFS ioctls. This time,
we're withdrawing all variants of the ALLOCSP and FREESP ioctls from
XFS' userspace API. This might be a little premature since we've only
just removed the functionality, but as I pointed out in the last pull
request, nobody (including fstests) noticed that it was broken for 20
years.
In response to the patch, we received a single comment from someone
who stated that they 'augment' the ioctl for their own purposes, but
otherwise acquiesced to the withdrawal. I still want to try to clobber
these old ioctl definitions in 5.17.
So remove the header definitions for these ioctls. The just-removed
implementation has allowed callers to read stale disk contents for
more than **21 years** and nobody noticed or complained, which implies
a lack of users aside from exploit programs"
* tag 'xfs-5.17-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove the XFS_IOC_{ALLOC,FREE}SP* definitions
Linux has always used fallocate as the space management system call,
whereas these Irix legacy ioctls only ever worked on XFS, and have been
the cause of recent stale data disclosure vulnerabilities. As
equivalent functionality is available elsewhere, remove the code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlplQACgkQ+H93GTRK
tOtAcRAAg11WggF9ycNLwnczUs4NmTV1cwhz8+eTuwr2yul3gl/mrO3MyjMmkrnm
1rXjwg28GKtps04Ugh+8TTL+QkDn6Uteco27OZbmUf00a0MoC7JG4VkEQVtXjcaK
zvevfutTH7Vnl49m+YBrLtonrTqmND46quKoPKsv0a5nlbXHSNMouUkayWXDSyOl
8tRcNWLy76L+XCxEU21cD1NBw3Vr0mCiId4xTcbNFw3TUVAGoZgghzC2d/gHFiwN
1PM7G51TKUNm3dybH0mt/jLF/fLsVxFnznnlW4bb/XzMuU4geqd0r1AQuIdbwZa9
uB+PkFWwN5frTEFELYTamAa4LlAe2oQ0hmSGLfC/zEtPcOv4h6qHNgRsN9wfG+H9
oYUeRY+2zHcD7jYJsaZZt5WCIDVncOlJMclRdpbpujkJzJX9ZjAi++PTgDxdMjFa
egwDAvOdgijgtz8erN0gglJrqJzQQp6ByNtht5rZjHz7LkrWYtt57TOoS986pW7X
/MwBLjT/4Xig/XaFVrmMohF3VPrG/eH/DpTnHotzQzZRYQWbKZwCgin6+kKC8cV8
Y+eE1jKZunL4Ms/GmrxencNzsDSJtkKyR5LkHCqgH8YUPJM3vYDcleZY+UgEKq0a
z0fw3MZvxM2jsUIk7+J8uQ8esSqUb5hNXkUJsUraUtG3Z6ZeaOg=
=2QZ3
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs irix ioctl housecleaning from Darrick Wong:
"Remove the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl families.
This is the second of a series of small pull requests that perform
some long overdue housecleaning of XFS ioctls. This time, we're
vacating the implementation of all variants of the ALLOCSP and FREESP
ioctls, which are holdovers from EFS in Irix, circa 1993. Roughly
equivalent functionality have been available for both ioctls since
2.6.25 (April 2008):
- XFS_IOC_FREESP ftruncates a file.
- XFS_IOC_ALLOCSP is the equivalent of fallocate.
As noted in the fix patch for CVE 2021-4155, the ALLOCSP ioctl has
been serving up stale disk blocks since 2000, and in 21 years
**nobody** noticed. On those grounds I think it's safe to vacate the
implementation.
Note that we lose the ability to preallocate and truncate relative to
the current file position, but as nobody's ever implemented that for
the VFS, I conclude that it's not in high demand.
Linux has always used fallocate as the space management system call,
whereas these Irix legacy ioctls only ever worked on XFS, and have
been the cause of recent stale data disclosure vulnerabilities. As
equivalent functionality is available elsewhere, remove the code"
* tag 'xfs-5.17-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlpbEACgkQ+H93GTRK
tOs8MBAAjOyUswNqVMYgMB6eww7oT8j54jGdTmo+d+N2hIoX4OAV6GdCI6te/7DE
yz6TtzbQ9fs9lDYsdRsMjfwr18R8N1m0D23cpFSJ6qq6e2NJi0mqVwmC8mPRNAJH
68yL0p6O4Tk5jj7kae+zsv50nsSANXclfoHjsZQ0DGOuGagmPkNVqT82KIC4n3dn
aw66xhuaiLLJE+4boLe4NexRVBbyOuHQ2uB7xUnKwc9tHvjAf8EFCIhUV1wp0qZf
CfA2wg8+Jzwrqz/gVRKUZOjz7LeIY6E2qCBrA+DATv2dcv7QhmvGHaQ9OkrvIE72
CbvI92IhvOcKFzpfMrRYGhOh7KE6SkxLGqsAXgnjPoFQkCDudgCaExBO96RMMd6u
cX43mXWZbUl++Sh2GhPD/xkiskLRZFjiHJbKBX/5nwjU2BzTHQY/7Jy07fIkR4c4
IrkKgiXfSJT4j/KeAMkBpZ7THMjRMSUgwliSWHL0QWUz5Bou8WRnHUl8CMsu9vDJ
fYeekXDQYuAX+UrcsDlbA0UukigOLSIZiQTAEgSbIkd/+Zb6U6e0IF7pTTZJ9uFs
bndLFYqZtEAySDrMCBM+W8VYmR48EDxfN8xsdS1kbZIqEdNhmkEMj9tMf0rs+FRi
lo1vMi08O7VcuyiyNrKs0e1d1Gkd2jwmwIskSQweslP5BbfPzRE=
=d+fT
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs ioctl housecleaning from Darrick Wong:
"This is the first of a series of small pull requests that perform some
long overdue housecleaning of XFS ioctls. This first pull request
removes the FSSETDM ioctl, which was used to set DMAPI event
attributes on XFS files. The DMAPI support has never been merged
upstream and the implementation of FSSETDM itself was removed two
years ago, so let's withdraw it completely.
- Withdraw the ioctl definition for the FSSETDM ioctl"
* tag 'xfs-5.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove the XFS_IOC_FSSETDM definitions
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.
Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.
Fixes: 2b3d047870 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
The code that uses the pointer info has been removed in 7326e382c2
("fanotify: report old and/or new parent+name in FAN_RENAME event").
and fanotify_event_info() doesn't change 'event', so the declaration and
assignment of info can be removed.
Eliminate the following clang warning:
fs/notify/fanotify/fanotify_user.c:161:24: warning: variable ‘info’ set
but not used
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Jan Kara <jack@suse.cz>
by Venky Shankar. It solves some long-standing issues with using
different auth entities and/or mounting different CephFS filesystems
from the same cluster, remounting and also misleading /proc/mounts
contents. The existing syntax of course remains to be maintained.
On top of that, there is a couple of fixes for edge cases in quota
and a new mount option for turning on unbuffered I/O mode globally
instead of on a per-file basis with ioctl(CEPH_IOC_SYNCIO).
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmHpP5ATHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi0TgB/480i2lPHgA3ujJNqo5Q6z+W0vtTA2+
Wx+4rAUgIESJVunbFxvecPbzyUXTe7wWFI11TCVHPpf6GyIIDTD+uHd3kKWtLsfL
Zkk1/2PN9Q5Dh29R+N8rP9NaP8tIaTQjyiO3iqmRZlo+k0Z/lYtWUb+fUP05XlVY
ML/ktW543tkKeYwl3SWdW5MqAAOVGDbTt+L51CraDhVoiUac5ptkP+cmDmIqsnGa
ZHVqpwugxgndEIyuBHDLBps+5/LrEaL10xDhGcMtP9hwGYhyNr6Yj+azfGtHWwOi
jdVsdHDiecUBVtGyZ351Y4pCMOmP0uJif6MOUZFXYYSSeUBUhH8UjgEi
=jcte
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The highlight is the new mount "device" string syntax implemented by
Venky Shankar. It solves some long-standing issues with using
different auth entities and/or mounting different CephFS filesystems
from the same cluster, remounting and also misleading /proc/mounts
contents. The existing syntax of course remains to be maintained.
On top of that, there is a couple of fixes for edge cases in quota and
a new mount option for turning on unbuffered I/O mode globally instead
of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)"
* tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client:
ceph: move CEPH_SUPER_MAGIC definition to magic.h
ceph: remove redundant Lsx caps check
ceph: add new "nopagecache" option
ceph: don't check for quotas on MDS stray dirs
ceph: drop send metrics debug message
rbd: make const pointer spaces a static const array
ceph: Fix incorrect statfs report for small quota
ceph: mount syntax module parameter
doc: document new CephFS mount device syntax
ceph: record updated mon_addr on remount
ceph: new device mount syntax
libceph: rename parse_fsid() to ceph_parse_fsid() and export
libceph: generalize addr/ip parsing based on delimiter
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmHo9UgACgkQiiy9cAdy
T1FG4Av9Gx2v8bmcjZgnbOLcDmhPZPIR2DaBOZ+79RfSy0FgeFo76Mp9YuiAmqH3
8DqinvWax0xzGiOSMQxODWK1c71TT7oRm5XqloxtLp6Upy61q7IB6f6gXKy566v1
/KB2XkUZQgfXr9inCOflzQDBVhl9UqPWmWV++VDpZkv4HAGOppcu7hloXyraG4cc
C4DzljiWGwWs875qcC8M533z3gEwpUB8o7M9hseLbEH8earUHzlLIp++HCuKfHJP
W7U13nPgZXQ4I1SxDdzt3ZXcwZKRKvbG7MdK4KnftkeKRfobz92IxWDxj7VrM/iN
5mxLExYjaZGSuYkquKiQy2RtaXudv3U51stxo2jWLvoRc9c8ogT5luwIQ31apRm+
PRiJew2XdeWnV9IYMJOpRTcLE7eVscU3oP2kW8kL8uoe1CwbabOnz5jpQ6ioA08t
8GgVvDWY7o2/bquDmiIsnaWnVnOWw4ms4vQTjecJgltpAHDQf9RwrSVZYAu+J+JY
sWW5IGCT
=WqhR
-----END PGP SIGNATURE-----
Merge tag '5.17-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- authentication fix
- RDMA (smbdirect) fixes (including fix for a memory corruption, and
some performance improvements)
- multiple improvements for multichannel
- misc fixes, including crediting (flow control) improvements
- cleanup fixes, including some kernel doc fixes
* tag '5.17-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: (23 commits)
ksmbd: fix guest connection failure with nautilus
ksmbd: uninitialized variable in create_socket()
ksmbd: smbd: fix missing client's memory region invalidation
ksmbd: add smb-direct shutdown
ksmbd: smbd: change the default maximum read/write, receive size
ksmbd: smbd: create MR pool
ksmbd: add reserved room in ipc request/response
ksmbd: smbd: call rdma_accept() under CM handler
ksmbd: limits exceeding the maximum allowable outstanding requests
ksmbd: move credit charge deduction under processing request
ksmbd: add support for smb2 max credit parameter
ksmbd: set 445 port to smbdirect port by default
ksmbd: register ksmbd ib client with ib_register_client()
ksmbd: Fix smb2_get_name() kernel-doc comment
ksmbd: Delete an invalid argument description in smb2_populate_readdir_entry()
ksmbd: Fix smb2_set_info_file() kernel-doc comment
ksmbd: Fix buffer_check_err() kernel-doc comment
ksmbd: fix multi session connection failure
ksmbd: set both ipv4 and ipv6 in FSCTL_QUERY_NETWORK_INTERFACE_INFO
ksmbd: set RSS capable in FSCTL_QUERY_NETWORK_INTERFACE_INFO
...
Merge more updates from Andrew Morton:
"55 patches.
Subsystems affected by this patch series: percpu, procfs, sysctl,
misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
lib: remove redundant assignment to variable ret
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
btrfs: use generic Kconfig option for 256kB page size limit
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
configs: introduce debug.config for CI-like setup
delayacct: track delays from memory compact
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: cleanup flags in struct task_delay_info and functions use it
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: support swapin delay accounting for swapping without blkio
panic: remove oops_id
panic: use error_report_end tracepoint on warnings
fs/adfs: remove unneeded variable make code cleaner
FAT: use io_schedule_timeout() instead of congestion_wait()
hfsplus: use struct_group_attr() for memcpy() region
nilfs2: remove redundant pointer sbufs
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
const_structs.checkpatch: add frequently used ops structs
...
Use the newly introduced CONFIG_PAGE_SIZE_LESS_THAN_256KB to describe
the dependency introduced by commit b05fbcc36b ("btrfs: disable build
on platforms having page size 256K").
Link: https://lkml.kernel.org/r/20211129230141.228085-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return value directly instead of taking this in a variable.
Link: https://lkml.kernel.org/r/20211210023211.424609-1-chi.minghao@zte.com.cn
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cm>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
congestion_wait() in this context is just a sleep - block devices do not
support congestion signalling any more.
The goal for this wait, which was introduced in commit ae78bf9c4f
("[PATCH] add -o flush for fat") is to wait for any recently written
data to get to storage. We currently have no direct mechanism to do
this, so a simple wait that behaves identically to the current
congestion_wait() is the best we can do.
This is a step towards removing congestion_wait()
Link: https://lkml.kernel.org/r/163936544519.22433.13400436295732112065@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.
Add struct_group() to mark the "info" region (containing struct DInfo
and struct DXInfo structs) in struct hfsplus_cat_folder and struct
hfsplus_cat_file that are written into directly, so the compiler can
correctly reason about the expected size of the writes.
"pahole" shows no size nor member offset changes to struct
hfsplus_cat_folder nor struct hfsplus_cat_file. "objdump -d" shows no
object code changes.
Link: https://lkml.kernel.org/r/20211119192851.1046717-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pointer sbufs is being assigned a value but it's not being used later
on. The pointer is redundant and can be removed. Cleans up scan-build
static analysis warning:
fs/nilfs2/page.c:203:8: warning: Although the value stored to 'sbufs'
is used in the enclosing expression, the value is never actually read
from 'sbufs' [deadcode.DeadStores]
sbh = sbufs = page_buffers(src);
Link: https://lkml.kernel.org/r/20211211180955.550380-1-colin.i.king@gmail.com
Link: https://lkml.kernel.org/r/1640712476-15136-1-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend commit ce81bb256a ("fs/binfmt_elf: use PT_LOAD p_align values
for suitable start address") which fixed PIE binaries built with
-Wl,-z,max-page-size=0x200000, to cover static PIE binaries. This
fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=215275
Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading.
Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19
all have the same name "cfs_migration/1", which will confuse the user.
This issue is not critical, because we can get the corresponding CPU
from the task's Cpus_allowed. But for kthreads corresponding to other
hardware devices, it is not easy to get the detailed device info from
task comm, for example,
jbd2/nvme0n1p2-
xfs-reclaim/sdf
Currently there are so many truncated kthreads:
rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...
We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-'
for example, it is a nice name, and it is not a good idea to shorten it.
One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows. Another more conservative approach is introducing a
new pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path. Finally
we make a dicision to use the second approach. See also the discussions
in this thread:
https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/
After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:
rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8
Link: https://lkml.kernel.org/r/20211120112850.46047-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the dest buffer size is smaller than sizeof(tsk->comm), the buffer
will be without null ternimator, that may cause problem. Using
strscpy_pad() instead of strncpy() in __get_task_comm() can make the
string always nul ternimated and zero padded.
Link: https://lkml.kernel.org/r/20211120112738.45980-3-laoar.shao@gmail.com
Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "task comm cleanups", v2.
This patchset is part of the patchset "extend task comm from 16 to
24"[1]. Now we have different opinion that dynamically allocates memory
to store kthread's long name into a separate pointer, so I decide to
take the useful cleanups apart from the original patchset and send it
separately[2].
These useful cleanups can make the usage around task comm less
error-prone. Furthermore, it will be useful if we want to extend task
comm in the future.
[1]. https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/
[2]. https://lore.kernel.org/lkml/CALOAHbAx55AUo3bm8ZepZSZnw7A08cvKPdPyNTf=E_tPqmw5hw@mail.gmail.com/
This patch (of 7):
strlcpy() can trigger out-of-bound reads on the source string[1], we'd
better use strscpy() instead. To make it be robust against full tsk->comm
copies that got noticed in other places, we should make sure it's zero
padded.
[1] https://github.com/KSPP/linux/issues/89
Link: https://lkml.kernel.org/r/20211120112738.45980-1-laoar.shao@gmail.com
Link: https://lkml.kernel.org/r/20211120112738.45980-2-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Subsequent if judgments will assign new values to ret, so the statement
here should be deleted
The clang_analyzer complains as follows:
fs/proc/proc_sysctl.c:
Value stored to 'ret' is never read
Link: https://lkml.kernel.org/r/20211230063622.586360-1-luo.penghao@zte.com.cn
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl_print_dir() always terminates the printed path name with a slash,
so printing a slash before the file part causes a duplicate like in
sysctl duplicate entry: /kernel//perf_user_access
Fix this by dropping the extra slash.
Link: https://lkml.kernel.org/r/e3054d605dc56f83971e4b6d2f5fa63a978720ad.1641551872.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert return type of proc_fd_access_allowed() and the 'allowed' in it
to be boolean since the return type of ptrace_may_access() is boolean.
Link: https://lkml.kernel.org/r/20211219024404.29779-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit cc5f2704c9 ("proc/vmcore: convert oldmem_pfn_is_ram callback
to more generic vmcore callbacks"), we added detection of surprise
vmcore_cb unregistration after the vmcore was already opened. Once
detected, we warn the user and simulate reading zeroes from that point
on when accessing the vmcore.
The basic reason was that unexpected unregistration, for example, by
manually unbinding a driver from a device after opening the vmcore, is
not supported and could result in reading oldmem the vmcore_cb would
have actually prohibited while registered. However, something like that
can similarly be trigger by a user that's really looking for trouble
simply by unbinding the relevant driver before opening the vmcore -- or
by disallowing loading the driver in the first place. So it's actually
of limited help.
Currently, unregistration can only be triggered via virtio-mem when
manually unbinding the driver from the device inside the VM; there is no
way to trigger it from the hypervisor, as hypervisors don't allow for
unplugging virtio-mem devices -- ripping out system RAM from a VM
without coordination with the guest is usually not a good idea.
The important part is that unbinding the driver and unregistering the
vmcore_cb while concurrently reading the vmcore won't crash the system,
and that is handled by the rwsem.
To make the mechanism more future proof, let's remove the "read zero"
part, but leave the warning in place. For example, we could have a
future driver (like virtio-balloon) that will contact the hypervisor to
figure out if we already populated a page for a given PFN.
Hotunplugging such a device and consequently unregistering the vmcore_cb
could be triggered from the hypervisor without harming the system even
while kdump is running. In that case, we don't want to silently end up
with a vmcore that contains wrong data, because the user inside the VM
might be unaware of the hypervisor action and might easily miss the
warning in the log.
Link: https://lkml.kernel.org/r/20211111192243.22002-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For improved debugging it can be helpful to send version information
as other clients do during NTLMSSP negotiation. See protocol document
MS-NLMP section 2.2.1.1
Set the major and minor versions based on the kernel version, and the
BuildNumber based on the internal cifs.ko module version number,
and following the recommendation in the protocol documentation
(MS-NLMP section 2.2.10) we set the NTLMRevisionCurrent field to 15.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The xfs_inodegc_stop() helper performs a high level flush of pending
work on the percpu queues and then runs a cancel_work_sync() on each
of the percpu work tasks to ensure all work has completed before
returning. While cancel_work_sync() waits for wq tasks to complete,
it does not guarantee work tasks have started. This means that the
_stop() helper can queue and instantly cancel a wq task without
having completed the associated work. This can be observed by
tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>"
test:
xfs_destroy_inode: ... ino 0x83 ...
xfs_inode_set_need_inactive: ... ino 0x83 ...
xfs_inodegc_stop: ...
...
xfs_inodegc_start: ...
xfs_inodegc_worker: ...
xfs_inode_inactivating: ... ino 0x83 ...
The first few lines show that the inode is removed and need inactive
state set, but the inactivation work has not completed before the
inodegc mechanism stops. The inactivation doesn't actually occur
until the fs is unfrozen and the gc mechanism starts back up. Note
that this test requires fsfreeze to reproduce because xfs_freeze
indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush().
When this occurs, the workqueue try_to_grab_pending() logic first
tries to steal the pending bit, which does not succeed because the
bit has been set by queue_work_on(). Subsequently, it checks for
association of a pool workqueue from the work item under the pool
lock. This association is set at the point a work item is queued and
cleared when dequeued for processing. If the association exists, the
work item is removed from the queue and cancel_work_sync() returns
true. If the pwq association is cleared, the remove attempt assumes
the task is busy and retries (eventually returning false to the
caller after waiting for the work task to complete).
To avoid this race, we can flush each work item explicitly before
cancel. However, since the _queue_all() already schedules each
underlying work item, the workqueue level helpers are sufficient to
achieve the same ordering effect. E.g., the inodegc enabled flag
prevents scheduling any further work in the _stop() case. Use the
drain_workqueue() helper in this particular case to make the intent
a bit more self explanatory.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
We used to have more code around the work loop, but now the goto and
lock juggling just makes it less readable than it should. Get rid of it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[BUG]
After commit 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to
implement btrfs_defrag_file()") autodefrag no longer properly re-defrag
the file from previously finished location.
[CAUSE]
The recent refactoring of defrag only focuses on defrag ioctl subpage
support, doesn't take autodefrag into consideration.
There are two problems involved which prevents autodefrag to restart its
scan:
- No range.start update
Previously when one defrag target is found, range->start will be
updated to indicate where next search should start from.
But now btrfs_defrag_file() doesn't update it anymore, making all
autodefrag to rescan from file offset 0.
This would also make autodefrag to mark the same range dirty again and
again, causing extra IO.
- No proper quick exit for defrag_one_cluster()
Currently if we reached or exceed @max_sectors limit, we just exit
defrag_one_cluster(), and let next defrag_one_cluster() call to do a
quick exit.
This makes @cur increase, thus no way to properly know which range is
defragged and which range is skipped.
[FIX]
The fix involves two modifications:
- Update range->start to next cluster start
This is a little different from the old behavior.
Previously range->start is updated to the next defrag target.
But in the end, the behavior should still be pretty much the same,
as now we skip to next defrag target inside btrfs_defrag_file().
Thus if auto-defrag determines to re-scan, then we still do the skip,
just at a different timing.
- Make defrag_one_cluster() to return >0 to indicate a quick exit
So that btrfs_defrag_file() can also do a quick exit, without
increasing @cur to the range end, and re-use @cur to update
@range->start.
- Add comment for btrfs_defrag_file() to mention the range->start update
Currently only autodefrag utilize this behavior, as defrag ioctl won't
set @max_to_defrag parameter, thus unless interrupted it will always
try to defrag the whole range.
Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mailbox.org/
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There are users using autodefrag mount option reporting obvious increase
in IO:
> If I compare the write average (in total, I don't have it per process)
> when taking idle periods on the same machine:
> Linux 5.16:
> without autodefrag: ~ 10KiB/s
> with autodefrag: between 1 and 2MiB/s.
>
> Linux 5.15:
> with autodefrag:~ 10KiB/s (around the same as without
> autodefrag on 5.16)
[CAUSE]
When autodefrag mount option is enabled, btrfs_defrag_file() will be
called with @max_sectors = BTRFS_DEFRAG_BATCH (1024) to limit how many
sectors we can defrag in one try.
And then use the number of sectors defragged to determine if we need to
re-defrag.
But commit b18c3ab234 ("btrfs: defrag: introduce helper to defrag one
cluster") uses wrong unit to increase @sectors_defragged, which should
be in unit of sector, not byte.
This means, if we have defragged any sector, then @sectors_defragged
will be >= sectorsize (normally 4096), which is larger than
BTRFS_DEFRAG_BATCH.
This makes the @max_sectors check in defrag_one_cluster() to underflow,
rendering the whole @max_sectors check useless.
Thus causing way more IO for autodefrag mount options, as now there is
no limit on how many sectors can really be defragged.
[FIX]
Fix the problems by:
- Use sector as unit when increasing @sectors_defragged
- Include @sectors_defragged > @max_sectors case to break the loop
- Add extra comment on the return value of btrfs_defrag_file()
Reported-by: Anthony Ruhier <aruhier@mailbox.org>
Fixes: b18c3ab234 ("btrfs: defrag: introduce helper to defrag one cluster")
Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mailbox.org/
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Change the cifs filesystem to take account of the changes to fscache's
indexing rewrite and reenable caching in cifs.
The following changes have been made:
(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.
(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.
For cifs, I've made it render the volume name string as:
"cifs,<ipaddress>,<sharename>"
where the sharename has '/' characters replaced with ';'.
This probably needs rethinking a bit as the total name could exceed
the maximum filename component length.
Further, the coherency data is currently just set to 0. It needs
something else doing with it - I wonder if it would suffice simply to
sum the resource_id, vol_create_time and vol_serial_number or maybe
hash them.
(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.
fscache_acquire_cookie() is passed the same keying and coherency
information as before.
(4) The functions to set/reset cookies are removed and
fscache_use_cookie() and fscache_unuse_cookie() are used instead.
fscache_use_cookie() is passed a flag to indicate if the cookie is
opened for writing. fscache_unuse_cookie() is passed updates for the
metadata if we changed it (ie. if the file was opened for writing).
These are called when the file is opened or closed.
(5) cifs_setattr_*() are made to call fscache_resize() to change the size
of the cache object.
(6) The functions to read and write data are stubbed out pending a
conversion to use netfslib.
Changes
=======
ver #8:
- Abstract cache invalidation into a helper function.
- Fix some checkpatch warnings[3].
ver #7:
- Removed the accidentally added-back call to get the super cookie in
cifs_root_iget().
- Fixed the right call to cifs_fscache_get_super_cookie() to take account
of the "-o fsc" mount flag.
ver #6:
- Moved the change of gfpflags_allow_blocking() to current_is_kswapd() for
cifs here.
- Fixed one of the error paths in cifs_atomic_open() to jump around the
call to use the cookie.
- Fixed an additional successful return in the middle of cifs_open() to
use the cookie on the way out.
- Only get a volume cookie (and thus inode cookies) when "-o fsc" is
supplied to mount.
ver #5:
- Fixed a couple of bits of cookie handling[2]:
- The cookie should be released in cifs_evict_inode(), not
cifsFileInfo_put_final(). The cookie needs to persist beyond file
closure so that writepages will be able to write to it.
- fscache_use_cookie() needs to be called in cifs_atomic_open() as it is
for cifs_open().
ver #4:
- Fixed the use of sizeof with memset.
- tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
ver #3:
- Canonicalise the cifs coherency data to make the cache portable.
- Set volume coherency data.
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- Upgraded to -rc4 to allow for upstream changes[1].
- fscache_acquire_volume() now returns errors.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1]
Link: https://lore.kernel.org/r/3419813.1641592362@warthog.procyon.org.uk/ [2]
Link: https://lore.kernel.org/r/CAH2r5muTanw9pJqzAHd01d9A8keeChkzGsCEH6=0rHutVLAF-A@mail.gmail.com/ [3]
Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021579335.640689.2681324337038770579.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/3462849.1641593783@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/1318953.1642024578@warthog.procyon.org.uk/ # v6
Signed-off-by: Steve French <stfrench@microsoft.com>
During defrag, at btrfs_defrag_file(), we have this loop that iterates
over a file range in steps no larger than 256K subranges. If the range
is too long, there's no way to interrupt it. So make the loop check in
each iteration if there's signal pending, and if there is, break and
return -AGAIN to userspace.
Before kernel 5.16, we used to allow defrag to be cancelled through a
signal, but that was lost with commit 7b508037d4 ("btrfs: defrag:
use defrag_one_cluster() to implement btrfs_defrag_file()").
This change adds back the possibility to cancel a defrag with a signal
and keeps the same semantics, returning -EAGAIN to user space (and not
the usually more expected -EINTR).
This is also motivated by a recent bug on 5.16 where defragging a 1 byte
file resulted in iterating from file range 0 to (u64)-1, as hitting the
bug triggered a too long loop, basically requiring one to reboot the
machine, as it was not possible to cancel defrag.
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When attempting to defrag a file with a single byte, we can end up in a
too long loop, which is nearly infinite because at btrfs_defrag_file()
we end up with the variable last_byte assigned with a value of
18446744073709551615 (which is (u64)-1). The problem comes from the fact
we end up doing:
last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
So if last_byte was assigned 0, which is i_size - 1, we underflow and
end up with the value 18446744073709551615.
This is trivial to reproduce and the following script triggers it:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
mkfs.btrfs -f $DEV
mount $DEV $MNT
echo -n "X" > $MNT/foobar
btrfs filesystem defragment $MNT/foobar
umount $MNT
So fix this by not decrementing last_byte by 1 before doing the sector
size round up. Also, to make it easier to follow, make the round up right
after computing last_byte.
Reported-by: Anthony Ruhier <aruhier@mailbox.org>
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mailbox.org/
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Recent restructuring of cifs_reconnect introduced a helper func
named cifs_ses_mark_for_reconnect, which updates the state of tcp
session for all the channels of a session for reconnect.
However, this does not update the session state and chans_need_reconnect
bitmask. This change fixes that.
Also, cifs_mark_tcp_sess_for_reconnect should mark set the bitmask
for all channels when the whole session is marked for reconnect.
Fixed that here too.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Till the end of SMB session setup, update tcpStatus and
avoid updating session status field. There was a typo in
cifs_setup_session, which caused ses->status to be updated
instead. This was causing issues during reconnect.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The status of tcp session, smb session and tcon have the
same flow, irrespective of the SMB version used. Hence
these status checks and updates should happen in the
version independent callers of these commands.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_tree_connect checks and sets the tidStatus for the tcon.
cifs_tree_connect also calls a dfs specific tree connect
function, which also does similar checks. This should
not happen. Removing it with this change.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recently, the cifs_reconnect code was refactored into
two branches for regular vs dfs codepath. Some of my
recent changes were missing in the dfs path, namely the
code to enable periodic DNS query, and a missing lock.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
ses_selected is being declared and set at several places. It is not
being used. Remove it.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
A spin lock called chan_lock was introduced recently.
But not all accesses were protected. Doing that with
this change.
To make sure that a channel is not freed when in use,
we need to introduce a ref count. But today, we don't
ever free channels.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recent changes to multichannel required some adjustments in
the way connection states transitioned during/after reconnect.
Also some minor fixes:
1. A pending switch of GlobalMid_Lock to cifs_tcp_ses_lock
2. Relocations of the code that logs reconnect
3. Changed some code in allocate_mid to suit the new scheme
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
With the new multichannel logic, when a channel needs reconnection,
the tree connect and other channels can still be active.
This fix will handle cases of checking for channel reconnect,
when the tcon does not need reconnect.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In this round, we've tried to address some performance issues in f2fs_checkpoint
and direct IO flows. Also, there was a work to enhance the page cache management
used for compression. Other than them, we've done typical work including sysfs,
code clean-ups, tracepoint, sanity check, in addition to bug fixes on corner
cases.
Enhancement:
- use iomap for direct IO
- try to avoid lock contention to improve f2fs_ckpt speed
- avoid unnecessary memory allocation in compression flow
- POSIX_FADV_DONTNEED drops the page cache containing compression pages
- add some sysfs entries (gc_urgent_high_remaining, pending_discard)
Bug fix:
- try not to expose unwritten blocks to user by DIO
: this was added to avoid merge conflict; another patch is coming to address
other missing case.
- relax minor error condition for file pinning feature used in Android OTA
- fix potential deadlock case in compression flow
- should not truncate any block on pinned file
In addition, we've done some code clean-ups and tracepoint/sanity check
improvement.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmHnY0sACgkQQBSofoJI
UNIOkg//UmjCSSG63/YZM/lQQQe4kK/tT6QTT8W/VQtzWL9vXcL7bcaxzwX3LQbR
Gb47Zmsw9bzVJt6GQ2VRbODE1py/KPNMl5SDXJXHo6fOZ/dOnHve32gLwcLEzhPd
casB0TbwQJ6bpEsJiZ5ho741mURxUrSCHAAX6QIQVXh8ofm9qAqlWu74OLI6UHiV
MM84XmXcHtGUZG5SCTWfSCJhJM6Az/3A83ws9KVeu86dlE7IrigphU2nI2vdCKiO
trR3CiLC/364fiM+9ssLS3X2wKFPD/unEU7ljBv5UaG36jsVfW+tisjTKldzpiKK
44cNgDv1FEDxC0g3FKUhEGezAhxT8AJZB0in0zn8+5scarKGJtFCy9XhCGMVaeP+
usxvHVy8Ga1I7sMV6oHEBcGiPJWkmurzq1XXobtj6oL/JxN4gqUJeHTcod89hQHA
lx9kZs7MLKm2au+T3gZf5xyx35YCie8sY/N1qoPy8tU9Q7FJ54NdqqAc9JEZ6mSk
k9ybMaa/srHG/EI/XYPw0DrobHg6P5+bYtmsRvw2vP/nsNsD3ZI/EwBBEll2ITxC
V5Dn7MljYWI/5kB41Hl5xz6X65WeIN7koRyTXw5mp9tkNrLugqII5hzhwhSlcqJ1
3k9TAN3RbVpWHBcyryDyLbm/+dcbwIJ4v/eJEMIDk8F2SrBGOZs=
=LCJH
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tried to address some performance issues in
f2fs_checkpoint and direct IO flows. Also, there was a work to enhance
the page cache management used for compression. Other than them, we've
done typical work including sysfs, code clean-ups, tracepoint, sanity
check, in addition to bug fixes on corner cases.
Enhancements:
- use iomap for direct IO
- try to avoid lock contention to improve f2fs_ckpt speed
- avoid unnecessary memory allocation in compression flow
- POSIX_FADV_DONTNEED drops the page cache containing compression
pages
- add some sysfs entries (gc_urgent_high_remaining, pending_discard)
Bug fixes:
- try not to expose unwritten blocks to user by DIO (this was added
to avoid merge conflict; another patch is coming to address other
missing case)
- relax minor error condition for file pinning feature used in
Android OTA
- fix potential deadlock case in compression flow
- should not truncate any block on pinned file
In addition, we've done some code clean-ups and tracepoint/sanity
check improvement"
* tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits)
f2fs: do not allow partial truncation on pinned file
f2fs: remove redunant invalidate compress pages
f2fs: Simplify bool conversion
f2fs: don't drop compressed page cache in .{invalidate,release}page
f2fs: fix to reserve space for IO align feature
f2fs: fix to check available space of CP area correctly in update_ckpt_flags()
f2fs: support fault injection to f2fs_trylock_op()
f2fs: clean up __find_inline_xattr() with __find_xattr()
f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr()
f2fs: do not bother checkpoint by f2fs_get_node_info
f2fs: avoid down_write on nat_tree_lock during checkpoint
f2fs: compress: fix potential deadlock of compress file
f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
f2fs: add gc_urgent_high_remaining sysfs node
f2fs: fix to do sanity check in is_alive()
f2fs: fix to avoid panic in is_alive() if metadata is inconsistent
f2fs: fix to do sanity check on inode type during garbage collection
f2fs: avoid duplicate call of mark_inode_dirty
f2fs: show number of pending discard commands
f2fs: support POSIX_FADV_DONTNEED drop compressed page cache
...
An active work can have poll armed, hence it's not enough to just do
the async work removal and return the value if it's different from "not
found". Rather than make poll removal special, just fall through to do
the remaining type lookups and removals.
Reported-by: Florian Fischer <florian.fl.fischer@fau.de>
Link: https://lore.kernel.org/io-uring/20220118151337.fac6cthvbnu7icoc@pasture/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have a gap where a worker removes an item from the work list and to
when it gets added as the workers active work. In this state, the work
item cannot be found by cancelations. This is a small window, but it does
exist.
Add a temporary pointer to a work item that isn't on the pending work
list anymore, but also not the active work. This is needed as we need
to drop the wqe lock in between grabbing the work item and marking it
as active, to ensure that signal based cancelations are properly
ordered.
Reported-by: Florian Fischer <florian.fl.fischer@fau.de>
Link: https://lore.kernel.org/io-uring/20220118151337.fac6cthvbnu7icoc@pasture/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rather than split these into two separate lookups and matches, combine
them into one loop. This will become important when we can guarantee
that we don't have a window where a pending work item isn't discoverable
in either state.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_wqe_cancel_pending_work() grabs it internally, grab it upfront
instead. For the running work cancelation, grab the lock around it as
well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation to nesting it under the wqe lock (which is raw due to
being acquired from the scheduler side), change the io_worker lock from
a normal spinlock to a raw spinlock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
MS-SMB2 describe session sign like the following.
Session.SigningRequired MUST be set to TRUE under the following conditions:
- If the SMB2_NEGOTIATE_SIGNING_REQUIRED bit is set in the SecurityMode
field of the client request.
- If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags
field and Session.IsAnonymous is FALSE and either Connection.ShouldSign
or global RequireMessageSigning is TRUE.
When trying guest account connection using nautilus, The login failure
happened on session setup. ksmbd does not allow this connection
when the user is a guest and the connection sign is set. Just do not set
session sign instead of error response as described in the specification.
And this change improves the guest connection in Nautilus.
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The "ksmbd_socket" variable is not initialized on this error path.
Cc: stable@vger.kernel.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
if the Channel of a SMB2 WRITE request is
SMB2_CHANNEL_RDMA_V1_INVALIDTE, a client
does not invalidate its memory regions but
ksmbd must do it by sending a SMB2 WRITE response
with IB_WR_SEND_WITH_INV.
But if errors occur while processing a SMB2
READ/WRITE request, ksmbd sends a response
with IB_WR_SEND. So a client could use memory
regions already in use.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In the October updates to MS-SMB2 two additional FSCTLs
were described. Add the missing defines for these,
as well as fix a typo in an earlier define.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
I thought I was iterating over the array when actually the iteration is
over the values contained in the array?
Ugh, keep it simple.
Symptoms were a null deference in vfs_lock_file() when an NFSv3 client
that previously held a lock came back up and sent a notify.
Reported-by: Jonathan Woithe <jwoithe@just42.net>
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Remove these unused ia32 compat declarations; all the bits involved have
either been withdrawn or hoisted to the VFS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
partial support for < MAX_ORDER - 1 granularity for virtio-mem
driver_override for vdpa
sysfs ABI documentation for vdpa
multiqueue config support for mlx5 vdpa
Misc fixes, cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmHiDHkPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpVT4H/3Veixt3uYPOmuLU2tSx+8X+sFTtik81hyiE
okz5fRJrxxA8SqS76FnmO10FS4hlPOGNk0Z5WVhr0yihwFvPLvpCM/xi2Lmrz9I7
pB0sXOIocEL1xApsxukR9K1Twpb2hfYsflbJYUVlRfhS5G0izKJNZp5I7OPrzd80
vVNNDWKW2iLDlfqsavumI4Kvm4nsFuCHG03jzMtcIa7YTXYV3DORD4ZGFFVUOIQN
t5F74TznwHOeYgJeg7TzjFjfPWmXjLetvx10QX1A1uOvwppWW/QY6My0UafTXNXj
VB3gOwJPf+gxXAXl/4bafq4NzM0xys6cpcPpjvhmU+erY4UuyAU=
=Y1eO
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes.
- partial support for < MAX_ORDER - 1 granularity for virtio-mem
- driver_override for vdpa
- sysfs ABI documentation for vdpa
- multiqueue config support for mlx5 vdpa
- and misc fixes, cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits)
vdpa/mlx5: Fix tracking of current number of VQs
vdpa/mlx5: Fix is_index_valid() to refer to features
vdpa: Protect vdpa reset with cf_mutex
vdpa: Avoid taking cf_mutex lock on get status
vdpa/vdpa_sim_net: Report max device capabilities
vdpa: Use BIT_ULL for bit operations
vdpa/vdpa_sim: Configure max supported virtqueues
vdpa/mlx5: Report max device capabilities
vdpa: Support reporting max device capabilities
vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
vdpa: Add support for returning device configuration information
vdpa/mlx5: Support configuring max data virtqueue
vdpa/mlx5: Fix config_attr_mask assignment
vdpa: Allow to configure max data virtqueues
vdpa: Read device configuration only if FEATURES_OK
vdpa: Sync calls set/get config/status with cf_mutex
vdpa/mlx5: Distribute RX virtqueues in RQT object
vdpa: Provide interface to read driver features
vdpa: clean up get_config_size ret value handling
virtio_ring: mark ring unused on error
...
The "PAGE_SIZE - 2 - size" calculation in legacy_parse_param() is an
unsigned type so a large value of "size" results in a high positive
value instead of a negative value as expected. Fix this by getting rid
of the subtraction.
Signed-off-by: Jamie Hill-Daniel <jamie@hill-daniel.co.uk>
Signed-off-by: William Liu <willsroot@protonmail.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RHBZ: 2008434
Some servers, such as Windows2016 have a very low number of concurrent mounts that
they allow from each client.
This can be a problem if you have a more than a handful (==3 in this case)
of cifs entries in your fstab and cause a number of the mounts there to randomly fail.
Add a global mutex and use it to serialize all mount attempts.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix the size of a memory allocation in orangefs_bufmap_alloc()
Christophe JAILLET
use default_groups in kobj_type
Greg KH
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEIGSFVdO6eop9nER2z0QOqevODb4FAmHlgOQACgkQz0QOqevO
Db75Ng/+Lruql+eMqjMrmwRf8Rm1tErTjyll1RgynB608xd3A8mzRxk0U9+EQDCJ
U1fa0N1AJatSDh0X8PLke0lMR1YxGFNs+gmIswz4b0B0tpSuw+oJhUTrYSaeE5ud
boD+EwCIv7G3bTUqT8eC3sRFGZycIfQUMKS7Ml76OKxQXSd+9BaP4e7s87/kQItI
oRxBx/AzndLjhQWfPhCGVqYMzetto/pSAQaUfspH/pIAaDYPmuTCBvNf9crMjQ1k
TI7Pf0CMTGFdki0p5MkhK5co90X28aI+QHaycCtWMvwKxcASU1mKaJ5IqEzholkY
Q89cEyMeb8RkQjeOC8L1jG1GvkrBJNzdm30NVo4m7uQBp1ExXdvUnQRQ+FnJ7nEP
SAX5trxIrQuNKq1/hLqxngIQ3/duLxtnaFd9ZwmixTvQUgtxLTQL9Bued1WmcNTR
Z3VePxyIgqKKsdzzx0uBAd501/T77aB5TwSiVaZQNEDc44oZhr6rricrYP2tjnq4
3XLsAzb5djPUj3kEhbc4HKPatl67r7LsFumOMjZGnnj2nEsvQLH9o53snWlyfBYS
40CrulcEiIjD5dgt/xAojhPT20CyqCYH9mJyZ5NfvRlcR30vNT9/okeoFnUyKTB4
JBhHT/bu89CGD5F1M6G6wHGXNv/szxPDzW08/dUJA+BrZrHOYJQ=
=OM/4
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.17-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fixes from Mike Marshall:
"Two fixes:
- Fix the size of a memory allocation in orangefs_bufmap_alloc()
(Christophe JAILLET)
- Use default_groups in kobj_type (Greg KH)"
* tag 'for-linus-5.17-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc()
orangefs: use default_groups in kobj_type
Windows SMB server responds with STATUS_OBJECT_NAME_INVALID code to
SMB2 QUERY_INFO request for "\<server>\<dfsname>\<linkpath>" DFS reference,
where <dfsname> contains non-ASCII unicode symbols.
Check such DFS reference and emulate -EREMOTE if it is actual.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215440
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
alloc_path_with_tree_prefix() concatenates tree prefix and the path.
Windows CIFS client does not add separator after the tree prefix if the path
is empty. Let's do the same.
This fixes mounting DFS namespaces with names containing non-ASCII symbols.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215440
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
Eliminate the follow smatch warning:
fs/cifs/sess.c:1581 sess_auth_rawntlmssp_authenticate() warn:
inconsistent indenting
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
One of my previous fixes:
cifs: send workstation name during ntlmssp session setup
...changed the prototype of build_ntlmssp_negotiate_blob
from being allocated by the caller to being allocated within
the function. The caller needs to free this object too.
While SMB2 version of the caller did it, I forgot to free
for the SMB1 version. Fixing that here.
Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org # 5.16
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Now that we've made these ioctls defunct, move them from xfs_fs.h to
xfs_ioctl.c, which effectively removes them from the publicly supported
ioctl interfaces for XFS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
According to the glibc compat header for Irix 4, these ioctls originated
in April 1991 as a (somewhat clunky) way to preallocate space at the end
of a file on an EFS filesystem. XFS, which was released in Irix 5.3 in
December 1993, picked up these ioctls to maintain compatibility and they
were ported to Linux in the early 2000s.
Recently it was pointed out to me they still lurk in the kernel, even
though the Linux fallocate syscall supplanted the functionality a long
time ago. fstests doesn't seem to include any real functional or stress
tests for these ioctls, which means that the code quality is ... very
questionable. Most notably, it was a stale disk block exposure vector
for 21 years and nobody noticed or complained. As mature programmers
say, "If you're not testing it, it's broken."
Given all that, let's withdraw these ioctls from the XFS userspace API.
Normally we'd set a long deprecation process, but I estimate that there
aren't any real users, so let's trigger a warning in dmesg and return
-ENOTTY.
See: CVE-2021-4155
Augments: 983d8e60f5 ("xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Remove the definitions for these ioctls, since the functionality (and,
weirdly, the 32-bit compat ioctl definitions) were removed from the
kernel in November 2019.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmHk2A0ACgkQiiy9cAdy
T1GjmgwAhJA5Z8nr28Q/DrKdUkdKeHbPQUsroQlRVzO7gswnVQe2EKc/cfBj2bC2
a/uD8G3Kkgv3n6UCjbj5FkfAQRIcuQaZfxPEYMrlk6quBmSeqSTts/YOGICfCvw8
4Ra/pULtuF3Y27/3z98owbMxi/OScVP7vOxGVSQOgBJazTX0Lgtnt+UK2gKMYqjD
rGDaFScf8eq6J4Py2E2Ritn06v4Zk3/G1C+66SwJePnHdPfNh9ym+OKjiz8OkhB2
o968pt9QpBb6hizZlr/uO402lcxQkDVGmUKjHNA8xeCZvOvlBAlJFJY7FdfswlRv
62nxWhXg8hZBda1qJQS+7PLLK/2ovEFApcYk3ZZu67lCPHtk0cloxGkNmaJ7qcu1
/IDKFwPEiN3PqwE50hIHx9X3VaSdoOrL41MJVc1EzubM6kmgQHsq6FczKC8zWStH
ZioulKkc6r+/bJWCkllMIFoo7hb/M67i8QLF9VSaepgxz1A1oN23N78X6rShzcjE
56qh83F+
=Kauh
-----END PGP SIGNATURE-----
Merge tag '5.17-rc-part1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
- multichannel patches mostly related to improving reconnect behavior
- minor cleanup patches
* tag '5.17-rc-part1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix FILE_BOTH_DIRECTORY_INFO definition
cifs: move superblock magic defitions to magic.h
cifs: Fix smb311_update_preauth_hash() kernel-doc comment
cifs: avoid race during socket reconnect between send and recv
cifs: maintain a state machine for tcp/smb/tcon sessions
cifs: fix hang on cifs_get_next_mid()
cifs: take cifs_tcp_ses_lock for status checks
cifs: reconnect only the connection and not smb session where possible
cifs: add WARN_ON for when chan_count goes below minimum
cifs: adjust DebugData to use chans_need_reconnect for conn status
cifs: use the chans_need_reconnect bitmap for reconnect status
cifs: track individual channel status using chans_need_reconnect
cifs: remove redundant assignment to pointer p
Prior to Linux v5.4 devtmpfs used mount_single() which treats the given
mount options as "remount" options, so it updates the configuration of
the single super_block on each mount.
Since that was changed, the mount options used for devtmpfs are ignored.
This is a regression which affect systemd - which mounts devtmpfs with
"-o mode=755,size=4m,nr_inodes=1m".
This patch restores the "remount" effect by calling reconfigure_single()
Fixes: d401727ea0 ("devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 2b3d047870 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.
The comments should be updated too, but at least they don't cause any
visible breakage. But the gitignore file needs changing to avoid git
complaining about untracked files.
Fixes: 2b3d047870 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull signal/exit/ptrace updates from Eric Biederman:
"This set of changes deletes some dead code, makes a lot of cleanups
which hopefully make the code easier to follow, and fixes bugs found
along the way.
The end-game which I have not yet reached yet is for fatal signals
that generate coredumps to be short-circuit deliverable from
complete_signal, for force_siginfo_to_task not to require changing
userspace configured signal delivery state, and for the ptrace stops
to always happen in locations where we can guarantee on all
architectures that the all of the registers are saved and available on
the stack.
Removal of profile_task_ext, profile_munmap, and profile_handoff_task
are the big successes for dead code removal this round.
A bunch of small bug fixes are included, as most of the issues
reported were small enough that they would not affect bisection so I
simply added the fixes and did not fold the fixes into the changes
they were fixing.
There was a bug that broke coredumps piped to systemd-coredump. I
dropped the change that caused that bug and replaced it entirely with
something much more restrained. Unfortunately that required some
rebasing.
Some successes after this set of changes: There are few enough calls
to do_exit to audit in a reasonable amount of time. The lifetime of
struct kthread now matches the lifetime of struct task, and the
pointer to struct kthread is no longer stored in set_child_tid. The
flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
removed. Issues where task->exit_code was examined with
signal->group_exit_code should been examined were fixed.
There are several loosely related changes included because I am
cleaning up and if I don't include them they will probably get lost.
The original postings of these changes can be found at:
https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org
I trimmed back the last set of changes to only the obviously correct
once. Simply because there was less time for review than I had hoped"
* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
ptrace/m68k: Stop open coding ptrace_report_syscall
ptrace: Remove unused regs argument from ptrace_report_syscall
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
taskstats: Cleanup the use of task->exit_code
exit: Use the correct exit_code in /proc/<pid>/stat
exit: Fix the exit_code for wait_task_zombie
exit: Coredumps reach do_group_exit
exit: Remove profile_handoff_task
exit: Remove profile_task_exit & profile_munmap
signal: clean up kernel-doc comments
signal: Remove the helper signal_group_exit
signal: Rename group_exit_task group_exec_task
coredump: Stop setting signal->group_exit_task
signal: Remove SIGNAL_GROUP_COREDUMP
signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
signal: Make coredump handling explicit in complete_signal
signal: Have prepare_signal detect coredumps using signal->core_state
signal: Have the oom killer detect coredumps using signal->core_state
exit: Move force_uaccess back into do_exit
exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
...
This includes patches from Christoph Hellwig to split the large data
tables of the unicode subsystem into a loadable module, which allow
users to not have them around if case-insensitive filesystems are not to
be used. It also includes minor code fixes to unicode and its users,
from the same author.
There is a trivial conflict in the function encoding_show in
fs/f2fs/sysfs.c reported by linux-next between commit
84eab2a899 ("f2fs: replace snprintf in show functions with sysfs_emit")
and commit a440943e68 ("unicode: remove the charset field from struct
unicode_map"). from my tree.
All the patches here have been on linux-next releases for the past
months.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAmHeLp0ACgkQQEuZqsMc
ppRWdhAAstuibIlhUj1Vae070P92oaxM/Azz3IgyVFWensJyQV1PvbtFQDhyKM4w
M3tQ45eK49vVHn+JpLHbiAdZV66rD/sMSsruCVIf/8KNVDisOBQtFar5yxVr0Ion
AOMoG6/Xrk8BZlZH62fhtJGtu/EFmeFoGVdC81NdTSroe9G+26we3IULwHSE1lNH
XMJFCgU6otuLDOna16U7kL77Tu7GXRJcQe1+2nRJ+u6Agxy2xTo/s4FHuxzRK0/e
GsgO1scY6unWM23O6z+qJYazng2Zt3EOZtSGqU4TsvZwjUi2UtAYW1/vAQGc/q3Y
hGxPYGgKC1VrXLfIcuyng7j0vFPtADbdHMbsJPoyy+Nz4znDJ81IAKAHMO1in3C8
CHKjW+6InmXNye/uwdRt8Tx49jxUHmWUbQRT5FwMDpzC7MAL+DVdPpVVQgpLVM/H
gW3YpBEk5qQvVdh8DWZVW3rT3SnMX/v0+u+76FsMHKYNJMNrCnP6vXpCPQl/Gyut
ycgK7qVF3o/bgNBf072H3ZBZajTv7ePvacP4Wth7m9I2ykk+p4IjQLpTC5rJK0By
VC1xS4im2VqiIWE9eE5y9cXU1oa/AfOcOF+7FZcxT13IL6hKTtd4+H4yKgdcNsyk
7RjpGgjp+SU51/EilhEqMFgEe07CURxwGwhApizBSiTIOgZS96U=
=4q9x
-----END PGP SIGNATURE-----
Merge tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode updates from Gabriel Krisman Bertazi:
"This includes patches from Christoph Hellwig to split the large data
tables of the unicode subsystem into a loadable module, which allow
users to not have them around if case-insensitive filesystems are not
to be used. It also includes minor code fixes to unicode and its
users, from the same author.
All the patches here have been on linux-next releases for the past
months"
* tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: only export internal symbols for the selftests
unicode: Add utf8-data module
unicode: cache the normalization tables in struct unicode_map
unicode: move utf8cursor to utf8-selftest.c
unicode: simplify utf8len
unicode: remove the unused utf8{,n}age{min,max} functions
unicode: pass a UNICODE_AGE() tripple to utf8_load
unicode: mark the version field in struct unicode_map unsigned
unicode: remove the charset field from struct unicode_map
f2fs: simplify f2fs_sb_read_encoding
ext4: simplify ext4_sb_read_encoding
New:
- The Real Time Linux Analysis (RTLA) tool is added to the tools directory.
- Can safely filter on user space pointers with: field.ustring ~ "match-string"
- eprobes can now be filtered like any other event.
- trace_marker(_raw) now uses stream_open() to allow multiple threads to safely
write to it. Note, this could possibly break existing user space, but we will
not know until we hear about it, and then can revert the change if need be.
- New field in events to display when bottom halfs are disabled.
- Sorting of the ftrace functions are now done at compile time instead of
at bootup.
Infrastructure changes to support future efforts:
- Added __rel_loc type for trace events. Similar to __data_loc but the offset
to the dynamic data is based off of the location of the descriptor and not
the beginning of the event. Needed for user defined events.
- Some simplification of event trigger code.
- Make synthetic events process its callback better to not hinder other
event callbacks that are registered. Needed for user defined events.
And other small fixes and clean ups.
-
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYeGvcxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrZtAP9ICjJxX54MTErElhhUL/NFLV7wqhJi
OIAgmp6jGVRqPAD+JxQtBnGH+3XMd71ioQkTfQ1rp+jBz2ERBj2DmELUAg0=
=zmda
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New:
- The Real Time Linux Analysis (RTLA) tool is added to the tools
directory.
- Can safely filter on user space pointers with: field.ustring ~
"match-string"
- eprobes can now be filtered like any other event.
- trace_marker(_raw) now uses stream_open() to allow multiple threads
to safely write to it. Note, this could possibly break existing
user space, but we will not know until we hear about it, and then
can revert the change if need be.
- New field in events to display when bottom halfs are disabled.
- Sorting of the ftrace functions are now done at compile time
instead of at bootup.
Infrastructure changes to support future efforts:
- Added __rel_loc type for trace events. Similar to __data_loc but
the offset to the dynamic data is based off of the location of the
descriptor and not the beginning of the event. Needed for user
defined events.
- Some simplification of event trigger code.
- Make synthetic events process its callback better to not hinder
other event callbacks that are registered. Needed for user defined
events.
And other small fixes and cleanups"
* tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
tracing: Add ustring operation to filtering string pointers
rtla: Add rtla timerlat hist documentation
rtla: Add rtla timerlat top documentation
rtla: Add rtla timerlat documentation
rtla: Add rtla osnoise hist documentation
rtla: Add rtla osnoise top documentation
rtla: Add rtla osnoise man page
rtla: Add Documentation
rtla/timerlat: Add timerlat hist mode
rtla: Add timerlat tool and timelart top mode
rtla/osnoise: Add the hist mode
rtla/osnoise: Add osnoise top mode
rtla: Add osnoise tool
rtla: Helper functions for rtla
rtla: Real-Time Linux Analysis tool
tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
tracing: Remove duplicate warnings when calling trace_create_file()
tracing/kprobes: 'nmissed' not showed correctly for kretprobe
tracing: Add test for user space strings when filtering on string pointers
tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
...
- Bruce steps down as NFSD maintainer
- Prepare for dynamic nfsd thread management
- More work on supporting re-exporting NFS mounts
- One fs/locks patch on behalf of Jeff Layton
Notable bug fixes:
- Fix zero-length NFSv3 WRITEs
- Fix directory cinfo on FS's that do not support iversion
- Fix WRITE verifiers for stable writes
- Fix crash on COPY_NOTIFY with a special state ID
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmHcWOMACgkQM2qzM29m
f5dh0Q/+MjEL0IK551FdChx9Es1JqKRggv9KwJkLIoa1bw/PMSwP2pnKz6eL0Yun
mdhE9AZQgyFH1IAGdqjeLZKIYRin6bvAdDrnlqQ9SvTviPLWniSUI6AuyUqK6Zyk
wMcXpyOze0fhpxkYmz8/g7i66w967tmLh5MRvV1dkpOYAe99rYwGhvj+9ZeEWfNI
TgmptntMG6YEb+xY0E73otXZHMr2DL67ZYvOUYWemJA1uxcX4joaWBg8sx74dB6k
DUB4BFuoURk6viDD1QYh3qPU3dz9RCJNMz/cWd8+2t7BdaujTSXRIcaFslrQnKfL
Rm+O7pi5W+XohFDjeuMZ1g0c1ot/aoZSaAz00LoCVhejJ/sK9NiPAN1+LyY91Lja
cUBMVPNfW7ClIpiZcORP/chNmVn2qlaL2nxzSY/Uegnd5pIIeVD0pFVgx4+NlEat
mbrrQBcMpBRM0B+RzHS6AusqHrGdSEcwqWoVXWdxsBigJQT/AxWmii3U88k0Z54i
ooMWLaQ9EBBmygV01JN/OBySW2M/dvbfz3eFROvAVqsIP9JWP3FlUOlRDl8GcjXA
azi9fTysBom7WtL6NPcxDJbJ2t9hYr2YaztTpdo9YCHOuQbSQT6IWR5PAa3zvwMu
Bfz6Y8Hoo/KZHCqmkPGYM+x1ENCyDPv788E+erdnw1PFP5F3Pbo=
=/kX3
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Bruce has announced he is leaving Red Hat at the end of the month and
is stepping back from his role as NFSD co-maintainer. As a result,
this includes a patch removing him from the MAINTAINERS file.
There is one patch in here that Jeff Layton was carrying in the locks
tree. Since he had only one for this cycle, he asked us to send it to
you via the nfsd tree.
There continues to be 0-day reports from Robert Morris @MIT. This time
we include a fix for a crash in the COPY_NOTIFY operation.
Highlights:
- Bruce steps down as NFSD maintainer
- Prepare for dynamic nfsd thread management
- More work on supporting re-exporting NFS mounts
- One fs/locks patch on behalf of Jeff Layton
Notable bug fixes:
- Fix zero-length NFSv3 WRITEs
- Fix directory cinfo on FS's that do not support iversion
- Fix WRITE verifiers for stable writes
- Fix crash on COPY_NOTIFY with a special state ID"
* tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (51 commits)
SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points
SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point
fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypes
nfsd: fix crash on COPY_NOTIFY with special stateid
MAINTAINERS: remove bfields
NFSD: Move fill_pre_wcc() and fill_post_wcc()
Revert "nfsd: skip some unnecessary stats in the v4 case"
NFSD: Trace boot verifier resets
NFSD: Rename boot verifier functions
NFSD: Clean up the nfsd_net::nfssvc_boot field
NFSD: Write verifier might go backwards
nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id)
NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
NFSD: Clean up nfsd_vfs_write()
nfsd: Replace use of rwsem with errseq_t
NFSD: Fix verifier returned in stable WRITEs
nfsd: Retry once in nfsd_open on an -EOPENSTALE return
nfsd: Add errno mapping for EREMOTEIO
nfsd: map EBADF
...
- fix possible uninitialized memory usage for setattr
- fix fscache reading hole in a file just after it's been grown
- split net/9p/trans_fd.c in its own module like other transports
that module defaults to 9P_NET and is autoloaded if required so
users should not be impacted
- add Christian Schoenebeck to 9p reviewers
- some more trivial cleanup
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmHgt18ACgkQq06b7GqY
5nAdUA//ZHFTIcnbTiVkqbw0YztxedTOhdGCWcYsszux0pNQ/ZCjbP5NxESWiFxs
raYWcvE98Xvq4fbs8b+m7YKBJzWF+Km/v1MKfgvZZWFLZR3MfWiL8iojlsaUgG9U
rBmEUyaTWw2COIvFN7EqnwT5mwzNCli5d3AzaAmgffWsHEi/+EVU35YX70ySUkjW
nVf08oX3dBB425oArXOZApOZSVRsUr5YDSuQGiFHBL+hvPTrvPumu/AXbjLbTbmf
NuUxu1Akw8/jhWNATLHzjCdzJzBSfF0zRs+oH9Qt8MKkUrlTfjxc/2MbQYL1p1IN
84XxiG02ebw+Mx05cydP9/Ll7gOo2p6ORGXMHcoPIAMy6zFiCKo3TRdKDgmYDauC
K8c3v+osNwl2GFn1+2XoQIWnqXb7bJ1debkFtWVEGOPd/Yr6CYCZyfiZamoPJXUO
TjkoLAJXJGKlV66ZCuVeAsySdUxdtwbj2WxTliDUWtxvQ+m1jt06S3f/TbA7a9bB
8aBhx7FKzQ/UL8zhm4+3WLPlWkoSUXhFSZZUknvhuaHFV/S1xglXox5OAQJrJMy+
qOzmOjCg14T1TC2WkJlEsedLUH8ACU+XueAPT6uqSdA4rvQEVzk32zTm/GfFu3Q3
0RhcGlW5kAAYBTBLXvmKsjyW2OmPCSycgbWCu3E8A/1gubbRE40=
=o2Lc
-----END PGP SIGNATURE-----
Merge tag '9p-for-5.17-rc1' of git://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
"Fixes, split 9p_net_fd, and new reviewer:
- fix possible uninitialized memory usage for setattr
- fix fscache reading hole in a file just after it's been grown
- split net/9p/trans_fd.c in its own module like other transports.
The new transport module defaults to 9P_NET and is autoloaded if
required so users should not be impacted
- add Christian Schoenebeck to 9p reviewers
- some more trivial cleanup"
* tag '9p-for-5.17-rc1' of git://github.com/martinetd/linux:
9p: fix enodata when reading growing file
net/9p: show error message if user 'msize' cannot be satisfied
MAINTAINERS: 9p: add Christian Schoenebeck as reviewer
9p: only copy valid iattrs in 9P2000.L setattr implementation
9p: Use BUG_ON instead of if condition followed by BUG.
net/p9: load default transports
9p/xen: autoload when xenbus service is available
9p/trans_fd: split into dedicated module
fs: 9p: remove unneeded variable
9p/trans_virtio: Fix typo in the comment for p9_virtio_create()
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
The size of FILE_BOTH_DIRECTORY_INFO.ShortName must be 24 bytes, not 12
(see MS-FSCC documentation).
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
Help userland apps to identify cifs and smb2 mounts.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add the description of @server in smb311_update_preauth_hash()
kernel-doc comment to remove warning found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
fs/cifs/smb2misc.c:856: warning: Function parameter or member 'server'
not described in 'smb311_update_preauth_hash'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pass "end - 1" instead of "end" when walking the interval tree in
hugetlb_vmdelete_list() to fix an inclusive vs. exclusive bug. The two
callers that pass a non-zero "end" treat it as exclusive, whereas the
interval tree iterator expects an inclusive "last". E.g. punching a
hole in a file that precisely matches the size of a single hugepage,
with a vma starting right on the boundary, will result in
unmap_hugepage_range() being called twice, with the second call having
start==end.
The off-by-one error doesn't cause functional problems as
__unmap_hugepage_range() turns into a massive nop due to
short-circuiting its for-loop on "address < end". But, the mmu_notifier
invocations to invalid_range_{start,end}() are passed a bogus zero-sized
range, which may be unexpected behavior for secondary MMUs.
The bug was exposed by commit ed922739c9 ("KVM: Use interval tree to
do fast hva lookup in memslots"), currently queued in the KVM tree for
5.17, which added a WARN to detect ranges with start==end.
Link: https://lkml.kernel.org/r/20211228234257.1926057-1-seanjc@google.com
Fixes: 1bfad99ab4 ("hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reported-by: syzbot+4e697fe80a31aa7efe21@syzkaller.appspotmail.com
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch to add anonymous vma names causes a build failure in some
configurations:
include/linux/mm_types.h: In function 'is_same_vma_anon_name':
include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration]
924 | return name && vma_name && !strcmp(name, vma_name);
| ^~~~~~
include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'?
This should not really be part of linux/mm_types.h in the first place,
as that header is meant to only contain structure defintions and need a
minimum set of indirect includes itself.
While the header clearly includes more than it should at this point,
let's not make it worse by including string.h as well, which would pull
in the expensive (compile-speed wise) fortify-string logic.
Move the new functions into a separate header that only needs to be
included in a couple of locations.
Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org
Fixes: "mm: add a field to store names for private anonymous memory"
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use. At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.). Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.
On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc. This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages. It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).
If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.
Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes. It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace. Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace. It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.
This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas. The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)
Setting the name to NULL clears it. The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.
Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps. Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.
The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string. Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name. The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.
CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature. It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.
The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names. In that design, name
pointers could be shared between vmas. However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.
One big concern is about fork() performance which would need to strdup
anonymous vma names. Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4]. I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.
This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name. Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.
[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/
Changes for prctl(2) manual page (in the options section):
PR_SET_VMA
Sets an attribute specified in arg2 for virtual memory areas
starting from the address specified in arg3 and spanning the
size specified in arg4. arg5 specifies the value of the attribute
to be set. Note that assigning an attribute to a virtual memory
area might prevent it from being merged with adjacent virtual
memory areas due to the difference in that attribute's value.
Currently, arg2 must be one of:
PR_SET_VMA_ANON_NAME
Set a name for anonymous virtual memory areas. arg5 should
be a pointer to a null-terminated string containing the
name. The name length including null byte cannot exceed
80 bytes. If arg5 is NULL, the name of the appropriate
anonymous virtual memory areas will be reset. The name
can contain only printable ascii characters (including
space), except '[',']','\','$' and '`'.
This feature is available only if the kernel is built with
the CONFIG_ANON_VMA_NAME option enabled.
[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
work here was done by Colin Cross, therefore, with his permission, keeping
him as the author]
Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dump_mapping() is a big chunk of dump_page(), and it'd be handy to be
able to call it when we don't have a struct page. Split it out and move
it to fs/inode.c. Take the opportunity to simplify some of the debug
messages a little.
Link: https://lkml.kernel.org/r/20211121121056.2870061-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__user annotations are used by the checker (e.g sparse) to mark user
pointers. However here __user is applied to a struct directly, without a
pointer being directly involved.
Although the presence of __user does not cause sparse to emit a warning,
__user should be removed for consistency with other uses of offsetof().
Note: No functional changes intended.
Link: https://lkml.kernel.org/r/20211122101256.7875-1-amit.kachhap@arm.com
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable 'free_space' is being initialized with a value that is not
read, it is being re-assigned later in the two paths of an if statement.
The early initialization is redundant and can be removed.
Link: https://lkml.kernel.org/r/20220112230411.1090761-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are currently two ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.
Move the ocfs2 cluster sysfs code to use default_groups field which has
been the preferred way since aa30f47cf6 ("kobject: Add support for
default attribute groups to kobj_type") so that we can soon get rid of
the obsolete default_attrs field.
Link: https://lkml.kernel.org/r/20220106102028.3345634-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable 'root_bh' is being initialized with a value that is not
read, it is being re-assigned later on closer to its use. The early
initialization is redundant and can be removed.
Link: https://lkml.kernel.org/r/20211228013719.620923-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are currently two ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.
Move the ocfs2 code to use default_groups field which has been the
preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.
Link: https://lkml.kernel.org/r/20211228144517.391660-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2_grab_pages_for_write() may return -EAGAIN if write context type is
mmap and it could not lock the target page. In this case, we exit with
no error and no target page. And then trigger the caller page_mkwrite()
to retry.
Since there are other caller types, e.g. buffer and direct io, make the
return value handling more clear.
Link: https://lkml.kernel.org/r/20211206065051.103353-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This issue was detected with the help of Coccinelle.
Link: https://lkml.kernel.org/r/20211105014424.75372-1-zhang.mingyu@zte.com.cn
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c1f6925e10 ("mm: put readahead pages in cache earlier") causes
the read performance of squashfs to deteriorate.Through testing, we find
that the performance will be back by closing the readahead of squashfs.
So we want to learn the way of ubifs, provides backing_dev_info and
disable read-ahead
We tested the following data by fio.
squashfs image blocksize=128K
test command:
fio --name basic --bs=? --filename="/mnt/test_file" --rw=? --iodepth=1 --ioengine=psync --runtime=200 --time_based
turn on squashfs readahead in 5.10 kernel
bs(k) read/randread MB/s
4 randread 271
128 randread 231
1024 randread 246
4 read 310
128 read 245
1024 read 247
turn off squashfs readahead in 5.10 kernel
bs(k) read/randread MB/s
4 randread 293
128 randread 330
1024 randread 363
4 read 338
128 read 360
1024 read 365
turn on squashfs readahead and revert the
commit c1f6925e1091("mm: put readahead
pages in cache earlier") in 5.10 kernel
bs(k) read/randread MB/s
4 randread 289
128 randread 306
1024 randread 335
4 read 337
128 read 336
1024 read 338
Link: https://lkml.kernel.org/r/20211116113141.1391026-1-zhengliang6@huawei.com
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Hou Tao <houtao1@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comments for the file should not be in kernel-doc format:
/**
* attrib.c - NTFS attribute operations. Part of the Linux-NTFS
as it causes it to be incorrectly identified for function
ntfs_map_runlist_nolock(), causing some warnings found by running
scripts/kernel-doc.:
fs/ntfs/attrib.c:25: warning: Incorrect use of kernel-doc format: * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode
fs/ntfs/attrib.c:71: warning: Function parameter or member 'ni' not described in 'ntfs_map_runlist_nolock'
fs/ntfs/attrib.c:71: warning: Function parameter or member 'vcn' not described in 'ntfs_map_runlist_nolock'
fs/ntfs/attrib.c:71: warning: Function parameter or member 'ctx' not described in 'ntfs_map_runlist_nolock'
fs/ntfs/attrib.c:71: warning: expecting prototype for attrib.c - NTFS attribute operations. Part of the Linux(). Prototype was for ntfs_map_runlist_nolock() instead
Link: https://lkml.kernel.org/r/20220106015145.67067-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix a minor locking inconsistency in readdir
- Fix incorrect fs feature bit validation for secondary superblocks
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHfFFYACgkQ+H93GTRK
tOvrQA//RnDMit4zLoPo3WeZTDjpv1CQkOXFgQSba+Owgz1RvGn5f29R4ICKW6Uy
NfnPcq5AeNhysOAi0uZPl4EC6140QgMrD4aJjKY1JAc/BkeBVEN05NOUFfCuBXU/
RMzL8nGD5zsN7fPIWVGS60iitou8IvoX8ky4dKx7XcsbFzbBMtnJIsUpfqVutY9u
i+zub6sNRkstr4uBRk+1S8uAqHUAW+21YwfKqB6pgpCoO5BHu2e4eN01ohWvF8Ru
0ujJ2j4YlfjYmtQypFk3rNgQoI0oXY6mYWZPKr7fYvhDpoKEodUvHRLIJEHqoD+y
fX6Ey5XxpHmDxSJnWRC7Vznl56VEmMndUcWEq2ZqROp/r/zZp7StXyHLO7DR9nEs
mp+55a4tcKhHa/KnjbqexAaN4a1NTpryMqjsPHP7VTNu5Dq8CK4kHtrcEVrKCZFq
ExRFMfoUDxap6iJaxoKAz5CtZyJeuZO8bLCa7jq/2F91EWp+2aclxEU6VY5r6B2X
dFlIY8XnZEfJxE3xnhH/aDs6IKWH4YmvgtwxNb+RIupyMTfJzpcysjV9NRER8fAv
9rLfWNz+nx0efNyXle+h+vrzT/zXgyi/0PSjAQS9/xErTPRFAmXFefLl/0oMQSHC
XFIVivcR0lxKpZhB5CIOfVj1Vu8VM52JvlXjHPU3ObqEbFDxtf4=
=RYJ4
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"These are the last few obvious fixes that I found while stress testing
online fsck for XFS prior to initiating a design review of the whole
giant machinery.
- Fix a minor locking inconsistency in readdir
- Fix incorrect fs feature bit validation for secondary superblocks"
* tag 'xfs-5.17-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix online fsck handling of v5 feature bits on secondary supers
xfs: take the ILOCK when readdir inspects directory mapping data
This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Christoph Fritz is reporting that failure to copy up fileattr when upper
doesn't support fileattr or xattr results in a regression.
Return success in these failure cases; this reverts overlayfs to the old
behavior.
Add a pr_warn_once() in these cases to still let the user know about the
copy up failures.
Reported-by: Christoph Fritz <chf.fritz@googlemail.com>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This patch is fixing a NULL pointer dereference to get a recently
introduced warning message working.
Fixes: 5b0a414d06 ("ovl: fix filattr copy-up failure")
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
For each location returned in FS_LOCATION query, establish a
transport to the server, send EXCHANGE_ID and test for trunking,
if successful, add the transport to the exiting client.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
An fs_location attribute returns a string that can be ipv4, ipv6,
or DNS name. An ip location can have a port appended to it and if
no port is present a default port needs to be set. If rpc_pton()
fails to parse, try calling rpc_uaddr2socaddr() that can convert
an universal address.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Make nfs_parse_server_name available outside of nfs4namespace.c.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Query the server for other possible trunkable locations for a given
file system on a 4.1+ mount.
v2:
-- added missing static to nfs4_discover_trunking,
reported by the kernel test robot
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The uapi headers are missing the ceph definition. Move it there so
userland apps can ID cephfs.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The newcaps has already included the Ls, no need to check it again.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
CephFS is a bit unlike most other filesystems in that it only
conditionally does buffered I/O based on the caps that it gets from the
MDS. In most cases, unless there is contended access for an inode the
MDS does give Fbc caps to the client, so the unbuffered codepaths are
only infrequently traveled and are difficult to test.
At one time, the "-o sync" mount option would give you this behavior,
but that was removed in commit 7ab9b38070 ("ceph: Don't use
ceph-sync-mode for synchronous-fs.").
Add a new mount option to tell the client to ignore Fbc caps when doing
I/O, and to use the synchronous codepaths exclusively, even on
non-O_DIRECT file descriptors. We already have an ioctl that forces this
behavior on a per-file basis, so we can just always set the CEPH_F_SYNC
flag in the file description on such mounts.
Additionally, this patch also changes the client to not request Fbc when
doing direct I/O. We aren't using the cache with O_DIRECT so we don't
have any need for those caps.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
玮文 胡 reported seeing the WARN_RATELIMIT pop when writing to an
inode that had been transplanted into the stray dir. The client was
trying to look up the quotarealm info from the parent and that tripped
the warning.
Change the ceph_vino_is_reserved helper to not throw a warning for
MDS stray directories (0x100 - 0x1ff), only for reserved dirs that
are not in that range.
Also, fix ceph_has_realms_with_quotas to return false when encountering
a reserved inode.
URL: https://tracker.ceph.com/issues/53180
Reported-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This pops every second and isn't very useful.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Problem:
The statfs reports incorrect free/available space for quota less then
CEPH_BLOCK size (4M).
Solution:
For quota less than CEPH_BLOCK size, smaller block size of 4K is used.
But if quota is less than 4K, it is decided to go with binary use/free
of 4K block. For quota size less than 4K size, report the
total=used=4K,free=0 when quota is full and total=free=4K,used=0
otherwise.
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add read-only module parameters for supported mount syntaxes. Primary
user is the user-space mount helper for catching v2 syntax bugs during
testing by cross verifying if the kernel supports v2 syntax on mount
failure.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Note that the new monitors are just shown in /proc/mounts.
Ceph does not (re)connect to new monitors yet.
[ jlayton: s/printk\(KERN_NOTICE/pr_notice(/
s/strcmp/strcmp_null/ ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Old mount device syntax (source) has the following problems:
- mounts to the same cluster but with different fsnames
and/or creds have identical device string which can
confuse xfstests.
- Userspace mount helper tool resolves monitor addresses
and fill in mon addrs automatically, but that means the
device shown in /proc/mounts is different than what was
used for mounting.
New device syntax is as follows:
cephuser@fsid.mycephfs2=/path
Note, there is no "monitor address" in the device string.
That gets passed in as mount option. This keeps the device
string same when monitor addresses change (on remounts).
Also note that the userspace mount helper tool is backward
compatible. I.e., the mount helper will fallback to using
old syntax after trying to mount with the new syntax.
[ idryomov: drop CEPH_MON_ADDR_MNTOPT_DELIM ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
... and remove hardcoded function name in ceph_parse_ips().
[ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
- Simplify the dax_operations API
- Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining
and applying a partition offset to all its DAX iomap operations.
- Remove wrappers and device-mapper stacked callbacks for
->copy_from_iter() and ->copy_to_iter() in favor of moving
block_device relative offset responsibility to the
dax_direct_access() caller.
- Remove the need for an @bdev in filesystem-DAX infrastructure
- Remove unused uio helpers copy_from_iter_flushcache() and
copy_mc_to_iter() as only the non-check_copy_size() versions are
used for DAX.
- Prepare XFS for the pending (next merge window) DAX+reflink support
- Remove deprecated DEV_DAX_PMEM_COMPAT support
- Cleanup a straggling misuse of the GUID api
Tags offered after the branch was cut:
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/Ydb/3P+8nvjCjYfO@redhat.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYd3dTAAKCRDfioYZHlFs
Z//UAP9zetoTE+O7zJG7CXja4jSopSadbdbh6QKSXaqfKBPvQQD+N4US3wA2bGv8
f/qCY62j2Hj3hUTGHs9RvTyw3JsSYAA=
=QvDs
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax and libnvdimm updates from Dan Williams:
"The bulk of this is a rework of the dax_operations API after
discovering the obstacles it posed to the work-in-progress DAX+reflink
support for XFS and other copy-on-write filesystem mechanics.
Primarily the need to plumb a block_device through the API to handle
partition offsets was a sticking point and Christoph untangled that
dependency in addition to other cleanups to make landing the
DAX+reflink support easier.
The DAX_PMEM_COMPAT option has been around for 4 years and not only
are distributions shipping userspace that understand the current
configuration API, but some are not even bothering to turn this option
on anymore, so it seems a good time to remove it per the deprecation
schedule. Recall that this was added after the device-dax subsystem
moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
All recent functionality depends on /sys/bus/dax.
Some other miscellaneous cleanups and reflink prep patches are
included as well.
Summary:
- Simplify the dax_operations API:
- Eliminate bdev_dax_pgoff() in favor of the filesystem
maintaining and applying a partition offset to all its DAX iomap
operations.
- Remove wrappers and device-mapper stacked callbacks for
->copy_from_iter() and ->copy_to_iter() in favor of moving
block_device relative offset responsibility to the
dax_direct_access() caller.
- Remove the need for an @bdev in filesystem-DAX infrastructure
- Remove unused uio helpers copy_from_iter_flushcache() and
copy_mc_to_iter() as only the non-check_copy_size() versions are
used for DAX.
- Prepare XFS for the pending (next merge window) DAX+reflink support
- Remove deprecated DEV_DAX_PMEM_COMPAT support
- Cleanup a straggling misuse of the GUID api"
* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
iomap: Fix error handling in iomap_zero_iter()
ACPI: NFIT: Import GUID before use
dax: remove the copy_from_iter and copy_to_iter methods
dax: remove the DAXDEV_F_SYNC flag
dax: simplify dax_synchronous and set_dax_synchronous
uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
iomap: turn the byte variable in iomap_zero_iter into a ssize_t
memremap: remove support for external pgmap refcounts
fsdax: don't require CONFIG_BLOCK
iomap: build the block based code conditionally
dax: fix up some of the block device related ifdefs
fsdax: shift partition offset handling into the file systems
dax: return the partition offset from fs_dax_get_by_bdev
iomap: add a IOMAP_DAX flag
xfs: pass the mapping flags to xfs_bmbt_to_iomap
xfs: use xfs_direct_write_iomap_ops for DAX zeroing
xfs: move dax device handling into xfs_{alloc,free}_buftarg
ext4: cleanup the dax handling in ext4_fill_super
ext2: cleanup the dax handling in ext2_fill_super
fsdax: decouple zeroing from the iomap buffered I/O code
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmHeBGsACgkQ+7dXa6fL
C2tyLw/8C2Gs/XvOZvRO7KPetKI9BbQSFoCe7uvGbiPq5CEmgcjWzQxvQGklBiZD
qYa6pMNye1iGpsHOY3Yu210b7vMQiRLnnxvVle0UrjpZR7CcxYS0gGV+6yRdbDGy
W1X6GFiX06qiNsgBH4msYp0SmbhhfkTyAx1BeBZAEtX8iFgaPfOldPY2nLMcTDD6
6FT1nTzRcMHx9IUQZJtpeatzc70Qg8+fOr2UAY2nOIypXh6+vAMBO80xtUjGVU+1
pWD1E+8cXSLfwEEzquFWoWTsTX7hNfsesEN10FmBf1bVCH9ZDFE01MOl6B8+CkFl
+xfkvDNFC3yyUwAMVAV4+A4Be+cVLSqN2R91QIKJnAj9w1OjxASrwZJ1YeZp6KP4
h0XKuPs3sRwwbNPVL/nP0UPNexoJnOUAaHesl4uKkRrExmxz9xGOIqIri2+tUIO+
HkGyNns1huymj1K1ja4AQbDiZZX39GgYVleyg9g3uuy1FS4k+/myJcXo/CqWn3ON
4oeNwxwLvlcqIQnPrESvwev50lFZYB4pfwvez6T2C5dL/Wk/xdeJK9iG81RWgx7y
5XcDeoGDE08gMCGWVPjuhOCXypeiRGHhRNlcxTtq5kLwBZGkcYg/wFFnWn+6hzc4
kyXw2kS5WZq4Q/FPh7BdY0eHp6xv0EpAOZwceneLB9lhNINdxcQ=
=ISJ6
-----END PGP SIGNATURE-----
Merge tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache rewrite from David Howells:
"This is a set of patches that rewrites the fscache driver and the
cachefiles driver, significantly simplifying the code compared to
what's upstream, removing the complex operation scheduling and object
state machine in favour of something much smaller and simpler.
The series is structured such that the first few patches disable
fscache use by the network filesystems using it, remove the cachefiles
driver entirely and as much of the fscache driver as can be got away
with without causing build failures in the network filesystems.
The patches after that recreate fscache and then cachefiles,
attempting to add the pieces in a logical order. Finally, the
filesystems are reenabled and then the very last patch changes the
documentation.
[!] Note: I have dropped the cifs patch for the moment, leaving local
caching in cifs disabled. I've been having trouble getting that
working. I think I have it done, but it needs more testing (there
seem to be some test failures occurring with v5.16 also from
xfstests), so I propose deferring that patch to the end of the
merge window.
WHY REWRITE?
============
Fscache's operation scheduling API was intended to handle sequencing
of cache operations, which were all required (where possible) to run
asynchronously in parallel with the operations being done by the
network filesystem, whilst allowing the cache to be brought online and
offline and to interrupt service for invalidation.
With the advent of the tmpfile capacity in the VFS, however, an
opportunity arises to do invalidation much more simply, without having
to wait for I/O that's actually in progress: Cachefiles can simply
create a tmpfile, cut over the file pointer for the backing object
attached to a cookie and abandon the in-progress I/O, dismissing it
upon completion.
Future work here would involve using Omar Sandoval's vfs_link() with
AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new
hard link from a tmpfile as currently I have to unlink the old file
first.
These patches can also simplify the object state handling as I/O
operations to the cache don't all have to be brought to a stop in
order to invalidate a file. To that end, and with an eye on to writing
a new backing cache model in the future, I've taken the opportunity to
simplify the indexing structure.
I've separated the index cookie concept from the file cookie concept
by C type now. The former is now called a "volume cookie" (struct
fscache_volume) and there is a container of file cookies. There are
then just the two levels. All the index cookie levels are collapsed
into a single volume cookie, and this has a single printable string as
a key. For instance, an AFS volume would have a key of something like
"afs,example.com,1000555", combining the filesystem name, cell name
and volume ID. This is freeform, but must not have '/' chars in it.
I've also eliminated all pointers back from fscache into the network
filesystem. This required the duplication of a little bit of data in
the cookie (cookie key, coherency data and file size), but it's not
actually that much. This gets rid of problems with making sure we keep
netfs data structures around so that the cache can access them.
These patches mean that most of the code that was in the drivers
before is simply gone and those drivers are now almost entirely new
code. That being the case, there doesn't seem any particular reason to
try and maintain bisectability across it. Further, there has to be a
point in the middle where things are cut over as there's a single
point everything has to go through (ie. /dev/cachefiles) and it can't
be in use by two drivers at once.
ISSUES YET OUTSTANDING
======================
There are some issues still outstanding, unaddressed by this patchset,
that will need fixing in future patchsets, but that don't stop this
series from being usable:
(1) The cachefiles driver needs to stop using the backing filesystem's
metadata to store information about what parts of the cache are
populated. This is not reliable with modern extent-based
filesystems.
Fixing this is deferred to a separate patchset as it involves
negotiation with the network filesystem and the VM as to how much
data to download to fulfil a read - which brings me on to (2)...
(2) NFS (and CIFS with the dropped patch) do not take account of how
the cache would like I/O to be structured to meet its granularity
requirements. Previously, the cache used page granularity, which
was fine as the network filesystems also dealt in page
granularity, and the backing filesystem (ext4, xfs or whatever)
did whatever it did out of sight. However, we now have folios to
deal with and the cache will now have to store its own metadata to
track its contents.
The change I'm looking at making for cachefiles is to store
content bitmaps in one or more xattrs and making a bit in the map
correspond to something like a 256KiB block. However, the size of
an xattr and the fact that they have to be read/updated in one go
means that I'm looking at covering 1GiB of data per 512-byte map
and storing each map in an xattr. Cachefiles has the potential to
grow into a fully fledged filesystem of its very own if I'm not
careful.
However, I'm also looking at changing things even more radically
and going to a different model of how the cache is arranged and
managed - one that's more akin to the way, say, openafs does
things - which brings me on to (3)...
(3) The way cachefilesd does culling is very inefficient for large
caches and it would be better to move it into the kernel if I can
as cachefilesd has to keep asking the kernel if it can cull a
file. Changing the way the backend works would allow this to be
addressed.
BITS THAT MAY BE CONTROVERSIAL
==============================
There are some bits I've added that may be controversial:
(1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check
if a files is already being used by some other kernel service
(e.g. a duplicate cachefiles cache in the same directory) and
reject it if it is. This isn't entirely necessary, but it helps
prevent accidental data corruption.
I don't want to use S_SWAPFILE as that has other effects, but
quite possibly swapon() should set S_KERNEL_FILE too.
Note that it doesn't prevent userspace from interfering, though
perhaps it should. (I have made it prevent a marked directory from
being rmdir-able).
(2) Cachefiles wants to keep the backing file for a cookie open whilst
we might need to write to it from network filesystem writeback.
The problem is that the network filesystem unuses its cookie when
its file is closed, and so we have nothing pinning the cachefiles
file open and it will get closed automatically after a short time
to avoid EMFILE/ENFILE problems.
Reopening the cache file, however, is a problem if this is being
done due to writeback triggered by exit(). Some filesystems will
oops if we try to open a file in that context because they want to
access current->fs or suchlike.
To get around this, I added the following:
(A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
filesystem inode to indicate that we have a usage count on the
cookie caching that inode.
(B) A flag in struct writeback_control, unpinned_fscache_wb, that
is set when __writeback_single_inode() clears the last dirty
page from i_pages - at which point it clears
I_PINNING_FSCACHE_WB and sets this flag.
This has to be done here so that clearing I_PINNING_FSCACHE_WB
can be done atomically with the check of PAGECACHE_TAG_DIRTY
that clears I_DIRTY_PAGES.
(C) A function, fscache_set_page_dirty(), which if it is not set,
sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to
pin the cache resources.
(D) A function, fscache_unpin_writeback(), to be called by
->write_inode() to unuse the cookie.
(E) A function, fscache_clear_inode_writeback(), to be called when
the inode is evicted, before clear_inode() is called. This
cleans up any lingering I_PINNING_FSCACHE_WB.
The network filesystem can then use these tools to make sure that
fscache_write_to_cache() can write locally modified data to the
cache as well as to the server.
For the future, I'm working on write helpers for netfs lib that
should allow this facility to be removed by keeping track of the
dirty regions separately - but that's incomplete at the moment and
is also going to be affected by folios, one way or another, since
it deals with pages"
Link: https://lore.kernel.org/all/510611.1641942444@warthog.procyon.org.uk/
Tested-by: Dominique Martinet <asmadeus@codewreck.org> # 9p
Tested-by: kafs-testing@auristor.com # afs
Tested-by: Jeff Layton <jlayton@kernel.org> # ceph
Tested-by: Dave Wysochanski <dwysocha@redhat.com> # nfs
Tested-by: Daire Byrne <daire@dneg.com> # nfs
* tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (67 commits)
9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
fscache: Add a tracepoint for cookie use/unuse
fscache: Rewrite documentation
ceph: add fscache writeback support
ceph: conversion to new fscache API
nfs: Implement cache I/O by accessing the cache directly
nfs: Convert to new fscache volume/cookie API
9p: Copy local writes to the cache when writing to the server
9p: Use fscache indexing rewrite and reenable caching
afs: Skip truncation on the server of data we haven't written yet
afs: Copy local writes to the cache when writing to the server
afs: Convert afs to use the new fscache API
fscache, cachefiles: Display stat of culling events
fscache, cachefiles: Display stats of no-space events
cachefiles: Allow cachefiles to actually function
fscache, cachefiles: Store the volume coherency data
cachefiles: Implement the I/O routines
cachefiles: Implement cookie resize for truncate
cachefiles: Implement begin and end I/O operation
cachefiles: Implement backing file wrangling
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYdw8xgAKCRDh3BK/laaZ
PFy3AQCHSltzy6f234CcsFk3mtJn0im0tDbRoEYFD731JOR1YAD9HQKtJRn/sMCF
r0PnZnOWJ35RWB3o8uEqptsgrZXJkQo=
=HosR
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fix a regression introduced in 5.15
- Extend the size of the FUSE_INIT request to accommodate for more
flags. There's a slight possibility of a regression for obscure fuse
servers; if this happens, then more complexity will need to be added
to the protocol
- Allow the DAX property to be controlled by the server on a per-inode
basis in virtiofs
- Allow sending security context to the server when creating a file or
directory
* tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
Documentation/filesystem/dax: DAX on virtiofs
fuse: mark inode DONT_CACHE when per inode DAX hint changes
fuse: negotiate per inode DAX in FUSE_INIT
fuse: enable per inode DAX
fuse: support per inode DAX in fuse protocol
fuse: make DAX mount option a tri-state
fuse: add fuse_should_enable_dax() helper
fuse: Pass correct lend value to filemap_write_and_wait_range()
fuse: send security context of inode on file
fuse: extend init flags
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHek9YACgkQnJ2qBz9k
QNmgUQf+NpBuhoBI83dWI7NvwMZ1srtOiFdtVctq4jT8rS5m3KnwaTbq9HZ2y+WL
6+Tp8B2Qs/C2X6DbX87N1RBaP6mpMQCY4ADfbYNV6KOhUQqGBo/zxyM4nAk0WOBR
wXFWNhd/c3Xr3eb5Ggaus11UdQxXBFtZ72Azm2eUXnIKuSeAH1KWbyElQGzLqMDE
NLS68XFOcxLLScWA0lTExtouV6ZLUHm6EyYRH+TmIjAwb01JDwceXmdqzXNO7uiV
cKalWV8GcUH2OOLdX/KXvAwDVwuJwq4PCGCajGPEI8Ebct5Eh0YZuHVcvmfReOde
dnyZNiMOH8GwYVDHUDCNI8zD/BpyJQ==
=4gnz
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF / reiserfs updates from Jan Kara:
"One UDF fix and one reiserfs cleanup"
* tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix error handling in udf_new_inode()
reiserfs: don't use congestion_wait()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHektkACgkQnJ2qBz9k
QNk73Qf9G/8C29GC7lTZvW8ZOqM2onvP16/wpMU98jkw6/xpsdGLMmQlP4JCdB6Y
mTttvPxwMMVrdRmt+MNIgvWPOK2LKHdkbNbj0DRw4JxLEIm+zXddAXAcvX2vLeUn
N24idiVQxIKWS3zJhQ6H7sFjSrZ7ztjkeHWff70LIzURIupYO0+rRmKDgOqo6D5g
cKAdb8x9ZQLr0CTQoHKQxA8IeCSirSblVaDMdes2KrJH+i05s3aqS9jBvY16nrLo
Xvm4Vv7ecQXcmKDswuxVcmBVlVVsWxoYv1w8+v9qRGJnbgpQkjjgCGjCh54M18Do
/OovTVlqGWRfmf2Eo1FCXXz90ppGlQ==
=gvMh
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify updates from Jan Kara:
"Support for new FAN_RENAME fanotify event and support for reporting
child info in directory fanotify events (FAN_REPORT_TARGET_FID)"
* tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: wire up FAN_RENAME event
fanotify: report old and/or new parent+name in FAN_RENAME event
fanotify: record either old name new name or both for FAN_RENAME
fanotify: record old and new parent and name in FAN_RENAME event
fanotify: support secondary dir fh and name in fanotify_info
fanotify: use helpers to parcel fanotify_info buffer
fanotify: use macros to get the offset to fanotify_info buffer
fsnotify: generate FS_RENAME event with rich information
fanotify: introduce group flag FAN_REPORT_TARGET_FID
fsnotify: separate mark iterator type from object type enum
fsnotify: clarify object type argument
This should be all that is needed for XFS to use large folios.
There is no code in this pull request to create large folios, but
no additional changes should be needed to XFS or iomap once they
are created.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcpaUACgkQDpNsjXcp
gj4MUAf+ItcKfgFo1QCMT+6Y0mohVqPme/vdyOCNv6yOOfZZqN5ZQc+2hmxXrRz9
XPOPwZKL0TttlHSYEJmrm8mqwN8UXl0kqMu4kQqOXMziiD9qpVlaLXOZ7iLdkQxu
z/xe1iACcGfJUaQCsaMP6BZqp6iETA4qP72dBE4jc6PC4H3OI0pN/900gEbAcLxD
Yn0a5NhrdS/EySU2aHLB6OcwhqnSiHBVjUbFiuXxuvOYyzLaERIh00Kx3jLdj4DR
82K4TF8h2IZpALfIDSt0JG+gHLCc+EfF7Yd/xkeEv0md3ncyi+jWvFCFPNJbyFjm
cYoDTSunfbxwszA2n01R4JM8/KkGwA==
=IeFX
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux
Pull iomap updates from Matthew Wilcox:
"Convert xfs/iomap to use folios.
This should be all that is needed for XFS to use large folios. There
is no code in this pull request to create large folios, but no
additional changes should be needed to XFS or iomap once they are
created.
Usually this would have come from Darrick, and we had intended that it
would come that route. Between the holidays and various things which
Darrick needed to work on, he asked if I could send things directly.
There weren't any other iomap patches pending for this release, which
probably also played a role"
* tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits)
iomap: Inline __iomap_zero_iter into its caller
xfs: Support large folios
iomap: Support large folios in invalidatepage
iomap: Convert iomap_migrate_page() to use folios
iomap: Convert iomap_add_to_ioend() to take a folio
iomap: Simplify iomap_do_writepage()
iomap: Simplify iomap_writepage_map()
iomap,xfs: Convert ->discard_page to ->discard_folio
iomap: Convert iomap_write_end_inline to take a folio
iomap: Convert iomap_write_begin() and iomap_write_end() to folios
iomap: Convert __iomap_zero_iter to use a folio
iomap: Allow iomap_write_begin() to be called with the full length
iomap: Convert iomap_page_mkwrite to use a folio
iomap: Convert readahead and readpage to use a folio
iomap: Convert iomap_read_inline_data to take a folio
iomap: Use folio offsets instead of page offsets
iomap: Convert bio completions to use folios
iomap: Pass the iomap_page into iomap_set_range_uptodate
iomap: Add iomap_invalidate_folio
iomap: Convert iomap_releasepage to use a folio
...
This patchset stops just short of actually enabling large folios.
It converts everything that I noticed needs to be converted, but there may
still be places I've overlooked which still have page size assumptions.
The big change here is using large entries in the page cache XArray
instead of many small entries. That only affects shmem for now, but
it's a pretty big change for shmem since it changes where memory needs
to be allocated (at split time instead of insertion).
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcraoACgkQDpNsjXcp
gj7C3wgAl0cjtdVzTpkLmbnInsicW1m3thnbkSXYbpqRccFjpu2kEBGj31PT+oGz
dzgXP7SNZ/VkFT+qWtmHSRF/J41B6f9bFojO81B2aQdpRiziU+5QbSbXbfUjwVhE
GJF0WGSJtVqySKynXP/iYTEt2zj6BiVperAwIqzhZpPY7gNoyDgeRD34Xy5bQqdD
ey6/Uwkh7oFHLEDcgxsEnyF0tUR3q+gpe5XZW1fb79p3crWw44xATc3UvKv8qCLC
Rd4oHmKkOj4MvdiUxJEfXI+XxgrkQ8XRO70B+p6ZljhDaoDZYw7ullxA0gvlSpNX
6pnjSQlKA1VQXsi6PMSt+9vf26XxaQ==
=KeYZ
-----END PGP SIGNATURE-----
Merge tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache
Pull folio conversion updates from Matthew Wilcox:
"Convert much of the page cache to use folios
This stops just short of actually enabling large folios. It converts
everything that I noticed needs to be converted, but there may still
be places I've overlooked which still have page size assumptions.
The big change here is using large entries in the page cache XArray
instead of many small entries. That only affects shmem for now, but
it's a pretty big change for shmem since it changes where memory needs
to be allocated (at split time instead of insertion)"
* tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache: (49 commits)
mm: Use multi-index entries in the page cache
XArray: Add xas_advance()
truncate,shmem: Handle truncates that split large folios
truncate: Convert invalidate_inode_pages2_range to folios
fs: Convert vfs_dedupe_file_range_compare to folios
mm: Remove pagevec_remove_exceptionals()
mm: Convert find_lock_entries() to use a folio_batch
filemap: Return only folios from find_get_entries()
filemap: Convert filemap_get_read_batch() to use a folio_batch
filemap: Convert filemap_read() to use a folio
truncate: Add invalidate_complete_folio2()
truncate: Convert invalidate_inode_pages2_range() to use a folio
truncate: Skip known-truncated indices
truncate,shmem: Add truncate_inode_folio()
shmem: Convert part of shmem_undo_range() to use a folio
mm: Add unmap_mapping_folio()
truncate: Add truncate_cleanup_folio()
filemap: Add filemap_release_folio()
filemap: Use a folio in filemap_page_mkwrite
filemap: Use a folio in filemap_map_pages
...
Define and store if server returns it supports fs_locations attribute
as a capability.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Remove the check for the zero length fs_locations reply in the
xdr decoding, and instead check for that in the migration code.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Here is the set of changes for the driver core for 5.17-rc1.
Lots of little things here, including:
- kobj_type cleanups
- auxiliary_bus documentation updates
- auxiliary_device conversions for some drivers (relevant
subsystems all have provided acks for these)
- kernfs lock contention reduction for some workloads
- other tiny cleanups and changes.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYd7deA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym8ngCgw0ANwrRPE5b1dthEmfU2f8Knk5kAn0pHQv6R
VRZJypgNfU/Pt0ykstZD
=CO9J
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of changes for the driver core for 5.17-rc1.
Lots of little things here, including:
- kobj_type cleanups
- auxiliary_bus documentation updates
- auxiliary_device conversions for some drivers (relevant subsystems
all have provided acks for these)
- kernfs lock contention reduction for some workloads
- other tiny cleanups and changes.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits)
kobject documentation: remove default_attrs information
drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb
debugfs: lockdown: Allow reading debugfs files that are not world readable
driver core: Make bus notifiers in right order in really_probe()
driver core: Move driver_sysfs_remove() after driver_sysfs_add()
firmware: edd: remove empty default_attrs array
firmware: dmi-sysfs: use default_groups in kobj_type
qemu_fw_cfg: use default_groups in kobj_type
firmware: memmap: use default_groups in kobj_type
sh: sq: use default_groups in kobj_type
headers/uninline: Uninline single-use function: kobject_has_children()
devtmpfs: mount with noexec and nosuid
driver core: Simplify async probe test code by using ktime_ms_delta()
nilfs2: use default_groups in kobj_type
kobject: remove kset from struct kset_uevent_ops callbacks
driver core: make kobj_type constant.
driver core: platform: document registration-failure requirement
vdpa/mlx5: Use auxiliary_device driver data helpers
net/mlx5e: Use auxiliary_device driver data helpers
soundwire: intel: Use auxiliary_device driver data helpers
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8BkQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpkgED/4vyoNlKfuTqRzuHAZzq+PBYlMpVbn54ebm
z2AJ2StLrZ8rflz/9ERNVYxbSS6ELf/8H2Mdjnb5uCoRza/QkMf4JnGtgT0pBavV
uoruKTiYgQAQJmGuhxAo0TQwyW7F6qadp1eZH8CqxB2Cwj5wgSb3SB/yIlNkj6+2
fSzVGCh0CljhrkgFG10z507VGJYmgBqE9kl2tnfYkAA4blqyowZN4boTMx3l9fm5
2GWt0xg80lgMCzHb5m/sbqdOrMJV77TkuiAuVc+FjIfdGiVWo7XCn3q3e0qqJhRM
A3x9mmUyF/xNrRLMgTyYxHzKLrytBibr+VxvDLr/LmLtV8viVCsgJiP5EIiAWIAt
BCBks31QUxHaB9fyiQ43/nsRZLHmhXErLEK4D0loAXa50p4Xl4ZdaegD6txSH3FI
mGprv4Si1kMNqSxmzrOUfFk8Xdv/vC08RWL4UbRa8xkgwWUAIhpRaYmAtmw913kb
MgwFQuGpE9b7ae7HNdgWzyI424srCwY5UawgpzI25ZwGAVXTXDQ2qw1Lmyyo6swT
bsYVyH/vJLvCS1tjmBtrKfJQ0Mokm1sJpaMeTs/SfSyHmAUXEsEFdXVu6bRnVkYF
9vjHZKOl5jA5nx6/JDpW993GnHV3FGkCDfENXs3wUYY7Hu0DDYZt0CGiqGLjC7Oq
Ow4q3aEJQA==
=syIB
-----END PGP SIGNATURE-----
Merge tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
- Support for prioritized work completions (Hao)
- Simplification of reissue (Pavel)
- Add support for CQE skip (Pavel)
- Memory leak fix going to 5.15-stable (Pavel)
- Re-write of internal poll. This both cleans up that code, and gets us
ready to fix the POLLFREE issue (Pavel)
- Various cleanups (GuoYong, Pavel, Hao)
* tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block: (31 commits)
io_uring: fix not released cached task refs
io_uring: remove redundant tab space
io_uring: remove unused function parameter
io_uring: use completion batching for poll rem/upd
io_uring: single shot poll removal optimisation
io_uring: poll rework
io_uring: kill poll linking optimisation
io_uring: move common poll bits
io_uring: refactor poll update
io_uring: remove double poll on poll update
io_uring: code clean for some ctx usage
io_uring: batch completion in prior_task_list
io_uring: split io_req_complete_post() and add a helper
io_uring: add helper for task work execution code
io_uring: add a priority tw list for irq completion work
io-wq: add helper to merge two wq_lists
io_uring: reuse io_req_task_complete for timeouts
io_uring: tweak iopoll CQE_SKIP event counting
io_uring: simplify selected buf handling
io_uring: move up io_put_kbuf() and io_put_rw_kbuf()
...
While I was auditing the code in xfs_repair that adds feature bits to
existing V5 filesystems, I decided to have a look at how online fsck
handles feature bits, and I found a few problems:
1) ATTR2 is added to the primary super when an xattr is set to a file,
but that isn't consistently propagated to secondary supers. This isn't
a corruption, merely a discrepancy that repair will fix if it ever has
to restore the primary from a secondary. Hence, if we find a mismatch
on a secondary, this is a preen condition, not a corruption.
2) There are more compat and ro_compat features now than there used to
be, but we mask off the newer features from testing. This means we
ignore inconsistencies in the INOBTCOUNT and BIGTIME features, which is
wrong. Get rid of the masking and compare directly.
3) NEEDSREPAIR, when set on a secondary, is ignored by everyone. Hence
a mismatch here should also be flagged for preening, and online repair
should clear the flag. Right now we ignore it due to (2).
4) log_incompat features are ephemeral, since we can clear the feature
bit as soon as the log no longer contains live records for a particular
log feature. As such, the only copy we care about is the one in the
primary super. If we find any bits set in the secondary super, we
should flag that for preening, and clear the bits if the user elects to
repair it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
- set_fs removal
- Devicetree support
- Many cleanups from Al
- Various virtio and build related fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbPpwWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wXkcD/9UfDRFqvgtZIlacQ/0IvN24xeq
+f4aXoXEVyVsCd02jv9pUk3IAezIQyJf3MGtNJ4D/UFXtYfjEYjK5kJpPDP6umaZ
ZDnpTzn29HW2aGlgOxW9gU7a3Yze629QasIRP6x7Ht+Hk5eXrvRYrgcmKtw1mm04
SA5v5ZqP3P5r623fpsFiw4Dvl7l6MhDyFeyA2tabNnmv93HgB76PHDtV2Z+SWrC+
ubjlfBQc87QGHW+eTvce+0qw9APMoJpNFjNN4H8P/9VcDTvw+KL2JqQ02HSMWh4z
HeHKsv6hbty+GskBhbaWDW7867fPJ3e08TFAAAjeEiBP/CDBwjOTSr3eOw1eHgzU
xdAqC2Bz0e5G3shClmVEzzvcP6R2cgNZjeBze5m3wQ1NKHEddk6N9t5K+4NrOpgp
gbNN5Q4FAVOBKeQsZWG81bJKGcu7SbShgiKjlxcaRpMyp6LwyD4naauGjmCzYsbf
Pd4ilLO1Yocf7nFs2C4vWxE4iAZ6hfQtukerIxCQfb/Y2BaWT3bcWWYHFRFy6Lq+
hTFGnjf+Ro65QCoa1idaLaUdhwAGi6U9sjjL6G/JdQCCE3ftcXLVA9TJz9CNdMb5
98IGznxhczOZc7rHNXOF4km5+OUrU6N+C0WRp3yOoUWcI+Ms4PXHzqIwC5cde/V7
O/o9O1BAoBP6LE1pPg==
=5J6E
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- set_fs removal
- Devicetree support
- Many cleanups from Al
- Various virtio and build related fixes
* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (31 commits)
um: virtio_uml: Allow probing from devicetree
um: Add devicetree support
um: Extract load file helper from initrd.c
um: remove set_fs
hostfs: Fix writeback of dirty pages
um: Use swap() to make code cleaner
um: header debriding - sigio.h
um: header debriding - os.h
um: header debriding - net_*.h
um: header debriding - mem_user.h
um: header debriding - activate_ipi()
um: common-offsets.h debriding...
um, x86: bury crypto_tfm_ctx_offset
um: unexport handle_page_fault()
um: remove a dangling extern of syscall_trace()
um: kill unused cpu()
uml/i386: missing include in barrier.h
um: stop polluting the namespace with registers.h contents
logic_io instance of iounmap() needs volatile on argument
um: move amd64 variant of mmap(2) to arch/x86/um/syscalls_64.c
...
JFFS2:
- Fix for a deadlock in jffs2_write_begin()
UBI:
- Fixes in comments
UBIFS:
- Expose error counters in sysfs
- Many bugfixes found by Hulk Robot and others
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbQUIWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQRhD/9dMymYBFc0pku9SPgQrbieCXTf
O73rtO/MQTgIZR7y0VueosoUE+6P5672VKMLKt8Dp67tqoqI/LI8YiC0eYGQwklm
U7Ui3ze3ClNoRoPWflFKpm7c9/5UthWpUdVIRQ0Q3U7qvUPHswlPMoEKzTdMZqXa
zaOKGuUGO+4USBod4Jooo5I8KyTUezvjVQRii2xcjIy9LOgKNXImuGIX4QnLkH8Q
GbzgSMtYsHvdo8pqE68sZJ8D7vgEpOHEKTqcVroqO3dO8cmnk41pI5hNMnYherN3
+2yPefRza0ukxowTquM+eJB7D+/Req1ZjpthRMGVtStmmhXxYwp+wX4JLWta6Tr2
A6mgd3j8IxU0BlhP+AAXwIcIPvYyPtGYcfY8tqjorErHHOESvk8Hhifmq49pB8qy
UpIPMGSohZDrIIN4DIvb5tahVI0lYq7ozgPz9MJbvVfTI5Z63lntbfwGydsLCJLy
VLuVl3tZaMg/Pwr0qIGERQbgAqTJ0nzeH68CumSTmQxOFR9J51H2jQZLJMkZOohn
b7B6XkJ4H8xv6U2cFU9JjaaNJCJK6A3DcdCJKyzIYKwoWKPXjEGPmYEg2SgKFIhY
jzAUYUlN+ZBI4UTZH/ua627pdgzjR/ho1Y6tqd1OetlGzZq1a6UnTmXifud8h/mX
B3kRcodjNzbEm9dx2w==
=rpfR
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
"JFFS2:
- Fix for a deadlock in jffs2_write_begin()
UBI:
- Fixes in comments
UBIFS:
- Expose error counters in sysfs
- Many bugfixes found by Hulk Robot and others"
* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
ubifs: read-only if LEB may always be taken in ubifs_garbage_collect
ubifs: fix double return leb in ubifs_garbage_collect
ubifs: fix slab-out-of-bounds in ubifs_change_lp
ubifs: fix snprintf() length check
ubifs: Document sysfs nodes
ubifs: Export filesystem error counters
ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
ubifs: Make use of the helper macro kthread_run()
ubi: Fix a mistake in comment
ubifs: Fix spelling mistakes
This set includes the normal collection of minor fixes and
cleanups, new kmem caches for network messaging structs,
a start on some basic tracepoints, and some new debugfs
files for inserting test messages.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJh3J8BAAoJEDgbc8f8gGmqC0IP/0xCyLKDLmv1KcR/SbUxiP9Y
ZmiuF1FdxqRLPN5HfU1GU4734EbWEiIZFGXDk36b4kE3UpbeddUMA6dDf9sDOJ73
H5BYgNfSgzybIq1D3wZe4C3nH4YTSrqvZhAI2bD3mcNnUDx20LzUMaGqLkzJcsOa
iFDVDlkNrXee/RPSuZDv/5U+L5YEC7WlKEHJ2u/INAFitakb0i1ChJGAgrJZADCw
giznPhfkmWyesgfOxOQ7JcxcVQSedQ+7vtHYhZ5tZ5v1h2yM9LPYJ0bNpNKfHUiA
0gf0DIpOjt4q9ClYxCsaLeK0t32qWOjiNoZaAm+lrLvZWE75CC+WP6KrDR+2x3p5
JlgBZ24h3v7t/ldYmEo7SSZ/1lfb4+0fyk7UjPzko9ErhRFrAopAfl924bRLN3ES
9iSgDslBT5apivNwrByDKI9flhDpMnfWtSufnYeMzFAbHzJCWJuUS8bbi9M+K0mK
ENjmXQu1OVOmht8WM7HOA91tIUcLhr/YD2fKtYNx054TErKKRjlRb/V+NHffyYJc
SqqyaLsAamhMBcLrAk3a88hj6bp8N4NYwrMQMus0xv3aZ23OaGCvtrj7fXuYMlcu
sjLOTQfFFDPN1B1WvI18TDdh/TIgP/8AC+KbHf4Y+hbntO0BgEzL1/CKqT3WqN8y
hAnK+Avj6CILRvO8ardm
=Cst4
-----END PGP SIGNATURE-----
Merge tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes the normal collection of minor fixes and cleanups,
new kmem caches for network messaging structs, a start on some basic
tracepoints, and some new debugfs files for inserting test messages"
* tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (32 commits)
fs: dlm: print cluster addr if non-cluster node connects
fs: dlm: memory cache for lowcomms hotpath
fs: dlm: memory cache for writequeue_entry
fs: dlm: memory cache for midcomms hotpath
fs: dlm: remove wq_alloc mutex
fs: dlm: use event based wait for pending remove
fs: dlm: check for pending users filling buffers
fs: dlm: use list_empty() to check last iteration
fs: dlm: fix build with CONFIG_IPV6 disabled
fs: dlm: replace use of socket sk_callback_lock with sock_lock
fs: dlm: don't call kernel_getpeername() in error_report()
fs: dlm: fix potential buffer overflow
fs: dlm:Remove unneeded semicolon
fs: dlm: remove double list_first_entry call
fs: dlm: filter user dlm messages for kernel locks
fs: dlm: add lkb waiters debugfs functionality
fs: dlm: add lkb debugfs functionality
fs: dlm: allow create lkb with specific id range
fs: dlm: add debugfs rawmsg send functionality
fs: dlm: let handle callback data as void
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmHdubYUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpngxAAg+N74AZhPeUoPsvbVf4LKJ8eJRTY
4329sUjORwpy2g2wN1I53vOnpSjFQ0uWxaOwsuBTmTzM2lr9+J3w/a7Chx5q6M7D
o8dHh4BAwKGmXoCqMxvSiBtFiD/cht7YjjGGb+Q6+qa8viLky3Oksps9SsaZqMCf
RD+33fflp3eppPZTVgo9n+zuxbMz4wT9GpS/GNZy8GhKhPM+v5cPCw1pV0Wy1b7T
2FXsKBtk+odTzbfj88gQ+b2hoRMMQ2FLqIsqyJKajMuBwiBpuPzrm03J2B9N1OJ0
R0Ir0mXV627SoQ4N7v12MfmSF3+XMUTEMrdxMGbjsOz35TtspfOc5+mmveVF4DBD
T4zsK9ju6N/4OtsetOB3MF1AV5Hj0zOvuFZ1y71Bk9RBAJvLrzbHc17KtMEYbVgQ
XUQYY6IGYiVBr8n8x9t3LaPhOzYXwvgFwAmetZE8vYkLZQEQlgRVYJymFR8iluLo
oIPy1ciZgyMRDCa1/OHAIV604qwTChRLzO6WnDhdwWPwF/Sx4cHnhNVyR/5cp1ub
GG6E4o/jopeblftQHO5j+GIZAAwo8abU4ahJXUJaPQVdyY3VjOPhd+ooTrNbEEug
qijL1fIojcalDyCG2cIsXlFtXSb9Txr3L1fxOkluwmwVdR9LHJSEeDxYvIN20L/+
yqnHctyg9+HNsIc=
=icLi
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
"Various minor gfs2 cleanups and fixes"
* tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: dump inode object for iopen glocks
gfs2: Fix gfs2_instantiate description
gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDED
gfs2: remove redundant set of INSTANTIATE_NEEDED
gfs2: Fix __gfs2_holder_init function name in kernel-doc comment
I was poking around in the directory code while diagnosing online fsck
bugs, and noticed that xfs_readdir doesn't actually take the directory
ILOCK when it calls xfs_dir2_isblock. xfs_dir_open most probably loaded
the data fork mappings and the VFS took i_rwsem (aka IOLOCK_SHARED) so
we're protected against writer threads, but we really need to follow the
locking model like we do in other places.
To avoid unnecessarily cycling the ILOCK for fairly small directories,
change the block/leaf _getdents functions to consume the ILOCK hold that
the parent readdir function took to decide on a _getdents implementation.
It is ok to cycle the ILOCK in readdir because the VFS takes the IOLOCK
in the appropriate mode during lookups and writes, and we don't want to
be holding the ILOCK when we copy directory entries to userspace in case
there's a page fault. We really only need it to protect against data
fork lookups, like we do for other files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls. In addition the usual
large number of clean ups and bug fixes, in particular for the
fast_commit feature.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmHcr8sACgkQ8vlZVpUN
gaOuMwgAgunXEni8yED1QXAp+F1nigxx8K6O6fpjzVjiUFlH/cgc3M5PEZiV7Pgj
rIsNliCLQjDQPA7XxInvMuDXEvGRdleaqGS5MCKBR9898acaCTn0eBCK8ZKfhlIs
tJus706kOjy8IXRgEJkCrxEdrsJ57rgqHk/bstipNVg4lnfYYSUKH8k0ob4aaoiW
irufs1dSo7KmpwL0c/z9yIei3ST4DA2Dc/eBS0hsdgKtr/zMmwSanxqbQJTXz8ra
FTZ23HAu/zxWZ9GM4p3DhuVaMWaulm6eEqJKO+loeU9xOeItn0WmSJW260KR0cmj
0nPAflMmpQVBhtO9gjXcSiORbd6/zg==
=9HNa
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Convert ext4 to use the new mount API, and add support for the
FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls.
In addition the usual large number of clean ups and bug fixes, in
particular for the fast_commit feature"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (48 commits)
ext4: don't use the orphan list when migrating an inode
ext4: use BUG_ON instead of if condition followed by BUG
ext4: fix a copy and paste typo
ext4: set csum seed in tmp inode while migrating to extents
ext4: remove unnecessary 'offset' assignment
ext4: remove redundant o_start statement
ext4: drop an always true check
ext4: remove unused assignments
ext4: remove redundant statement
ext4: remove useless resetting io_end_size in mpage_process_page()
ext4: allow to change s_last_trim_minblks via sysfs
ext4: change s_last_trim_minblks type to unsigned long
ext4: implement support for get/set fs label
ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified
ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names
ext4: avoid trim error on fs with small groups
ext4: fix an use-after-free issue about data=journal writeback mode
ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
ext4: initialize err_blk before calling __ext4_get_inode_loc
ext4: fix a possible ABBA deadlock due to busy PA
...
- Fix log recovery with da btree buffers when metauuid is in use.
- Fix type coercion problems in xattr buffer size validation.
- Fix a bug in online scrub dir leaf bestcount checking.
- Only run COW recovery when recovering the log.
- Fix symlink target buffer UAF problems and symlink locking problems
by not exposing xfs innards to the VFS.
- Fix incorrect quotaoff lock usage.
- Don't let transactions cancel cleanly if they have deferred work
items attached.
- Fix a UAF when we're deciding if we need to relog an intent item.
- Reduce kvmalloc overhead for log shadow buffers.
- Clean up sysfs attr group usage.
- Fix a bug where scrub's bmap/rmap checking could race with a quota
file block allocation due to insufficient locking.
- Teach scrub to complain about invalid project ids.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHXOL0ACgkQ+H93GTRK
tOulDQ/+PwvFyROQePQo+5Z7fDb92VAKX6UxFYzWZGXj3zhRmTX7fvNSniPxKqHO
K0GMnZ9rJxvuo/MRuSEPILl+2rKh59Efo9IPcs2TR+xO+vI/EKnch2/gPLBN1y7Y
A4peQuK6CvUjYTyMazR2PcK8y76ZW8iXyuQMxnpmYldZBCZ+v7myVoVlPCRrZ68W
EoEQ/cDf4fmsycSG4Eh9WTddbqd9bfl1+jXEz/pzlbhJSvQOaqynR+21ahxhEQ/f
CzjSM4kXQ9AWTfHOPnEXs+Hf02NfcKRMuguzO69gJZhooEXJlgeYAZXUKqXpdmbC
2Tc+x/rJnQv9PtzEw/WoWMem4rhwjBX7cVi2E/SJHB46MLI5rjhQilIdWzecWd8q
nOrjIZlReI0vxYSqe1ifejxiCsV6gv3XdyLNacjxf7ISkU4XF0caO8f+3/ocIScV
E7k+XN13pArUiRZXOt8UL571Hoa6nhtHjyAUg6i62AFI2R9XYUEjEXKMwX7ovRVO
8eGdeQqioHkmOkF8HGyyurB9+itccuW8dtl7YOqEQJMGXeyk7rsvpcPahKKKoiDr
8w3bpZlNiOfZEnRcNjO9B/Egu6LLMudti2ptwlUvfvWsfa8VTO8v0h5pqVhwuwrk
lwQThWJ5THe/MptWybdZ3kSjhMvzqi4GO75T30Hj6XhecC3dFfo=
=RYEb
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"The big new feature here is that the mount code now only bothers to
try to free stale COW staging extents if the fs unmounted uncleanly.
This should reduce mount times, particularly on filesystems supporting
reflink and containing a large number of allocation groups.
Everything else this cycle are bugfixes, as the iomap folios
conversion should be plenty enough excitement for anyone. That and I
ran out of brain bandwidth after Thanksgiving last year.
Summary:
- Fix log recovery with da btree buffers when metauuid is in use.
- Fix type coercion problems in xattr buffer size validation.
- Fix a bug in online scrub dir leaf bestcount checking.
- Only run COW recovery when recovering the log.
- Fix symlink target buffer UAF problems and symlink locking problems
by not exposing xfs innards to the VFS.
- Fix incorrect quotaoff lock usage.
- Don't let transactions cancel cleanly if they have deferred work
items attached.
- Fix a UAF when we're deciding if we need to relog an intent item.
- Reduce kvmalloc overhead for log shadow buffers.
- Clean up sysfs attr group usage.
- Fix a bug where scrub's bmap/rmap checking could race with a quota
file block allocation due to insufficient locking.
- Teach scrub to complain about invalid project ids"
* tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: warn about inodes with project id of -1
xfs: hold quota inode ILOCK_EXCL until the end of dqalloc
xfs: Remove redundant assignment of mp
xfs: reduce kvmalloc overhead for CIL shadow buffers
xfs: sysfs: use default_groups in kobj_type
xfs: prevent UAF in xfs_log_item_in_current_chkpt
xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list()
xfs: Fix comments mentioning xfs_ialloc
xfs: check sb_meta_uuid for dabuf buffer recovery
xfs: fix a bug in the online fsck directory leaf1 bestcount check
xfs: only run COW extent recovery when there are no live extents
xfs: don't expose internal symlink metadata buffers to the vfs
xfs: fix quotaoff mutex usage now that we don't support disabling it
xfs: shut down filesystem if we xfs_trans_cancel with deferred work items
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmHcX64ACgkQxWXV+ddt
WDsIyA//UGMG0waxz40RksQL+4AnTIJ+apmmxq5VGEE8y6aUPAGKm++5Cs9cC+Os
5xuZhNnD8hjB+eWH+tRCx8ll+T4g4skGKDCyD0fWNtZAlW8pnsIbztdO0nNYx9C0
V++vu/hQR6M8E7ORlayEKBWy2/UnBG5p/XVLPG4RJ4vMETJPl2RLWVDpu3dj09kf
YnD3AY0vmKEyCu/b9NtSzfZMO2/lXT2U41ezLJJfmPAXcMJ0EeSAazACVDQyq24p
wnnr6xmdo7ZR0oGFLUmBmfxbKwd3l0JIUsi/XRysXe+8y7raIE0gKCg7NpCS276T
BZrKmefxHhdMCA1HLlH6AkrKmQUgIaceLkXTanTYv4cnVzb6XoeV6R4IO9/JQ0yv
YsdCL7eZ4Vl3ToPlQkYWdwUNP5UjVMg5qMwxchigbwq7jViLJtiu3WKy0O0TitzB
n4o0cCdlv3lHRp8FS6cFmbCrsavFT8/q3vz/aRdkrojOKE0jEWVJiz0bf39j5BVO
IiJ3RF3kbpG7TZz0+eNUbgebME8zwaWfyd6t2L7Ztvx9F4elzSW95iwGc9//TZvl
ciNFI9LQvnZRymUItxg0HNXfbJMXJAE9ImTxA8HLkfQYpaBlwbt4avkQK56LhpN+
nFCd5cxJy5HLFIvkRDfqpF+C247p5FICzd/hs59/cXlCynulglg=
=9lg+
-----END PGP SIGNATURE-----
Merge tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This end of the year branch is intentionally not that exciting. Most
of the changes are under the hood, but there are some minor user
visible improvements and several performance improvements too.
Features:
- make send work with concurrent block group relocation.
We're not allowed to prevent send failing or silently producing
some bad stream but with more fine grained locking and checks it's
possible. The send vs deduplication exclusion could reuse the same
logic in the future.
- new exclusive operation 'balance paused' to allow adding a device
to filesystem with paused balance
- new sysfs file for fsid stored in the per-device directory to help
distinguish devices when seeding is enabled, the fsid may differ
from the one reported by the filesystem
Performance improvements:
- less metadata needed for directory logging, directory deletion is
20-40% faster
- in zoned mode, cache zone information during mount to speed up
repeated queries (about 50% speedup)
- free space tree entries get indexed and searched by size (latency
-30%, search run time -30%)
- less contention in tree node locking when inserting a key and no
splits are needed (files/sec in fsmark improves by 1-20%)
Fixes:
- fix ENOSPC failure when attempting direct IO write into NOCOW range
- fix deadlock between quota enable and other quota operations
- global reserve minimum calculations fixed to account for free space
tree
- in zoned mode, fix condition for chunk allocation that may not find
the right zone for reuse and could lead to early ENOSPC
Core:
- global reserve stealing got simplified and cleaned up in evict
- remove async transaction commit based on manual transaction refs,
reuse existing kthread and mechanisms to let it commit transaction
before timeout
- preparatory work for extent tree v2, add wrappers for global tree
roots, truncation path cleanups
- remove readahead framework, it's a bit overengineered and used only
for scrub, and yet it does not cover all its needs, there is
another readahead built in the b-tree search that is now used,
performance drop on HDD is about 5% which is acceptable and scrub
is often throttled anyway, on SSDs there's no reported drop but
slight improvement
- self tests report extent tree state when error occurs
- replace assert with debugging information when an uncommitted
transaction is found at unmount time
Other:
- error handling improvements
- other cleanups and refactoring"
* tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits)
btrfs: output more debug messages for uncommitted transaction
btrfs: respect the max size in the header when activating swap file
btrfs: fix argument list that the kdoc format and script verified
btrfs: remove unnecessary parameter type from compression_decompress_bio
btrfs: selftests: dump extent io tree if extent-io-tree test failed
btrfs: scrub: cleanup the argument list of scrub_stripe()
btrfs: scrub: cleanup the argument list of scrub_chunk()
btrfs: remove reada infrastructure
btrfs: scrub: use btrfs_path::reada for extent tree readahead
btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity()
btrfs: refactor unlock_up
btrfs: skip transaction commit after failure to create subvolume
btrfs: zoned: fix chunk allocation condition for zoned allocator
btrfs: add extent allocator hook to decide to allocate chunk or not
btrfs: zoned: unset dedicated block group on allocation failure
btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zoned
btrfs: zoned: sink zone check into btrfs_repair_one_zone
btrfs: zoned: simplify btrfs_check_meta_write_pointer
btrfs: zoned: encapsulate inode locking for zoned relocation
btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device
...
- add sysfs interface and a sysfs node to control sync decompression;
- add tail-packing inline support for compressed files;
- get rid of erofs_get_meta_page().
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYduH/BEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBNPdAP9hKomD1hRiFeCWlLA1nDXYkqGbbt6+D3HT
cm4G7DgVBAD+O+RWv6JVYg1zAAFlKmxqEKEfoDLKI65wAjH1V/h/dQE=
=cEam
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, tail-packing data inline for compressed files is now
supported so that tail pcluster can be stored and read together with
inode metadata in order to save data I/O and storage space.
In addition to that, to prepare for the upcoming subpage, folio and
fscache features, we also introduce meta buffers to get rid of
erofs_get_meta_page() since it was too close to the page itself.
In addition, in order to show supported kernel features and control
sync decompression strategy, new sysfs nodes are introduced in this
cycle as well.
Summary:
- add sysfs interface and a sysfs node to control sync decompression
- add tail-packing inline support for compressed files
- get rid of erofs_get_meta_page()"
* tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: use meta buffers for zmap operations
erofs: use meta buffers for xattr operations
erofs: use meta buffers for super operations
erofs: use meta buffers for inode operations
erofs: introduce meta buffer operations
erofs: add on-disk compressed tail-packing inline support
erofs: support inline data decompression
erofs: support unaligned data decompression
erofs: introduce z_erofs_fixup_insize
erofs: tidy up z_erofs_lz4_decompress
erofs: clean up erofs_map_blocks tracepoints
erofs: Replace zero-length array with flexible-array member
erofs: add sysfs node to control sync decompression strategy
erofs: add sysfs interface
erofs: rename lz4_0pading to zero_padding
In 9p, afs ceph, and nfs, gfpflags_allow_blocking() (which wraps a
test for __GFP_DIRECT_RECLAIM being set) is used to determine if
->releasepage() should wait for the completion of a DIO write to fscache
with something like:
if (folio_test_fscache(folio)) {
if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
return false;
folio_wait_fscache(folio);
}
Instead, current_is_kswapd() should be used instead.
Note that this is based on a patch originally by Zhaoyang Huang[1]. In
addition to extending it to the other network filesystems and putting it on
top of my fscache rewrite, it also needs to include linux/swap.h in a bunch
of places. Can current_is_kswapd() be moved to linux/mm.h?
Changes
=======
ver #5:
- Dropping the changes for cifs.
Originally-signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <smfrench@gmail.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: linux-cachefs@redhat.com
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/1638952658-20285-1-git-send-email-huangzhaoyang@gmail.com/ [1]
Link: https://lore.kernel.org/r/164021590773.640689.16777975200823659231.stgit@warthog.procyon.org.uk/ # v4
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYdRCkgAKCRCRxhvAZXjc
olrvAQCdp8LWkT8TauJSl8wmUm3mZhNy+5+fXuCUSwe3PyUtTQEAq4fxm41JpG8u
WCZTrrxVhaXwgUY3aWzzeQnLCZjtEQw=
=woqV
-----END PGP SIGNATURE-----
Merge tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fs idmapping updates from Christian Brauner:
"This contains the work to enable the idmapping infrastructure to
support idmapped mounts of filesystems mounted with an idmapping.
In addition this contains various cleanups that avoid repeated
open-coding of the same functionality and simplify the code in quite a
few places.
We also finish the renaming of the mapping helpers we started a few
kernel releases back and move them to a dedicated header to not
continue polluting the fs header needlessly with low-level idmapping
helpers. With this series the fs header only contains idmapping
helpers that interact with fs objects.
Currently we only support idmapped mounts for filesystems mounted
without an idmapping themselves. This was a conscious decision
mentioned in multiple places (cf. [1]).
As explained at length in [3] it is perfectly fine to extend support
for idmapped mounts to filesystem's mounted with an idmapping should
the need arise. The need has been there for some time now (cf. [2]).
Before we can port any filesystem that is mountable with an idmapping
to support idmapped mounts in the coming cycles, we need to first
extend the mapping helpers to account for the filesystem's idmapping.
This again, is explained at length in our documentation at [3] and
also in the individual commit messages so here's an overview.
Currently, the low-level mapping helpers implement the remapping
algorithms described in [3] in a simplified manner as we could rely on
the fact that all filesystems supporting idmapped mounts are mounted
without an idmapping.
In contrast, filesystems mounted with an idmapping are very likely to
not use an identity mapping and will instead use a non-identity
mapping. So the translation step from or into the filesystem's
idmapping in the remapping algorithm cannot be skipped for such
filesystems.
Non-idmapped filesystems and filesystems not supporting idmapped
mounts are unaffected by this change as the remapping algorithms can
take the same shortcut as before. If the low-level helpers detect that
they are dealing with an idmapped mount but the underlying filesystem
is mounted without an idmapping we can rely on the previous shortcut
and can continue to skip the translation step from or into the
filesystem's idmapping. And of course, if the low-level helpers detect
that they are not dealing with an idmapped mount they can simply
return the relevant id unchanged; no remapping needs to be performed
at all.
These checks guarantee that only the minimal amount of work is
performed. As before, if idmapped mounts aren't used the low-level
helpers are idempotent and no work is performed at all"
Link: 2ca4dcc490 ("fs/mount_setattr: tighten permission checks") [1]
Link: https://github.com/containers/podman/issues/10374 [2]
Link: Documentations/filesystems/idmappings.rst [3]
Link: a65e58e791 ("fs: document and rename fsid helpers") [4]
* tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: support mapped mounts of mapped filesystems
fs: add i_user_ns() helper
fs: port higher-level mapping helpers
fs: remove unused low-level mapping helpers
fs: use low-level mapping helpers
docs: update mapping documentation
fs: account for filesystem mappings
fs: tweak fsuidgid_has_mapping()
fs: move mapping helpers
fs: add is_idmapped_mnt() helper
A task can end up indefinitely sleeping in do_select() ->
poll_schedule_timeout() when the following race happens:
TASK1 (thread1) TASK2 TASK1 (thread2)
do_select()
setup poll_wqueues table
with 'fd'
write data to 'fd'
pollwake()
table->triggered = 1
closes 'fd' thread1 is
waiting for
poll_schedule_timeout()
- sees table->triggered
table->triggered = 0
return -EINTR
loop back in do_select()
But at this point when TASK1 loops back, the fdget() in the setup of
poll_wqueues fails. So now so we never find 'fd' is ready for reading
and sleep in poll_schedule_timeout() indefinitely.
Treat an fd that got closed as a fd on which some event happened. This
makes sure cannot block indefinitely in do_select().
Another option would be to return -EBADF in this case but that has a
potential of subtly breaking applications that excercise this behavior
and it happens to work for them. So returning fd as active seems like a
safer choice.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before this patch, glock dumps would not dump the gl_object for iopen
glocks. This information can help us debug problems related to eviction:
when AN iopen glock is blocked we can see the status of its underlying
inode and its flags, etc.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reading from a file that was just extended by a write, but the write had
not yet reached the server would return ENODATA as illustrated by this
command:
$ xfs_io -c 'open -ft test' -c 'w 4096 1000' -c 'r 0 1000'
wrote 1000/1000 bytes at offset 4096
1000.000000 bytes, 1 ops; 0.0001 sec (5.610 MiB/sec and 5882.3529 ops/sec)
pread: No data available
Fix this case by having netfs assume zeroes when reads from server come
short like AFS and CEPH do
Link: https://lkml.kernel.org/r/20220110111444.926753-1-asmadeus@codewreck.org
Cc: stable@vger.kernel.org
Fixes: eb497943fa ("9p: Convert to using the netfs helper lib to do reads and caching")
Co-authored-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Core
----
- Defer freeing TCP skbs to the BH handler, whenever possible,
or at least perform the freeing outside of the socket lock section
to decrease cross-CPU allocator work and improve latency.
- Add netdevice refcount tracking to locate sources of netdevice
and net namespace refcount leaks.
- Make Tx watchdog less intrusive - avoid pausing Tx and restarting
all queues from a single CPU removing latency spikes.
- Various small optimizations throughout the stack from Eric Dumazet.
- Make netdev->dev_addr[] constant, force modifications to go via
appropriate helpers to allow us to keep addresses in ordered data
structures.
- Replace unix_table_lock with per-hash locks, improving performance
of bind() calls.
- Extend skb drop tracepoint with a drop reason.
- Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
BPF
---
- New helpers:
- bpf_find_vma(), find and inspect VMAs for profiling use cases
- bpf_loop(), runtime-bounded loop helper trading some execution
time for much faster (if at all converging) verification
- bpf_strncmp(), improve performance, avoid compiler flakiness
- bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
for tracing programs, all inlined by the verifier
- Support BPF relocations (CO-RE) in the kernel loader.
- Further the support for BTF_TYPE_TAG annotations.
- Allow access to local storage in sleepable helpers.
- Convert verifier argument types to a composable form with different
attributes which can be shared across types (ro, maybe-null).
- Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
creating new, extensible ones where missing and deprecating those
to be removed.
Protocols
---------
- WiFi (mac80211/cfg80211):
- notify user space about long "come back in N" AP responses,
allow it to react to such temporary rejections
- allow non-standard VHT MCS 10/11 rates
- use coarse time in airtime fairness code to save CPU cycles
- Bluetooth:
- rework of HCI command execution serialization to use a common
queue and work struct, and improve handling errors reported
in the middle of a batch of commands
- rework HCI event handling to use skb_pull_data, avoiding packet
parsing pitfalls
- support AOSP Bluetooth Quality Report
- SMC:
- support net namespaces, following the RDMA model
- improve connection establishment latency by pre-clearing buffers
- introduce TCP ULP for automatic redirection to SMC
- Multi-Path TCP:
- support ioctls: SIOCINQ, OUTQ, and OUTQNSD
- support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
- support cmsgs: TCP_INQ
- improvements in the data scheduler (assigning data to subflows)
- support fastclose option (quick shutdown of the full MPTCP
connection, similar to TCP RST in regular TCP)
- MCTP (Management Component Transport) over serial, as defined by
DMTF spec DSP0253 - "MCTP Serial Transport Binding".
Driver API
----------
- Support timestamping on bond interfaces in active/passive mode.
- Introduce generic phylink link mode validation for drivers which
don't have any quirks and where MAC capability bits fully express
what's supported. Allow PCS layer to participate in the validation.
Convert a number of drivers.
- Add support to set/get size of buffers on the Rx rings and size of
the tx copybreak buffer via ethtool.
- Support offloading TC actions as first-class citizens rather than
only as attributes of filters, improve sharing and device resource
utilization.
- WiFi (mac80211/cfg80211):
- support forwarding offload (ndo_fill_forward_path)
- support for background radar detection hardware
- SA Query Procedures offload on the AP side
New hardware / drivers
----------------------
- tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
real-time requirements for isochronous communication with protocols
like OPC UA Pub/Sub.
- Qualcomm BAM-DMUX WWAN - driver for data channels of modems
integrated into many older Qualcomm SoCs, e.g. MSM8916 or
MSM8974 (qcom_bam_dmux).
- Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch
driver with support for bridging, VLANs and multicast forwarding
(lan966x).
- iwlmei driver for co-operating between Intel's WiFi driver and
Intel's Active Management Technology (AMT) devices.
- mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
- Bluetooth:
- MediaTek MT7921 SDIO devices
- Foxconn MT7922A
- Realtek RTL8852AE
Drivers
-------
- Significantly improve performance in the datapaths of:
lan78xx, ax88179_178a, lantiq_xrx200, bnxt.
- Intel Ethernet NICs:
- igb: support PTP/time PEROUT and EXTTS SDP functions on
82580/i354/i350 adapters
- ixgbevf: new PF -> VF mailbox API which avoids the risk of
mailbox corruption with ESXi
- iavf: support configuration of VLAN features of finer granularity,
stacked tags and filtering
- ice: PTP support for new E822 devices with sub-ns precision
- ice: support firmware activation without reboot
- Mellanox Ethernet NICs (mlx5):
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- support TC forwarding when tunnel encap and decap happen between
two ports of the same NIC
- dynamically size and allow disabling various features to save
resources for running in embedded / SmartNIC scenarios
- Broadcom Ethernet NICs (bnxt):
- use page frag allocator to improve Rx performance
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- Other Ethernet NICs:
- amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
- Microsoft cloud/virtual NIC (mana):
- add XDP support (PASS, DROP, TX)
- Mellanox Ethernet switches (mlxsw):
- initial support for Spectrum-4 ASICs
- VxLAN with IPv6 underlay
- Marvell Ethernet switches (prestera):
- support flower flow templates
- add basic IP forwarding support
- NXP embedded Ethernet switches (ocelot & felix):
- support Per-Stream Filtering and Policing (PSFP)
- enable cut-through forwarding between ports by default
- support FDMA to improve packet Rx/Tx to CPU
- Other embedded switches:
- hellcreek: improve trapping management (STP and PTP) packets
- qca8k: support link aggregation and port mirroring
- Qualcomm 802.11ax WiFi (ath11k):
- qca6390, wcn6855: enable 802.11 power save mode in station mode
- BSS color change support
- WCN6855 hw2.1 support
- 11d scan offload support
- scan MAC address randomization support
- full monitor mode, only supported on QCN9074
- qca6390/wcn6855: report signal and tx bitrate
- qca6390: rfkill support
- qca6390/wcn6855: regdb.bin support
- Intel WiFi (iwlwifi):
- support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
in cooperation with the BIOS
- support for Optimized Connectivity Experience (OCE) scan
- support firmware API version 68
- lots of preparatory work for the upcoming Bz device family
- MediaTek WiFi (mt76):
- Specific Absorption Rate (SAR) support
- mt7921: 160 MHz channel support
- RealTek WiFi (rtw88):
- Specific Absorption Rate (SAR) support
- scan offload
- Other WiFi NICs
- ath10k: support fetching (pre-)calibration data from nvmem
- brcmfmac: configure keep-alive packet on suspend
- wcn36xx: beacon filter support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S
IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k
h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O
i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY
AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz
Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP
NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF
yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ
0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N
cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO
z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J
r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo=
=KLm3
-----END PGP SIGNATURE-----
Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Defer freeing TCP skbs to the BH handler, whenever possible, or at
least perform the freeing outside of the socket lock section to
decrease cross-CPU allocator work and improve latency.
- Add netdevice refcount tracking to locate sources of netdevice and
net namespace refcount leaks.
- Make Tx watchdog less intrusive - avoid pausing Tx and restarting
all queues from a single CPU removing latency spikes.
- Various small optimizations throughout the stack from Eric Dumazet.
- Make netdev->dev_addr[] constant, force modifications to go via
appropriate helpers to allow us to keep addresses in ordered data
structures.
- Replace unix_table_lock with per-hash locks, improving performance
of bind() calls.
- Extend skb drop tracepoint with a drop reason.
- Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
BPF
---
- New helpers:
- bpf_find_vma(), find and inspect VMAs for profiling use cases
- bpf_loop(), runtime-bounded loop helper trading some execution
time for much faster (if at all converging) verification
- bpf_strncmp(), improve performance, avoid compiler flakiness
- bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
for tracing programs, all inlined by the verifier
- Support BPF relocations (CO-RE) in the kernel loader.
- Further the support for BTF_TYPE_TAG annotations.
- Allow access to local storage in sleepable helpers.
- Convert verifier argument types to a composable form with different
attributes which can be shared across types (ro, maybe-null).
- Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
creating new, extensible ones where missing and deprecating those
to be removed.
Protocols
---------
- WiFi (mac80211/cfg80211):
- notify user space about long "come back in N" AP responses,
allow it to react to such temporary rejections
- allow non-standard VHT MCS 10/11 rates
- use coarse time in airtime fairness code to save CPU cycles
- Bluetooth:
- rework of HCI command execution serialization to use a common
queue and work struct, and improve handling errors reported in
the middle of a batch of commands
- rework HCI event handling to use skb_pull_data, avoiding packet
parsing pitfalls
- support AOSP Bluetooth Quality Report
- SMC:
- support net namespaces, following the RDMA model
- improve connection establishment latency by pre-clearing buffers
- introduce TCP ULP for automatic redirection to SMC
- Multi-Path TCP:
- support ioctls: SIOCINQ, OUTQ, and OUTQNSD
- support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
- support cmsgs: TCP_INQ
- improvements in the data scheduler (assigning data to subflows)
- support fastclose option (quick shutdown of the full MPTCP
connection, similar to TCP RST in regular TCP)
- MCTP (Management Component Transport) over serial, as defined by
DMTF spec DSP0253 - "MCTP Serial Transport Binding".
Driver API
----------
- Support timestamping on bond interfaces in active/passive mode.
- Introduce generic phylink link mode validation for drivers which
don't have any quirks and where MAC capability bits fully express
what's supported. Allow PCS layer to participate in the validation.
Convert a number of drivers.
- Add support to set/get size of buffers on the Rx rings and size of
the tx copybreak buffer via ethtool.
- Support offloading TC actions as first-class citizens rather than
only as attributes of filters, improve sharing and device resource
utilization.
- WiFi (mac80211/cfg80211):
- support forwarding offload (ndo_fill_forward_path)
- support for background radar detection hardware
- SA Query Procedures offload on the AP side
New hardware / drivers
----------------------
- tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
real-time requirements for isochronous communication with protocols
like OPC UA Pub/Sub.
- Qualcomm BAM-DMUX WWAN - driver for data channels of modems
integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
(qcom_bam_dmux).
- Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
with support for bridging, VLANs and multicast forwarding
(lan966x).
- iwlmei driver for co-operating between Intel's WiFi driver and
Intel's Active Management Technology (AMT) devices.
- mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
- Bluetooth:
- MediaTek MT7921 SDIO devices
- Foxconn MT7922A
- Realtek RTL8852AE
Drivers
-------
- Significantly improve performance in the datapaths of: lan78xx,
ax88179_178a, lantiq_xrx200, bnxt.
- Intel Ethernet NICs:
- igb: support PTP/time PEROUT and EXTTS SDP functions on
82580/i354/i350 adapters
- ixgbevf: new PF -> VF mailbox API which avoids the risk of
mailbox corruption with ESXi
- iavf: support configuration of VLAN features of finer
granularity, stacked tags and filtering
- ice: PTP support for new E822 devices with sub-ns precision
- ice: support firmware activation without reboot
- Mellanox Ethernet NICs (mlx5):
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- support TC forwarding when tunnel encap and decap happen between
two ports of the same NIC
- dynamically size and allow disabling various features to save
resources for running in embedded / SmartNIC scenarios
- Broadcom Ethernet NICs (bnxt):
- use page frag allocator to improve Rx performance
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- Other Ethernet NICs:
- amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
- Microsoft cloud/virtual NIC (mana):
- add XDP support (PASS, DROP, TX)
- Mellanox Ethernet switches (mlxsw):
- initial support for Spectrum-4 ASICs
- VxLAN with IPv6 underlay
- Marvell Ethernet switches (prestera):
- support flower flow templates
- add basic IP forwarding support
- NXP embedded Ethernet switches (ocelot & felix):
- support Per-Stream Filtering and Policing (PSFP)
- enable cut-through forwarding between ports by default
- support FDMA to improve packet Rx/Tx to CPU
- Other embedded switches:
- hellcreek: improve trapping management (STP and PTP) packets
- qca8k: support link aggregation and port mirroring
- Qualcomm 802.11ax WiFi (ath11k):
- qca6390, wcn6855: enable 802.11 power save mode in station mode
- BSS color change support
- WCN6855 hw2.1 support
- 11d scan offload support
- scan MAC address randomization support
- full monitor mode, only supported on QCN9074
- qca6390/wcn6855: report signal and tx bitrate
- qca6390: rfkill support
- qca6390/wcn6855: regdb.bin support
- Intel WiFi (iwlwifi):
- support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
in cooperation with the BIOS
- support for Optimized Connectivity Experience (OCE) scan
- support firmware API version 68
- lots of preparatory work for the upcoming Bz device family
- MediaTek WiFi (mt76):
- Specific Absorption Rate (SAR) support
- mt7921: 160 MHz channel support
- RealTek WiFi (rtw88):
- Specific Absorption Rate (SAR) support
- scan offload
- Other WiFi NICs
- ath10k: support fetching (pre-)calibration data from nvmem
- brcmfmac: configure keep-alive packet on suspend
- wcn36xx: beacon filter support"
* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
tcp: tcp_send_challenge_ack delete useless param `skb`
net/qla3xxx: Remove useless DMA-32 fallback configuration
rocker: Remove useless DMA-32 fallback configuration
hinic: Remove useless DMA-32 fallback configuration
lan743x: Remove useless DMA-32 fallback configuration
net: enetc: Remove useless DMA-32 fallback configuration
cxgb4vf: Remove useless DMA-32 fallback configuration
cxgb4: Remove useless DMA-32 fallback configuration
cxgb3: Remove useless DMA-32 fallback configuration
bnx2x: Remove useless DMA-32 fallback configuration
et131x: Remove useless DMA-32 fallback configuration
be2net: Remove useless DMA-32 fallback configuration
vmxnet3: Remove useless DMA-32 fallback configuration
bna: Simplify DMA setting
net: alteon: Simplify DMA setting
myri10ge: Simplify DMA setting
qlcnic: Simplify DMA setting
net: allwinner: Fix print format
page_pool: remove spinlock in page_pool_refill_alloc_cache()
amt: fix wrong return type of amt_send_membership_update()
...
When killing ksmbd server after connecting rdma, ksmbd threads does not
terminate properly because the rdma connection is still alive.
This patch add shutdown operation to disconnect rdma connection while
ksmbd threads terminate.
Signed-off-by: Yufan Chen <wiz.chen@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Due to restriction that cannot handle multiple
buffer descriptor structures, decrease the maximum
read/write size for Windows clients.
And set the maximum fragmented receive size
in consideration of the receive queue size.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>