The order of these two parameters is just reversed. gcc didn't warn on
that, probably because 'void *' can be converted from or to other
pointer types without warning.
Cc: stable@vger.kernel.org
Fixes: 3d3c950467 ("netfs: Provide readahead and readpage netfs helpers")
Fixes: e1b1240c1f ("netfs: Add write_begin helper")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in
a lockdep warning like that below. The problem comes from cachefiles
needing to take the sb_writers lock in order to do a write to the cache,
but being asked to do this by netfslib called from readpage, readahead or
write_begin[1].
Fix this by always offloading the write to the cache off to a worker
thread. The main thread doesn't need to wait for it, so deadlock can be
avoided.
This can be tested by running the quick xfstests on something like afs or
ceph with lockdep enabled.
WARNING: possible circular locking dependency detected
5.15.0-rc1-build2+ #292 Not tainted
------------------------------------------------------
holetest/65517 is trying to acquire lock:
ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5
but task is already holding lock:
ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&mm->mmap_lock#2){++++}-{3:3}:
validate_chain+0x3c4/0x4a8
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
__might_fault+0x87/0xb1
strncpy_from_user+0x25/0x18c
removexattr+0x7c/0xe5
__do_sys_fremovexattr+0x73/0x96
do_syscall_64+0x67/0x7a
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #1 (sb_writers#10){.+.+}-{0:0}:
validate_chain+0x3c4/0x4a8
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
cachefiles_write+0x2b3/0x4bb
netfs_rreq_do_write_to_cache+0x3b5/0x432
netfs_readpage+0x2de/0x39d
filemap_read_page+0x51/0x94
filemap_get_pages+0x26f/0x413
filemap_read+0x182/0x427
new_sync_read+0xf0/0x161
vfs_read+0x118/0x16e
ksys_read+0xb8/0x12e
do_syscall_64+0x67/0x7a
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}:
check_noncircular+0xe4/0x129
check_prev_add+0x16b/0x3a4
validate_chain+0x3c4/0x4a8
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
down_read+0x40/0x4a
filemap_fault+0x276/0x7a5
__do_fault+0x96/0xbf
do_fault+0x262/0x35a
__handle_mm_fault+0x171/0x1b5
handle_mm_fault+0x12a/0x233
do_user_addr_fault+0x3d2/0x59c
exc_page_fault+0x85/0xa5
asm_exc_page_fault+0x1e/0x30
other info that might help us debug this:
Chain exists of:
mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_lock#2);
lock(sb_writers#10);
lock(&mm->mmap_lock#2);
lock(mapping.invalidate_lock#3);
*** DEADLOCK ***
1 lock held by holetest/65517:
#0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
stack backtrace:
CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
dump_stack_lvl+0x45/0x59
check_noncircular+0xe4/0x129
? print_circular_bug+0x207/0x207
? validate_chain+0x461/0x4a8
? add_chain_block+0x88/0xd9
? hlist_add_head_rcu+0x49/0x53
check_prev_add+0x16b/0x3a4
validate_chain+0x3c4/0x4a8
? check_prev_add+0x3a4/0x3a4
? mark_lock+0xa5/0x1c6
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
? filemap_fault+0x276/0x7a5
? rcu_read_unlock+0x59/0x59
? add_to_page_cache_lru+0x13c/0x13c
? lock_is_held_type+0x7b/0xd3
down_read+0x40/0x4a
? filemap_fault+0x276/0x7a5
filemap_fault+0x276/0x7a5
? pagecache_get_page+0x2dd/0x2dd
? __lock_acquire+0x8bc/0x949
? pte_offset_kernel.isra.0+0x6d/0xc3
__do_fault+0x96/0xbf
? do_fault+0x124/0x35a
do_fault+0x262/0x35a
? handle_pte_fault+0x1c1/0x20d
__handle_mm_fault+0x171/0x1b5
? handle_pte_fault+0x20d/0x20d
? __lock_release+0x151/0x254
? mark_held_locks+0x1f/0x78
? rcu_read_unlock+0x3a/0x59
handle_mm_fault+0x12a/0x233
do_user_addr_fault+0x3d2/0x59c
? pgtable_bad+0x70/0x70
? rcu_read_lock_bh_held+0xab/0xab
exc_page_fault+0x85/0xa5
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x40192f
Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b
RSP: 002b:00007f9931867eb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0
RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700
R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0
R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000
Fixes: 726218fdc2 ("netfs: Define an interface to talk to a cache")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
cc: Jan Kara <jack@suse.cz>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
There's a small race here where the task_work could finish and drop
the worker itself, so that by the time that task_work_add() returns
with a successful addition we've already put the worker.
The worker callbacks clear this bit themselves, so we don't actually
need to manually clear it in the caller. Get rid of it.
Reported-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGr0HsACgkQiiy9cAdy
T1FXmgv+LxMKh3FtGdLyzeJ9OJeUP87WtHCn8Tek5b0BUYxBrC12lzdLuYl8+zQr
O09RcJ8THZwWYKqTDlkaQg4NnNeMEVeouottGIEQPuBztLVWNmvXD6g5c1YPhYMI
0AfeUbkCQ9Gb+zoAJtXTLMCkqYj/x+jVrgO56V12gjIfZCB1XRPhnhJOao2BRo88
go35pSn8l3lv8dLeQxUCBZ73+z/R8Kwk/8NgkFPT/N7vQwuSK7Ax27h8pyV0m8Zx
lziNBiVM0HRTB5WREJsMWVD8NRcm8e+Xf7bFjhR0FbT7U3cSy4kLOPTd9RDKxxkj
sGI56r1OoRCQFrv456kz6SPLLYx2ecP9WzDp6ghOvGENJ3UYFoUcv0L9+aQL1gyy
Gxn+bTng6TMHxjpzhRgS4pJg3TME2US4HQepsRrrNZ0FXM67Gt4pFwoxtA/ZLy4X
WyuROJ1Xy46G7Yn0glTPWGJ3yDDud88etLdF3RbH+wWtB88WvBtIjyPQ0QCEOFpU
0UMD4ow9
=hChf
-----END PGP SIGNATURE-----
Merge tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three SMB3 multichannel/fscache fixes and a DFS fix.
In testing multichannel reconnect scenarios recently various problems
with the cifs.ko implementation of fscache were found (e.g. incorrect
initialization of fscache cookies in some cases)"
* tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: avoid use of dstaddr as key for fscache client cookie
cifs: add server conn_id to fscache client cookie
cifs: wait for tcon resource_id before getting fscache super
cifs: fix missed refcounting of ipc tcon
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGqehoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpo+jD/4kPAd6kkKBnYWxBhnjZUuL5pio7G9wVM5a
OFDwJ8gh4blp+eqSFf7RnsTiP2xSIjOykBtCSl35jiUql1KPc0SN/rE9I+mxuTfj
k1nW+hTaFeQXUL3Pz8oP+vwfWhfCmgPPH5hrNd9n6vmNas1JH29ZOK5Z2AQU7OLl
h1a6Kwm7Jqmi346ZSOfdr4xCMVzXrUpZeN2AFEID9/rvpznKiENoMdO8dsDypgjc
bRLQIP3X10qSu1KglRJo1+zGFDB/xKOPZ4EdoD0xNFHhUJQpe1YE0wVCliheCmcq
kLVCPBeqh5UTW+orBfqVEI2UQlhhaqLW1Yq1zRMm35aS8tMQIaTfWDdvCO+Iq2Le
0VLEWCLohP17nRxlJETxxzOPfCw/vE/ZxygLD5TRgZGdNB1+A1PV4s565fhvFE8t
yRMhj4hYAY2XTBs44erJHJqBabt2VXLNw4A69DpdlDex+65lyNqGDez6x3VEKfZl
/4ohF3XCAeklZxegK/06EQdgwqx7bovSEnAIFZs4jrdcTqZmubWoIyLLFOG8jC68
ey50VOYZJhET+vb0KUn+zK2shcHJu5qSGH73ARHJ6pJpRsdEAdLaigTCSOKpUayA
Ml/jg2JIGIeUAsfsA5mhM6kTJ+ymc1fM7CQUkH1SSXB+qOUbsML4FKhkY7WL7vH2
IJ3zGtTNLw==
=Xy6S
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.16-2021-12-03' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a single fix preventing repeated retries of task_work based io-wq
thread creation, fixing a regression from when io-wq was made more (a
bit too much) resilient against signals"
* tag 'io_uring-5.16-2021-12-03' of git://git.kernel.dk/linux-block:
io-wq: don't retry task_work creation failure on fatal conditions
* Since commit 486408d690 ("gfs2: Cancel remote delete work
asynchronously"), inode create and lookup-by-number can overlap more
easily and we can end up with temporary duplicate inodes. Fix the
code to prevent that.
* Fix a BUG demoting weak glock holders from a remote node.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmGrPtYUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrtgA/+MESXTK8h/HK+M+CAnsJ5jFSVXV8W
wmhhOtqikNGGruxnTJdkipe9PcPRsdwh4XLtmbfGairXnmFqJ+oEG8oRk/Tqoap8
9RxWN+B+Wzh79mNvI3Il3RtvUJGNQGrfDSuT5F4jZ6eZSfjsyY3cgO2KMt2TN5sL
7WXwYnj4VyPa8TdkSFIhfxwiXyal7aC8BVS6UX83CUIrhJvNG70A3hqVrIuKymUf
dNiLWURdQ2oiDnX7hMjq/RnX84zAcp2B2wfKZVsX0ZcIqVdmHS5Er4PKqffEVOzq
nf2SU4pAo4q7L7/66TdUtGoRgUSK74O83VTqsb1Lex5roON7unmVsjKyc7dmKUrR
xt2Is4/OsCSZEsErWdMgtqer3fOxG1oGdSHzmgdRwqA22+Im5COTKjOBJ+3MXHaW
E/36FGYYiugOLQn8hrqTqp3MIdPoZe8B/4ZF82qu6qYBQiA3PIHMlfElym5+45oR
fqWLcwids4LV5esHK3YzZdQ3f9+FVIvcBayUUx71ZCD0QkNDUzTnckb5vTxwF3r1
zU/jIsv84dvQiTsCMhTxQq+JDY98V544sapa+0k7chwtwpTdeajTHjocECGRKE1E
o5vCWX6gAiJQ/5RtgfBS2DWXrBQNueUsYpBQM9jp2fr6kxK+Hc5MT8JMdZWw7Rb7
1ZDw9PNs8TpByzc=
=7HvB
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v5.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- Since commit 486408d690 ("gfs2: Cancel remote delete work
asynchronously"), inode create and lookup-by-number can overlap more
easily and we can end up with temporary duplicate inodes. Fix the
code to prevent that.
- Fix a BUG demoting weak glock holders from a remote node.
* tag 'gfs2-v5.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: gfs2_create_inode rework
gfs2: gfs2_inode_lookup rework
gfs2: gfs2_inode_lookup cleanup
gfs2: Fix remote demote of weak glock holders
server->dstaddr can change when the DNS mapping for the
server hostname changes. But conn_id is a u64 counter
that is incremented each time a new TCP connection
is setup. So use only that as a key.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
The fscache client cookie uses the server address
(and port) as the cookie key. This is a problem when
nosharesock is used. Two different connections will
use duplicate cookies. Avoid this by adding
server->conn_id to the key, so that it's guaranteed
that cookie will not be duplicated.
Also, for secondary channels of a session, copy the
fscache pointer from the primary channel. The primary
channel is guaranteed not to go away as long as secondary
channels are in use. Also addresses minor problem found
by kernel test robot.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
The logic for initializing tcon->resource_id is done inside
cifs_root_iget. fscache super cookie relies on this for aux
data. So we need to push the fscache initialization to this
later point during mount.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix missed refcounting of IPC tcon used for getting domain-based DFS
root referrals. We want to keep it alive as long as mount is active
and can be refreshed. For standalone DFS root referrals it wouldn't
be a problem as the client ends up having an IPC tcon for both mount
and cache.
Fixes: c88f7dcd6d ("cifs: support nested dfs links over reconnect")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Jann Horn points out that there is another possible race wrt Unix domain
socket garbage collection, somewhat reminiscent of the one fixed in
commit cbcf01128d ("af_unix: fix garbage collect vs MSG_PEEK").
See the extended comment about the garbage collection requirements added
to unix_peek_fds() by that commit for details.
The race comes from how we can locklessly look up a file descriptor just
as it is in the process of being closed, and with the right artificial
timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
Unix domain socket garbage collector could see the reference count
decrement of the close() happen before fget() took its reference to the
file and the file was attached onto a new file descriptor.
This is all (intentionally) correct on the 'struct file *' side, with
RCU lookups and lockless reference counting very much part of the
design. Getting that reference count out of order isn't a problem per
se.
But the garbage collector can get confused by seeing this situation of
having seen a file not having any remaining external references and then
seeing it being attached to an fd.
In commit cbcf01128d ("af_unix: fix garbage collect vs MSG_PEEK") the
fix was to serialize the file descriptor install with the garbage
collector by taking and releasing the unix_gc_lock.
That's not really an option here, but since this all happens when we are
in the process of looking up a file descriptor, we can instead simply
just re-check that the file hasn't been closed in the meantime, and just
re-do the lookup if we raced with a concurrent close() of the same file
descriptor.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't want to be retrying task_work creation failure if there's
an actual signal pending for the parent task. If we do, then we can
enter an infinite loop of perpetually retrying and each retry failing
with -ERESTARTNOINTR because a signal is pending.
Fixes: 3146cba99a ("io-wq: make worker creation resilient against signals")
Reported-by: Florian Fischer <florian.fl.fischer@fau.de>
Link: https://lore.kernel.org/io-uring/20211202165606.mqryio4yzubl7ms5@pasture/
Tested-by: Florian Fischer <florian.fl.fischer@fau.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When gfs2_lookup_by_inum() calls gfs2_inode_lookup() for an uncached
inode, gfs2_inode_lookup() will place a new tentative inode into the
inode cache before verifying that there is a valid inode at the given
address. This can race with gfs2_create_inode() which doesn't check for
duplicates inodes. gfs2_create_inode() will try to assign the new inode
to the corresponding inode glock, and glock_set_object() will complain
that the glock is still in use by gfs2_inode_lookup's tentative inode.
We noticed this bug after adding commit 486408d690 ("gfs2: Cancel
remote delete work asynchronously") which allowed delete_work_func() to
race with gfs2_create_inode(), but the same race exists for
open-by-handle.
Fix that by switching from insert_inode_hash() to
insert_inode_locked4(), which does check for duplicate inodes. We know
we've just managed to to allocate the new inode, so an inode tentatively
created by gfs2_inode_lookup() will eventually go away and
insert_inode_locked4() will always succeed.
In addition, don't flush the inode glock work anymore (this can now only
make things worse) and clean up glock_{set,clear}_object for the inode
glock somewhat.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rework gfs2_inode_lookup() to only set up the new inode's glocks after
verifying that the new inode is valid.
There is no need for flushing the inode glock work queue anymore now,
so remove that as well.
While at it, get rid of the useless wrapper around iget5_locked() and
its unnecessary is_bad_inode() check.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_inode_lookup, once the inode has been looked up, we check if the
inode generation (no_formal_ino) is the one we're looking for. If it
isn't and the inode wasn't in the inode cache, we discard the newly
looked up inode. This is unnecessary, complicates the code, and makes
future changes to gfs2_inode_lookup harder, so change the code to retain
newly looked up inodes instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in
response to a remote locking conflict, we don't set the HIF_HOLDER flag. This
causes function may_grant to BUG. Fix by setting the missing HIF_HOLDER flag
in the mock glock holder.
In addition, define the mock glock holder where it is used.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This ASSERT in xfs_rename is a) incorrect, because
(RENAME_WHITEOUT|RENAME_NOREPLACE) is a valid combination, and
b) unnecessary, because actual invalid flag combinations are already
handled at the vfs level in do_renameat2() before we get called.
So, remove it.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Ceph always inherits the SGID bit if it is set on the parent inode,
while the generic inode_init_owner does not do this in a few cases where
it can create a possible security problem (cf. [1]).
Update ceph to strip the SGID bit just as inode_init_owner would.
This bug was detected by the mapped mount testsuite in [3]. The
testsuite tests all core VFS functionality and semantics with and
without mapped mounts. That is to say it functions as a generic VFS
testsuite in addition to a mapped mount testsuite. While working on
mapped mount support for ceph, SIGD inheritance was the only failing
test for ceph after the port.
The same bug was detected by the mapped mount testsuite in XFS in
January 2021 (cf. [2]).
[1]: commit 0fa3ecd878 ("Fix up non-directory creation in SGID directories")
[2]: commit 01ea173e10 ("xfs: fix up non-directory creation in SGID directories")
[3]: https://git.kernel.org/fs/xfs/xfstests-dev.git
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The smatch static checker warned about an uninitialized symbol usage in
this function, in the case where ceph_mdsc_build_path returns an error.
It turns out that that case is harmless, but it just looks sketchy.
Initialize the variable at declaration time, and remove the unneeded
setting of it later.
Fixes: a33f6432b3 ("ceph: encode inodes' parent/d_name in cap reconnect message")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Newer compilers seem to determine that this variable being uninitialized
isn't a problem, but older compilers (from the RHEL8 era) seem to choke
on it and complain that it could be used uninitialized.
Go ahead and initialize the variable at declaration time to silence
potential compiler warnings.
Fixes: c3d8e0b5de ("ceph: return the real size read when it hits EOF")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
opened_inodes is incremented twice when the same inode is opened twice
with O_RDONLY and O_WRONLY respectively.
To reproduce, run this python script, then check the metrics:
import os
for _ in range(10000):
fd_r = os.open('a', os.O_RDONLY)
fd_w = os.open('a', os.O_WRONLY)
os.close(fd_r)
os.close(fd_w)
Fixes: 1dd8d47081 ("ceph: metrics for opened files, pinned caps and opened inodes")
Signed-off-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGiaB4ACgkQiiy9cAdy
T1H4PAv+OG94BZe+MyMdXgoRbOLiEeye7dm/TZnlpdtV96WmyrMkA4/1nyOc8k7F
bNRn3ocuH3YCjjUB2kU6MZW5Kh9aSxULYgEC0zxuPcA9q4Ig4FehZD2U3r6fztFM
U5Do9n1xRXjIgS/gI7DfZHSQ+SYVwBMAzRB3opplcs1CW68N7+WmKa5CQ6wH5vZM
GqhU4+yFLhZGiXdxXTFF/bLKWer2PF9p5J4mLzRH0ugTG26tY7WnJcQIj+XrNTrN
ccBSXr+9fHFFa9iGcLY08pghk8s1F6dXl/BLv7DFswKOoje7HsDLcWpNSVtVDojO
0Fg5vVtuDKapCOrpmtPt8Khc4qgkxY6VpJyELMCwJuxrSzTE59C54gQcgJfZO97d
bpGeV9L6D6VTLTe1LhUFCRQbc0FFO3daPWRrxrZnvtJeef70xVSZPtrQqNtpnAtM
RwkN2Rf/Enl03w9U25nv3ymlKoHERiR2XZADLfWc5XdB40Lxc+fDccyoVP1wmxT8
uicPpySr
=bEt3
-----END PGP SIGNATURE-----
Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Five ksmbd server fixes, four of them for stable:
- memleak fix
- fix for default data stream on filesystems that don't support xattr
- error logging fix
- session setup fix
- minor doc cleanup"
* tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix memleak in get_file_stream_info()
ksmbd: contain default data stream even if xattr is empty
ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
docs: filesystem: cifs: ksmbd: Fix small layout issues
ksmbd: Fix an error handling path in 'smb2_sess_setup()'
NTFS_RW code allocates page size dependent arrays on the stack. This
results in build failures if the page size is 64k or larger.
fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
fs/ntfs/aops.c:1311:1: error:
the frame size of 2240 bytes is larger than 2048 bytes
Since commit f22969a660 ("powerpc/64s: Default to 64K pages for 64 bit
book3s") this affects ppc:allmodconfig builds, but other architectures
supporting page sizes of 64k or larger are also affected.
Increasing the maximum frame size for affected architectures just to
silence this error does not really help. The frame size would have to
be set to a really large value for 256k pages. Also, a large frame size
could potentially result in stack overruns in this code and elsewhere
and is therefore not desirable. Make NTFS_RW dependent on page sizes
smaller than 64k instead.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix buffer resource leak that could lead to livelock on corrupt fs.
- Remove unused function xfs_inew_wait to shut up the build robots.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGef+UACgkQ+H93GTRK
tOtAQQ/9EAzGgADQR2dcCXoZwRP3LKz5WGops0qCqywvH/BfbLmKmqzgfUXaf026
dMmrnP9+d6BFytoLk8IXmpydML8qK2/k8lmwJRUG8arPRbHQwVSckDM4vXVrI2X2
K4f8nu1CBR8MDavVS7cR8CZWO3XJMLKTZtxCTdOQlRAw+m9P1+S00LWkiuDTPTTX
YRAGkYEVCtCQLuqJgClf267/5+MaZXRJfFVh1hBQkUtPFXhu1LXfJqZ/thgacmlD
D6/tfwt6Ad7iKg4LjtoJC6zkvoTFN7rB39PGPGILWgS3Nimp0xWgKoTnJ6VLRblY
FwItA6zERXQoHRse7eGMfQ4ZnGT40pvIiN+JVZ4jju4hElY7dkigBfJv8oLkVm3D
xV2dA7YO4DcYS4UAifZ+C00T3pYo/rQnsIwfAGsbh28Xlshyi4+cEqGRTkqrJHnx
gkE4uzp0acs6HVTMC3S0pL6oTirPbHAQtt5tjS8ZtxALlYqF3+Y8xzQhMyB7Fo0b
is213My5aP6VSr2UZajxpkIOl+5OvQ4v37fhmBXMGKmiz7XCzatyvtUjEBjahFfF
FJeug4hRDom/uCKPM3eHfuZxGorlIB1GIeSe7+0ZIXcn+Wuo5seWfJrNP+1H/Asb
Vp6TNueGiCKLBD/+MLWp65zyIZu+b2gNEyYCuhwnOf2sQflhZmI=
=Fa6X
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Fixes for a resource leak and a build robot complaint about totally
dead code:
- Fix buffer resource leak that could lead to livelock on corrupt fs.
- Remove unused function xfs_inew_wait to shut up the build robots"
* tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove xfs_inew_wait
xfs: Fix the free logic of state in xfs_attr_node_hasname
- Fix an accounting problem where unaligned inline data reads can run
off the end of the read iomap iterator. iomap has historically
required that inline data mappings only exist at the end of a file,
though this wasn't documented anywhere.
- Document iomap_read_inline_data and change its return type to be
appropriate for the information that it's actually returning.
-----BEGIN PGP SIGNATURE-----
iQIyBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGehZoACgkQ+H93GTRK
tOu7ig/44n/PzNzFv0dLAadoanBH/d3uMDs/s9DBWw6s7RYU0wjUMHgGyla9vFgT
aw0xxLZfppvk59Gkme5WPiv/ksRwB9ZcWvEFUOMX/zt55uSNueCXCVpckduf9j29
gKUzFvRGVssBQ2ACHvuH/s6c6hF9EOhCHadREHinqemU3zg/+eH/+L+dgHIitzMg
WiGdWEaojQX8brxD4kH7xfsNUbuFwCNO5UbGtndzsK/5b/8QGXIXOUrJ0JbefoBf
Kscz4opZjfJIuGjczhIhollgV0jihMOH3OIJYfHQHVOUGVwQ/2epHo2cwq6Wujk0
3qXsjuYloR5xkyQwoLfr382BBO4teQW75nNUt+ez4tvwYs7Ck3U1oOFG+j3KiU/P
gsMcSPzgQIiFdU1DRR5r6li6daLJJWK34PeZ4DtE0zFUKwslSUKytv+pT99yNPTG
xkhvdU6R4jchUOPJCZCh8zdARhofTiaxrLPlZ0xKenqxlLxdMm+W0fCkuGFrLHSq
g39CGJhuPe7OexK8lBY9fQ08zwI7LJIx/vQGQ3hbqZaxq14uCZVr/nGwcXlzBhzl
BufZslO9Aj/eG1/MzwSN/xDTo3Jl+RuCs9lBgpg8pu6WNzueXbRMQnURy6Vf47Ua
U6Awq0OrGbippg4Y7P/IiyYDKjFWPjgftYRJhoosxxfbO00BSA==
=hIEG
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
"A single iomap bug fix and a cleanup for 5.16-rc2.
The bug fix changes how iomap deals with reading from an inline data
region -- whereas the current code (incorrectly) lets the iomap read
iter try for more bytes after reading the inline region (which zeroes
the rest of the page!) and hopes the next iteration terminates, we
surveyed the inlinedata implementations and realized that all
inlinedata implementations also require that the inlinedata region end
at EOF, so we can simply terminate the read.
The second patch documents these assumptions in the code so that
they're not subtle implications anymore, and cleans up some of the
grosser parts of that function.
Summary:
- Fix an accounting problem where unaligned inline data reads can run
off the end of the read iomap iterator. iomap has historically
required that inline data mappings only exist at the end of a file,
though this wasn't documented anywhere.
- Document iomap_read_inline_data and change its return type to be
appropriate for the information that it's actually returning"
* tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: iomap_read_inline_data cleanup
iomap: Fix inline extent handling in iomap_readpage
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGiPsAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpr6oD/9i1PrY0nhHbNdpEKC5ZB2TeX8inxvCQj5i
HmAm421s9umLjUXrRZQH6VMsSOWAS6viaXA9MdeWkxXSA9950Stx4Tr5LpujN/iJ
hZneHgX1V3kfGkIOWdstPE62QTKCoMDtBFx1Jk+XZfZ7N9ogWHlmvr3C7iD3QLId
+RODIIZlOZXYP5cYIUon9hK4ydMRjAh5jGik73ckN6gG+vsEWlsVZPlKLgqQ+AyT
CXkXHh5Ad5tQt1vqio+PH2Fk42Ce/+CeY0vhNS2ZUWDHwQEKSz8qIHz3kx+zaCCU
Y3e5CVe8MW64MOZIQPayEeNyposT0YNxTDgOMZxRi2J3IMeYFluJTm57zMNnnH/N
sgKZHZDItAfsgkptVkptoIEeg+2nus3GWmhAKVWu7zPaS8LfAVROt+OCx6+2CUvr
DHJwuGbkG3XR4vwO69ugTbjgRh97P3VJJfg4t6QZZNE9JDSAviHUrZDu9X53xHN5
hAsxg98QRz5BqvqPq99KIB6lq2zQ8LBHLZnnjEXhF3q031LpxCAhpVNBNt+LgQo2
MiGyx4lgznAl6P7/2uXaq5VxCLhWWHI5BtLdlGvJH3v8Uckci/JQKYoO65RNL4D8
gzdGnLQwJ80mDVlHcoN/9vUIYirru4+E+BiW6ua1/5MGmTAQvt6zw4Lo88MZMa5i
mKOxZxL30g==
=h6A3
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block
Pull more io_uring fixes from Jens Axboe:
"The locking fixup that was applied earlier this rc has both a deadlock
and IRQ safety issue, let's get that ironed out before -rc3. This
contains:
- Link traversal locking fix (Pavel)
- Cancelation fix (Pavel)
- Relocate cond_resched() for huge buffer chain freeing, avoiding a
softlockup warning (Ye)
- Fix timespec validation (Ye)"
* tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
io_uring: Fix undefined-behaviour in io_issue_sqe
io_uring: fix soft lockup when call __io_remove_buffers
io_uring: fix link traversal locking
io_uring: fail cancellation for EXITING tasks
Highlights include:
Stable fixes:
- NFSv42: Fix pagecache invalidation after COPY/CLONE
Bugfixes:
- NFSv42: Don't fail clone() just because the server failed to return
post-op attributes
- SUNRPC: use different lockdep keys for INET6 and LOCAL
- NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION
- SUNRPC: fix header include guard in trace header
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmGiTnMACgkQZwvnipYK
APJoJQ//VZYSCx/mGaTIj5oUwjBKE/n/9rz2EUGS1cfYjZjPpb5Xgm1tn+1A4c01
Ztu9/hKgwrDqknqkmtKvP1GsX5vYUqgfqAlc880Q2nXAqaLJBBZgB6BFMmTtcoQx
C24L0tgxlZVD9Vw0DJEVDVgDxXA/9VmdSQK6uptQRQhcYf4VtR1wAzELHWdkdkfq
1WrREeAwGWw1BaPTrlPn9XwW9qTaMlBH05XRHh6dM7gFmoIe3td7kq7BOnFxFsnA
AQZ/nCgeMTE04kQQMzYqUc4YqZvnzUHxueZ6q8s0K1RKJBpNIQNgdWUMa285Qo4a
JA9oBCPPo0JjmsEge2Km12zyBJoA7lLQDfc6UQJON50ADF0sTu3wszgGuC63KkhE
V+kUogK2WOlnGky2yYrHmv43mcCcyoJ/g+g+38GXNYGorsFi/XUhvctEpFFFPF71
0umQwWhA6Dhc52hMj5DN4nfspp//hEuV9o7/zJzlPi0elC+xtVaBWZCorNvznlXW
C/O5yobJVd89PuIE17Stg+c0Rq3k7RVPPoyS2IaMkZ5cs7DtT5Tz3nKClPAA8Bur
mPLAMkHSOSLO26cy30SVZCIx1JDMJU9dx/PkFemCexkzYXQxgp9px/8wmM2Xe6Oc
/hDqi8V7ayJUuYpYuJ6sA8oUqj3j+NkoP4w5HhlnmZDBhFMEBhM=
=jeHm
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- NFSv42: Fix pagecache invalidation after COPY/CLONE
Bugfixes:
- NFSv42: Don't fail clone() just because the server failed to return
post-op attributes
- SUNRPC: use different lockdep keys for INET6 and LOCAL
- NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION
- SUNRPC: fix header include guard in trace header"
* tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: use different lock keys for INET6 and LOCAL
sunrpc: fix header include guard in trace header
NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION
NFSv42: Fix pagecache invalidation after COPY/CLONE
NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
- Fix an ABBA deadlock introduced by XArray convention.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYaG1ZxEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBFQqAP0Zk3Nozb1BnXnQ/xGZR+liyTimNlujsWqQ
YtHIwB4EiwEAopRnuZRY00txMqzYZNz3sLEV6ylATbN0i/EJ2brOpA8=
=ZHsQ
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Fix an ABBA deadlock introduced by XArray conversion"
* tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix deadlock when shrink erofs slab
We got issue as follows:
================================================================================
UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
signed integer overflow:
-4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x170/0x1dc lib/dump_stack.c:118
ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
handle_overflow+0x188/0x1dc lib/ubsan.c:192
__ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
ktime_set include/linux/ktime.h:42 [inline]
timespec64_to_ktime include/linux/ktime.h:78 [inline]
io_timeout fs/io_uring.c:5153 [inline]
io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
__io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
io_submit_sqe fs/io_uring.c:6137 [inline]
io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
__do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
__se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
__arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
================================================================================
As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
negative value maybe lead to overflow.
To address this issue, we must check if 'sec' is negative.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYZ+ZHwAKCRDh3BK/laaZ
PFcKAQC/1E+WRv8MT7BmtS3FBnnuH0hSBJe5KTVL+sQI72NC2AD/eKeX3FKXpimt
C3hrTAEegduHSRNMCZn+4juHZGFRtAw=
=n755
-----END PGP SIGNATURE-----
Merge tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
"Fix a regression caused by a bugfix in the previous release. The
symptom is a VM_BUG_ON triggered from splice to the fuse device.
Unfortunately the original bugfix was already backported to a number
of stable releases, so this fix-fix will need to be backported as
well"
* tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: release pipe buf after last use
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmGg+PIACgkQxWXV+ddt
WDsWDBAAk//y15bs3SGQLPshFuFidcRmudQkXGE8431ftjWZtuz1UY046rVvC8I5
FkotiiDVrnuyg1kBU1k0vYGS0Fo687JHw2E+8abD2KCicF4MSNIAlc5D9M5B7jzp
GpSIPoZjs05H85kxcoy3BmdQR1DyjR/xOqvTe2IVswPQSj2B1qZKMCbU4927U+e8
Tu18kr7FTxpnzLhtt9Ahr8xVol6bcV/3zB0nC+O5hRbfg5gH87tpb7giLICwfipq
eYY8I361iYDKtDQlR/qvpmkUAfPO4ahYz1yumTxm0twuIv34PugcCP0oLtGCbWwU
71YflcuZa6L2vmXNK8cjXf/9Frg/7k0FlyepEAhjhbooqT92m9Sv4iCX5z9mrsqf
40aoLBXrrPCewa9j31Aw+JuUEjWC4G7U2v+TqJ3waHgUyfeDQUGhOGLmvQJqGyMd
SL4QhGz9aQGlLmUVkDlUemkmMtGBwn87sD1HCJkkNrHS0OWOHb12tpijOu7UAIp3
ih/nXtAayVZV5LS1hfxfYuXzpP8E6dXUnR2vWpvvDqihvfq6ubkt497yVN+vUFBw
6GEhvePwa/w0U1Kn4cZUZxFnalBAmeYcahBBL9ngrHLBD57IJL/yLcOue5nCgqqN
DvLKl8tJUDQTmayx1MjdpdPNckAbY3cp0OP8uEowL0rJjxJPyAE=
=/pKc
-----END PGP SIGNATURE-----
Merge tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix to the lzo code, a missing put_page causing memory leaks
when some error branches are taken"
* tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix the memory leak caused in lzo_compress_pages()
WARNING: inconsistent lock state
5.16.0-rc2-syzkaller #0 Not tainted
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
ffff888078e11418 (&ctx->timeout_lock
){?.+.}-{2:2}
, at: io_timeout_fn+0x6f/0x360 fs/io_uring.c:5943
{HARDIRQ-ON-W} state was registered at:
[...]
spin_unlock_irq include/linux/spinlock.h:399 [inline]
__io_poll_remove_one fs/io_uring.c:5669 [inline]
__io_poll_remove_one fs/io_uring.c:5654 [inline]
io_poll_remove_one+0x236/0x870 fs/io_uring.c:5680
io_poll_remove_all+0x1af/0x235 fs/io_uring.c:5709
io_ring_ctx_wait_and_kill+0x1cc/0x322 fs/io_uring.c:9534
io_uring_release+0x42/0x46 fs/io_uring.c:9554
__fput+0x286/0x9f0 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xc14/0x2b40 kernel/exit.c:832
674ee8e1b4 ("io_uring: correct link-list traversal locking") fixed a
data race but introduced a possible deadlock and inconsistentcy in irq
states. E.g.
io_poll_remove_all()
spin_lock_irq(timeout_lock)
io_poll_remove_one()
spin_lock/unlock_irq(poll_lock);
spin_unlock_irq(timeout_lock)
Another type of problem is freeing a request while holding
->timeout_lock, which may leads to a deadlock in
io_commit_cqring() -> io_flush_timeouts() and other places.
Having 3 nested locks is also too ugly. Add io_match_task_safe(), which
would briefly take and release timeout_lock for race prevention inside,
so the actuall request cancellation / free / etc. code doesn't have it
taken.
Reported-by: syzbot+ff49a3059d49b0ca0eec@syzkaller.appspotmail.com
Reported-by: syzbot+847f02ec20a6609a328b@syzkaller.appspotmail.com
Reported-by: syzbot+3368aadcd30425ceb53b@syzkaller.appspotmail.com
Reported-by: syzbot+51ce8887cdef77c9ac83@syzkaller.appspotmail.com
Reported-by: syzbot+3cb756a49d2f394a9ee3@syzkaller.appspotmail.com
Fixes: 674ee8e1b4 ("io_uring: correct link-list traversal locking")
Cc: stable@kernel.org # 5.15+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/397f7ebf3f4171f1abe41f708ac1ecb5766f0b68.1637937097.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
WARNING: CPU: 1 PID: 20 at fs/io_uring.c:6269 io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
CPU: 1 PID: 20 Comm: kworker/1:0 Not tainted 5.16.0-rc1-syzkaller #0
Workqueue: events io_fallback_req_func
RIP: 0010:io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
Call Trace:
<TASK>
io_req_task_link_timeout+0x6b/0x1e0 fs/io_uring.c:6886
io_fallback_req_func+0xf9/0x1ae fs/io_uring.c:1334
process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
kthread+0x405/0x4f0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
We need original task's context to do cancellations, so if it's dying
and the callback is executed in a fallback mode, fail the cancellation
attempt.
Fixes: 89b263f6d5 ("io_uring: run linked timeouts from task_work")
Cc: stable@kernel.org # 5.15+
Reported-by: syzbot+ab0cfe96c2b3cd1c1153@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c41c5f379c6941ad5a07cd48cb66ed62199cf7e.1637937097.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[BUG]
Fstests generic/027 is pretty easy to trigger a slow but steady memory
leak if run with "-o compress=lzo" mount option.
Normally one single run of generic/027 is enough to eat up at least 4G ram.
[CAUSE]
In commit d4088803f5 ("btrfs: subpage: make lzo_compress_pages()
compatible") we changed how @page_in is released.
But that refactoring makes @page_in only released after all pages being
compressed.
This leaves error path not releasing @page_in. And by "error path"
things like incompressible data will also be treated as an error
(-E2BIG).
Thus it can cause a memory leak if even nothing wrong happened.
[FIX]
Add check under @out label to release @page_in when needed, so when we
hit any error, the input page is properly released.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: d4088803f5 ("btrfs: subpage: make lzo_compress_pages() compatible")
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGfuAQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmBbD/9yy8RHZAFfPLhjuyVaUak6/KlLe39UEhqu
75tqBNyWkN/mBon34tIxoUJdDCw15GZXW3G+xDgHOP2+RgpsqZX97Oj7f1m/0L+Z
XpYHKc8j1GmNiXU22nWt7GGSAKFiGZtlW0ppe2BX75xVcqPCiPxge20Ev0wTZAcr
Vrm0r2XKZQ+WI1qLXoKIOxO+bNBzlzhZ/sV2W/6akz6lQIeztFqk6PhovERbBBwi
/UlyWJ4z+ZKQYjpomKA8f41IvVrh3MrQlILYFOHQFj83Wa1OrBM4mD8i1ZAJf907
wWEBgh0Em5GRLzQijXQo/vz3RNgklzX1sP4C5j6njlukgCklEOLhhunEFkk+qtps
Vbc+Hm+jAP2VJjHfMmsttMKCqlJYtR85j71dip+j3JXheaGirUxf97RDHWOFtCFO
+AFbVu7S7H2GI9LJb/WEr66EmbE5RlzCKSeyrBZb96Y7PYdfTwq8TgjzDPSZEm3B
tPvYQgXPXoiFjzlaCgN372YJJIfJhpVOhBTu0hJ5oyPW/Rsy/vGyrhOWEfoc+nGh
RY8ou7qOnnspVwa8PVknjk01BWTvSNgvqVp82WywqZHqQFxEj5YCeEwg2EVuHQt7
owXNO7dtJ+W1SNJDD3KfQx+cwNv8Alf1G02A4dl8xzTgETiHsng+R/t0IQGb7mrf
68hXzT+qSQ==
=y8h2
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.16-2021-11-25' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A locking fix for link traversal, and fixing up an outdated function
name in a comment"
* tag 'io_uring-5.16-2021-11-25' of git://git.kernel.dk/linux-block:
io_uring: correct link-list traversal locking
io_uring: fix missed comment from *task_file rename
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGfEPEACgkQiiy9cAdy
T1F7Ogv+J9/ODordrHKl5AQWwtFfcWAVCv2K9hJHoaYbj4fpQVzreyBwuSFqryxe
udO8msORq/KySIOMATvPkse8+ZZ+0ZMFF+n4I5kKMaoJL3UZHqjJopUL+zv61817
Qwyd8KCwnrskpZ+q8OfbHDOvKxHpBt/YVP6RdE6R9DBtBIY0zJB8dcD3srYkfDi+
yWjab3cCFoe2p9dzs9PG6eWmmDMM/AcO3m5WEI12hBE30NS+n8CACVFbjImC/ZVV
8SokX33X4CvCZWn2BGdJBE1OFb4Nviz6QChbRd18Z0JOGiWAtlBWclxykedvMGcn
FR/pUFF+7RhXFFK4lTkzJwqj34FoovTiS9X81fLo7lCCXTT0Ki0yp+mKtoNQAaBj
YTDoISjG5+ZQfes12zy1NwARzpKWEwFAR6A8mHlj3QUVP7wkClNlKqZSw9bzRDE7
yOd+4PoH8Hy0ZiprmgTnV92s7+HiD5vucFkzurvBM7hqQAC9LcW1CEix925jMOw0
71Oow+aE
=sMmj
-----END PGP SIGNATURE-----
Merge tag '5.16-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four small cifs/smb3 fixes:
- two multichannel fixes
- fix problem noted by kernel test robot
- update internal version number"
* tag '5.16-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
smb2: clarify rc initialization in smb2_reconnect
cifs: populate server_hostname for extra channels
cifs: nosharesock should be set on new server
- Fix compilation warnings on csky and sparc
- Rename multipage folios to large folios
- Rename AS_THP_SUPPORT and FS_THP_SUPPORT
- Add functions to zero portions of a folio
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmGem6wACgkQDpNsjXcp
gj7uvwgAjNqDWOVgwYU98daN6nKQQf5Vv35f0bzeKcKcHIOEWZ2+MUeXkI55h8TD
ss5L3O86sPtQmpKUQJJChZC4AhpIPRyjPA0JW6vYqXQd912M331WpGgFFyX5eI+3
OxfKLRULmopeWP1RjWmkWqlhYQHL5OLgAMC4VaBSfDHd1UMRf+F9JNm9qR7GCp9Q
Vb0qcmBMaQYt/K5sWRQyPUACVTF+27RLKAs+Om37NGekv1UqgOPMzi9nAyi9RjCi
rRY6oGupNgC+Y41jzlpaNoL71RPS92H769FBh/Fe4qu55VSPjfcN77qAnVhX5Ykn
4RhzZcEUoqlx9xG9xynk0mmbx2Bf4g==
=kvqM
-----END PGP SIGNATURE-----
Merge tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache
Pull folio fixes from Matthew Wilcox:
"In the course of preparing the folio changes for iomap for next merge
window, we discovered some problems that would be nice to address now:
- Renaming multi-page folios to large folios.
mapping_multi_page_folio_support() is just a little too long, so we
settled on mapping_large_folio_support(). That meant renaming, eg
folio_test_multi() to folio_test_large().
Rename AS_THP_SUPPORT to match
- I hadn't included folio wrappers for zero_user_segments(), etc.
Also, multi-page^W^W large folio support is now independent of
CONFIG_TRANSPARENT_HUGEPAGE, so machines with HIGHMEM always need
to fall back to the out-of-line zero_user_segments().
Remove FS_THP_SUPPORT to match
- The build bots finally got round to telling me that I missed a
couple of architectures when adding flush_dcache_folio(). Christoph
suggested that we just add linux/cacheflush.h and not rely on
asm-generic/cacheflush.h"
* tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache:
mm: Add functions to zero portions of a folio
fs: Rename AS_THP_SUPPORT and mapping_thp_support
fs: Remove FS_THP_SUPPORT
mm: Remove folio_test_single
mm: Rename folio_test_multi to folio_test_large
Add linux/cacheflush.h
Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.
This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.
Reported-by: Justin Forbes <jmforbes@linuxtx.org>
Fixes: 712a951025 ("fuse: fix page stealing")
Cc: <stable@vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If xattr is not supported like exfat or fat, ksmbd server doesn't
contain default data stream in FILE_STREAM_INFORMATION response. It will
cause ppt or doc file update issue if local filesystem is such as ones.
This patch move goto statement to contain it.
Fixes: 9f6323311c ("ksmbd: add default data stream name in FILE_STREAM_INFORMATION")
Cc: stable@vger.kernel.org # v5.15
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
While file transfer through windows client, This error flood message
happen. This flood message will cause performance degradation and
misunderstand server has problem.
Fixes: e294f78d34 ("ksmbd: allow PROTECTED_DACL_SECINFO and UNPROTECTED_DACL_SECINFO addition information in smb2 set info security")
Cc: stable@vger.kernel.org # v5.15
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
All the error handling paths of 'smb2_sess_setup()' end to 'out_err'.
All but the new error handling path added by the commit given in the Fixes
tag below.
Fix this error handling path and branch to 'out_err' as well.
Fixes: 0d994cd482 ("ksmbd: add buffer validation in session setup")
Cc: stable@vger.kernel.org # v5.15
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change iomap_read_inline_data to return 0 or an error code; this
simplifies the callers. Add a description.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: document the return value of iomap_read_inline_data explicitly]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
With the remove of xfs_dqrele_all_inodes, xfs_inew_wait and all the
infrastructure used to wake the XFS_INEW bit waitqueue is unused.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 777eb1fa85 ("xfs: remove xfs_dqrele_all_inodes")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.
The deadlock as below:
[983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080
[ 983.923405] Call Trace:
[ 983.923410] __schedule+0x2c4/0x700
[ 983.923412] schedule+0x37/0xa0
[ 983.923414] schedule_timeout+0x274/0x300
[ 983.923416] __down+0x9b/0xf0
[ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[ 983.923453] down+0x3b/0x50
[ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs]
[ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs]
[ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs]
[ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs]
[ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[ 983.923624] ? kmem_cache_alloc+0x12e/0x270
[ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs]
[ 983.923664] xfs_attr_set+0x273/0x320 [xfs]
[ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs]
[ 983.923686] __vfs_removexattr+0x4d/0x60
[ 983.923688] __vfs_removexattr_locked+0xac/0x130
[ 983.923689] vfs_removexattr+0x4e/0xf0
[ 983.923690] removexattr+0x4d/0x80
[ 983.923693] ? __check_object_size+0xa8/0x16b
[ 983.923695] ? strncpy_from_user+0x47/0x1a0
[ 983.923696] ? getname_flags+0x6a/0x1e0
[ 983.923697] ? _cond_resched+0x15/0x30
[ 983.923699] ? __sb_start_write+0x1e/0x70
[ 983.923700] ? mnt_want_write+0x28/0x50
[ 983.923701] path_removexattr+0x9b/0xb0
[ 983.923702] __x64_sys_removexattr+0x17/0x20
[ 983.923704] do_syscall_64+0x5b/0x1a0
[ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 983.923707] RIP: 0033:0x7f080f10ee1b
When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.
Then subsequent removexattr will hang because of it.
This bug was introduced by kernel commit 07120f1abd ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.
Fix this bug by moving the step of free state into caller.
Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.
Fixes: 07120f1abd ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
It is clearer to initialize rc at the beginning of the function.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recently, a new field got added to the smb3_fs_context struct
named server_hostname. While creating extra channels, pick up
this field from primary channel.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recent fix to maintain a nosharesock state on the
server struct caused a regression. It updated this
field in the old tcp session, and not the new one.
This caused the multichannel scenario to misbehave.
Fixes: c9f1c19cf7 (cifs: nosharesock should not share socket with future sessions)
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
We observed the following deadlock in the stress test under low
memory scenario:
Thread A Thread B
- erofs_shrink_scan
- erofs_try_to_release_workgroup
- erofs_workgroup_try_to_freeze -- A
- z_erofs_do_read_page
- z_erofs_collection_begin
- z_erofs_register_collection
- erofs_insert_workgroup
- xa_lock(&sbi->managed_pslots) -- B
- erofs_workgroup_get
- erofs_wait_on_workgroup_freezed -- A
- xa_erase
- xa_lock(&sbi->managed_pslots) -- B
To fix this, it needs to hold xa_lock before freezing the workgroup
since xarray will be touched then. So let's hold the lock before
accessing each workgroup, just like what we did with the radix tree
before.
[ Gao Xiang: Jianhua Hao also reports this issue at
https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ]
Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com
Fixes: 64094a0441 ("erofs: convert workstn to XArray")
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Reported-by: Jianhua Hao <haojianhua1@xiaomi.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
Before commit 740499c784 ("iomap: fix the iomap_readpage_actor return
value for inline data"), when hitting an IOMAP_INLINE extent,
iomap_readpage_actor would report having read the entire page. Since
then, it only reports having read the inline data (iomap->length).
This will force iomap_readpage into another iteration, and the
filesystem will report an unaligned hole after the IOMAP_INLINE extent.
But iomap_readpage_actor (now iomap_readpage_iter) isn't prepared to
deal with unaligned extents, it will get things wrong on filesystems
with a block size smaller than the page size, and we'll eventually run
into the following warning in iomap_iter_advance:
WARN_ON_ONCE(iter->processed > iomap_length(iter));
Fix that by changing iomap_readpage_iter to return 0 when hitting an
inline extent; this will cause iomap_iter to stop immediately.
To fix readahead as well, change iomap_readahead_iter to pass on
iomap_readpage_iter return values less than or equal to zero.
Fixes: 740499c784 ("iomap: fix the iomap_readpage_actor return value for inline data")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Merge misc fixes from Andrew Morton:
"15 patches.
Subsystems affected by this patch series: ipc, hexagon, mm (swap,
slab-generic, kmemleak, hugetlb, kasan, damon, and highmem), and proc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
proc/vmcore: fix clearing user buffer by properly using clear_user()
kmap_local: don't assume kmap PTEs are linear arrays in memory
mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
kasan: test: silence intentional read overflow warnings
hugetlb, userfaultfd: fix reservation restore on userfaultfd error
hugetlb: fix hugetlb cgroup refcounting during mremap
mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
hexagon: ignore vmlinux.lds
hexagon: clean up timer-regs.h
hexagon: export raw I/O routines for modules
mm: emit the "free" trace report before freeing memory in kmem_cache_free()
shm: extend forced shm destroy to support objects from several IPC nses
ipc: WARN if trying to remove ipc object which is absent
mm/swap.c:put_pages_list(): reinitialise the page list
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGYctEQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqXjD/9Ef3i/OXjvUOOG19gEk4B1rr0youVioJg9
0HYmXYGDLHZZd01mBdOCrbvadYaJRw3SG+whzNyayekPhs38kFHTqBNHcXOGphm1
4snuDVFz0kElDswRjnh/q9QoJNDGwDidy6L+KHXMbHhTPuXALXshc+6U3GGWJ0Gg
5EEKwfJGRkIzJJ9fL9d9GbyAFMLq8xXr1pf9LdNJEZL2RaBkh4gI7Uu0q6vYGh88
N1iz2LdF4D4uopB6GkT6Eup/5iKakGbv2M2edcHpICdG/3EKn3Q8pnUajxVvXkV/
kOR4zRy2on7xYKIVZGKvq6/8e7Wde9yfBGVQ1D/45uzKIeTiDrP0MWo8ldP5vkbU
yTc3/BiLW5Nkk67RXs3Llg5QDX+n69BHNtXepO8W6DBMdLRFqZSiM0Y1Xrba/+yU
4TJif/wlArErjUUsWuWlnSzKrn1CyRLxxmewXfhxgpy12d4pTQsDLMKs7CojCmoF
t265dmvzXeNtAymuS/WSk0GlnobEO+wvfVUzDUQir6PukDSvz1vw5OIdsMaf2dmX
QDrcgVnGIjNQfLQVoX8FF0u52HcElP1+iikcs/XCNj67f9qImiBYsUYqDtfW6yFl
56IKkq4akSWPgA/qnV9oJ/Cf/AV8WI7rXV32hivo41nXaclrJouRfrarffTRzV9n
gHP0k9W6Cg==
=IqIF
-----END PGP SIGNATURE-----
Merge tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Flip a cap check to avoid a selinux error (Alistair)
- Fix for a regression this merge window where we can miss a queue ref
put (me)
- Un-mark pstore-blk as broken, as the condition that triggered that
change has been rectified (Kees)
- Queue quiesce and sync fixes (Ming)
- FUA insertion fix (Ming)
- blk-cgroup error path put fix (Yu)
* tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block:
blk-mq: don't insert FUA request with data into scheduler queue
blk-cgroup: fix missing put device in error path from blkg_conf_pref()
block: avoid to quiesce queue in elevator_init_mq
Revert "mark pstore-blk as broken"
blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release()
block: fix missing queue put in error path
block: Check ADMIN before NICE for IOPRIO_CLASS_RT
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGYKGIACgkQiiy9cAdy
T1ED5wwAtDe73G0fLjajrlVaOfBgHVqRWg1QkGxrwe0Pk8BXJkrrfze5kizD7B5D
xgVDbAWmF3wfkoGukggpjrLpwe/36F/xdHHIATRH2xG7zielDSab/RHPZQx4xPJQ
Qz/F7f5N8cSvDYX5HdTYncAtF3yV6MM48n9N6fBoKTL43mDWK7EI90KM5EkL2Mdc
DT1wYzTpuNoR1qY4oBIftV8mau6DAVtE/GIdpijzIbCf07xADvaM62QmA1qFLCFT
zlya3RmgyTS2UtCV9pKnbzZ2o1rm7J/C6YWvqrggH24Fu7V4nGTAx5yY/X2Zm3iu
uyoMTT1uvaiaihFCVUYN2e4jhYX/SuWA2Q0WczHAcx0LFXWWsIe5pOl78i9aQkU3
UQhTk4G1HNd35CEtR53HEil8wt674p2D/kJpZ5VL6OIg7H1jgwG6UoterNEbEUVT
qFvKCbePXyt8kDdgk9DqsAQQZTkJKMeiuk+hwVUngDRd/jsp7N7p4ISdBQNzNVG3
JVtinWzM
=IGYd
-----END PGP SIGNATURE-----
Merge tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small cifs/smb3 fixes: two to address minor coverity issues and
one cleanup"
* tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: introduce cifs_ses_mark_for_reconnect() helper
cifs: protect srv_count with cifs_tcp_ses_lock
cifs: move debug print out of spinlock
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f51 ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmGWiSAACgkQxWXV+ddt
WDtKiA//VFrxg5I1yrTyyVvc2RqcPg0aCopO6wIAgcHV1yzseJ7AyP7two1p5dg8
3DPDKaXFvONZYXl8j9ZuzFiryKPGJxp1KSagKyt6EKDRYm50HOreTC1Qt2ZvLJHn
wHohwHX96yv+4gyKvpCBZVpp3dSIDbsbCxlpz3mm7kZv//wHxA5l0chZpHbTqUF6
JloRSrOIGlSeQYPog1Lnu1c92qoGzLL5n47aXS3s5afpkqqkOlKZLsyb90N4uJx4
M1htsl4ga7b3OB8jbR95wlbd/qXsB+dvaBUQHgDm4hafW6ma5ft9NhuePQnQlaVH
ub/rlfNTsKl6jly9eNJ6wGpqi/OBlhA4qCmQVbVDE+HhWUJbdUiQ5UgxoOrQlkOP
Pd3NvW+95qg+Lj/egUA/Mrtz1v/6oSKcf3gQVKMNIrnk6lOUVZWtQhBe5YS3qHih
PzxrCp4ThlvmVeemHS7783akiwkI49wUn7a6dUD87x81ghemUHJzC83/mgs1rl/0
7Q1QLetgfrZpko3W4GzS2J3WwKTB0tvBXxsZ8gU5gI0FNkx90bR8+xI0fVF8IGJo
QglHn9gepb6si7BCxyKDTlQNMt23s7GFH5/4hHtkomtlR6vpRbPJAq5mpOrqsLgJ
VGc/SwCJPSmynqRAxuCn+DqlfaMZZaqtvgVVWnhJl9ylKyUAQKU=
=ze0L
-----END PGP SIGNATURE-----
Merge tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several xes and one old ioctl deprecation. Namely there's fix for
crashes/warnings with lzo compression that was suspected to be caused
by first pull merge resolution, but it was a different bug.
Summary:
- regression fix for a crash in lzo due to missing boundary checks of
the page array
- fix crashes on ARM64 due to missing barriers when synchronizing
status bits between work queues
- silence lockdep when reading chunk tree during mount
- fix false positive warning in integrity checker on devices with
disabled write caching
- fix signedness of bitfields in scrub
- start deprecation of balance v1 ioctl"
* tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: deprecate BTRFS_IOC_BALANCE ioctl
btrfs: make 1-bit bit-fields of scrub_page unsigned int
btrfs: check-integrity: fix a warning on write caching disabled disk
btrfs: silence lockdep when reading chunk tree during mount
btrfs: fix memory ordering between normal and ordered work functions
btrfs: fix a out-of-bound access in copy_compressed_data_to_page()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmGWL+AACgkQnJ2qBz9k
QNnK3ggAtJLlaCwg4TxyIIGEWBP/KBKs2STLf80VDavhxZuplKKZ0o5/9IPZ2uu9
v9SgME4hnGpUNaECttVWqijT823SiIxA/4uFSBnN1hLlvPj2Wkwl0oRWb6vCFzZ5
OBUiZrycZKxEq78p2Ao8+FxeRsJ9RuXHnPkZULAqizkEue23d+gW33cO0x//UwB0
lB6i32yDqCmOAtLaT1Div5ttJ3oM/hGBbOURkh3YB7jV2MxojHX28ZjDv657S30k
lhpQwvmHF70jM7MPWbxtcbSTBdPKQdaF5+oDqpAFwDQn3nhEk1sjw6IrCUKr2NAK
Z+YFlRgv5eD+loRPEBVVU/6QrmfSEQ==
=ueAF
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
"A fix for a long-standing UDF bug where we were not properly
validating directory position inside readdir"
* tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix crash after seekdir
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYZZM2AAKCRCRxhvAZXjc
onQYAP4oKQBguYvFThF4H4VnJY1xoS/33rNOqwPumubdj/8P2AD9EFOegFKYBFr7
tCpokujZl3jZTOqV6h32JDQkPZB0kAc=
=AqiS
-----END PGP SIGNATURE-----
Merge tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull setattr idmapping fix from Christian Brauner:
"This contains a simple fix for setattr. When determining the validity
of the attributes the ia_{g,u}id fields contain the value that will be
written to inode->i_{g,u}id. When the {g,u}id attribute of the file
isn't altered and the caller's fs{g,u}id matches the current {g,u}id
attribute the attribute change is allowed.
The value in ia_{g,u}id does already account for idmapped mounts and
will have taken the relevant idmapping into account. So in order to
verify that the {g,u}id attribute isn't changed we simple need to
compare the ia_{g,u}id value against the inode's i_{g,u}id value.
This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has
a meaning when circular idmappings are used, i.e. mappings where e.g.
id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such
ciruclar mappings can e.g. be useful when sharing the same home
directory between multiple users at the same time.
Before this patch we could end up denying legitimate attribute changes
and allowing invalid attribute changes when circular mappings are
used. To even get into this situation the caller must've been
privileged both to create that mapping and to create that idmapped
mount.
This hasn't been seen in the wild anywhere but came up when expanding
the fstest suite during work on a series of hardening patches. All
idmapped fstests pass without any regressions and we're adding new
tests to verify the behavior of circular mappings.
The new tests can be found at [1]"
Link: https://lore.kernel.org/linux-fsdevel/20211109145713.1868404-2-brauner@kernel.org [1]
* tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: handle circular mappings correctly
* The current iomap_file_buffered_write behavior of failing the entire
write when part of the user buffer cannot be faulted in leads to an
endless loop in gfs2. Work around that in gfs2 for now.
* Various other bugs all over the place.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmGVjogUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTojHA/7BvjlQKdDG84pbgFwWa2JBHgkfB5u
ayd4gBR9mq3WwOzMO7Wp9dUmUgDYjjj2ZJZkExHYgla+52ScRJAbcTk0VkNpSgy6
pOAZmPpTjmj4KGUghAZuUUqm55S5lpB6A2VciscW0GFGWE7kEOOcs/Gw5heSTpHW
zSLM2cQZ303kqd34KE747DchG6ruwlzsRms8qxwaV6Qe5ZyYx6xSBoL7bJ+VVqdF
Ujn3KEpFd5+aRkffwyypltkIsMh1yQceOCoTOp5G3U8zwl+bT8hhpY6qMvtMtT90
Pl5A7abpM7kGzdbdi3z+RtJK3DwmDgtfFrP6z4LTa4QJtUE8Dn7akIPH5v2E1ajI
vpQVepIS+ryn614nqCyXgGed975gFqNNgZheqiZq2+mQMsAO/fxxI4jMYXpB2Hyi
k4PoFm7K+Gcn8ARkuhlbZiZoo8CldmorZz7lVthyb3hNNzg2EzMK/K14KAz7F7W9
ndcXLnMZdxS3R4wlGVKtfLi/E9+VsM0MPTsfvsJtubq/lfu5d2G47FoQvsYFRRDW
icCzMFC+MshFS+GurSYHGni3x11C6M6wsL7m+YvkPLkDC4Jn4Ok9mdf77DiaHy2L
znc0w32FHSQHQZuYN+Fakr8K+W5k0XGkHeJpVzRglkQf2qH0hClqXnzmgbt60ECu
2rxg2KuSsEiYtrU=
=/Vqx
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- The current iomap_file_buffered_write behavior of failing the entire
write when part of the user buffer cannot be faulted in leads to an
endless loop in gfs2. Work around that in gfs2 for now.
- Various other bugs all over the place.
* tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Prevent endless loops in gfs2_file_buffered_write
gfs2: Fix "Introduce flag for glock holder auto-demotion"
gfs2: Fix length of holes reported at end-of-file
gfs2: release iopen glock early in evict
gfs2: Fix atomic bug in gfs2_instantiate
gfs2: Only dereference i->iov when iter_is_iovec(i)
When the client receives ERR_NOSPC on reply to CREATE_SESSION
it leads to a client hanging in nfs_wait_client_init_complete().
Instead, complete and fail the client initiation with an EIO
error which allows for the mount command to fail instead of
hanging.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The mechanism in use to allow the client to see the results of COPY/CLONE
is to drop those pages from the pagecache. This forces the client to read
those pages once more from the server. However, truncate_pagecache_range()
zeros out partial pages instead of dropping them. Let us instead use
invalidate_inode_pages2_range() with full-page offsets to ensure the client
properly sees the results of COPY/CLONE operations.
Cc: <stable@vger.kernel.org> # v4.7+
Fixes: 2e72448b07 ("NFS: Add COPY nfs operation")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This provides some insight into the client's invalidation behavior to show
both when the client uses the helper, and the results of calling the
helper which can vary depending on how the helper is called.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The failure to retrieve post-op attributes has no bearing on whether or
not the clone operation itself was successful. We must therefore ignore
the return value of decode_getfattr() when looking at the success or
failure of nfs4_xdr_dec_clone().
Fixes: 36022770de ("nfs42: add CLONE xdr functions")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmGS3msVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+wCMQAIm7hCuZ7bNtJwdgabv/z3u9Cgre
2lBuFN5edrymgahKERBnA5bZCQGEnG/yVG6w69nB8LnqWphN2caG5ln17kSjfFCO
W/J7FBH9u7662GQhmxqZyrNVm/Td0vgyKH0uh2RTiaitN0JrPg+4gjAWOPUPq53I
lYVgm20Aj3LkH83MEwwp6K2u3pqJ5y+pqfDv6ROX6/HkPV+7yczleWLafB/EYQrs
zX4vSyrR7aLjJ5ZEFz4rokcsemq1iI4eqBr6fiwSZwIDbRBwPFdIlQTwuww4PGSW
ingM4y/RU3okUXV5exchex7ffzmPi8IvkTBOdn0RicHRcbm9f0Rky6wXiASJLTqu
QURh+rsvupfrHnLQ/b1bJtOrSJCdJXdidw8bA7vrpsmpatImnS+u+iWO9RkesL+g
sVdQJV+0ZmOtyLTvw6xpRtXXcpMaJvUksmtiHvySBZot9waX03X/h7TFEmtx+P3E
k0znywn9Ebu5d7X8vBwwDqq9f7Xe7pzo7zALkMeC1qXULIzCzYmTuvSTm870Cz9S
JPh0ojuYJvlvoKNkjYfRKRyE28VLZe/hEwtVWL+kgQD8zR8gPlQNpAcBRJ++Ett+
fwjffdAQ/rfb43W2T5AhoSK1173pu0AWSVEH9h1h604HDvf3iCCw0Wew3JsnJYCS
+tcj/6NetJG64aHe
=beJ8
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"This is just one bugfix for a buffer overflow in knfsd's xdr decoding"
* tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux:
NFSD: Fix exposure in nfsd4_decode_bitmap()
Instead of setting a bit in the fs_flags to set a bit in the
address_space, set the bit in the address_space directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
When calling setattr_prepare() to determine the validity of the attributes the
ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id.
When the {g,u}id attribute of the file isn't altered and the caller's fs{g,u}id
matches the current {g,u}id attribute the attribute change is allowed.
The value in ia_{g,u}id does already account for idmapped mounts and will have
taken the relevant idmapping into account. So in order to verify that the
{g,u}id attribute isn't changed we simple need to compare the ia_{g,u}id value
against the inode's i_{g,u}id value.
This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has a meaning
when circular idmappings are used, i.e. mappings where e.g. id 1000 is mapped
to id 1001 and id 1001 is mapped to id 1000. Such ciruclar mappings can e.g. be
useful when sharing the same home directory between multiple users at the same
time.
As an example consider a directory with two files: /source/file1 owned by
{g,u}id 1000 and /source/file2 owned by {g,u}id 1001. Assume we create an
idmapped mount at /target with an idmapping that maps files owned by {g,u}id
1000 to being owned by {g,u}id 1001 and files owned by {g,u}id 1001 to being
owned by {g,u}id 1000. In effect, the idmapped mount at /target switches the
ownership of /source/file1 and source/file2, i.e. /target/file1 will be owned
by {g,u}id 1001 and /target/file2 will be owned by {g,u}id 1000.
This means that a user with fs{g,u}id 1000 must be allowed to setattr
/target/file2 from {g,u}id 1000 to {g,u}id 1000. Similar, a user with fs{g,u}id
1001 must be allowed to setattr /target/file1 from {g,u}id 1001 to {g,u}id
1001. Conversely, a user with fs{g,u}id 1000 must fail to setattr /target/file1
from {g,u}id 1001 to {g,u}id 1000. And a user with fs{g,u}id 1001 must fail to
setattr /target/file2 from {g,u}id 1000 to {g,u}id 1000. Both cases must fail
with EPERM for non-capable callers.
Before this patch we could end up denying legitimate attribute changes and
allowing invalid attribute changes when circular mappings are used. To even get
into this situation the caller must've been privileged both to create that
mapping and to create that idmapped mount.
This hasn't been seen in the wild anywhere but came up when expanding the
testsuite during work on a series of hardening patches. All idmapped fstests
pass without any regressions and we add new tests to verify the behavior of
circular mappings.
Link: https://lore.kernel.org/r/20211109145713.1868404-1-brauner@kernel.org
Fixes: 2f221d6f7b ("attr: handle idmapped mounts")
Cc: Seth Forshee <seth.forshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Fix comment referring to function "io_uring_del_task_file()", now called
"io_uring_del_tctx_node()".
Fixes: eef51daa72 ("io_uring: rename function *task_file")
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Link: https://lore.kernel.org/r/20211116175530.31608-1-kamal@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit d07f3b081e.
pstore-blk was fixed to avoid the unwanted APIs in commit 7bb9557b48
("pstore/blk: Use the normal block device I/O path"), which landed in
the same release as the commit adding BROKEN.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211116181559.3975566-1-keescook@chromium.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use new cifs_ses_mark_for_reconnect() helper to mark all session
channels for reconnect instead of duplicating it in different places.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Updates to the srv_count field are protected elsewhere
with the cifs_tcp_ses_lock spinlock. Add one missing place
(cifs_get_tcp_sesion).
CC: Shyam Prasad N <sprasad@microsoft.com>
Addresses-Coverity: 1494149 ("Data Race Condition")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
It is better to print debug messages outside of the chan_lock
spinlock where possible.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Addresses-Coverity: 1493854 ("Thread deadlock")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
The v2 balance ioctl has been introduced more than 9 years ago. Users of
the old v1 ioctl should have long been migrated to it. It's time we
deprecate it and eventually remove it.
The only known user is in btrfs-progs that tries v1 as a fallback in
case v2 is not supported. This is not necessary anymore.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The bitfields have_csum and io_error are currently signed which is not
recommended as the representation is an implementation defined
behaviour. Fix this by making the bit-fields unsigned ints.
Fixes: 2c36395430 ("btrfs: scrub: remove the anonymous structure from scrub_page")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When a disk has write caching disabled, we skip submission of a bio with
flush and sync requests before writing the superblock, since it's not
needed. However when the integrity checker is enabled, this results in
reports that there are metadata blocks referred by a superblock that
were not properly flushed. So don't skip the bio submission only when
the integrity checker is enabled for the sake of simplicity, since this
is a debug tool and not meant for use in non-debug builds.
fstests/btrfs/220 trigger a check-integrity warning like the following
when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
------------[ cut here ]------------
WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
FS: 00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
Call Trace:
btrfsic_process_written_block+0x2f7/0x850 [btrfs]
__btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
? bio_associate_blkg_from_css+0xa4/0x2c0
btrfsic_submit_bio+0x18/0x30 [btrfs]
write_dev_supers+0x81/0x2a0 [btrfs]
? find_get_pages_range_tag+0x219/0x280
? pagevec_lookup_range_tag+0x24/0x30
? __filemap_fdatawait_range+0x6d/0xf0
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
? find_first_extent_bit+0x9b/0x160 [btrfs]
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
write_all_supers+0x1b3/0xa70 [btrfs]
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
btrfs_commit_transaction+0x59d/0xac0 [btrfs]
close_ctree+0x11d/0x339 [btrfs]
generic_shutdown_super+0x71/0x110
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
exit_to_user_mode_prepare+0x1f0/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f009f711dfb
RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
---[ end trace 2c4b82abcef9eec4 ]---
S-65536(sdb2/65536/1)
-->
M-1064960(sdb2/1064960/1)
Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Ordered work functions aren't guaranteed to be handled by the same thread
which executed the normal work functions. The only way execution between
normal/ordered functions is synchronized is via the WORK_DONE_BIT,
unfortunately the used bitops don't guarantee any ordering whatsoever.
This manifested as seemingly inexplicable crashes on ARM64, where
async_chunk::inode is seen as non-null in async_cow_submit which causes
submit_compressed_extents to be called and crash occurs because
async_chunk::inode suddenly became NULL. The call trace was similar to:
pc : submit_compressed_extents+0x38/0x3d0
lr : async_cow_submit+0x50/0xd0
sp : ffff800015d4bc20
<registers omitted for brevity>
Call trace:
submit_compressed_extents+0x38/0x3d0
async_cow_submit+0x50/0xd0
run_ordered_work+0xc8/0x280
btrfs_work_helper+0x98/0x250
process_one_work+0x1f0/0x4ac
worker_thread+0x188/0x504
kthread+0x110/0x114
ret_from_fork+0x10/0x18
Fix this by adding respective barrier calls which ensure that all
accesses preceding setting of WORK_DONE_BIT are strictly ordered before
setting the flag. At the same time add a read barrier after reading of
WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads
would be strictly ordered after reading the bit. This in turn ensures
are all accesses before WORK_DONE_BIT are going to be strictly ordered
before any access that can occur in ordered_func.
Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: 08a9ff3264 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue")
CC: stable@vger.kernel.org # 4.4+
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following script can cause btrfs to crash:
$ mount -o compress-force=lzo $DEV /mnt
$ dd if=/dev/urandom of=/mnt/foo bs=4k count=1
$ sync
The call trace looks like this:
general protection fault, probably for non-canonical address 0xe04b37fccce3b000: 0000 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 164 Comm: kworker/u20:3 Not tainted 5.15.0-rc7-custom+ #4
Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
RIP: 0010:__memcpy+0x12/0x20
Call Trace:
lzo_compress_pages+0x236/0x540 [btrfs]
btrfs_compress_pages+0xaa/0xf0 [btrfs]
compress_file_range+0x431/0x8e0 [btrfs]
async_cow_start+0x12/0x30 [btrfs]
btrfs_work_helper+0xf6/0x3e0 [btrfs]
process_one_work+0x294/0x5d0
worker_thread+0x55/0x3c0
kthread+0x140/0x170
ret_from_fork+0x22/0x30
---[ end trace 63c3c0f131e61982 ]---
[CAUSE]
In lzo_compress_pages(), parameter @out_pages is not only an output
parameter (for the number of compressed pages), but also an input
parameter, as the upper limit of compressed pages we can utilize.
In commit d4088803f5 ("btrfs: subpage: make lzo_compress_pages()
compatible"), the refactoring doesn't take @out_pages as an input, thus
completely ignoring the limit.
And for compress-force case, we could hit incompressible data that
compressed size would go beyond the page limit, and cause the above
crash.
[FIX]
Save @out_pages as @max_nr_page, and pass it to lzo_compress_pages(),
and check if we're beyond the limit before accessing the pages.
Note: this also fixes crash on 32bit architectures that was suspected to
be caused by merge of btrfs patches to 5.16-rc1. Reported in
https://lore.kernel.org/all/20211104115001.GU20319@twin.jikos.cz/ .
Reported-by: Omar Sandoval <osandov@fb.com>
Fixes: d4088803f5 ("btrfs: subpage: make lzo_compress_pages() compatible")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
rtm@csail.mit.edu reports:
> nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
> directs it to do so. This can cause nfsd4_decode_state_protect4_a()
> to write client-supplied data beyond the end of
> nfsd4_exchange_id.spo_must_allow[] when called by
> nfsd4_decode_exchange_id().
Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
@bmlen.
Reported by: rtm@csail.mit.edu
Fixes: d1c263a031 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* Clean up open-coded swap() calls.
* A little bit of #ifdef golf to complete the reunification of the
kernel and userspace libxfs source code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGNT1IACgkQ+H93GTRK
tOucKA//Qk2NX3QBm/8pCrFE5V+eqooPANhZmzeviJCN/6++jcNOy0f+YK6JXVRC
U2WdotHFH5fF6lsDkzNtMPHZ8JMZmOfEiPx5CGFiWT5iUW7FbLkROHm7GFtbwMoH
qm3Lt7PbdSzJqTuOTvaGCw1xWkjDXMLdsdFM7mx3JO5zT9a/fCqjjmyR2Kl0qcSP
RzfruVe20wUka2BeaXfZzSasgfLswratkU4xsiNiwA37yQaldzhrg8fg6uP3OSYi
dkWFXi6WdWwQzARnjWNPwigUwA3xVaYgV+I6+ME0DYsUBvywZzUg3pkowhRAHyA9
kv86L5Zt5K7kQcVqyd+lIvIuAcGrOZ9hA18PIXnwahLBqmjcqAJoF9XhTTZDMD4J
LfujGMrf7DSDcf0vH8G9wlQQthsPGUOoFia5rr8MhdVVNee/b1Qvwsh7kmyg0DOK
9WuNQxGPd7s+X+kwdmGrK7E6fqyPwEfC43l8wtCiBIyGz6QcorwD7kH9DGzv5xGF
NX7WQeKvcaoXn1XVfonb0YgdVOnbyqK4AiY3Po1Ood3IxGyiLGCgDnusvYu+C9/r
T0rRMbljkX1lUKqfzGkg2egOKPR+8RFgFKrKNSXUkDxl8TFLRd3ZObowPPlohq1I
9lIIirip5UFYRv+7srDU1oZPWkvwkpJmaMFgagD3w+OWdo6zwao=
=wsFu
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs cleanups from Darrick Wong:
"The most 'exciting' aspect of this branch is that the xfsprogs
maintainer and I have worked through the last of the code
discrepancies between kernel and userspace libxfs such that there are
no code differences between the two except for #includes.
IOWs, diff suffices to demonstrate that the userspace tools behave the
same as the kernel, and kernel-only bits are clearly marked in the
/kernel/ source code instead of just the userspace source.
Summary:
- Clean up open-coded swap() calls.
- A little bit of #ifdef golf to complete the reunification of the
kernel and userspace libxfs source code"
* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: sync xfs_btree_split macros with userspace libxfs
xfs: #ifdef out perag code for userspace
xfs: use swap() to make dabtree code cleaner
This PR includes 5 commits that update the zstd library version:
1. Adds a new kernel-style wrapper around zstd. This wrapper API
is functionally equivalent to the subset of the current zstd API that is
currently used. The wrapper API changes to be kernel style so that the symbols
don't collide with zstd's symbols. The update to zstd-1.4.10 maintains the same
API and preserves the semantics, so that none of the callers need to be
updated. All callers are updated in the commit, because there are zero
functional changes.
2. Adds an indirection for `lib/decompress_unzstd.c` so it
doesn't depend on the layout of `lib/zstd/` to include every source file.
This allows the next patch to be automatically generated.
3. Imports the zstd-1.4.10 source code. This commit is automatically generated
from upstream zstd (https://github.com/facebook/zstd).
4. Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
5. Fixes a newly added build warning for clang.
The discussion around this patchset has been pretty long, so I've included a
FAQ-style summary of the history of the patchset, and why we are taking this
approach.
Why do we need to update?
-------------------------
The zstd version in the kernel is based off of zstd-1.3.1, which is was released
August 20, 2017. Since then zstd has seen many bug fixes and performance
improvements. And, importantly, upstream zstd is continuously fuzzed by OSS-Fuzz,
and bug fixes aren't backported to older versions. So the only way to sanely get
these fixes is to keep up to date with upstream zstd. There are no known security
issues that affect the kernel, but we need to be able to update in case there
are. And while there are no known security issues, there are relevant bug fixes.
For example the problem with large kernel decompression has been fixed upstream
for over 2 years https://lkml.org/lkml/2020/9/29/27.
Additionally the performance improvements for kernel use cases are significant.
Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
- BtrFS zstd compression at levels 1 and 3 is 5% faster
- BtrFS zstd decompression+read is 15% faster
- SquashFS zstd decompression+read is 15% faster
- F2FS zstd compression+write at level 3 is 8% faster
- F2FS zstd decompression+read is 20% faster
- ZRAM decompression+read is 30% faster
- Kernel zstd decompression is 35% faster
- Initramfs zstd decompression+build is 5% faster
On top of this, there are significant performance improvements coming down the
line in the next zstd release, and the new automated update patch generation
will allow us to pull them easily.
How is the update patch generated?
----------------------------------
The first two patches are preparation for updating the zstd version. Then the
3rd patch in the series imports upstream zstd into the kernel. This patch is
automatically generated from upstream. A script makes the necessary changes and
imports it into the kernel. The changes are:
- Replace all libc dependencies with kernel replacements and rewrite includes.
- Remove unncessary portability macros like: #if defined(_MSC_VER).
- Use the kernel xxhash instead of bundling it.
This automation gets tested every commit by upstream's continuous integration.
When we cut a new zstd release, we will submit a patch to the kernel to update
the zstd version in the kernel.
The automated process makes it easy to keep the kernel version of zstd up to
date. The current zstd in the kernel shares the guts of the code, but has a lot
of API and minor changes to work in the kernel. This is because at the time
upstream zstd was not ready to be used in the kernel envrionment as-is. But,
since then upstream zstd has evolved to support being used in the kernel as-is.
Why are we updating in one big patch?
-------------------------------------
The 3rd patch in the series is very large. This is because it is restructuring
the code, so it both deletes the existing zstd, and re-adds the new structure.
Future updates will be directly proportional to the changes in upstream zstd
since the last import. They will admittidly be large, as zstd is an actively
developed project, and has hundreds of commits between every release. However,
there is no other great alternative.
One option ruled out is to replay every upstream zstd commit. This is not feasible
for several reasons:
- There are over 3500 upstream commits since the zstd version in the kernel.
- The automation to automatically generate the kernel update was only added recently,
so older commits cannot easily be imported.
- Not every upstream zstd commit builds.
- Only zstd releases are "supported", and individual commits may have bugs that were
fixed before a release.
Another option to reduce the patch size would be to first reorganize to the new
file structure, and then apply the patch. However, the current kernel zstd is formatted
with clang-format to be more "kernel-like". But, the new method imports zstd as-is,
without additional formatting, to allow for closer correlation with upstream, and
easier debugging. So the patch wouldn't be any smaller.
It also doesn't make sense to import upstream zstd commit by commit going
forward. Upstream zstd doesn't support production use cases running of the
development branch. We have a lot of post-commit fuzzing that catches many bugs,
so indiviudal commits may be buggy, but fixed before a release. So going forward,
I intend to import every (important) zstd release into the Kernel.
So, while it isn't ideal, updating in one big patch is the only patch I see forward.
Who is responsible for this code?
---------------------------------
I am. This patchset adds me as the maintainer for zstd. Previously, there was no tree
for zstd patches. Because of that, there were several patches that either got ignored,
or took a long time to merge, since it wasn't clear which tree should pick them up.
I'm officially stepping up as maintainer, and setting up my tree as the path through
which zstd patches get merged. I'll make sure that patches to the kernel zstd get
ported upstream, so they aren't erased when the next version update happens.
How is this code tested?
------------------------
I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS, Kernel,
InitRAMFS). I also tested Kernel & InitRAMFS on i386 and aarch64. I checked both
performance and correctness.
Also, thanks to many people in the community who have tested these patches locally.
If you have tested the patches, please reply with a Tested-By so I can collect them
for the PR I will send to Linus.
Lastly, this code will bake in linux-next before being merged into v5.16.
Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
------------------------------------------------------------
This patchset has been outstanding since 2020, and zstd-1.4.10 was the latest
release when it was created. Since the update patch is automatically generated
from upstream, I could generate it from zstd-1.5.0. However, there were some
large stack usage regressions in zstd-1.5.0, and are only fixed in the latest
development branch. And the latest development branch contains some new code that
needs to bake in the fuzzer before I would feel comfortable releasing to the
kernel.
Once this patchset has been merged, and we've released zstd-1.5.1, we can update
the kernel to zstd-1.5.1, and exercise the update process.
You may notice that zstd-1.4.10 doesn't exist upstream. This release is an
artifical release based off of zstd-1.4.9, with some fixes for the kernel
backported from the development branch. I will tag the zstd-1.4.10 release after
this patchset is merged, so the Linux Kernel is running a known version of zstd
that can be debugged upstream.
Why was a wrapper API added?
----------------------------
The first versions of this patchset migrated the kernel to the upstream zstd
API. It first added a shim API that supported the new upstream API with the old
code, then updated callers to use the new shim API, then transitioned to the
new code and deleted the shim API. However, Cristoph Hellwig suggested that we
transition to a kernel style API, and hide zstd's upstream API behind that.
This is because zstd's upstream API is supports many other use cases, and does
not follow the kernel style guide, while the kernel API is focused on the
kernel's use cases, and follows the kernel style guide.
Where is the previous discussion?
---------------------------------
Links for the discussions of the previous versions of the patch set.
The largest changes in the design of the patchset are driven by the discussions
in V11, V5, and V1. Sorry for the mix of links, I couldn't find most of the the
threads on lkml.org.
V12: https://www.spinics.net/lists/linux-crypto/msg58189.html
V11: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/
V10: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/
V9: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/
V8: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/
V7: https://lkml.org/lkml/2020/12/3/1195
V6: https://lkml.org/lkml/2020/12/2/1245
V5: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/
V4: https://www.spinics.net/lists/linux-btrfs/msg105783.html
V3: https://lkml.org/lkml/2020/9/23/1074
V2: https://www.spinics.net/lists/linux-btrfs/msg105505.html
V1: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/
Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEmIwAqlFIzbQodPwyuzRpqaNEqPUFAmGJyKIACgkQuzRpqaNE
qPXnmw/+PKyCn6LvRQqNfdpF5f59j/B1Fab15tkpVyz3UWnCw+EKaPZOoTfIsjRf
7TMUVm4iGsm+6xBO/YrGdRl4IxocNgXzsgnJ1lTGDbvfRC1tG+YNwuv+EEXwKYq5
Yz3DRwDotgsrV0Kg05b+VIgkmAuY3ukmu2n09LnAdKkxoIgmHw3MIDCdVZW2Br4c
sjJmYI+fiJd7nAlbDa42VOrdTiLzkl/2BsjWBqTv6zbiQ5uuJGsKb7P3kpcybWzD
5C118pyE3qlVyvFz+UFu8WbN0NSf47DP22KV/3IrhNX7CVQxYBe+9/oVuPWTgRx0
4Vl0G6u7rzh4wDZuGqTC3LYWwH9GfycI0fnVC0URP2XMOcGfPlGd3L0PEmmAeTmR
fEbaGAN4dr0jNO3lmbyAGe/G8tvtXQx/4ZjS9Pa3TlQP24GARU/f78/blbKR87Vz
BGMndmSi92AscgXb9buO3bCwAY1YtH5WiFaZT1XVk42cj4MiOLvPTvP4UMzDDxcZ
56ahmAP/84kd6H+cv9LmgEMqcIBmxdUcO1nuAItJ4wdrMUgw3+lrbxwFkH9xPV7I
okC1K0TIVEobADbxbdMylxClAylbuW+37Pko97NmAlnzNCPNE38f3s3gtXRrUTaR
IP8jv5UQ7q3dFiWnNLLodx5KM6s32GVBKRLRnn/6SJB7QzlyHXU=
=Xb18
-----END PGP SIGNATURE-----
Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux
Pull zstd update from Nick Terrell:
"Update to zstd-1.4.10.
Add myself as the maintainer of zstd and update the zstd version in
the kernel, which is now 4 years out of date, to a much more recent
zstd release. This includes bug fixes, much more extensive fuzzing,
and performance improvements. And generates the kernel zstd
automatically from upstream zstd, so it is easier to keep the zstd
verison up to date, and we don't fall so far out of date again.
This includes 5 commits that update the zstd library version:
- Adds a new kernel-style wrapper around zstd.
This wrapper API is functionally equivalent to the subset of the
current zstd API that is currently used. The wrapper API changes to
be kernel style so that the symbols don't collide with zstd's
symbols. The update to zstd-1.4.10 maintains the same API and
preserves the semantics, so that none of the callers need to be
updated. All callers are updated in the commit, because there are
zero functional changes.
- Adds an indirection for `lib/decompress_unzstd.c` so it doesn't
depend on the layout of `lib/zstd/` to include every source file.
This allows the next patch to be automatically generated.
- Imports the zstd-1.4.10 source code. This commit is automatically
generated from upstream zstd (https://github.com/facebook/zstd).
- Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
- Fixes a newly added build warning for clang.
The discussion around this patchset has been pretty long, so I've
included a FAQ-style summary of the history of the patchset, and why
we are taking this approach.
Why do we need to update?
-------------------------
The zstd version in the kernel is based off of zstd-1.3.1, which is
was released August 20, 2017. Since then zstd has seen many bug fixes
and performance improvements. And, importantly, upstream zstd is
continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to
older versions. So the only way to sanely get these fixes is to keep
up to date with upstream zstd.
There are no known security issues that affect the kernel, but we need
to be able to update in case there are. And while there are no known
security issues, there are relevant bug fixes. For example the problem
with large kernel decompression has been fixed upstream for over 2
years [1]
Additionally the performance improvements for kernel use cases are
significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
- BtrFS zstd compression at levels 1 and 3 is 5% faster
- BtrFS zstd decompression+read is 15% faster
- SquashFS zstd decompression+read is 15% faster
- F2FS zstd compression+write at level 3 is 8% faster
- F2FS zstd decompression+read is 20% faster
- ZRAM decompression+read is 30% faster
- Kernel zstd decompression is 35% faster
- Initramfs zstd decompression+build is 5% faster
On top of this, there are significant performance improvements coming
down the line in the next zstd release, and the new automated update
patch generation will allow us to pull them easily.
How is the update patch generated?
----------------------------------
The first two patches are preparation for updating the zstd version.
Then the 3rd patch in the series imports upstream zstd into the
kernel. This patch is automatically generated from upstream. A script
makes the necessary changes and imports it into the kernel. The
changes are:
- Replace all libc dependencies with kernel replacements and rewrite
includes.
- Remove unncessary portability macros like: #if defined(_MSC_VER).
- Use the kernel xxhash instead of bundling it.
This automation gets tested every commit by upstream's continuous
integration. When we cut a new zstd release, we will submit a patch to
the kernel to update the zstd version in the kernel.
The automated process makes it easy to keep the kernel version of zstd
up to date. The current zstd in the kernel shares the guts of the
code, but has a lot of API and minor changes to work in the kernel.
This is because at the time upstream zstd was not ready to be used in
the kernel envrionment as-is. But, since then upstream zstd has
evolved to support being used in the kernel as-is.
Why are we updating in one big patch?
-------------------------------------
The 3rd patch in the series is very large. This is because it is
restructuring the code, so it both deletes the existing zstd, and
re-adds the new structure. Future updates will be directly
proportional to the changes in upstream zstd since the last import.
They will admittidly be large, as zstd is an actively developed
project, and has hundreds of commits between every release. However,
there is no other great alternative.
One option ruled out is to replay every upstream zstd commit. This is
not feasible for several reasons:
- There are over 3500 upstream commits since the zstd version in the
kernel.
- The automation to automatically generate the kernel update was only
added recently, so older commits cannot easily be imported.
- Not every upstream zstd commit builds.
- Only zstd releases are "supported", and individual commits may have
bugs that were fixed before a release.
Another option to reduce the patch size would be to first reorganize
to the new file structure, and then apply the patch. However, the
current kernel zstd is formatted with clang-format to be more
"kernel-like". But, the new method imports zstd as-is, without
additional formatting, to allow for closer correlation with upstream,
and easier debugging. So the patch wouldn't be any smaller.
It also doesn't make sense to import upstream zstd commit by commit
going forward. Upstream zstd doesn't support production use cases
running of the development branch. We have a lot of post-commit
fuzzing that catches many bugs, so indiviudal commits may be buggy,
but fixed before a release. So going forward, I intend to import every
(important) zstd release into the Kernel.
So, while it isn't ideal, updating in one big patch is the only patch
I see forward.
Who is responsible for this code?
---------------------------------
I am. This patchset adds me as the maintainer for zstd. Previously,
there was no tree for zstd patches. Because of that, there were
several patches that either got ignored, or took a long time to merge,
since it wasn't clear which tree should pick them up. I'm officially
stepping up as maintainer, and setting up my tree as the path through
which zstd patches get merged. I'll make sure that patches to the
kernel zstd get ported upstream, so they aren't erased when the next
version update happens.
How is this code tested?
------------------------
I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS,
Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and
aarch64. I checked both performance and correctness.
Also, thanks to many people in the community who have tested these
patches locally.
Lastly, this code will bake in linux-next before being merged into
v5.16.
Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
------------------------------------------------------------
This patchset has been outstanding since 2020, and zstd-1.4.10 was the
latest release when it was created. Since the update patch is
automatically generated from upstream, I could generate it from
zstd-1.5.0.
However, there were some large stack usage regressions in zstd-1.5.0,
and are only fixed in the latest development branch. And the latest
development branch contains some new code that needs to bake in the
fuzzer before I would feel comfortable releasing to the kernel.
Once this patchset has been merged, and we've released zstd-1.5.1, we
can update the kernel to zstd-1.5.1, and exercise the update process.
You may notice that zstd-1.4.10 doesn't exist upstream. This release
is an artifical release based off of zstd-1.4.9, with some fixes for
the kernel backported from the development branch. I will tag the
zstd-1.4.10 release after this patchset is merged, so the Linux Kernel
is running a known version of zstd that can be debugged upstream.
Why was a wrapper API added?
----------------------------
The first versions of this patchset migrated the kernel to the
upstream zstd API. It first added a shim API that supported the new
upstream API with the old code, then updated callers to use the new
shim API, then transitioned to the new code and deleted the shim API.
However, Cristoph Hellwig suggested that we transition to a kernel
style API, and hide zstd's upstream API behind that. This is because
zstd's upstream API is supports many other use cases, and does not
follow the kernel style guide, while the kernel API is focused on the
kernel's use cases, and follows the kernel style guide.
Where is the previous discussion?
---------------------------------
Links for the discussions of the previous versions of the patch set
below. The largest changes in the design of the patchset are driven by
the discussions in v11, v5, and v1. Sorry for the mix of links, I
couldn't find most of the the threads on lkml.org"
Link: https://lkml.org/lkml/2020/9/29/27 [1]
Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12]
Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11]
Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10]
Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9]
Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8]
Link: https://lkml.org/lkml/2020/12/3/1195 [v7]
Link: https://lkml.org/lkml/2020/12/2/1245 [v6]
Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5]
Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4]
Link: https://lkml.org/lkml/2020/9/23/1074 [v3]
Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2]
Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1]
Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
* tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux:
lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
MAINTAINERS: Add maintainer entry for zstd
lib: zstd: Upgrade to latest upstream zstd version 1.4.10
lib: zstd: Add decompress_sources.h for decompress_unzstd
lib: zstd: Add kernel-specific API
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGP7AoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphulEACJ5c1STtVCWFDVdfMEiyPh/rVYyar6xKgr
c0iqjHvmSTLrFaiYaJwQvVE939JgV1wGmy/WyAnVBdAuThDxh5oyk/hGF1NrBxSI
cE2zi69QYyNBkvhvbDlO6vi6Yue5n1+jxhrsPAtGvpWqgxfuyY4q74pjhOVd/ID5
Vw5eskcmRDCiuQX7TpvP2dFof1DNp4upIdTbX1uMO8fqgReNAeLEWNNdkpUtFtSI
UyWqHQECSV4ez2bRGF2wWNwpiVxExjkLOoj8Z0n1u1L4f9hujHHmHpHRoCFhXWhP
nDaCUxBEisc1BNuCOy0JqVhmpDcr2fMrdioZoc8TR0h4+DILQPTPst1r0LPmBMq1
1RCTR7lwCPtkfCtIyUTa2OCpv3ldkExcZrdmks7K5EDwgh+PPNvbB1X+pvjOSOeH
WvA3LwF9x/jpjOvtC5f88g58jCfoRYLSuMnplJxTvBzq5ERtBtuC8JzV4K/nHNTW
PdQRnHj61FYBiduT6IK9LF4gu3nwbFyATfqHAKQlNKGeZrgtJ5lBc3aPe11ICJ70
Wew9fAUet/TONApA0YsY4E0inOBJg4HOh+L6hy2jJbnFcfIeKr3e7PqrHd4f41DW
m2cf8UfuyApXquZ6kAMu+zpSLFaEoItjHwrcfvTKVRo/X17phza4qdPZqKEbWU7H
sTmzYM92Cg==
=OWxn
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.16-2021-11-13' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a single fix here for a buffered write hash stall, which is also
affecting stable"
* tag 'io_uring-5.16-2021-11-13' of git://git.kernel.dk/linux-block:
io-wq: serialize hash clear with wakeup
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGP2W4ACgkQiiy9cAdy
T1Eaawv/ZS5lgxqmCRBKlptlXvBwSFGNSwY8kZy5xAG08Pk8SMWBmpuPolZJhwmU
UlfOwCr0WzKlkcm9e6SLdhpO+wfITqzD84xrOq9sM5C4XC4h8gn45WIcsvDV7bu8
r4SB8AEHlU0lQWW8syclXppCW0s3+aqgyZVrgW3u0uykBm326CN+DtnqoHV84enO
deurEtPAbrJRzkGE3RHxfQEx5gt9IIkjZwoMk1AO6m/hEpdW7NTkDlMPFoAS+9PE
21m/fnilMV7gbLn6GNJKlYzM+ZCEq6tY2860fTR75uqgvo5qPKMSamv0Z/I9W23V
NKP6fMDWVZmac7eieBw3Hjzxkfj0/1nmXRyTghlRZk0PUkOW6atZPXSIcbdEJgCu
l/a+KV1U50FWp4/a5eTQpgJE6wDM71Lj9H5yomc5EafiRr8xAzhuJ1ZRoTVfNVRc
NEIJ4H81pxatbIukESz/wxB1SgV2FebsZWEEaTyudd66hNLVGiek0Q7qDT52YiPU
hvQzGkTx
=iwLL
-----END PGP SIGNATURE-----
Merge tag '5.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- improvements to reconnect and multichannel
- a performance improvement (additional use of SMB3 compounding)
- DFS code cleanup and improvements
- various trivial Coverity fixes
- two fscache fixes
- an fsync fix
* tag '5.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (23 commits)
cifs: do not duplicate fscache cookie for secondary channels
cifs: connect individual channel servers to primary channel server
cifs: protect session channel fields with chan_lock
cifs: do not negotiate session if session already exists
smb3: do not setup the fscache_super_cookie until fsinfo initialized
cifs: fix potential use-after-free bugs
cifs: fix memory leak of smb3_fs_context_dup::server_hostname
smb3: add additional null check in SMB311_posix_mkdir
cifs: release lock earlier in dequeue_mid error case
smb3: add additional null check in SMB2_tcon
smb3: add additional null check in SMB2_open
smb3: add additional null check in SMB2_ioctl
smb3: remove trivial dfs compile warning
cifs: support nested dfs links over reconnect
smb3: do not error on fsync when readonly
cifs: for compound requests, use open handle if possible
cifs: set a minimum of 120s for next dns resolution
cifs: split out dfs code from cifs_reconnect()
cifs: convert list_for_each to entry variant
cifs: introduce new helper for cifs_reconnect()
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGN4ZsACgkQiiy9cAdy
T1E2lgwArPsjBHTZcogr14wjnoFJflT3YMomGC+NOYtMosFumAmVbxA16qC3QVBA
evK2zE92qkdtCaGfeTiJVdDhbtpRsiO53F9Gdk+YqZWF7shJozAovruVen6kFI+3
fjVUSWP/VvN33SWlekU9litSUf7pJCcAvbPQBTA9h7gKSaA6BwqpaubhbuMjdtIN
Ycrz+xBB8q8UW4CRDL70zmUQM5Jxa//bO68e+M2cgcO8LM2MC1ZbT0CA3Q3STFAq
n3yWwwlwV+y5qgEiFztNdKkgF8tGou21YI2PJErjrQWhWgEvximFOV4VWZ+5mL9R
nx6ziORBgN7SbbMDeXhmDgihWCMdvcfFlB31sx2d0dIRo2Tu+x1DnaXqNPtJ8W9Q
M3GbUesDK+2HXIAPvrd/o0PNW0wahlPqEx3vWIJI9f15XrSGJL1OQ0FeaXIa5mGE
+rnwp9ifCGNoojC4wurWz5eebrlML0VtUwBF2Osr58lTS1VSV+ejXhSGv5F3bpzt
YpXtbFd+
=sxsQ
-----END PGP SIGNATURE-----
Merge tag '5.16-rc-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd updates from Steve French:
"Several smb server fixes; three for stable:
- important fix for negotiation info validation
- fix alignment check in packet validation
- cleanup of dead code (like MD4)
- refactoring some protocol headers to use common code in smbfs_common"
* tag '5.16-rc-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: Use the SMB3_Create definitions from the shared
ksmbd: Move more definitions into the shared area
ksmbd: use the common definitions for NEGOTIATE_PROTOCOL
ksmbd: switch to use shared definitions where available
ksmbd: change LeaseKey data type to u8 array
ksmbd: remove smb2_buf_length in smb2_transform_hdr
ksmbd: remove smb2_buf_length in smb2_hdr
ksmbd: remove md4 leftovers
ksmbd: set unique value to volume serial field in FS_VOLUME_INFORMATION
ksmbd: don't need 8byte alignment for request length in ksmbd_check_message
ksmbd: Fix buffer length check in fsctl_validate_negotiate_info()
ksmbd: Remove redundant 'flush_workqueue()' calls
ksmdb: use cmd helper variable in smb2_get_ksmbd_tcon()
ksmbd: use ksmbd_req_buf_next() in ksmbd_smb2_check_message()
ksmbd: use ksmbd_req_buf_next() in ksmbd_verify_smb_message()
in 5.7 are now enabled by default. This should greatly speed up things
like rm, tar and rsync. To opt out, wsync mount option can be used.
Other than that we have a pile of bug fixes all across the filesystem
from Jeff, Xiubo and Kotresh and a metrics infrastructure rework from
Luis.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmGORY8THGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi5XmB/0SRQTW+BRAhYvSD/Ib2/c6uVZGnmJU
MUCO/uuD4fZvfxyMVb3qnAzHIGh3YlFWa/dgSZNStvLmY0L9P0MIvyaYolVuC4Tu
7llX1I+yckTns9VmiULBNy9D812eRY282nMbRzikMGPO1eb6Yqo3r50AdTvkam/R
Qs5pfwRqLerbP7VUv4vSsrBflwVyHOrCFZaUUVTu4f2kKz/FzZd/FCw5VV51KYzq
ygN+8eFotf5zluMX0tl0FXVgJ13N9vbg+YNxyOzHmOAV0AhSmmtV8vJ+j+m7+sOj
b7wNl5AuI5Ogeg8PHSGsXOHBVn6IbhdGDcougEVL+1gHkxxOQ5hfBHWQ
=22Vd
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"One notable change here is that async creates and unlinks introduced
in 5.7 are now enabled by default. This should greatly speed up things
like rm, tar and rsync. To opt out, wsync mount option can be used.
Other than that we have a pile of bug fixes all across the filesystem
from Jeff, Xiubo and Kotresh and a metrics infrastructure rework from
Luis"
* tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client:
ceph: add a new metric to keep track of remote object copies
libceph, ceph: move ceph_osdc_copy_from() into cephfs code
ceph: clean-up metrics data structures to reduce code duplication
ceph: split 'metric' debugfs file into several files
ceph: return the real size read when it hits EOF
ceph: properly handle statfs on multifs setups
ceph: shut down mount on bad mdsmap or fsmap decode
ceph: fix mdsmap decode when there are MDS's beyond max_mds
ceph: ignore the truncate when size won't change with Fx caps issued
ceph: don't rely on error_string to validate blocklisted session.
ceph: just use ci->i_version for fscache aux info
ceph: shut down access to inode when async create fails
ceph: refactor remove_session_caps_cb
ceph: fix auth cap handling logic in remove_session_caps_cb
ceph: drop private list from remove_session_caps_cb
ceph: don't use -ESTALE as special return code in try_get_cap_refs
ceph: print inode numbers instead of pointer values
ceph: enable async dirops by default
libceph: drop ->monmap and err initialization
ceph: convert to noop_direct_IO
- fix unsafe pagevec reuse which could cause unexpected behaviors;
- get rid of the unused DELAYEDALLOC strategy.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYY53yxEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBEm2AQD1wxu5TxXqOPP9cISHcS2IGiJOCxQThjSs
XT3jSvIKBAEA+nncAHua1vGzC7HyqLjsebRjoZ+dAjXULZFdg3FnPwY=
=z3VD
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- fix unsafe pagevec reuse which could cause unexpected behaviors
- get rid of the unused DELAYEDALLOC strategy that has been replaced by
TRYALLOC
* tag 'erofs-for-5.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: remove useless cache strategy of DELAYEDALLOC
erofs: fix unsafe pagevec reuse of hooked pclusters
In this cycle, we've applied relatively small number of patches which fix subtle
corner cases mainly, while introducing a new mount option to be able to fragment
the disk intentionally for performance tests.
Enhancement:
- add a mount option to fragmente on-disk layout to understand the performance
- support direct IO for multi-partitions
- add a fault injection of dquot_initialize
Bug fix:
- address some lockdep complaints
- fix a deadlock issue with quota
- fix a memory tuning condition
- fix compression condition to improve the ratio
- fix disabling compression on the non-empty compressed file
- invalidate cached pages before IPU/DIO writes
And, we've added some minor clean-ups as usual.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmGMILwACgkQQBSofoJI
UNJDRA/+KPyCXdY0OqL26BuGKj+z7hW6bz7tlh6h3wdnPdsR/W3ehbqQEr3GBb+q
yokmD75/in7vZwGsDHGowFWMAfWOHYEqHz5UAq91sHjhfZzLDNUgLFWJedBX2XJb
UoEAa7KzRt9M9K2p/5vSTs07RN3okUiRkFhVBBQJIaL7xi6MpadN/XAqpyoBqsiP
pAV6J3GF6WNF19P/hkN1CJI8rV+PFrvY6C23lMkP7mnsWh03jMSgDDuhLHMQpAba
EJYq7QbSatsLDRdR+jUQwIfMucvvzN7M6ja9+NTGlbeACvND8vXKYXOwngCq9+je
2PIU4J8zNqnEkLsPn8STm4zwZHCA7VFdeCobCZcaVZCZFBzVqCkVYE9wqFVaQmr1
bCrRFvEb+D1pkHYFujVXwCAfPlO6twiAInFNMa3WQ3FduJq2nhc8OLCJJ46D1KT2
ZzzLv2EIIlncxPvgLIhiEE9DgPOyV56PQAO3OTsBZcvycU32aHo4hyexju1ubKiD
CZFEHLnPbxX8Ulh3NX4uUxqPAEVhM/aw4l4e8xhmVRY3uj75geY7M6rt1vD+Y5Et
EwbUE8XbLy+GhqbbO/SX9G38pftOiIquH1J0RuhuVNNmkIDkQvnSNp8WqHTdjEJE
NiHZ5bkRkii34Wfrax9UccqGDswh/gjHAXEfGD8nFfcQZwLP1n8=
=KGQ3
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we've applied relatively small number of patches which
fix subtle corner cases mainly, while introducing a new mount option
to be able to fragment the disk intentionally for performance tests.
Enhancements:
- add a mount option to fragmente on-disk layout to understand the
performance
- support direct IO for multi-partitions
- add a fault injection of dquot_initialize
Bug fixes:
- address some lockdep complaints
- fix a deadlock issue with quota
- fix a memory tuning condition
- fix compression condition to improve the ratio
- fix disabling compression on the non-empty compressed file
- invalidate cached pages before IPU/DIO writes
And, we've added some minor clean-ups as usual"
* tag 'f2fs-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix UAF in f2fs_available_free_memory
f2fs: invalidate META_MAPPING before IPU/DIO write
f2fs: support fault injection for dquot_initialize()
f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()
f2fs: compress: disallow disabling compress on non-empty compressed file
f2fs: compress: fix overwrite may reduce compress ratio unproperly
f2fs: multidevice: support direct IO
f2fs: introduce fragment allocation mode mount option
f2fs: replace snprintf in show functions with sysfs_emit
f2fs: include non-compressed blocks in compr_written_block
f2fs: fix wrong condition to trigger background checkpoint correctly
f2fs: fix to use WHINT_MODE
f2fs: fix up f2fs_lookup tracepoints
f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
f2fs: introduce excess_dirty_threshold()
f2fs: avoid attaching SB_ACTIVE flag during mount
f2fs: quota: fix potential deadlock
f2fs: should use GFP_NOFS for directory inodes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmGNO7sACgkQ+7dXa6fL
C2sDTA//SLJBMoY719Z/RvZcZb3PUkuwYtqu8j/F6C1n231yg5TrlkchslV635Ph
4lUuy/+pEVtsWot4JBxBsziBsi5WCjucn6opFAO2UOyT0aiE9ucY2MG+fNo6/b5k
RRGqbEODKHScC7pm00AJlqd8gJUvHz6Zy08KetHvkSI4bqBz5VpKxDSNxcWvsbx1
T6FMY+E61vSd0bEOp1/sZ1gRKK5nG9BJrba9V/MuOzj94MqVd9Ajdarr2vfQ7IyS
Qe16rOBM8u6oCRJqRcwdz2Ma0Zf/0Bm6JpoP7LsEdbzXLKPiHPWM6kNd9WZFmMt1
KFGGC3xG3Yxufasdpf5KCa6wlw1U5hWVITqRubHGxg49IdnrwxNK2zLqpJNr/lZC
vsOg31PkAWmJiMCAxhwL+u++Qar27jlXcdiO6tyqIDYHWyzwGWqF5zlzZ46NR26W
SX7oB36drIzH+UMDqxKGti2hRYTPKKwjCJ6p0EfhEYQ0oGa/bNFJA3bPxupJCkJe
PD0pnXdEmhKwvY0fLH6Ghr/PAckQttstTFpaHZ40XFzglgd3Sm5DEcg2Xm38LQ+n
4elMUA+c807ZTDLSDkGTL9QKPmgoM3AFoAsVxV9eqtEYAzYdnYM1GJ0CGOWM1lvs
vDrB7rqn/CFB6ks8k1BOalq2DTmPVU1I1f1GwnkyOiVtlxyrslk=
=K23c
-----END PGP SIGNATURE-----
Merge tag 'netfs-folio-20211111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs, 9p, afs and ceph (partial) foliation from David Howells:
"This converts netfslib, 9p and afs to use folios. It also partially
converts ceph so that it uses folios on the boundaries with netfslib.
To help with this, a couple of folio helper functions are added in the
first two patches.
These patches don't touch fscache and cachefiles as I intend to remove
all the code that deals with pages directly from there. Only nfs and
cifs are using the old fscache I/O API now. The new API uses iov_iter
instead.
Thanks to Jeff Layton, Dominique Martinet and AuriStor for testing and
retesting the patches"
* tag 'netfs-folio-20211111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Use folios in directory handling
netfs, 9p, afs, ceph: Use folios
folio: Add a function to get the host inode for a folio
folio: Add a function to change the private data attached to a folio
We allocate index cookies for each connection from the client.
However, we don't need this index for each channel in case of
multichannel. So making sure that we avoid creating duplicate
cookies by instantiating only for primary channel.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Today, we don't have any way to get the smb session for any
of the secondary channels. Introducing a pointer to the primary
server from server struct of any secondary channel. The value will
be NULL for the server of the primary channel. This will enable us
to get the smb session for any channel.
This will be needed for some of the changes that I'm planning
to make soon.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Introducing a new spin lock to protect all the channel related
fields in a cifs_ses struct. This lock should be taken
whenever dealing with the channel fields, and should be held
only for very short intervals which will not sleep.
Currently, all channel related fields in cifs_ses structure
are protected by session_mutex. However, this mutex is held for
long periods (sometimes while waiting for a reply from server).
This makes the codepath quite tricky to change.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
In cifs_get_smb_ses, if we find an existing matching session,
we should not send a negotiate request for the session if a
session reconnect is not necessary.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
We were calling cifs_fscache_get_super_cookie after tcon but before
we queried the info (QFS_Info) we need to initialize the cookie
properly. Also includes an additional check suggested by Paulo
to make sure we don't initialize super cookie twice.
Suggested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ensure that share and prefix variables are set to NULL after kfree()
when looping through DFS targets in __tree_connect_dfs_target().
Also, get rid of @ref in __tree_connect_dfs_target() and just pass a
boolean to indicate whether we're handling link targets or not.
Fixes: c88f7dcd6d ("cifs: support nested dfs links over reconnect")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Although unlikely for it to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.
Addresses-Coverity: 1437501 ("Explicit Null dereference")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>