Since statx, every filesystem should fill the attributes/attributes_mask
in routine getattr. But the generic_fillattr has not fill that, so add
ext2_getattr to do this. This can fix generic/424 while testing ext2.
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Refuse to mount a volume read-write without a coherent Logical Volume
Integrity Descriptor, because we can't generate truly unique IDs without
one.
This fixes a bug where all inodes created on a UDF filesystem following
mount without a coherent LVID are assigned unique ID 0 which can then
confuse other UDF implementations.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Make sure the CRC and tag checksum of the Logical Volume Integrity
Descriptor are valid before the structure is written out to disk.
Otherwise, unless the filesystem is unmounted gracefully, the on-disk
LVID will be invalid - which is unnecessary filesystem damage.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Centralize timestamping and CRC/checksum updating of the in-core
Logical Volume Integrity Descriptor, in preparation for adding
a third site where this functionality is needed.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When ext2 filesystem is created with 64k block size, ext2_max_size()
will return value less than 0. Also, we cannot write any file in this fs
since the sb->maxbytes is less than 0. The core of the problem is that
the size of block index tree for such large block size is more than
i_blocks can carry. So fix the computation to count with this
possibility.
File size limits computed with the new function for the full range of
possible block sizes look like:
bits file_size
10 17247252480
11 275415851008
12 2196873666560
13 2197948973056
14 2198486220800
15 2198754754560
16 2198888906752
CC: stable@vger.kernel.org
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When best_desc keeps NULL, best_group keeps -1, too. So we can
return best_group directly.
Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn>
Signed-off-by: Jan Kara <jack@suse.cz>
There is a plan to build the kernel with -Wimplicit-fallthrough and
these places in the code produced warnings (W=1).
This commit removes the following warnings:
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Jan Kara <jack@suse.cz>
When setting the first xattr, we automatically enable
EXT2_FEATURE_COMPAT_EXT_ATTR. However we forget to call
ext2_update_dynamic_rev() so in theory if the filesystem was created as
ancient one without features support, this could be missed. The
consequences are minor anyway - since the feature is compat one, only
old e2fsck which does not understand xattrs could do something bad.
Reported-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
The case of (EXT2_INODE_SIZE(sb) == 0) is included in
(sbi->s_inode_size < EXT2_GOOD_OLD_INODE_SIZE).
So there is no need to check again.
Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn>
Signed-off-by: Jan Kara <jack@suse.cz>
Set proper return code when failing from allocating
memory in ext2_fill_super().
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
- Fix console ramoops to show the previous boot logs (Sai Prakash Ranjan)
- Avoid allocation and leak of platform data
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlxE/QYWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsVXD/94pJsNHogOle3XnZJst/1cV4+R
KTyQ6QQ1184lI5FPGJliV3veCuKnTY4Q7e5vCl2mzEvds+VRxu2yUm+LFxKVj04W
yiRNv8Yn3+3eKxNufqLa9ySbsSZyIpoCKQ/20nbC5cNM3r+Gq12GUnDBixtGdl9V
fB316emIaKR70Qvi9KlfwDTcjh+SWBR2u8L0YsmTaoJkjtaCJhekN3hfya6Wnjf5
1sj1fCDzteda/Ld0gXoImHPvt0UcHPJDp1+ZhY0ja64QIuoWwDxsisp4X4nS8iFD
oak9BWRDhRwSzPnyCu8BI3Hc4ayxcBYR/vmBS12LTkIJVdR1CaqfeDYvZexyX04n
TYZnmPeZ+2R3wzA9aCLjUDEAYNi403sLqIvR99jlrOoiMo+5Bf05TvBG8lND2dQg
A13854vg9ssxfiuEmvxJTg8SLtcFiqEtrzZhqaLVfuf4cqaN3o0Ed+A0gjZadwwD
TPxiMh4YEZuw+qszZwyzi7uECKOvJIJeLCL8CApPvihCNFy2vKF5wj58ZizXXmoT
Orq6DJ1ITx4DTkYIgtiTC9HKDuvNDZ/ncdb5QoqlpoTe01r8GQOzffZ+tD3xRTNc
8+yAdK0/bTjLVNbFX/Q4dwagh0I8EQSrdw8CmmQs0lsCvfyYTfHz8uPeBQ7rrzeN
2r1H7a/OKpL/duTkEw==
=m6sX
-----END PGP SIGNATURE-----
Merge tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Fix console ramoops to show the previous boot logs (Sai Prakash
Ranjan)
- Avoid allocation and leak of platform data
* tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Avoid allocation and leak of platform data
pstore/ram: Fix console ramoops to show the previous boot logs
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.
Reported-by: Yue Hu <huyue2@yulong.com>
Fixes: 35da60941e ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxElHsACgkQxWXV+ddt
WDsF8Q/6A+l/Ku8D+xSmw+JGXGNroX1V62sxVYWqIgrkYNg6iSMjicqF3aN1bCbR
LqvOQ5skerrKteNYIPbTTOD5Xp37ccinSeEWEF0ktFkeU1G6yJo7aRsnQlO1sduk
2PKUOA1/PdeTBiOj1bej/1ybhtIW+d0MaoPtnUCMC8DD/ihfmU332+KC8VmRUYGZ
4kT1DvKfBOSVz1UTl6OJdWo76crvjz0eGVnH1YG7DoFbpVzAbphHE7+aC/WkHQme
X5Ux2NtYVMe/0IGAC7kJnj4jCZ1weAdlmvzzagjzGtWdhWgTxntiJ/FVJs9nO8Dm
G/pVtD8RVjgMkPOWcfw5fdderrBJqjVGgl4VDrDLqjO9OTGNFJs+HgcPRi6Oli28
sA+HG+U34YzSfKY0L9eAmpkNxMjWywBuXTQIAlMhHNZCL0vZ2K5EYa3dTp9OswAW
IIcOh/LfZxiomvMvUqQWcRCy5y/b+cYjOjbHwkrw+ewd3IWXVLG8YLMyZI3vnHKu
/f1xn6KCap9a1cS4LwyK6gzstEugn0MYmnmD/Jx8I1BJFBt55Q31ES6tPHgTmh/d
QjveRjMkxNCql4h5Hq0+LiXoSoocBmsO0wrs2QrWSx4PBnJsvjySnWr/8GfAOj79
BhnuQFxbr/BkNyBzvKrjoI+zZnrVm0cBU59lP6PzN75+kQTaIfs=
=hGlM
-----END PGP SIGNATURE-----
Merge tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A handful of fixes (some of them in testing for a long time):
- fix some test failures regarding cleanup after transaction abort
- revert of a patch that could cause a deadlock
- delayed iput fixes, that can help in ENOSPC situation when there's
low space and a lot data to write"
* tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: wakeup cleaner thread when adding delayed iput
btrfs: run delayed iputs before committing
btrfs: wait on ordered extents on abort cleanup
btrfs: handle delayed ref head accounting cleanup in abort
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
Stable bugfixes:
- Fix TCP receive code on archs with flush_dcache_page()
Other bugfixes:
- Fix error code in rpcrdma_buffer_create()
- Fix a double free in rpcrdma_send_ctxs_create()
- Fix kernel BUG at kernel/cred.c:825
- Fix unnecessary retry in nfs42_proc_copy_file_range()
- Ensure rq_bytes_sent is reset before request transmission
- Ensure we respect the RPCSEC_GSS sequence number limit
- Address Kerberos performance/behavior regression
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxCH1kACgkQ18tUv7Cl
QOs4rBAAymqyhUzNgap1TX/KezFxqii7CVMVabrA5eGN+ZXbSVAZkwy7BMZWwVIp
tEvD7lxWtF11x7bQDw7Xz+ruBCjLdD0RQIFnlBpVKqsRy9oSRA4PsgSbuIFaw+gX
Bun4Z0xmOCPF7knRv6gQonArEZfHeokIIN8AtSBtWVByaOrnZwgDkNTIub8akpUl
FQlzgq7lTydVzNcju2ImBeubU7KgFEu0F2Zub5z/iR+F2Mx/bAju8Q4YeVlPyD8U
QJoIBlXAvgK8LK4bZCh40zPeEt0TMWXnW7o0JHgVQ0g6VbT+hp17I7fz91xEazye
qbjpIJIjv5daEv0REM8t5ZCZB3tEatVjb4EQWXp0gJYb0l5E3I/O+7MO44n4uMYx
s3UTxzM6NjwCtlgmn4tYUj+vEIExQHUUnwOl02e5iEa7bqNNY75ehAhj5Rh7iQBH
H4b+OVuqc608q87rNePdK1LRyh0/u1cDI1kDAQoIP2omlb5hJQGk0Nuz9G2BodIj
rP0x7nV+ykOXZtr6TR+RvaksL1W39PzVKYA0aL+e2gbcv4YO+Oq1phvNKwRWPM4a
g08r/kvifS5h6/Jq8Wmn83f1vAOX7Sf23RtEoj+t9hc4S4JbsV2iYK3PY3eWbSYE
Oz0Vt4gvBBJ+0rHJ10BsQ7686OQkyMKpIlvmx6O5mWVlthovbJM=
=6Nzz
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes for SUNRPC bugs, with a single v4.2
copy_file_range() fix mixed in.
Stable bugfixes:
- Fix TCP receive code on archs with flush_dcache_page()
Other bugfixes:
- Fix error code in rpcrdma_buffer_create()
- Fix a double free in rpcrdma_send_ctxs_create()
- Fix kernel BUG at kernel/cred.c:825
- Fix unnecessary retry in nfs42_proc_copy_file_range()
- Ensure rq_bytes_sent is reset before request transmission
- Ensure we respect the RPCSEC_GSS sequence number limit
- Address Kerberos performance/behavior regression"
* tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Address Kerberos performance/behavior regression
SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit
SUNRPC: Ensure rq_bytes_sent is reset before request transmission
NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
sunrpc: kernel BUG at kernel/cred.c:825!
SUNRPC: Fix TCP receive code on archs with flush_dcache_page()
xprtrdma: Double free in rpcrdma_sendctxs_create()
xprtrdma: Fix error code in rpcrdma_buffer_create()
The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path. Delaying iputs
means we are potentially delaying the eviction of an inode and it's
respective space. The cleaner thread only gets woken up every 30
seconds, or when we require space. If there are a lot of inodes that
need to be deleted we could induce a serious amount of latency while we
wait for these inodes to be evicted. So instead wakeup the cleaner if
it's not already awake to process any new delayed iputs we add to the
list. If we suddenly need space we will less likely be backed up
behind a bunch of inodes that are waiting to be deleted, and we could
possibly free space before we need to get into the flushing logic which
will save us some latency.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd. So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit e73e81b6d0.
This patch causes a few problems:
- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
more work from btrfs_btree_balance_dirty_nodelay could end up in the
same workque, effectively deadlocking
12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Transaction commit will wait on the freespace cache:
838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
And then writepages ends up waiting on transaction commit:
9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.
The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.
Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQIVAwUAXECcyPu3V2unywtrAQIpGw//ctcGHg2sfUEra17pvlKEbpZe9bMeJxp6
UY2YR5gPpiYMmvNe8hLl3I68b43h03jGOx+KmqowqX7anNq3o2nMy0pDbuGmKtuS
5NmIOECAml8k2uPSpacAF2s6TsxB2lTDYwZdyeuRmZ4scOTujNby33RlijGIxX4s
87WJFRuCacm9I1KkiKKn4PWoYGjDdsZ7rsDyEeBmQ/MiKOSLG4QP5XuNr4X9zFMX
r8uF3N8h/NzJWefEirc2DPFfiWLJqkyclq9tgsTB1Z3l+x5u/MHnIg3rZpZhH0uC
GhGWjlGYqTxOwzYCaOOsNIDRF4rAGPi3lzuJXjONnhvbOO7DCGJ+Mo2obxkAqLL6
PrtFuQvgXIOl/k8y6AdckuPPu/OMHT1hyY0PQXmTGhHAAfPP3RHoPl7owEjmAbRg
hvRkFDSIKZ4Kr1nKP4vwaJYEKtxUQrkOwZKmN6ve31ZeJsyrH12MsCgWvp7oQwRJ
fVbk8DWVRtYzy4RaO3Xr+0WfD+03dDi6KKCPPiC2gtNKOO+1Kco/EtIqPi6SXg0m
ee/mFOkRsmEh++iNxS58qLxH37On6GSYOElSIMN0NJDNA2TzLrvidXlMSOTD1hj4
n6gL38E3br/CIimKPYm87qi6yC59CAtrJCulYiPOoMc20eaEUP4DIXN8yjgjLNQp
Hj5M9GRTwXk=
=cUrU
-----END PGP SIGNATURE-----
Merge tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Here's a set of fixes for AFS:
- Use struct_size() for kzalloc() size calculation.
- When calling YFS.CreateFile rather than AFS.CreateFile, it is
possible to create a file with a file lock already held. The
default value indicating no lock required is actually -1, not 0.
- Fix an oops in inode/vnode validation if the target inode doesn't
have a server interest assigned (ie. a server that will notify us
of changes by third parties).
- Fix refcounting of keys in file locking.
- Fix a race in refcounting asynchronous operations in the event of
an error during request transmission. The provision of a dedicated
function to get an extra ref on a call is split into a separate
commit"
* tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix race in async call refcounting
afs: Provide a function to get a ref on a call
afs: Fix key refcounting in file locking code
afs: Don't set vnode->cb_s_break in afs_validate()
afs: Set correct lock type for the yfs CreateFile
afs: Use struct_size() in kzalloc()
commit b05c950698 ("pstore/ram: Simplify ramoops_get_next_prz()
arguments") changed update assignment in getting next persistent ram zone
by adding a check for record type. But the check always returns true since
the record type is assigned 0. And this breaks console ramoops by showing
current console log instead of previous log on warm reset and hard reset
(actually hard reset should not be showing any logs).
Fix this by having persistent ram zone type check instead of record type
check. Tested this on SDM845 MTP and dragonboard 410c.
Reproducing this issue is simple as below:
1. Trigger hard reset and mount pstore. Will see console-ramoops
record in the mounted location which is the current log.
2. Trigger warm reset and mount pstore. Will see the current
console-ramoops record instead of previous record.
Fixes: b05c950698 ("pstore/ram: Simplify ramoops_get_next_prz() arguments")
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[kees: dropped local variable usage]
Signed-off-by: Kees Cook <keescook@chromium.org>
There's a race between afs_make_call() and afs_wake_up_async_call() in the
case that an error is returned from rxrpc_kernel_send_data() after it has
queued the final packet.
afs_make_call() will try and clean up the mess, but the call state may have
been moved on thereby causing afs_process_async_call() to also try and to
delete the call.
Fix this by:
(1) Getting an extra ref for an asynchronous call for the call itself to
hold. This makes sure the call doesn't evaporate on us accidentally
and will allow the call to be retained by the caller in a future
patch. The ref is released on leaving afs_make_call() or
afs_wait_for_call_to_complete().
(2) In the event of an error from rxrpc_kernel_send_data():
(a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
call has been aborted and ended. This prevents
afs_deliver_to_call() from doing anything with any notifications
it gets.
(b) Explicitly end the call immediately to prevent further callbacks.
(c) Cancel any queued async_work and wait for the work if it's
executing. This allows us to be sure the race won't recur when we
change the state. We put the work queue's ref on the call if we
managed to cancel it.
(d) Put the call's ref that we got in (1). This belongs to us as long
as the call is in state AFS_CALL_CL_REQUESTING.
Fixes: 341f741f04 ("afs: Refcount the afs_call struct")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the refcounting of the authentication keys in the file locking code.
The vnode->lock_key member points to a key on which it expects to be
holding a ref, but it isn't always given an extra ref, however.
Fixes: 0fafdc9f88 ("afs: Fix file locking")
Signed-off-by: David Howells <dhowells@redhat.com>
A cb_interest record is not necessarily attached to the vnode on entry to
afs_validate(), which can cause an oops when we try to bring the vnode's
cb_s_break up to date in the default case (ie. no current callback promise
and the vnode has not been deleted).
Fix this by simply removing the line, as vnode->cb_s_break will be set when
needed by afs_register_server_cb_interest() when we next get a callback
promise from RPC call.
The oops looks something like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
...
RIP: 0010:afs_validate+0x66/0x250 [kafs]
...
Call Trace:
afs_d_revalidate+0x8d/0x340 [kafs]
? __d_lookup+0x61/0x150
lookup_dcache+0x44/0x70
? lookup_dcache+0x44/0x70
__lookup_hash+0x24/0xa0
do_unlinkat+0x11d/0x2c0
__x64_sys_unlink+0x23/0x30
do_syscall_64+0x4d/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: ae3b7361dc ("afs: Fix validation/callback interaction")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Currently nfs42_proc_copy_file_range() can not return EAGAIN.
Fixes: e4648aa4f9 ("NFS recover from destination server reboot for copies")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
init_page_buffers+0x3e2/0x530 fs/buffer.c:904
grow_dev_page fs/buffer.c:947 [inline]
grow_buffers fs/buffer.c:1009 [inline]
__getblk_slow fs/buffer.c:1036 [inline]
__getblk_gfp+0x906/0xb10 fs/buffer.c:1313
__bread_gfp+0x2d/0x310 fs/buffer.c:1347
sb_bread include/linux/buffer_head.h:307 [inline]
fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
fat_ent_read_block fs/fat/fatent.c:441 [inline]
fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
__fat_get_block fs/fat/inode.c:148 [inline]
...
Trivial reproducer for the problem looks like:
truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt
Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.
Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.
Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlw7Y68ACgkQxWXV+ddt
WDs+Jw/9Hn71GLb1YNpNUlzJSQHUzbvFtXbtvEVyYeuuLkmjTG4FT2bkGqYhAeC9
2jvaqPaxI0VJ7vNulAaMVZayFwNEQrF9p+z+vzV/Ty2lu0Ep/7Uuwp9KL8X49req
5hEtb5p9Dbj9hXqa8XNOfF2GPDRTQ6D4eknYiNM7MY+aTkvMEtpuZg9kfwXrZ2au
JeFBzZo9SA5nxqrXZg1XlRYttDnOf44h6YJmEFOZOJuAouKcd5I7C8BshCRKDuWo
iJBjYTBTFqjgUgCrn10UM92T19hKufHiDE3WWQ/7zykqtThpKgBFR1opAUcNBJBf
HvOOsYnZcGbxIKOjFpucSpButTmjFcnI83f/7dmZYXUsyIzP/xH51uLiz/CLJqWj
JsRgtgtJ7l5s1M8GkFG6B9Tp89KxHnVKqNC5HyX+4AuFWiIJ+CWWSddGqNSfAIHe
o2ceQRMumvwbVKgfd0AfPcZ9v4sRM/DfwxWXgEQmCSNJikSuUjP9b3D51ttnXQ8c
q6lz7L6nRKK4mgBnfJCmpus3IArdvNwJ7CF8C1RejfRwL824RzH+yHjWdg36nVXB
oaBYKJf8GRRn+3In1W6npr65NxCQERIZV4M89EgST/RK7tW5u0Fotd1KJCmo+ayO
cSRbp16On9G6gBV+qrBs8X65KIWALZpdTycwyFplCU581DXdtnU=
=aCB2
-----END PGP SIGNATURE-----
Merge tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- two regression fixes in clone/dedupe ioctls, the generic check
callback needs to lock extents properly and wait for io to avoid
problems with writeback and relocation
- fix deadlock when using free space tree due to block group creation
- a recently added check refuses a valid fileystem with seeding device,
make that work again with a quickfix, proper solution needs more
intrusive changes
* tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Use real device structure to verify dev extent
Btrfs: fix deadlock when using free space tree due to block group creation
Btrfs: fix race between reflink/dedupe and relocation
Btrfs: fix race between cloning range ending at eof and writeback
Here is one small sysfs change, and a documentation update for 5.0-rc2
The sysfs change moves from using BUG_ON to WARN_ON, as discussed in an
email thread on lkml while trying to track down another driver bug.
sysfs should not be crashing and preventing people from seeing where
they went wrong. Now it properly recovers and warns the developer.
The documentation update removes the use of BUS_ATTR() as the kernel is
moving away from this to use the specific BUS_ATTR_RW() and friends
instead. There are pending patches in all of the different subsystems
to remove the last users of this macro, but for now, don't advertise it
should be used anymore to keep new ones from being introduced.
Both have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXDsRFA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynbgACfTY/rAZpWTgMdPDfoOmF+s/XHQXsAoJKZ+v+f
Tpkcw76Wo1ESpPLuT1u1
=W0D+
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here is one small sysfs change, and a documentation update for 5.0-rc2
The sysfs change moves from using BUG_ON to WARN_ON, as discussed in
an email thread on lkml while trying to track down another driver bug.
sysfs should not be crashing and preventing people from seeing where
they went wrong. Now it properly recovers and warns the developer.
The documentation update removes the use of BUS_ATTR() as the kernel
is moving away from this to use the specific BUS_ATTR_RW() and friends
instead. There are pending patches in all of the different subsystems
to remove the last users of this macro, but for now, don't advertise
it should be used anymore to keep new ones from being introduced.
Both have been in linux-next with no reported issues"
* tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: driver core: remove use of BUS_ATTR
sysfs: convert BUG_ON to WARN_ON
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlw6VMAACgkQiiy9cAdy
T1GA4AwAiLxAK9gYkv7oCi4xkr+GeDr9/J0R4JZOglXrJQfbEDAgejpTqIVOeiTW
n2fDEe4JCj3Dk9VnhGIAcV3Lyp5pcXuiqKwBJsxlJIIM81rDa98K8NgpJqjFDt1P
QEsGHijO7/5rlefnMPUy0YrFdoGr9ZjDEYDx2RdTkuRAX2rqLI02ytqyB6AHMjSM
zx5Ck1JdltrZa9vN1k2QoUW92FLrHWHDQM6ooRTiTv1n5v3CtVCFNFEUD9AWCdKy
APFXva3cWPAh6Dvsh6Qtryn2WdSfDdy69iiqfkBrDaaLsJC0oCogK9TsQOL66Szu
onpXQROqxvOA0UoyJJgRaDhRjGzRLglChpitS9ps4ZqziXRMpKBsn9SzaZiJ3Jna
/u5Pv5CFr4JkXRnvLAx2h6coZ+FyfVmSgxtoCU+tdn8G0t2AXl7swUw2e2UX7k4y
FR20/w0hA9pDVSlrY1Z+Yg7TCZLS5677+6xFo+b+rnF6VW8XZQc/2F9MYlxyGAj5
30R/Q0ck
=410H
-----END PGP SIGNATURE-----
Merge tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"A set of cifs/smb3 fixes, 4 for stable, most from Pavel. His patches
fix an important set of crediting (flow control) problems, and also
two problems in cifs_writepages, ddressing some large i/o and also
compounding issues"
* tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
CIFS: Fix error paths in writeback code
CIFS: Move credit processing to mid callbacks for SMB3
CIFS: Fix credits calculation for cancelled requests
cifs: Fix potential OOB access of lock element array
cifs: Limit memory used by lock request calls to a page
cifs: move large array from stack to heap
CIFS: Do not hide EINTR after sending network packets
CIFS: Fix credit computation for compounded requests
CIFS: Do not set credits to 1 if the server didn't grant anything
CIFS: Fix adjustment of credits for MTU requests
cifs: Fix a tiny potential memory leak
cifs: Fix a debug message
edge case, marked for stable.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAlw4yfETHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi9loB/4iTBs/c5vI8oXSi+n4IAAG/LunP8Ff
HfxMr7Mpbpp0cLdzmJWO63Wejii95w7IkdepJTF5VlpH6dwJDUL6jSAyAXPO1a2G
phZHmnGYjFBKLGPftCZmGQCJeYeS9JJyRgTC4IZECVkQRiPRGXn8cAbEqMYOy6Am
mXmy0u4Me3RJTpcOCqaPPugontvtsZdfqvwVE2dGqXTj9Kh1hfum5ddrvf1CXugQ
MpXzW3mJCMMfcXlZOCVuQInKqPT4HRhTpzJxdECiJTxBXDQUd6e0ekOXmKW6jOYk
ZpNVnXHGee0uvzwQC0/VMWkRF6d+pOsYFQkDlX79h04rqnoBkUNc2Doe
=dJev
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A patch to allow setting abort_on_full and a fix for an old "rbd
unmap" edge case, marked for stable"
* tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client:
rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
ceph: use vmf_error() in ceph_filemap_fault()
libceph: allow setting abort_on_full for rbd
This patch aims to address writeback code problems related to error
paths. In particular it respects EINTR and related error codes and
stores and returns the first error occurred during writeback.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently we account for credits in the thread initiating a request
and waiting for a response. The demultiplex thread receives the response,
wakes up the thread and the latter collects credits from the response
buffer and add them to the server structure on the client. This approach
is not accurate, because it may race with reconnect events in the
demultiplex thread which resets the number of credits.
Fix this by moving credit processing to new mid callbacks that collect
credits granted by the server from the response in the demultiplex thread.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If a request is cancelled, we can't assume that the server returns
1 credit back. Instead we need to wait for a response and process
the number of credits granted by the server.
Create a separate mid callback for cancelled request, parse the number
of credits in a response buffer and add them to the client's credits.
If the didn't get a response (no response buffer available) assume
0 credits granted. The latter most probably happens together with
session reconnect, so the client's credits are adjusted anyway.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If maxBuf is small but non-zero, it could result in a zero sized lock
element array which we would then try and access OOB.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
The code tries to allocate a contiguous buffer with a size supplied by
the server (maxBuf). This could fail if memory is fragmented since it
results in high order allocations for commonly used server
implementations. It is also wasteful since there are probably
few locks in the usual case. Limit the buffer to be no larger than a
page to avoid memory allocation failures due to fragmentation.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This addresses some compile warnings that you can
see depending on configuration settings.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently we hide EINTR code returned from sock_sendmsg()
and return 0 instead. This makes a caller think that we
successfully completed the network operation which is not
true. Fix this by properly returning EINTR to callers.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In SMB3 protocol every part of the compound chain consumes credits
individually, so we need to call wait_for_free_credits() for each
of the PDUs in the chain. If an operation is interrupted, we must
ensure we return all credits taken from the server structure back.
Without this patch server can sometimes disconnect the session
due to credit mismatches, especially when first operation(s)
are large writes.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Currently we reset the number of total credits granted by the server
to 1 if the server didn't grant us anything int the response. This
violates the SMB3 protocol - we need to trust the server and use
the credit values from the response. Fix this by removing the
corresponding code.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Currently for MTU requests we allocate maximum possible credits
in advance and then adjust them according to the request size.
While we were adjusting the number of credits belonging to the
server, we were skipping adjustment of credits belonging to the
request. This patch fixes it by setting request credits to
CreditCharge field value of SMB2 packet header.
Also ask 1 credit more for async read and write operations to
increase parallelism and match the behavior of other operations.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
The most recent "it" allocation is leaked on this error path. I
believe that small allocations always succeed in current kernels so
this doesn't really affect run time.
Fixes: 54be1f6c1c ("cifs: Add DFS cache routines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This debug message was never shown because it was checking for NULL
returns but extract_hostname() returns error pointers.
Fixes: 93d5cb517d ("cifs: Add support for failover in cifs_reconnect()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
A lock type of 0 is "LockRead", which makes the fileserver record an
unintentional read lock on the new file. This will cause problems
later on if the file is the subject of locking operations.
The correct default value should be -1 ("LockNone").
Fix the operation marshalling code to set the value and provide an enum to
symbolise the values whilst we're at it.
Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
[BUG]
Linux v5.0-rc1 will fail fstests/btrfs/163 with the following kernel
message:
BTRFS error (device dm-6): dev extent devid 1 physical offset 13631488 len 8388608 is beyond device boundary 0
BTRFS error (device dm-6): failed to verify dev extents against chunks: -117
BTRFS error (device dm-6): open_ctree failed
[CAUSE]
Commit cf90d884b3 ("btrfs: Introduce mount time chunk <-> dev extent
mapping check") introduced strict check on dev extents.
We use btrfs_find_device() with dev uuid and fs uuid set to NULL, and
only dependent on @devid to find the real device.
For seed devices, we call clone_fs_devices() in open_seed_devices() to
allow us search seed devices directly.
However clone_fs_devices() just populates devices with devid and dev
uuid, without populating other essential members, like disk_total_bytes.
This makes any device returned by btrfs_find_device(fs_info, devid,
NULL, NULL) is just a dummy, with 0 disk_total_bytes, and any dev
extents on the seed device will not pass the device boundary check.
[FIX]
This patch will try to verify the device returned by btrfs_find_device()
and if it's a dummy then re-search in seed devices.
Fixes: cf90d884b3 ("btrfs: Introduce mount time chunk <-> dev extent mapping check")
CC: stable@vger.kernel.org # 4.19+
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When modifying the free space tree we can end up COWing one of its extent
buffers which in turn might result in allocating a new chunk, which in
turn can result in flushing (finish creation) of pending block groups. If
that happens we can deadlock because creating a pending block group needs
to update the free space tree, and if any of the updates tries to modify
the same extent buffer that we are COWing, we end up in a deadlock since
we try to write lock twice the same extent buffer.
So fix this by skipping pending block group creation if we are COWing an
extent buffer from the free space tree. This is a case missed by commit
5ce555578e ("Btrfs: fix deadlock when writing out free space caches").
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202173
Fixes: 5ce555578e ("Btrfs: fix deadlock when writing out free space caches")
CC: stable@vger.kernel.org # 4.18+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>