linux-sg2042

History

David Howells ec0fa0b659 afs: Fix deadlock between writeback and truncate The afs filesystem has a lock[] that it uses to serialise I/O operations going to the server (vnode->io_lock), as the server will only perform one modification operation at a time on any given file or directory. This prevents the the filesystem from filling up all the call slots to a server with calls that aren't going to be executed in parallel anyway, thereby allowing operations on other files to obtain slots. [] Note that is probably redundant for directories at least since i_rwsem is used to serialise directory modifications and lookup/reading vs modification. The server does allow parallel non-modification ops, however. When a file truncation op completes, we truncate the in-memory copy of the file to match - but we do it whilst still holding the io_lock, the idea being to prevent races with other operations. However, if writeback starts in a worker thread simultaneously with truncation (whilst notify_change() is called with i_rwsem locked, writeback pays it no heed), it may manage to set PG_writeback bits on the pages that will get truncated before afs_setattr_success() manages to call truncate_pagecache(). Truncate will then wait for those pages - whilst still inside io_lock: # cat /proc/8837/stack [<0>] wait_on_page_bit_common+0x184/0x1e7 [<0>] truncate_inode_pages_range+0x37f/0x3eb [<0>] truncate_pagecache+0x3c/0x53 [<0>] afs_setattr_success+0x4d/0x6e [<0>] afs_wait_for_operation+0xd8/0x169 [<0>] afs_do_sync_operation+0x16/0x1f [<0>] afs_setattr+0x1fb/0x25d [<0>] notify_change+0x2cf/0x3c4 [<0>] do_truncate+0x7f/0xb2 [<0>] do_sys_ftruncate+0xd1/0x104 [<0>] do_syscall_64+0x2d/0x3a [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The writeback operation, however, stalls indefinitely because it needs to get the io_lock to proceed: # cat /proc/5940/stack [<0>] afs_get_io_locks+0x58/0x1ae [<0>] afs_begin_vnode_operation+0xc7/0xd1 [<0>] afs_store_data+0x1b2/0x2a3 [<0>] afs_write_back_from_locked_page+0x418/0x57c [<0>] afs_writepages_region+0x196/0x224 [<0>] afs_writepages+0x74/0x156 [<0>] do_writepages+0x2d/0x56 [<0>] __writeback_single_inode+0x84/0x207 [<0>] writeback_sb_inodes+0x238/0x3cf [<0>] __writeback_inodes_wb+0x68/0x9f [<0>] wb_writeback+0x145/0x26c [<0>] wb_do_writeback+0x16a/0x194 [<0>] wb_workfn+0x74/0x177 [<0>] process_one_work+0x174/0x264 [<0>] worker_thread+0x117/0x1b9 [<0>] kthread+0xec/0xf1 [<0>] ret_from_fork+0x1f/0x30 and thus deadlock has occurred. Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact that the caller is holding i_rwsem doesn't preclude more pages being dirtied through an mmap'd region. Fix this by: (1) Use the vnode validate_lock to mediate access between afs_setattr() and afs_writepages(): (a) Exclusively lock validate_lock in afs_setattr() around the whole RPC operation. (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to shared-lock validate_lock and returning immediately if we couldn't get it. (c) If WB_SYNC_ALL is set, wait for the lock. The validate_lock is also used to validate a file and to zap its cache if the file was altered by a third party, so it's probably a good fit for this. (2) Move the truncation outside of the io_lock in setattr, using the same hook as is used for local directory editing. This requires the old i_size to be retained in the operation record as we commit the revised status to the inode members inside the io_lock still, but we still need to know if we reduced the file size. Fixes: `d2ddc776a4` ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2020-10-08 10:50:55 -07:00
..
Kconfig	docs: filesystems: fix renamed references	2020-04-20 15:45:22 -06:00
Makefile	afs: Detect cell aliases 1 - Cells with root volumes	2020-06-04 15:37:57 +01:00
addr_list.c	afs: Use kfree_rcu() instead of casting kfree() to rcu_callback_t	2020-03-13 10:47:33 -07:00
afs.h	afs: Implement client support for the YFSVL.GetCellName RPC op	2020-06-04 15:37:57 +01:00
afs_cm.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
afs_fs.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
afs_vl.h	afs: Implement client support for the YFSVL.GetCellName RPC op	2020-06-04 15:37:57 +01:00
cache.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
callback.c	afs: Fix the by-UUID server tree to allow servers with the same UUID	2020-06-04 15:37:57 +01:00
cell.c	afs: Fix storage of cell names	2020-06-27 22:04:24 -07:00
cmservice.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
dir.c	treewide: Remove uninitialized_var() usage	2020-07-16 12:35:15 -07:00
dir_edit.c	afs: Remove set but not used variables 'before', 'after'	2019-11-21 20:36:00 +00:00
dir_silly.c	afs: Fix silly rename	2020-06-16 22:00:28 +01:00
dynroot.c	afs: Fix NULL deref in afs_dynroot_depopulate()	2020-08-21 10:56:40 -07:00
file.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
flock.c	afs: Remove erroneous fallthough annotation	2020-08-27 14:33:01 -05:00
fs_operation.c	afs: Fix key ref leak in afs_put_operation()	2020-08-20 10:41:45 -07:00
fs_probe.c	rxrpc: Make rxrpc_kernel_get_srtt() indicate validity	2020-08-20 18:21:28 +01:00
fsclient.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
inode.c	afs: Fix deadlock between writeback and truncate	2020-10-08 10:50:55 -07:00
internal.h	afs: Fix deadlock between writeback and truncate	2020-10-08 10:50:55 -07:00
main.c	afs: Fix hang on rmmod due to outstanding timer	2020-06-20 12:01:58 -07:00
misc.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
mntpt.c	afs: Fix mountpoint parsing	2019-12-11 16:56:54 +00:00
proc.c	afs: Don't use VL probe running state to make decisions outside probe code	2020-08-20 18:21:28 +01:00
protocol_uae.h	afs: Add support for the UAE error table	2019-06-28 18:37:53 +01:00
protocol_yfs.h	afs: Implement client support for the YFSVL.GetCellName RPC op	2020-06-04 15:37:57 +01:00
rotate.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
rxrpc.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
security.c	treewide: Remove uninitialized_var() usage	2020-07-16 12:35:15 -07:00
server.c	afs: Fix hang on rmmod due to outstanding timer	2020-06-20 12:01:58 -07:00
server_list.c	afs: Reorganise volume and server trees to be rooted on the cell	2020-06-04 15:37:57 +01:00
super.c	afs: Fix afs_statfs() to not let the values go below zero	2020-06-04 15:37:58 +01:00
vl_alias.c	afs: Fix debugging statements with %px to be %p	2020-06-09 18:17:14 +01:00
vl_list.c	afs: Don't use VL probe running state to make decisions outside probe code	2020-08-20 18:21:28 +01:00
vl_probe.c	afs: Don't use VL probe running state to make decisions outside probe code	2020-08-20 18:21:28 +01:00
vl_rotate.c	afs: Fix error handling in VL server rotation	2020-08-20 18:21:28 +01:00
vlclient.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
volume.c	afs: Reorganise volume and server trees to be rooted on the cell	2020-06-04 15:37:57 +01:00
write.c	afs: Fix deadlock between writeback and truncate	2020-10-08 10:50:55 -07:00
xattr.c	afs: Build an abstraction around an "operation" concept	2020-06-04 15:37:17 +01:00
xdr_fs.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
yfsclient.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00