OpenCloudOS-Kernel/fs
Dave Chinner 19f4e7cc81 xfs: Fix CIL throttle hang when CIL space used going backwards
A hang with tasks stuck on the CIL hard throttle was reported and
largely diagnosed by Donald Buczek, who discovered that it was a
result of the CIL context space usage decrementing in committed
transactions once the hard throttle limit had been hit and processes
were already blocked.  This resulted in the CIL push not waking up
those waiters because the CIL context was no longer over the hard
throttle limit.

The surprising aspect of this was the CIL space usage going
backwards regularly enough to trigger this situation. Assumptions
had been made in design that the relogging process would only
increase the size of the objects in the CIL, and so that space would
only increase.

This change and commit message fixes the issue and documents the
result of an audit of the triggers that can cause the CIL space to
go backwards, how large the backwards steps tend to be, the
frequency in which they occur, and what the impact on the CIL
accounting code is.

Even though the CIL ctx->space_used can go backwards, it will only
do so if the log item is already logged to the CIL and contains a
space reservation for it's entire logged state. This is tracked by
the shadow buffer state on the log item. If the item is not
previously logged in the CIL it has no shadow buffer nor log vector,
and hence the entire size of the logged item copied to the log
vector is accounted to the CIL space usage. i.e.  it will always go
up in this case.

If the item has a log vector (i.e. already in the CIL) and the size
decreases, then the existing log vector will be overwritten and the
space usage will go down. This is the only condition where the space
usage reduces, and it can only occur when an item is already tracked
in the CIL. Hence we are safe from CIL space usage underruns as a
result of log items decreasing in size when they are relogged.

Typically this reduction in CIL usage occurs from metadata blocks
being free, such as when a btree block merge occurs or a directory
enter/xattr entry is removed and the da-tree is reduced in size.
This generally results in a reduction in size of around a single
block in the CIL, but also tends to increase the number of log
vectors because the parent and sibling nodes in the tree needs to be
updated when a btree block is removed. If a multi-level merge
occurs, then we see reduction in size of 2+ blocks, but again the
log vector count goes up.

The other vector is inode fork size changes, which only log the
current size of the fork and ignore the previously logged size when
the fork is relogged. Hence if we are removing items from the inode
fork (dir/xattr removal in shortform, extent record removal in
extent form, etc) the relogged size of the inode for can decrease.

No other log items can decrease in size either because they are a
fixed size (e.g. dquots) or they cannot be relogged (e.g. relogging
an intent actually creates a new intent log item and doesn't relog
the old item at all.) Hence the only two vectors for CIL context
size reduction are relogging inode forks and marking buffers active
in the CIL as stale.

Long story short: the majority of the code does the right thing and
handles the reduction in log item size correctly, and only the CIL
hard throttle implementation is problematic and needs fixing. This
patch makes that fix, as well as adds comments in the log item code
that result in items shrinking in size when they are relogged as a
clear reminder that this can and does happen frequently.

The throttle fix is based upon the change Donald proposed, though it
goes further to ensure that once the throttle is activated, it
captures all tasks until the CIL push issues a wakeup, regardless of
whether the CIL space used has gone back under the throttle
threshold.

This ensures that we prevent tasks reducing the CIL slightly under
the throttle threshold and then making more changes that push it
well over the throttle limit. This is acheived by checking if the
throttle wait queue is already active as a condition of throttling.
Hence once we start throttling, we continue to apply the throttle
until the CIL context push wakes everything on the wait queue.

We can use waitqueue_active() for the waitqueue manipulations and
checks as they are all done under the ctx->xc_push_lock. Hence the
waitqueue has external serialisation and we can safely peek inside
the wait queue without holding the internal waitqueue locks.

Many thanks to Donald for his diagnostic and analysis work to
isolate the cause of this hang.

Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-21 10:06:14 -07:00
..
9p 9p for 5.13-rc1 2021-05-07 11:18:52 -07:00
adfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
affs idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
afs afs: Fix the nlink handling of dir-over-dir rename 2021-05-27 06:23:58 -10:00
autofs autofs: should_expire() argument is guaranteed to be positive 2021-03-24 14:14:27 -04:00
befs fs/befs: Delete obsolete TODO file 2021-03-30 16:54:49 -07:00
bfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
btrfs for-5.13-rc2-tag 2021-05-21 13:24:12 -10:00
cachefiles fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
ceph Notable items here are a series to take advantage of David Howells' 2021-05-06 10:27:02 -07:00
cifs cifs: change format of CIFS_FULL_KEY_DUMP ioctl 2021-05-27 15:26:32 -05:00
coda coda: fix reference counting in coda_file_mmap error path 2021-04-23 14:42:39 -07:00
configfs treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
cramfs cramfs: use %pD instead of messing with file_dentry()->d_name 2021-01-05 23:02:47 -05:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-04-26 08:51:23 -07:00
debugfs debugfs: fix security_locked_down() call for SELinux 2021-05-18 18:05:59 +02:00
devpts
dlm fs: dlm: fix missing unlock on error in accept_from_sock() 2021-03-29 13:28:18 -05:00
ecryptfs fs: ecryptfs: remove BUG_ON from crypt_scatterlist 2021-05-13 18:32:26 +02:00
efivarfs efivars: convert to fileattr 2021-04-12 15:04:29 +02:00
efs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
erofs erofs: fix 1 lcluster-sized pcluster for big pcluster 2021-05-13 15:58:46 +08:00
exfat exfat: speed up iterate/lookup by fixing start point of traversing cluster chain 2021-04-27 20:45:07 +09:00
exportfs exportfs: Add a function to return the raw output from fh_to_dentry() 2020-12-09 09:39:38 -05:00
ext2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
ext4 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
f2fs f2fs: return EINVAL for hole cases in swap file 2021-05-12 07:38:00 -07:00
fat fs: fat: fix spelling typo of values 2021-05-07 00:26:34 -07:00
freevxfs
fscache fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
fuse Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
gfs2 mm: introduce and use mapping_empty() 2021-05-05 11:27:19 -07:00
hfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
hfsplus hfsplus: prevent corruption in shrinking truncate 2021-05-14 19:41:32 -07:00
hostfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
hpfs hpfs: replace one-element array with flexible-array member 2021-05-06 19:24:13 -07:00
hugetlbfs userfaultfd: hugetlbfs: fix new flag usage in error path 2021-05-22 15:09:07 -10:00
iomap mm/filemap: fix readahead return types 2021-05-14 19:41:32 -07:00
isofs isofs: fix fall-through warnings for Clang 2021-05-06 19:24:13 -07:00
jbd2 ext4: fix debug format string warning 2021-04-09 23:32:16 -04:00
jffs2 This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
jfs jfs: convert to fileattr 2021-04-12 15:04:29 +02:00
kernfs idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
lockd SUNRPC: Make trace_svc_process() display the RPC procedure symbolically 2021-01-25 09:36:23 -05:00
minix fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
netfs netfs: Make CONFIG_NETFS_SUPPORT auto-selected rather than manual 2021-05-25 13:48:04 +01:00
nfs nfs: Remove trailing semicolon in macros 2021-05-27 09:19:33 -04:00
nfs_common NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs 2021-03-22 10:19:00 -04:00
nfsd NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2021-05-07 00:34:51 -07:00
nls
notify fanotify_user: use upper_32_bits() to verify mask 2021-03-25 15:33:45 +01:00
ntfs ntfs: check for valid standard information attribute 2021-02-24 13:38:26 -08:00
ocfs2 treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
omfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
openpromfs openpromfs: don't do unlock_new_inode() until the new inode is set up 2021-03-12 22:15:22 -05:00
orangefs orangefs: leave files in the page cache for a few micro seconds at least 2021-04-29 08:06:05 -04:00
overlayfs overlayfs update for 5.13 2021-04-30 15:17:08 -07:00
proc proc: Check /proc/$pid/attr/ writes against file opener 2021-05-25 10:24:41 -10:00
pstore printk changes for 5.13 2021-04-27 18:09:44 -07:00
qnx4 [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
qnx6 [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
quota quota: Use 'hlist_for_each_entry' to simplify code 2021-05-10 16:27:49 +02:00
ramfs ramfs: support O_TMPFILE 2021-02-24 13:38:26 -08:00
reiserfs treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
romfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
squashfs squashfs: fix divide error in calculate_skip() 2021-05-14 19:41:32 -07:00
sysfs sysfs: Support zapping of binary attr mmaps 2021-01-12 14:26:31 +01:00
sysv fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
tracefs tracing: Fix various typos in comments 2021-03-23 14:08:18 -04:00
ubifs This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
udf useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
ufs useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxsf: don't allow to change the inode type 2021-03-12 22:15:00 -05:00
verity fsverity: relax build time dependency on CRYPTO_SHA256 2021-04-22 17:31:32 +10:00
xfs xfs: Fix CIL throttle hang when CIL space used going backwards 2021-06-21 10:06:14 -07:00
zonefs \n 2021-04-29 11:06:13 -07:00
Kconfig NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
Kconfig.binfmt binfmt_flat: allow not offsetting data start 2021-04-19 09:56:37 +10:00
Makefile netfs: Provide readahead and readpage netfs helpers 2021-04-23 10:14:32 +01:00
aio.c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" 2021-04-30 11:20:39 -07:00
anon_inodes.c fs: anon_inodes: rephrase to appropriate kernel-doc 2021-01-15 12:17:25 -05:00
attr.c ima: handle idmapped mounts 2021-01-24 14:27:20 +01:00
bad_inode.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
binfmt_aout.c
binfmt_elf.c coredump: don't bother with do_truncate() 2021-03-08 10:21:11 -05:00
binfmt_elf_fdpic.c coredump: don't bother with do_truncate() 2021-03-08 10:21:11 -05:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: allow not offsetting data start 2021-04-19 09:56:37 +10:00
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-13 11:27:30 -08:00
binfmt_script.c
block_dev.c block-5.13-2021-05-22 2021-05-22 07:40:34 -10:00
buffer.c Merge branch 'akpm' (patches from Andrew) 2021-05-05 13:50:15 -07:00
char_dev.c
compat_binfmt_elf.c get rid of COMPAT_ELF_EXEC_PAGESIZE 2021-01-06 08:42:51 -05:00
coredump.c coredump: don't bother with do_truncate() 2021-03-08 10:21:11 -05:00
d_path.c constify dentry argument of dentry_path()/dentry_path_raw() 2021-03-21 11:43:58 -04:00
dax.c dax fixes for 5.13-rc2 2021-05-15 08:28:08 -07:00
dcache.c useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
direct-io.c fs: direct-io: fix missing sdio->boundary 2021-04-09 14:54:23 -07:00
drop_caches.c
eventfd.c eventfd: Export eventfd_ctx_do_read() 2020-11-15 09:49:10 -05:00
eventpoll.c fs/epoll: restore waking from ep_done_scan() 2021-05-06 19:24:13 -07:00
exec.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
fcntl.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
fhandle.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
file.c Merge branch 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-03 11:05:28 -07:00
file_table.c epoll: take epitem list out of struct file 2020-10-25 20:02:08 -04:00
filesystems.c
fs-writeback.c fs: improve comments for writeback_single_inode() 2021-01-13 17:26:50 +01:00
fs_context.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
fs_parser.c vfs: fs_parser: clean up kernel-doc warnings 2021-04-30 11:20:35 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
init.c init: handle idmapped mounts 2021-01-24 14:27:19 +01:00
inode.c mm: remove nrexceptional from inode: remove BUG_ON 2021-05-05 11:27:20 -07:00
internal.h idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
io-wq.c io-wq: Fix UAF when wakeup wqe in hash waitqueue 2021-05-26 09:03:56 -06:00
io-wq.h io_uring/io-wq: close io-wq full-stop gap 2021-05-25 19:39:58 -06:00
io_uring.c io_uring: fix data race to avoid potential NULL-deref 2021-05-27 07:44:49 -06:00
ioctl.c vfs: add fileattr ops 2021-04-12 15:04:23 +02:00
kernel_read_file.c fs/kernel_file_read: Add "offset" arg for partial reads 2020-10-05 13:37:04 +02:00
libfs.c libfs: fix kernel-doc for mnt_userns 2021-03-23 11:20:25 +01:00
locks.c Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, 2021-05-05 13:44:19 -07:00
mbcache.c
mount.h mount: make {lock,unlock}_mount_hash() static 2021-01-24 14:29:34 +01:00
mpage.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
namei.c fs.idmapped.helpers.v5.13 2021-04-27 12:49:42 -07:00
namespace.c fs/mount_setattr: tighten permission checks 2021-05-12 14:13:16 +02:00
no-block.c
nsfs.c
open.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
pipe.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
pnode.c
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-08 15:18:43 +01:00
posix_acl.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
proc_namespace.c fs: introduce MOUNT_ATTR_IDMAP 2021-01-24 14:43:45 +01:00
read_write.c teach sendfile(2) to handle send-to-pipe directly 2021-01-25 23:29:36 -05:00
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-17 11:39:49 -07:00
remap_range.c ioctl: handle idmapped mounts 2021-01-24 14:27:19 +01:00
select.c kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-16 22:13:10 +01:00
seq_file.c seq_file: Add a seq_bprintf function 2021-04-27 15:50:15 -07:00
signalfd.c signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo 2021-05-18 16:20:54 -05:00
splice.c for-5.12/block-2021-02-17 2021-02-21 11:02:48 -08:00
stack.c
stat.c fs: fix reporting supported extra file attributes for statx() 2021-04-17 23:03:50 -04:00
statfs.c s390,alpha: switch to 64-bit ino_t 2021-02-13 17:17:53 +01:00
super.c fs,security: Add sb_delete hook 2021-04-22 12:22:11 -07:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: add UFFDIO_CONTINUE ioctl 2021-05-05 11:27:22 -07:00
utimes.c utimes: handle idmapped mounts 2021-01-24 14:27:18 +01:00
xattr.c xattr: fix kernel-doc for mnt_userns and vfs xattr helpers 2021-03-23 11:20:26 +01:00