linux-sg2042/fs
Filipe Manana 75b463d2b4 btrfs: do not commit logs and transactions during link and rename operations
Since commit d4682ba03e ("Btrfs: sync log after logging new name") we
started to commit logs, and fallback to transaction commits when we failed
to log the new names or commit the logs, after link and rename operations
when the target inodes (or their parents) were previously logged in the
current transaction. This was to avoid losing directories despite an
explicit fsync on them when they are ancestors of some inode that got a
new named logged, due to a link or rename operation. However that adds the
cost of starting IO and waiting for it to complete, which can cause higher
latencies for applications.

Instead of doing that, just make sure that when we log a new name for an
inode we don't mark any of its ancestors as logged, so that if any one
does an fsync against any of them, without doing any other change on them,
the fsync commits the log. This way we only pay the cost of a log commit
(or a transaction commit if something goes wrong or a new block group was
created) if the application explicitly asks to fsync any of the parent
directories.

Using dbench, which mixes several filesystems operations including renames,
revealed some significant latency gains. The following script that uses
dbench was used to test this:

  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/btrfs
  MOUNT_OPTIONS="-o ssd -o space_cache=v2"
  MKFS_OPTIONS="-m single -d single"
  THREADS=16

  echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  dbench -t 300 -D $MNT $THREADS

  umount $MNT

The test was run on bare metal, no virtualization, on a box with 12 cores
(Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
configuration that is the default of typical distributions (debian in this
case), without debug options enabled (kasan, kmemleak, slub debug, debug
of page allocations, lock debugging, etc).

Results before this patch:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    10750455     0.011   155.088
 Close         7896674     0.001     0.243
 Rename         455222     2.158  1101.947
 Unlink        2171189     0.067   121.638
 Deltree           256     2.425     7.816
 Mkdir             128     0.002     0.003
 Qpathinfo     9744323     0.006    21.370
 Qfileinfo     1707092     0.001     0.146
 Qfsinfo       1786756     0.001    11.228
 Sfileinfo      875612     0.003    21.263
 Find          3767281     0.025     9.617
 WriteX        5356924     0.011   211.390
 ReadX        16852694     0.003     9.442
 LockX           35008     0.002     0.119
 UnlockX         35008     0.001     0.138
 Flush          753458     4.252  1102.249

Throughput 1128.35 MB/sec  16 clients  16 procs  max_latency=1102.255 ms

Results after this patch:

16 clients, after

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    11471098     0.012   448.281
 Close         8426396     0.001     0.925
 Rename         485746     0.123   267.183
 Unlink        2316477     0.080    63.433
 Deltree           288     2.830    11.144
 Mkdir             144     0.003     0.010
 Qpathinfo    10397420     0.006    10.288
 Qfileinfo     1822039     0.001     0.169
 Qfsinfo       1906497     0.002    14.039
 Sfileinfo      934433     0.004     2.438
 Find          4019879     0.026    10.200
 WriteX        5718932     0.011   200.985
 ReadX        17981671     0.003    10.036
 LockX           37352     0.002     0.076
 UnlockX         37352     0.001     0.109
 Flush          804018     5.015   778.033

Throughput 1201.98 MB/sec  16 clients  16 procs  max_latency=778.036 ms
(+6.5% throughput, -29.4% max latency, -75.8% rename latency)

Test case generic/498 from fstests tests the scenario that the previously
mentioned commit fixed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:06:56 +02:00
..
9p treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
adfs treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
affs affs: fix basic permission bits to actually work 2020-08-31 12:20:31 +02:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-03 18:50:48 -07:00
autofs autofs: use __kernel_write() for the autofs pipe writing 2020-09-29 17:18:34 -07:00
befs block: move struct block_device to blk_types.h 2020-06-24 09:16:02 -06:00
bfs
btrfs btrfs: do not commit logs and transactions during link and rename operations 2020-10-07 12:06:56 +02:00
cachefiles cachefiles: switch to kernel_write 2020-07-08 08:27:56 +02:00
ceph We have an inode number handling change, prompted by s390x which is 2020-08-28 10:33:04 -07:00
cifs cifs: fix DFS mount with cifsacl/modefromsid 2020-09-06 23:59:53 -05:00
coda
configfs treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
cramfs
crypto mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
debugfs debugfs: Fix module state check condition 2020-09-04 18:12:52 +02:00
devpts
dlm treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ecryptfs mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
efivarfs efi/efivars: Expose RT service availability via efivars abstraction 2020-07-09 10:14:29 +03:00
efs block: move struct block_device to blk_types.h 2020-06-24 09:16:02 -06:00
erofs treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
exfat exfat: retain 'VolumeFlags' properly 2020-08-12 08:31:13 +09:00
exportfs
ext2 ext2: don't update mtime on COW faults 2020-09-05 10:00:05 -07:00
ext4 \n 2020-08-28 10:57:14 -07:00
f2fs f2fs: Return EOF on unaligned end of file DIO read 2020-09-08 20:31:33 -07:00
fat fat: fix fat_ra_init() for data clusters == 0 2020-08-12 10:58:01 -07:00
freevxfs
fscache Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-06-03 16:27:18 -07:00
fuse fuse: fix the ->direct_IO() treatment of iov_iter 2020-09-17 17:26:56 -04:00
gfs2 Fix memory leak on filesystem withdraw 2020-08-28 10:41:00 -07:00
hfs block: move block-related definitions out of fs.h 2020-06-24 09:16:02 -06:00
hfsplus treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
hostfs
hpfs hpfs: fix warning due to superfluous semicolon 2020-06-06 10:08:17 -07:00
hugetlbfs hugetlbfs: prevent filesystem stacking of hugetlbfs 2020-08-12 10:57:56 -07:00
iomap treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
isofs Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
jbd2 jbd2: clean up checksum verification in do_one_pass() 2020-08-19 12:04:35 -04:00
jffs2 treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
jfs block: move struct block_device to blk_types.h 2020-06-24 09:16:02 -06:00
kernfs fsnotify: pass dir and inode arguments to fsnotify() 2020-07-27 23:15:48 +02:00
lockd
minix fs/minix: remove expected error message in block_to_path() 2020-08-12 10:58:00 -07:00
nfs pNFS/flexfiles: Be consistent about mirror index types 2020-09-18 09:25:33 -04:00
nfs_common treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
nfsd Fixes: 2020-08-25 18:01:36 -07:00
nilfs2 treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
nls treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
notify treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ntfs ntfs: fix ntfs_test_inode and ntfs_init_locked_inode function type 2020-08-07 11:33:21 -07:00
ocfs2 treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
omfs treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
openpromfs
orangefs orangefs: remove unnecessary assignment to variable ret 2020-08-04 15:01:58 -04:00
overlayfs Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
proc mm, oom: make the calculation of oom badness more accurate 2020-08-12 10:57:56 -07:00
pstore treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
qnx4
qnx6 fs: convert mpage_readpages to mpage_readahead 2020-06-02 10:59:07 -07:00
quota treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ramfs
reiserfs \n 2020-08-06 19:28:26 -07:00
romfs romfs: fix uninitialized memory leak in romfs_dev_read() 2020-08-21 09:52:53 -07:00
squashfs squashfs: avoid bio_alloc() failure with 1Mbyte blocks 2020-08-21 09:52:53 -07:00
sysfs RDMA 5.8 merge window pull request 2020-06-05 14:05:57 -07:00
sysv
tracefs
ubifs treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
udf treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ufs treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
unicode
vboxsf Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-09-22 15:08:41 -07:00
verity fs-verity: use smp_load_acquire() for ->i_verity_info 2020-07-21 16:02:41 -07:00
xfs Fixes (2) for 5.9: 2020-09-05 10:04:53 -07:00
zonefs zonefs: add zone-capacity support 2020-08-11 17:42:24 +09:00
Kconfig tmpfs: support 64-bit inums per-sb 2020-08-07 11:33:24 -07:00
Kconfig.binfmt treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile init: add an init_mount helper 2020-07-31 08:17:51 +02:00
aio.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
anon_inodes.c
attr.c
bad_inode.c fs: move the fiemap definitions out of fs.h 2020-06-03 23:16:55 -04:00
binfmt_aout.c
binfmt_elf.c kill elf_fpxregs_t 2020-07-27 14:29:23 -04:00
binfmt_elf_fdpic.c Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
binfmt_em86.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
binfmt_flat.c binfmt_flat: revert "binfmt_flat: don't offset the data start" 2020-08-24 08:49:13 +10:00
binfmt_misc.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
binfmt_script.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
block_dev.c for-5.9/io_uring-20200802 2020-08-03 13:01:22 -07:00
buffer.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
char_dev.c
compat.c
compat_binfmt_elf.c Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK 2020-06-05 13:45:21 -07:00
coredump.c coredump: add %f for executable filename 2020-08-12 10:58:01 -07:00
d_path.c
dax.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
dcache.c vfs: Use sequence counter with associated spinlock 2020-07-29 16:14:27 +02:00
dcookies.c
direct-io.c block: remove the bd_queue field from struct block_device 2020-07-01 08:08:20 -06:00
drop_caches.c
eventfd.c
eventpoll.c ep_create_wakeup_source(): dentry name can change under you... 2020-09-24 19:41:58 -04:00
exec.c mm/gup: remove task_struct pointer for all gup code 2020-08-12 10:58:04 -07:00
fcntl.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
fhandle.c
file.c Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 09:40:34 -07:00
file_table.c Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes" 2020-06-29 09:40:55 -07:00
filesystems.c
fs-writeback.c fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype 2020-09-19 13:13:39 -07:00
fs_context.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
fs_parser.c
fs_pin.c
fs_struct.c vfs: Use sequence counter with associated spinlock 2020-07-29 16:14:27 +02:00
fs_types.c
fsopen.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
init.c init: add an init_dup helper 2020-08-04 21:02:38 -04:00
inode.c AFS Changes 2020-06-05 16:26:36 -07:00
internal.h Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 09:40:34 -07:00
io-wq.c io-wq: fix hang after cancelling pending hashed work 2020-08-23 11:38:50 -06:00
io-wq.h io_uring/io-wq: move RLIMIT_FSIZE to io-wq 2020-07-24 13:00:44 -06:00
io_uring.c io_uring-5.9-2020-10-02 2020-10-02 14:38:10 -07:00
ioctl.c fs: remove ksys_ioctl 2020-07-31 08:16:01 +02:00
libfs.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
locks.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mbcache.c
mount.h
mpage.c fs: convert mpage_readpages to mpage_readahead 2020-06-02 10:59:07 -07:00
namei.c exec: restore EACCES of S_ISDIR execve() 2020-08-14 19:56:56 -07:00
namespace.c Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 21:03:25 -07:00
no-block.c
nsfs.c
open.c exec: move S_ISREG() check earlier 2020-08-12 10:58:01 -07:00
pipe.c pipe: remove pipe_wait() and fix wakeup race with splice 2020-10-01 19:14:36 -07:00
pnode.c
pnode.h
posix_acl.c vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK 2020-06-08 11:04:19 -07:00
proc_namespace.c Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-06-04 13:54:34 -07:00
read_write.c autofs: use __kernel_write() for the autofs pipe writing 2020-09-29 17:18:34 -07:00
readdir.c fs: remove ksys_getdents64 2020-07-31 08:16:00 +02:00
select.c pselect6() and friends: take handling the combined 6th/7th args into helper 2020-05-29 19:10:42 -04:00
seq_file.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
signalfd.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
splice.c pipe: remove pipe_wait() and fix wakeup race with splice 2020-10-01 19:14:36 -07:00
stack.c
stat.c New code for 5.8: 2020-06-02 19:45:12 -07:00
statfs.c
super.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-06-10 16:09:11 -07:00
sync.c overlayfs update for 5.8 2020-06-09 15:40:50 -07:00
timerfd.c
userfaultfd.c A set of locking fixes and updates: 2020-08-10 19:07:44 -07:00
utimes.c fs: expose utimes_common 2020-07-31 08:16:01 +02:00
xattr.c xattr: add a function to check if a namespace is supported 2020-07-13 17:27:03 -04:00