OpenCloudOS-Kernel

History

Filipe Manana 73e339e6ab btrfs: cache sharedness of the last few data extents during fiemap During fiemap we process all the file extent items of an inode, by their file offset order (left to right b+tree order), and then check if the data extent they point at is shared or not. Until now we didn't cache those results, we only did it for b+tree nodes/leaves since for each unique b+tree path we have access to hundreds of file extent items. However, it is also common to repeat checking the sharedness of a particular data extent in a very short time window, and the cases that lead to that are the following: 1) COW writes. If have a file extent item like this: [ bytenr X, offset = 0, num_bytes = 512K ] file offset 0 512K Then a 4K write into file offset 64K happens, we end up with the following file extent item layout: [ bytenr X, offset = 0, num_bytes = 64K ] file offset 0 64K [ bytenr Y, offset = 0, num_bytes = 4K ] file offset 64K 68K [ bytenr X, offset = 68K, num_bytes = 444K ] file offset 68K 512K So during fiemap we well check for the sharedness of the data extent with bytenr X twice. Typically for COW writes and for at least moderately updated files, we end up with many file extent items that point to different sections of the same data extent. 2) Writing into a NOCOW file after a snapshot is taken. This happens if the target extent was created in a generation older than the generation where the last snapshot for the root (the tree the inode belongs to) was made. This leads to a scenario like the previous one. 3) Writing into sections of a preallocated extent. For example if a file has the following layout: [ bytenr X, offset = 0, num_bytes = 1M, type = prealloc ] 0 1M After doing a 4K write into file offset 0 and another 4K write into offset 512K, we get the following layout: [ bytenr X, offset = 0, num_bytes = 4K, type = regular ] 0 4K [ bytenr X, offset = 4K, num_bytes = 508K, type = prealloc ] 4K 512K [ bytenr X, offset = 512K, num_bytes = 4K, type = regular ] 512K 516K [ bytenr X, offset = 516K, num_bytes = 508K, type = prealloc ] 516K 1M So we end up with 4 consecutive file extent items pointing to the data extent at bytenr X. 4) Hole punching in the middle of an extent. For example if a file has the following file extent item: [ bytenr X, offset = 0, num_bytes = 8M ] 0 8M And then hole is punched for the file range [4M, 6M[, we our file extent item split into two: [ bytenr X, offset = 0, num_bytes = 4M ] 0 4M [ 2M hole, implicit or explicit depending on NO_HOLES feature ] 4M 6M [ bytenr X, offset = 6M, num_bytes = 2M ] 6M 8M Again, we end up with two file extent items pointing to the same data extent. 5) When reflinking (clone and deduplication) within the same file. This is probably the least common case of all. In cases 1, 2, 4 and 4, when we have multiple file extent items that point to the same data extent, their distance is usually short, typically separated by a few slots in a b+tree leaf (or across sibling leaves). For case 5, the distance can vary a lot, but it's typically the less common case. This change caches the result of the sharedness checks for data extents, but only for the last 8 extents that we notice that our inode refers to with multiple file extent items. Whenever we want to check if a data extent is shared, we lookup the cache which consists of doing a linear scan of an 8 elements array, and if we find the data extent there, we return the result and don't check the extent tree and delayed refs. The array/cache is small so that doing the search has no noticeable negative impact on the performance in case we don't have file extent items within a distance of 8 slots that point to the same data extent. Slots in the cache/array are overwritten in a simple round robin fashion, as that approach fits very well. Using this simple approach with only the last 8 data extents seen is effective as usually when multiple file extents items point to the same data extent, their distance is within 8 slots. It also uses very little memory and the time to cache a result or lookup the cache is negligible. The following test was run on non-debug kernel (Debian's default kernel config) to measure the impact in the case of COW writes (first example given above), where we run fiemap after overwriting 33% of the blocks of a file: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi umount $DEV &> /dev/null mkfs.btrfs -f $DEV mount $DEV $MNT FILE_SIZE=$((1 * 1024 * 1024 * 1024)) # Create the file full of 1M extents. xfs_io -f -s -c "pwrite -b 1M -S 0xab 0 $FILE_SIZE" $MNT/foobar block_count=$((FILE_SIZE / 4096)) # Overwrite about 33% of the file blocks. overwrite_count=$((block_count / 3)) echo -e "\nOverwriting $overwrite_count 4K blocks (out of $block_count)..." RANDOM=123 for ((i = 1; i <= $overwrite_count; i++)); do off=$(((RANDOM % block_count) * 4096)) xfs_io -c "pwrite -S 0xcd $off 4K" $MNT/foobar > /dev/null echo -ne "\r$i blocks overwritten..." done echo -e "\n" # Unmount and mount to clear all cached metadata. umount $MNT mount $DEV $MNT start=$(date +%s%N) filefrag $MNT/foobar end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "fiemap took $dur milliseconds" umount $MNT Result before applying this patch: fiemap took 128 milliseconds Result after applying this patch: fiemap took 92 milliseconds (-28.1%) The test is somewhat limited in the sense the gains may be higher in practice, because in the test the filesystem is small, so we have small fs and extent trees, plus there's no concurrent access to the trees as well, therefore no lock contention there. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2022-12-05 18:00:39 +01:00
..
tests	for-6.1-rc4-tag	2022-11-10 08:58:29 -08:00
Kconfig	btrfs: use generic Kconfig option for 256kB page size limit	2022-01-20 08:52:55 +02:00
Makefile	btrfs: move extent state init and alloc functions to their own file	2022-09-26 12:28:03 +02:00
acl.c	btrfs: reserve correct number of items for inode creation	2022-05-16 17:03:08 +02:00
async-thread.c	btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue	2022-05-16 17:03:15 +02:00
async-thread.h	btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t	2022-07-25 17:45:36 +02:00
backref.c	btrfs: cache sharedness of the last few data extents during fiemap	2022-12-05 18:00:39 +01:00
backref.h	btrfs: cache sharedness of the last few data extents during fiemap	2022-12-05 18:00:39 +01:00
block-group.c	btrfs: move btrfs_should_fragment_free_space into block-group.c	2022-12-05 18:00:37 +01:00
block-group.h	btrfs: move btrfs_should_fragment_free_space into block-group.c	2022-12-05 18:00:37 +01:00
block-rsv.c	btrfs: introduce BTRFS_RESERVE_FLUSH_EMERGENCY	2022-12-05 18:00:38 +01:00
block-rsv.h	btrfs: add KCSAN annotations for unlocked access to block_rsv->full	2022-09-26 12:28:02 +02:00
btrfs_inode.h	btrfs: move btrfs_print_data_csum_error into inode.c	2022-12-05 18:00:37 +01:00
check-integrity.c	fs/btrfs: Use the enum req_op and blk_opf_t types	2022-07-14 12:14:32 -06:00
check-integrity.h	btrfs: check-integrity: split submit_bio from btrfsic checking	2022-05-16 17:03:12 +02:00
compression.c	fs: fix leaked psi pressure state	2022-11-08 15:57:25 -08:00
compression.h	for-5.20-tag	2022-08-03 14:54:52 -07:00
ctree.c	btrfs: move btrfs_next_old_item into ctree.c	2022-12-05 18:00:37 +01:00
ctree.h	btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h	2022-12-05 18:00:37 +01:00
delalloc-space.c	btrfs: add the ability to use NO_FLUSH for data reservations	2022-09-29 17:08:28 +02:00
delalloc-space.h	btrfs: add the ability to use NO_FLUSH for data reservations	2022-09-29 17:08:28 +02:00
delayed-inode.c	btrfs: move flush related definitions to space-info.h	2022-12-05 18:00:37 +01:00
delayed-inode.h	btrfs: use delayed items when logging a directory	2022-09-26 12:27:57 +02:00
delayed-ref.c	btrfs: switch btrfs_block_rsv::full to bool	2022-07-25 17:45:40 +02:00
delayed-ref.h	btrfs: remove btrfs_delayed_extent_op::is_data	2022-05-16 17:17:31 +02:00
dev-replace.c	btrfs: don't take a bio_counter reference for cloned bios	2022-09-26 12:27:58 +02:00
dev-replace.h	btrfs: add struct declarations in dev-replace.h	2022-09-26 12:28:07 +02:00
dir-item.c	btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item	2022-05-16 17:03:07 +02:00
discard.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
discard.h	btrfs: cleanup btrfs_discard_update_discardable usage	2020-12-08 15:54:02 +01:00
disk-io.c	btrfs: move btrfs_get_block_group helper out of disk-io.h	2022-12-05 18:00:36 +01:00
disk-io.h	btrfs: move btrfs_get_block_group helper out of disk-io.h	2022-12-05 18:00:36 +01:00
export.c	btrfs: fix type of parameter generation in btrfs_get_dentry	2022-10-24 15:28:58 +02:00
export.h	btrfs: fix type of parameter generation in btrfs_get_dentry	2022-10-24 15:28:58 +02:00
extent-io-tree.c	btrfs: cache the failed state when locking extents	2022-12-05 18:00:36 +01:00
extent-io-tree.h	btrfs: cache the failed state when locking extents	2022-12-05 18:00:36 +01:00
extent-tree.c	btrfs: fix tree mod log mishandling of reallocated nodes	2022-10-24 15:28:07 +02:00
extent_io.c	btrfs: move ulists to data extent sharedness check context	2022-12-05 18:00:39 +01:00
extent_io.h	btrfs: move extent io tree unrelated prototypes to their appropriate header	2022-09-26 12:28:04 +02:00
extent_map.c	btrfs: get the next extent map during fiemap/lseek more efficiently	2022-12-05 18:00:38 +01:00
extent_map.h	btrfs: get the next extent map during fiemap/lseek more efficiently	2022-12-05 18:00:38 +01:00
file-item.c	btrfs: make can_nocow_extent nowait compatible	2022-09-29 17:08:26 +02:00
file.c	btrfs: skip unnecessary delalloc search during fiemap and lseek	2022-12-05 18:00:38 +01:00
free-space-cache.c	btrfs: move free space cachep's out of ctree.h	2022-12-05 18:00:37 +01:00
free-space-cache.h	btrfs: move free space cachep's out of ctree.h	2022-12-05 18:00:37 +01:00
free-space-tree.c	btrfs: get rid of block group caching progress logic	2022-09-26 12:27:58 +02:00
free-space-tree.h	…
inode-item.c	btrfs: move flush related definitions to space-info.h	2022-12-05 18:00:37 +01:00
inode-item.h	btrfs: add inode to truncate control	2022-01-07 14:18:24 +01:00
inode.c	btrfs: move free space cachep's out of ctree.h	2022-12-05 18:00:37 +01:00
ioctl.c	btrfs: free btrfs_path before copying subvol info to userspace	2022-11-15 17:15:45 +01:00
locking.c	btrfs: implement a nowait option for tree searches	2022-09-26 12:46:42 +02:00
locking.h	btrfs: implement a nowait option for tree searches	2022-09-26 12:46:42 +02:00
lzo.c	btrfs: replace kmap() with kmap_local_page() in lzo.c	2022-07-25 17:45:33 +02:00
misc.h	btrfs: convert the io_failure_tree to a plain rb_tree	2022-09-26 12:28:02 +02:00
ordered-data.c	btrfs: use cached_state for btrfs_check_nocow_lock	2022-12-05 18:00:36 +01:00
ordered-data.h	btrfs: use cached_state for btrfs_check_nocow_lock	2022-12-05 18:00:36 +01:00
orphan.c	…
print-tree.c	btrfs: unify the error handling pattern for read_tree_block()	2022-03-14 13:13:53 +01:00
print-tree.h	btrfs: print the actual offset in btrfs_root_name	2021-01-07 17:25:05 +01:00
props.c	btrfs: move flush related definitions to space-info.h	2022-12-05 18:00:37 +01:00
props.h	btrfs: move common inode creation code into btrfs_create_new_inode()	2022-05-16 17:03:08 +02:00
qgroup.c	btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()	2022-11-21 14:57:52 +01:00
qgroup.h	btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting	2022-09-26 12:28:01 +02:00
raid56.c	btrfs: raid56: make it more explicit that cache rbio should have all its data sectors uptodate	2022-12-05 18:00:38 +01:00
raid56.h	btrfs: properly abstract the parity raid bio handling	2022-09-26 12:27:59 +02:00
rcu-string.h	…
ref-verify.c	btrfs: stop accessing ->extent_root directly	2022-01-03 15:09:49 +01:00
ref-verify.h	…
reflink.c	btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITS	2022-09-26 12:28:05 +02:00
reflink.h	…
relocation.c	btrfs: move flush related definitions to space-info.h	2022-12-05 18:00:37 +01:00
root-tree.c	btrfs: simplify error handling at btrfs_del_root_ref()	2022-09-26 12:27:58 +02:00
scrub.c	btrfs: move BTRFS_MAX_MIRRORS into scrub.c	2022-12-05 18:00:37 +01:00
send.c	btrfs: send: avoid unaligned encoded writes when attempting to clone range	2022-11-21 14:41:41 +01:00
send.h	btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG	2022-10-11 14:46:55 +02:00
space-info.c	btrfs: introduce BTRFS_RESERVE_FLUSH_EMERGENCY	2022-12-05 18:00:38 +01:00
space-info.h	btrfs: introduce BTRFS_RESERVE_FLUSH_EMERGENCY	2022-12-05 18:00:38 +01:00
struct-funcs.c	btrfs: remove redundant check in up check_setget_bounds	2022-07-25 17:45:33 +02:00
subpage.c	btrfs: convert process_page_range() to use filemap_get_folios_contig()	2022-09-11 20:26:03 -07:00
subpage.h	btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine	2022-05-16 17:03:11 +02:00
super.c	btrfs: move free space cachep's out of ctree.h	2022-12-05 18:00:37 +01:00
sysfs.c	btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs()	2022-11-23 16:52:22 +01:00
sysfs.h	…
transaction.c	btrfs: move trans_handle_cachep out of ctree.h	2022-12-05 18:00:37 +01:00
transaction.h	btrfs: move trans_handle_cachep out of ctree.h	2022-12-05 18:00:37 +01:00
tree-checker.c	btrfs: tree-checker: check for overlapping extent items	2022-08-17 16:20:25 +02:00
tree-checker.h	btrfs: tree-checker: check extent buffer owner against owner rootid	2022-05-16 17:03:09 +02:00
tree-defrag.c	btrfs: remove unnecessary extent root check in btrfs_defrag_leaves	2022-01-03 15:09:48 +01:00
tree-log.c	btrfs: do not modify log tree while holding a leaf from fs tree locked	2022-11-23 16:52:15 +01:00
tree-log.h	btrfs: use delayed items when logging a directory	2022-09-26 12:27:57 +02:00
tree-mod-log.c	btrfs: fix race when picking most recent mod log operation for an old root	2021-04-20 19:27:17 +02:00
tree-mod-log.h	btrfs: add and use helper to get lowest sequence number for the tree mod log	2021-04-19 17:25:17 +02:00
ulist.c	…
ulist.h	…
uuid-tree.c	btrfs: drop the _nr from the item helpers	2022-01-03 15:09:43 +01:00
verity.c	btrfs: send: add support for fs-verity	2022-09-26 12:27:55 +02:00
volumes.c	btrfs: zoned: initialize device's zone info for seeding	2022-11-07 14:35:24 +01:00
volumes.h	btrfs: zoned: initialize device's zone info for seeding	2022-11-07 14:35:24 +01:00
xattr.c	btrfs: check if root is readonly while setting security xattr	2022-08-22 18:06:30 +02:00
xattr.h	…
zlib.c	btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio()	2022-07-25 17:45:41 +02:00
zoned.c	btrfs: use kvcalloc in btrfs_get_dev_zone_info	2022-11-23 16:51:50 +01:00
zoned.h	btrfs: zoned: clone zoned device info when cloning a device	2022-11-07 14:35:21 +01:00
zstd.c	btrfs: zstd: replace kmap() with kmap_local_page()	2022-07-25 17:45:40 +02:00