OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nikolay Borisov	7087a9d8db	btrfs: Remove extent_io_ops::writepage_end_io_hook This callback is ony ever called for data page writeout so there is no need to actually abstract it via extent_io_ops. Lets just export it, remove the definition of the callback and call it directly in the functions that invoke the callback. Also rename the function to btrfs_writepage_endio_finish_ordered since what it really does is account finished io in the ordered extent data structures. No functional changes. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-12-17 14:51:28 +01:00
Linus Torvalds	dad4f140ed	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax Pull XArray conversion from Matthew Wilcox: "The XArray provides an improved interface to the radix tree data structure, providing locking as part of the API, specifying GFP flags at allocation time, eliminating preloading, less re-walking the tree, more efficient iterations and not exposing RCU-protected pointers to its users. This patch set 1. Introduces the XArray implementation 2. Converts the pagecache to use it 3. Converts memremap to use it The page cache is the most complex and important user of the radix tree, so converting it was most important. Converting the memremap code removes the only other user of the multiorder code, which allows us to remove the radix tree code that supported it. I have 40+ followup patches to convert many other users of the radix tree over to the XArray, but I'd like to get this part in first. The other conversions haven't been in linux-next and aren't suitable for applying yet, but you can see them in the xarray-conv branch if you're interested" * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits) radix tree: Remove multiorder support radix tree test: Convert multiorder tests to XArray radix tree tests: Convert item_delete_rcu to XArray radix tree tests: Convert item_kill_tree to XArray radix tree tests: Move item_insert_order radix tree test suite: Remove multiorder benchmarking radix tree test suite: Remove __item_insert memremap: Convert to XArray xarray: Add range store functionality xarray: Move multiorder_check to in-kernel tests xarray: Move multiorder_shrink to kernel tests xarray: Move multiorder account test in-kernel radix tree test suite: Convert iteration test to XArray radix tree test suite: Convert tag_tagged_items to XArray radix tree: Remove radix_tree_clear_tags radix tree: Remove radix_tree_maybe_preload_order radix tree: Remove split/join code radix tree: Remove radix_tree_update_node_t page cache: Finish XArray conversion dax: Convert page fault handlers to XArray ...	2018-10-28 11:35:40 -07:00
Matthew Wilcox	0a943c65e7	btrfs: Convert page cache to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: David Sterba <dsterba@suse.com>	2018-10-21 10:46:41 -04:00
Colin Ian King	29c5e5d496	btrfs: remove unused pointer 'tree' in btrfs_submit_compressed_read Pointer 'tree' is being assigned but is never used hence it is redundant and can be removed. This is a leftover from cleanup patch `00032d38ea` ("btrfs: drop extent_io_ops::merge_bio_hook callback"). Cleans up clang warning: warning: variable 'tree' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-10-15 17:23:28 +02:00
Matthew Wilcox	3159f943aa	xarray: Replace exceptional entries Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>	2018-09-29 22:47:49 -04:00
David Sterba	00032d38ea	btrfs: drop extent_io_ops::merge_bio_hook callback The data and metadata callback implementation both use the same function. We can remove the call indirection completely. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:56 +02:00
David Sterba	ebcc326316	btrfs: open-code bio_set_op_attrs The helper is trivial and marked as deprecated. Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:44 +02:00
David Sterba	d7f663fa3f	btrfs: prune unused includes Remove includes if none of the interfaces and exports is used in the given source file. Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:43 +02:00
David Sterba	093258e6eb	btrfs: replace waitqueue_actvie with cond_wake_up Use the wrappers and reduce the amount of low-level details about the waitqueue management. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-05-28 18:23:09 +02:00
Linus Torvalds	e37563bb6c	for-4.17-part2-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlrTQ4sACgkQxWXV+ddt WDti3Q/+MAeqsLTjvre2RQ3ka5hNyCuVftUIBmcP3YfJbt+xZYQyaewW4Xkfi3cm cbJE+zehzf5ag+RJhxk3OwFTvNfLGIO9asWs3b08NGUi6VzwL0/8B/iOdZPuHSAV TrecQIBE2Tp+xax9cQEnxav34D4dUtXNaDweGjp1MIIUkDneQP/I0vlTu7vafBgX UVxP6riL/MCs7sjTHGIPs0lv8L/fgdmo+dk5SnNuIPTOcFTQXgVrtHjw9IvbKWd4 aq+sbNWoSrhXUfllbFg/wZqDe9tWn9E2f6m/H0ThSoNdxusSVgacOjFRYh20NKLW WGB8Amd/ItGtJwJ1CIypa7VX2U11UAi0XT7BeiK82rUNEJ6moRqFOXG861gRLoTZ SpH8uWO+e+CogfXob1KCndn5lot4AM2ZTkCqfrjpM35Nul72PZdne0CxNlmiRupY Fdt5GB+sg8plcMaRiYr++BbbHP5tggX1MrhLGEbx2XBs2eRdn+2Lv2I1Ig/U4NUb Vf+xk/tFLKGOTSZlbv7SV7ekXxG/3+7gAuL7A1XMETZCwBF4L3hwyW7CgkEBb3PC TqX8BwMaRpyp/FgW/QL6edjXZ3a64VaHIfqRPNks3lWWcCHVbzyiVPfEjx+AEJ/0 abx6jXTJhLUBPuPxEgb5rsSv62RoxPoYHqCErrG95XwnZ8rSCts= =I0y0 -----END PGP SIGNATURE----- Merge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs updates from David Sterba: "We have queued a few more fixes (error handling, log replay, softlockup) and the rest is SPDX updates that touche almost all files so the diffstat is long" * tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Only check first key for committed tree blocks btrfs: add SPDX header to Kconfig btrfs: replace GPL boilerplate by SPDX -- sources btrfs: replace GPL boilerplate by SPDX -- headers Btrfs: fix loss of prealloc extents past i_size after fsync log replay Btrfs: clean up resources during umount after trans is aborted btrfs: Fix possible softlock on single core machines Btrfs: bail out on error during replay_dir_deletes Btrfs: fix NULL pointer dereference in log_dir_items	2018-04-15 18:08:35 -07:00
David Sterba	c1d7c514f7	btrfs: replace GPL boilerplate by SPDX -- sources Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>	2018-04-12 16:29:51 +02:00
Matthew Wilcox	b93b016313	page cache: use xa_lock Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:39 -07:00
David Sterba	e67c718b5b	btrfs: add more __cold annotations The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which branches are more/less likely to be taken without any other annotations needed. Though we can't add more __exit annotations, it's still possible to add __cold (that's also added with __exit). That way the following function categories are tagged: - printf wrappers, error messages - exit helpers Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:39 +02:00
Linus Torvalds	31466f3ed7	for-4.16-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlpvikQACgkQxWXV+ddt WDs6qA//ZE7eEH0sKpD4Z+3gUevk/MMXwE9prRijEdjXz/K/UXtvpq0sI7HMQskZ Ls9Wmzof+3WEQoa08RQZFzwuclW1Udm09SqE2oHP2gXQB5rC0BtWdrlMaKUJy03y NUwxHetbE6TsFLU5HIVmi05NexNx9SVV6oJTWt00RlXTePw9Aoc88ikoXXUE2vqH wbH9/ccmM9EkDFxdG+YG5QX054kQV8/5RXdqBJnIiGVRX5ZsAY84AN9x9YoRCVUw wq9TfPu6XmeA6Uq6knpeLlXDms5w+FE3n5CduROk7Q7YNgpoZekF20c8uK8HzT4T KF8hc0QpQgRCVBJ8I4MbPSMRIDf3IWfZmWSDEDda/6/ep6Bl99b8PFvdDKDBMUct 8wsgGrwGbHuz2l2QUIXjpBL9Cv9Tbu8vjmg0h2hFrpiH1c8JaXjKtJXAMtigWsZ1 DdX+5Y0zqvV/YLpzKF4aMDWXIteN4qaznvjdmj3B7BxgcnITOV/cmPCyMplNrtUa Cs2fzGV5tpxhBzxE490v+frMULmLq2W1e6WPfFCqPKBCqulcR75TozDQS9M2/h4k uAZzVKoguHrUPP1ONVas9aVC05K473nbYPd28eecYwkZ4z32hK4SAdnQY1aPreTe axoV7p7a+i1bkzT5LK6gIfpddVWth8w45nz4P0lwxp0Z6XlkbJE= =Irul -----END PGP SIGNATURE----- Merge tag 'for-4.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features or user visible changes: - fallocate: implement zero range mode - avoid losing data raid profile when deleting a device - tree item checker: more checks for directory items and xattrs Notable fixes: - raid56 recovery: don't use cached stripes, that could be potentially changed and a later RMW or recovery would lead to corruptions or failures - let raid56 try harder to rebuild damaged data, reading from all stripes if necessary - fix scrub to repair raid56 in a similar way as in the case above Other: - cleanups: device freeing, removed some call indirections, redundant bio_put/_get, unused parameters, refactorings and renames - RCU list traversal fixups - simplify mount callchain, remove recursing back when mounting a subvolume - plug for fsync, may improve bio merging on multiple devices - compression heurisic: replace heap sort with radix sort, gains some performance - add extent map selftests, buffered write vs dio" * tag 'for-4.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (155 commits) btrfs: drop devid as device_list_add() arg btrfs: get device pointer from device_list_add() btrfs: set the total_devices in device_list_add() btrfs: move pr_info into device_list_add btrfs: make btrfs_free_stale_devices() to match the path btrfs: rename btrfs_free_stale_devices() arg to skip_dev btrfs: make btrfs_free_stale_devices() argument optional btrfs: make btrfs_free_stale_device() to iterate all stales btrfs: no need to check for btrfs_fs_devices::seeding btrfs: Use IS_ALIGNED in btrfs_truncate_block instead of opencoding it Btrfs: noinline merge_extent_mapping Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Btrfs: extent map selftest: dio write vs dio read Btrfs: extent map selftest: buffered write vs dio read Btrfs: add extent map selftests Btrfs: move extent map specific code to extent_map.c Btrfs: add helper for em merge logic Btrfs: fix unexpected EEXIST from btrfs_get_extent Btrfs: fix incorrect block_len in merge_extent_mapping btrfs: Remove unused readahead spinlock ...	2018-01-29 14:04:23 -08:00
Nikolay Borisov	32506af595	btrfs: Remove redundant bio_get/set calls in compressed read/write paths bio_get/set is necessary only if the bio is going to be referenced following submissions. In the code paths where such calls are made we don't really need them since the bio is referenced only if btrfs_map_bio returns an error. And this function can return an error prior to submission only. So referencing the bio is safe. Furthermore we do call bio_endio which will consume the last reference. So let's remove the redundant calls. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:19 +01:00
David Sterba	36243c9199	btrfs: heuristic: call get4bits directly As it's a single instance and local to the file, we don't need to pass it as an argument. Reviewed-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:19 +01:00
David Sterba	7add17befc	btrfs: heuristic: open code copy_call callback of radix sort The callback is trivial and we don't need the abstraction for our purposes. Let's open code it. Reviewed-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:19 +01:00
David Sterba	23ae8c63aa	btrfs: heuristic: open code get_num callback of radix sort The callback is trivial and we don't need the abstraction for our purposes. Let's open code it and also make the array types explicit. Reviewed-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:19 +01:00
David Sterba	e128f9c3f7	btrfs: compression: add helper for type to string conversion There are several places opencoding this conversion, add a helper now that we have 3 compression algorithms. Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:16 +01:00
Timofey Titovets	440c840cb4	Btrfs: compression heuristic: replace heap sort with radix sort Slowest part of heuristic for now is kernel heap sort() It's can take up to 55% of runtime on sorting bucket items. As sorting will always call on most data sets to get correctly byte_core_set_size, the only way to speed up heuristic, is to speed up sort on bucket. Add a general radix_sort function. Radix sort require 2 buffers, one full size of input array and one for store counters (jump addresses). That increase usage per heuristic workspace +1KiB 8KiB + 1KiB -> 8KiB + 2KiB That is LSD Radix, i use 4 bit as a base for calculating, to make counters array acceptable small (16 elements * 8 byte). That Radix sort implementation have several points to adjust, I added him to make radix sort general usable in kernel, like heap sort, if needed. Performance tested in userspace copy of heuristic code, throughput: - average <-> random data: ~3500 MiB/s - heap sort - average <-> random data: ~6000 MiB/s - radix sort Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> [ coding style fixes ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:15 +01:00
Ming Lei	c45a8f2def	fs: convert to bio_last_bvec_all() This patch converts 3 users to bio_last_bvec_all(), so that we can go ahead and convert to multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-06 09:18:00 -07:00
Ming Lei	263663cd3c	block: convert to bio_first_bvec_all & bio_first_page_all This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-06 09:18:00 -07:00
Qu Wenruo	eae8d82529	btrfs: Fix wild memory access in compression level parser [BUG] Kernel panic when mounting with "-o compress" mount option. KASAN will report like: ------ ================================================================== BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0 Read of size 1 at addr d86735fce994f800 by task mount/662 ... Call Trace: dump_stack+0xe3/0x175 kasan_report+0x163/0x370 __asan_load1+0x47/0x50 strncmp+0x31/0xc0 btrfs_compress_str2level+0x20/0x70 [btrfs] btrfs_parse_options+0xff4/0x1870 [btrfs] open_ctree+0x2679/0x49f0 [btrfs] btrfs_mount+0x1b7f/0x1d30 [btrfs] mount_fs+0x49/0x190 vfs_kern_mount.part.29+0xba/0x280 vfs_kern_mount+0x13/0x20 btrfs_mount+0x31e/0x1d30 [btrfs] mount_fs+0x49/0x190 vfs_kern_mount.part.29+0xba/0x280 do_mount+0xaad/0x1a00 SyS_mount+0x98/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe ------ [Cause] For 'compress' and 'compress_force' options, its token doesn't expect any parameter so its args[0] contains uninitialized data. Accessing args[0] will cause above wild memory access. [Fix] For Opt_compress and Opt_compress_force, set compression level to the default. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ set the default in advance ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-27 17:01:11 +01:00
Liu Bo	f82b735936	Btrfs: add write_flags for compression bio Compression code path has only flaged bios with REQ_OP_WRITE no matter where the bios come from, but it could be a sync write if fsync starts this writeback or a normal writeback write if wb kthread starts a periodic writeback. It breaks the rule that sync writes and writeback writes need to be differentiated from each other, because from the POV of block layer, all bios need to be recognized by these flags in order to do some management, e.g. throttlling. This passes writeback_control to compression write path so that it can send bios with proper flags to block layer. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-15 14:44:31 +01:00
Timofey Titovets	19562430c6	Btrfs: heuristic: add Shannon entropy calculation Byte distribution check in heuristic will filter edge data cases and some time fail to classify input data. Let's fix that by adding Shannon entropy calculation, that will cover classification of most other data types. As Shannon entropy needs log2 with some precision to work, let's use ilog2(N) and for increased precision, by do ilog2(pow(N, 4)). Shannon entropy has been slightly changed to avoid signed numbers and division. The calculation is direct by the formula, successor of precalculated table or chains of if-else. The accuracy errors of ilog2 are compensated by @ENTROPY_LVL_ACEPTABLE 70 -> 65 @ENTROPY_LVL_HIGH 85 -> 80 Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:36 +01:00
Timofey Titovets	858177d38d	Btrfs: heuristic: add byte core set calculation Calculate byte core set for data sample: - sort buckets' numbers in decreasing order - count how many values cover 90% of the sample If the core set size is low (<=25%), data are easily compressible. If the core set size is high (>=80%), data are not compressible. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:36 +01:00
Timofey Titovets	a288e92cac	Btrfs: heuristic: add byte set calculation Calculate byte set size for data sample: - calculate how many unique bytes have been in the sample - for all bytes count > 0, check if we're still in the low count range (~25%), such data are easily compressible, otherwise furhter analysis is needed Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:36 +01:00
Timofey Titovets	1fe4f6fa5a	Btrfs: heuristic: add detection of repeated data patterns Walk over data sample and use memcmp to detect repeated patterns, like zeros, but a bit more general. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor coding style fixes ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:36 +01:00
Timofey Titovets	a440d48c7f	Btrfs: heuristic: implement sampling logic Copy sample data from the input data range to sample buffer then calculate byte value count for that sample into bucket. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> [ minor comment updates ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:36 +01:00
Timofey Titovets	17b5a6c17e	Btrfs: heuristic: add bucket and sample counters and other defines Add basic defines and structures for data sampling. Added macros: - For future sampling algo - For bucket size Heuristic workspace: - Add bucket for storing byte type counters - Add sample array for storing partial copy of input data range - Add counter for store current sample size to workspace Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor coding style fixes, comments updated ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:36 +01:00
Timofey Titovets	4e439a0b18	Btrfs: compression: separate heuristic/compression workspaces Compression heuristic itself is not a compression type, as current infrastructure provides workspaces for several compression types, it's difficult to just add heuristic workspace. Just refactor the code to support compression/heuristic workspaces with maximum code sharing and minimum changes in it. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ coding style fixes ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:35 +01:00
Adam Borowski	fa4d885a48	btrfs: allow setting zlib compression level via :9 This is bikeshedding, but it seems people are drastically more likely to understand "zlib:9" as compression level rather than an algorithm version compared to "zlib9". Based on feedback on the mailinglist, the ":9" will be the only accepted syntax. The level must be a single digit. Unrecognized format will result to the default, for forward compatibility in a similar way the compression algorithm specifier was relaxed in commit `a7164fa4e0` ("btrfs: prepare for extensions in compression options"). Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: David Sterba <dsterba@suse.com> [ tighten the accepted format ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:34 +01:00
David Sterba	f51d2b5912	btrfs: allow to set compression level for zlib Preliminary support for setting compression level for zlib, the following works: $ mount -o compess=zlib # default $ mount -o compess=zlib0 # same $ mount -o compess=zlib9 # level 9, slower sync, less data $ mount -o compess=zlib1 # level 1, faster sync, more data $ mount -o remount,compress=zlib3 # level set by remount The compress-force works the same as compress'. The level is visible in the same format in /proc/mounts. Level set via file property does not work yet. Required patch: "btrfs: prepare for extensions in compression options" Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-01 20:45:29 +01:00
Anand Jain	2dbe0c7718	btrfs: use BLK_STS defines where needed At few places we could use BLK_STS_OK and BLK_STS_NOSUPP. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Satoru Taekeuchi <satoru.takeuchi@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ dropped first hunk btrfs_endio_direct_read ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-10-30 12:28:01 +01:00
Linus Torvalds	5ba88cd6e9	Merge branch 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've collected a bunch of isolated fixes, for crashes, user-visible behaviour or missing bits from other subsystem cleanups from the past. The overall number is not small but I was not able to make it significantly smaller. Most of the patches are supposed to go to stable" * 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: log csums for all modified extents Btrfs: fix unexpected result when dio reading corrupted blocks btrfs: Report error on removing qgroup if del_qgroup_item fails Btrfs: skip checksum when reading compressed data if some IO have failed Btrfs: fix kernel oops while reading compressed data Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block Btrfs: do not backup tree roots when fsync btrfs: remove BTRFS_FS_QUOTA_DISABLING flag btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: prevent to set invalid default subvolid Btrfs: send: fix error number for unknown inode types btrfs: fix NULL pointer dereference from free_reloc_roots() btrfs: finish ordered extent cleaning if no progress is found btrfs: clear ordered flag on cleaning up ordered extents Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO Btrfs: do not reset bio->bi_ops while writing bio Btrfs: use the new helper wbc_to_write_flags	2017-09-29 12:57:35 -07:00
Liu Bo	e6311f240c	Btrfs: skip checksum when reading compressed data if some IO have failed Currently even if the underlying disk reports failure on IO, compressed read endio still gets to verify checksum and reports it as a checksum error. In fact, if some IO have failed during reading a compressed data extent , there's no way the checksum could match, therefore, we can skip that in order to return error quickly to the upper layer. Please note that we need to do this after recording the failed mirror index so that read-repair in the upper layer's endio can work properly. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Paul Jones <paul@pauljones.id.au> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-09-26 14:53:26 +02:00
Liu Bo	cf1167d5c1	Btrfs: fix kernel oops while reading compressed data The kernel oops happens at kernel BUG at fs/btrfs/extent_io.c:2104! ... RIP: clean_io_failure+0x263/0x2a0 [btrfs] It's showing that read-repair code is using an improper mirror index. This is due to the fact that compression read's endio hasn't recorded the failed mirror index in %cb->orig_bio. With this, btrfs's read-repair can work properly on reading compressed data. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reported-by: Paul Jones <paul@pauljones.id.au> Tested-by: Paul Jones <paul@pauljones.id.au> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-09-26 14:53:23 +02:00
Linus Torvalds	e7cdb60fd2	Merge branch 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull zstd support from Chris Mason: "Nick Terrell's patch series to add zstd support to the kernel has been floating around for a while. After talking with Dave Sterba, Herbert and Phillip, we decided to send the whole thing in as one pull request. zstd is a big win in speed over zlib and in compression ratio over lzo, and the compression team here at FB has gotten great results using it in production. Nick will continue to update the kernel side with new improvements from the open source zstd userland code. Nick has a number of benchmarks for the main zstd code in his lib/zstd commit: I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is 211,988,480 B large. Run the following commands for the benchmark: sudo modprobe zstd_compress_test sudo mknod zstd_compress_test c 245 0 sudo cp silesia.tar zstd_compress_test The time is reported by the time of the userland `cp`. The MB/s is computed with 1,536,217,008 B / time(buffer size, hash) which includes the time to copy from userland. The Adjusted MB/s is computed with 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)). The memory reported is the amount of memory the compressor requests. \| Method \| Size (B) \| Time (s) \| Ratio \| MB/s \| Adj MB/s \| Mem (MB) \| \|----------\|----------\|----------\|-------\|---------\|----------\|----------\| \| none \| 11988480 \| 0.100 \| 1 \| 2119.88 \| - \| - \| \| zstd -1 \| 73645762 \| 1.044 \| 2.878 \| 203.05 \| 224.56 \| 1.23 \| \| zstd -3 \| 66988878 \| 1.761 \| 3.165 \| 120.38 \| 127.63 \| 2.47 \| \| zstd -5 \| 65001259 \| 2.563 \| 3.261 \| 82.71 \| 86.07 \| 2.86 \| \| zstd -10 \| 60165346 \| 13.242 \| 3.523 \| 16.01 \| 16.13 \| 13.22 \| \| zstd -15 \| 58009756 \| 47.601 \| 3.654 \| 4.45 \| 4.46 \| 21.61 \| \| zstd -19 \| 54014593 \| 102.835 \| 3.925 \| 2.06 \| 2.06 \| 60.15 \| \| zlib -1 \| 77260026 \| 2.895 \| 2.744 \| 73.23 \| 75.85 \| 0.27 \| \| zlib -3 \| 72972206 \| 4.116 \| 2.905 \| 51.50 \| 52.79 \| 0.27 \| \| zlib -6 \| 68190360 \| 9.633 \| 3.109 \| 22.01 \| 22.24 \| 0.27 \| \| zlib -9 \| 67613382 \| 22.554 \| 3.135 \| 9.40 \| 9.44 \| 0.27 \| I benchmarked zstd decompression using the same method on the same machine. The benchmark file is located in the upstream zstd repo under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is the amount of memory required to decompress data compressed with the given compression level. If you know the maximum size of your input, you can reduce the memory usage of decompression irrespective of the compression level. \| Method \| Time (s) \| MB/s \| Adjusted MB/s \| Memory (MB) \| \|----------\|----------\|---------\|---------------\|-------------\| \| none \| 0.025 \| 8479.54 \| - \| - \| \| zstd -1 \| 0.358 \| 592.15 \| 636.60 \| 0.84 \| \| zstd -3 \| 0.396 \| 535.32 \| 571.40 \| 1.46 \| \| zstd -5 \| 0.396 \| 535.32 \| 571.40 \| 1.46 \| \| zstd -10 \| 0.374 \| 566.81 \| 607.42 \| 2.51 \| \| zstd -15 \| 0.379 \| 559.34 \| 598.84 \| 4.61 \| \| zstd -19 \| 0.412 \| 514.54 \| 547.77 \| 8.80 \| \| zlib -1 \| 0.940 \| 225.52 \| 231.68 \| 0.04 \| \| zlib -3 \| 0.883 \| 240.08 \| 247.07 \| 0.04 \| \| zlib -6 \| 0.844 \| 251.17 \| 258.84 \| 0.04 \| \| zlib -9 \| 0.837 \| 253.27 \| 287.64 \| 0.04 \| I ran a long series of tests and benchmarks on the btrfs side and the gains are very similar to the core benchmarks Nick ran" * 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: squashfs: Add zstd support btrfs: Add zstd support lib: Add zstd modules lib: Add xxhash module	2017-09-14 17:30:49 -07:00
Timofey Titovets	c2fcdcdf36	Btrfs: add skeleton code for compression heuristic Add skeleton code for compresison heuristics. Now it iterates over all the pages, but in the end always says "yes, compress please", ie it does not change the current behaviour. In the future we're going to add various heuristics to analyze the data. This patch can be used as a baseline for measuring if the effectivness and performance. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhanced changelog, modified comments ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-08-16 16:12:04 +02:00
Nick Terrell	26b28dce50	btrfs: Keep one more workspace around find_workspace() allocates up to num_online_cpus() + 1 workspaces. free_workspace() will only keep num_online_cpus() workspaces. When (de)compressing we will allocate num_online_cpus() + 1 workspaces, then free one, and repeat. Instead, we can just keep num_online_cpus() + 1 workspaces around, and never have to allocate/free another workspace in the common case. I tested on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. I mounted a BtrFS partition with -o compress-force={lzo,zlib,zstd} and logged whenever a workspace was allocated of freed. Then I copied vmlinux (527 MB) to the partition. Before the patch, during the copy it would allocate and free 5-6 workspaces. After, it only allocated the initial 3. This held true for lzo, zlib, and zstd. The time it took to execute cp vmlinux /mnt/btrfs && sync dropped from 1.70s to 1.44s with lzo compression, and from 2.04s to 1.80s for zstd compression. Signed-off-by: Nick Terrell <terrelln@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-08-16 16:12:02 +02:00
Nick Terrell	5c1aab1dd5	btrfs: Add zstd support Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much faster compression and decompression, approaching lzo speeds. I benchmarked btrfs with zstd compression against no compression, lzo compression, and zlib compression. I benchmarked two scenarios. Copying a set of files to btrfs, and then reading the files. Copying a tarball to btrfs, extracting it to btrfs, and then reading the extracted files. After every operation, I call `sync` and include the sync time. Between every pair of operations I unmount and remount the filesystem to avoid caching. The benchmark files can be found in the upstream zstd source repository under `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}` [1] [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. The first compression benchmark is copying 10 copies of the unzipped Silesia corpus [3] into a BtrFS filesystem mounted with `-o compress-force=Method`. The decompression benchmark times how long it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is measured by comparing the output of `df` and `du`. See the benchmark file [1] for details. I benchmarked multiple zstd compression levels, although the patch uses zstd level 1. \| Method \| Ratio \| Compression MB/s \| Decompression speed \| \|---------\|-------\|------------------\|---------------------\| \| None \| 0.99 \| 504 \| 686 \| \| lzo \| 1.66 \| 398 \| 442 \| \| zlib \| 2.58 \| 65 \| 241 \| \| zstd 1 \| 2.57 \| 260 \| 383 \| \| zstd 3 \| 2.71 \| 174 \| 408 \| \| zstd 6 \| 2.87 \| 70 \| 398 \| \| zstd 9 \| 2.92 \| 43 \| 406 \| \| zstd 12 \| 2.93 \| 21 \| 408 \| \| zstd 15 \| 3.01 \| 11 \| 354 \| The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it measures the compression ratio, extracts the tar, and deletes the tar. Then it measures the compression ratio again, and `tar`s the extracted files into `/dev/null`. See the benchmark file [2] for details. \| Method \| Tar Ratio \| Extract Ratio \| Copy (s) \| Extract (s)\| Read (s) \| \|--------\|-----------\|---------------\|----------\|------------\|----------\| \| None \| 0.97 \| 0.78 \| 0.981 \| 5.501 \| 8.807 \| \| lzo \| 2.06 \| 1.38 \| 1.631 \| 8.458 \| 8.585 \| \| zlib \| 3.40 \| 1.86 \| 7.750 \| 21.544 \| 11.744 \| \| zstd 1 \| 3.57 \| 1.85 \| 2.579 \| 11.479 \| 9.389 \| [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2017-08-15 09:02:09 -07:00
Linus Torvalds	bc243704fb	Merge branch 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've identified and fixed a silent corruption (introduced by code in the first pull), a fixup after the blk_status_t merge and two fixes to incremental send that Filipe has been hunting for some time" * 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unexpected return value of bio_readpage_error btrfs: btrfs_create_repair_bio never fails, skip error handling btrfs: cloned bios must not be iterated by bio_for_each_segment_all Btrfs: fix write corruption due to bio cloning on raid5/6 Btrfs: incremental send, fix invalid memory access Btrfs: incremental send, fix invalid path for link commands	2017-07-14 22:55:52 -07:00
David Sterba	c09abff87f	btrfs: cloned bios must not be iterated by bio_for_each_segment_all We've started using cloned bios more in 4.13, there are some specifics regarding the iteration. Filipe found [1] that the raid56 iterated a cloned bio using bio_for_each_segment_all, which is incorrect. The cloned bios have wrong bi_vcnt and this could lead to silent corruptions. This patch adds assertions to all remaining bio_for_each_segment_all cases. [1] https://patchwork.kernel.org/patch/9838535/ Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-07-14 20:39:31 +02:00
Linus Torvalds	8c27cb3566	Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The core updates improve error handling (mostly related to bios), with the usual incremental work on the GFP_NOFS (mis)use removal, refactoring or cleanups. Except the two top patches, all have been in for-next for an extensive amount of time. User visible changes: - statx support - quota override tunable - improved compression thresholds - obsoleted mount option alloc_start Core updates: - bio-related updates: - faster bio cloning - no allocation failures - preallocated flush bios - more kvzalloc use, memalloc_nofs protections, GFP_NOFS updates - prep work for btree_inode removal - dir-item validation - qgoup fixes and updates - cleanups: - removed unused struct members, unused code, refactoring - argument refactoring (fs_info/root, caller -> callee sink) - SEARCH_TREE ioctl docs" * 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits) btrfs: Remove false alert when fiemap range is smaller than on-disk extent btrfs: Don't clear SGID when inheriting ACLs btrfs: fix integer overflow in calc_reclaim_items_nr btrfs: scrub: fix target device intialization while setting up scrub context btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges btrfs: qgroup: Introduce extent changeset for qgroup reserve functions btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quotas being enabled btrfs: qgroup: Return actually freed bytes for qgroup release or free data btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function btrfs: qgroup: Add quick exit for non-fs extents Btrfs: rework delayed ref total_bytes_pinned accounting Btrfs: return old and new total ref mods when adding delayed refs Btrfs: always account pinned bytes when dropping a tree block ref Btrfs: update total_bytes_pinned when pinning down extents Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 btrfs: fix validation of XATTR_ITEM dir items btrfs: Verify dir_item in iterate_object_props btrfs: Check name_len before in btrfs_del_root_ref btrfs: Check name_len before reading btrfs_get_name ...	2017-07-05 16:41:23 -07:00
Dan Carpenter	0e9350de2e	btrfs: use new block error code This function is supposed to return blk_status_t error codes now but there was a stray -ENOMEM left behind. Fixes: `4e4cbee93d` ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-21 07:47:34 -06:00
David Sterba	c821e7f3da	btrfs: pass bytes to btrfs_bio_alloc Most callers of btrfs_bio_alloc convert from bytes to sectors. Hide that in the helper and simplify the logic in the callsers. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:03 +02:00
David Sterba	9886b17433	btrfs: opencode trivial compressed_bio_alloc, simplify error handling compressed_bio_alloc is now a trivial wrapper around btrfs_bio_alloc, no point keeping it. The error handling can be simplified, as we know btrfs_bio_alloc will never fail. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:03 +02:00
David Sterba	9f2179a5e7	btrfs: remove redundant parameters from btrfs_bio_alloc All callers pass gfp_flags=GFP_NOFS and nr_vecs=BIO_MAX_PAGES. submit_extent_page adds __GFP_HIGH that does not make a difference in our case as it allows access to memory reserves but otherwise does not change the constraints. Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:03 +02:00
David Sterba	fe30853307	btrfs: add memalloc_nofs protections around alloc_workspace callback The workspaces are preallocated at the beginning where we can safely use GFP_KERNEL, but in some cases the find_workspace might reach the allocation again, now in a more restricted context when the bios or pages are being compressed. To avoid potential lockup when alloc_workspace -> vmalloc would silently use the GFP_KERNEL, add the memalloc_nofs helpers around the critical call site. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:02 +02:00
Anand Jain	e1ddce71d6	btrfs: reduce arguments for decompress_bio ops struct compressed_bio pointer can be used instead. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:00 +02:00

1 2 3 4

160 Commits