linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Qu Wenruo	dba213242f	btrfs: qgroup: Make qgroup_reserve and its callers to use separate reservation type Since most callers of qgroup_reserve() are already defined by type, converting qgroup_reserve() is quite an easy work. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:13 +02:00
Qu Wenruo	f59c0347d4	btrfs: qgroup: Introduce helpers to update and access new qgroup rsv Introduce helpers to: 1) Get total reserved space For limit calculation 2) Add/release reserved space for given type With underflow detection and warning 3) Add/release reserved space according to child qgroup Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:13 +02:00
Qu Wenruo	d4e5c92055	btrfs: qgroup: Skeleton to support separate qgroup reservation type Instead of single qgroup->reserved, use a new structure btrfs_qgroup_rsv to store different types of reservation. This patch only updates the header needed to compile. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:13 +02:00
Omar Sandoval	0a0d4415e3	Btrfs: delete dead code in btrfs_orphan_add() btrfs_orphan_add() has had this case commented out since it was first introduced in commit `d68fc57b7e` ("Btrfs: Metadata reservation for orphan inodes"). Most of the orphan cleanup code has been rewritten since then, so it's safe to say that this code isn't needed. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ switch to bool ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:12 +02:00
Misono, Tomohiro	4408ea7c5f	btrfs: ctree.h: Fix wrong comment position about csum size Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:12 +02:00
Jeff Mahoney	75cb379d26	btrfs: defer adding raid type kobject until after chunk relocation Any time the first block group of a new type is created, we add a new kobject to sysfs to hold the attributes for that type. Kobject-internal allocations always use GFP_KERNEL, making them prone to fs-reclaim races. While it appears as if this can occur any time a block group is created, the only times the first block group of a new type can be created in memory is at mount and when we create the first new block group during raid conversion. This patch adds a new list to track pending kobject additions and then handles them after we do chunk relocation. Between relocating the target chunk (or forcing allocation of a new chunk in the case of data) and removing the old chunk, we're in a safe place for fs-reclaim to occur. We're holding the volume mutex, which is already held across page faults, and the delete_unused_bgs_mutex, which will only stall the cleaner thread. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:12 +02:00
Jeff Mahoney	dc2d3005d2	btrfs: remove dead create_space_info calls Since commit `2be12ef79` (btrfs: Separate space_info create/update), we've separated out the creation and updating of the space info structures. That commit was a straightforward refactoring of the two parts of update_space_info, but we can go a step further. Since commits `c59021f84` (Btrfs: fix OOPS of empty filesystem after balance) and `b742bb82f` (Btrfs: Link block groups of different raid types), we know that the space_info structures will be created at mount and there will only ever be, at most, three of them. This patch cleans out the create_space_info calls after __find_space_info returns NULL since __find_space_info can't return NULL. The initial cause for reviewing this was the kobject_add calls from create_space_info occuring in sites where fs-reclaim wasn't allowed. Now we are certain they occur only early in the mount process and are safe. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:12 +02:00
Liu Bo	580c6efaf9	Btrfs: replace: cache rbio when rebuild data on missing device Rebuild on missing device is as same as recover, after it's done, rbio has data which is consistent with on-disk data, so it can be cached to avoid further reads. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:12 +02:00
Jeff Mahoney	8a5a916d9a	btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers While running btrfs/011, I hit the following lockdep splat. This is the important bit: pcpu_alloc+0x1ac/0x5e0 __percpu_counter_init+0x4e/0xb0 btrfs_init_fs_root+0x99/0x1c0 [btrfs] btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] resolve_indirect_refs+0x130/0x830 [btrfs] find_parent_nodes+0x69e/0xff0 [btrfs] btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] btrfs_find_all_roots+0x50/0x70 [btrfs] btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] The percpu_counter_init call in btrfs_alloc_subvolume_writers uses GFP_KERNEL, which we can't do during transaction commit. This switches it to GFP_NOFS. ======================================================== WARNING: possible irq lock inversion dependency detected 4.12.14-kvmsmall #8 Tainted: G W -------------------------------------------------------- kswapd0/50 just changed the state of lock: (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] but this lock took another, RECLAIM_FS-unsafe lock in the past: (pcpu_alloc_mutex){+.+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcpu_alloc_mutex); local_irq_disable(); lock(&delayed_node->mutex); lock(&found->groups_sem); <Interrupt> lock(&delayed_node->mutex); * DEADLOCK * 2 locks held by kswapd0/50: #0: (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0 #1: (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50 the shortest dependencies between 2nd lock and 1st lock: -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 { HARDIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 kmem_cache_init_late+0x42/0x75 start_kernel+0x343/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 SOFTIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 kmem_cache_init_late+0x42/0x75 start_kernel+0x343/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 RECLAIM_FS-ON-W at: __kmalloc+0x47/0x310 pcpu_extend_area_map+0x2b/0xc0 pcpu_alloc+0x3ec/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 __kmem_cache_create+0x1bf/0x390 create_cache+0xba/0x1b0 kmem_cache_create+0x1f8/0x2b0 ksm_init+0x6f/0x19d do_one_initcall+0x50/0x1b0 kernel_init_freeable+0x201/0x289 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 INITIAL USE at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 setup_cpu_cache+0x2f/0x1f0 __kmem_cache_create+0x1bf/0x390 create_boot_cache+0x8b/0xb1 kmem_cache_init+0xa1/0x19e start_kernel+0x270/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 } ... key at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0 ... acquired at: pcpu_alloc+0x1ac/0x5e0 __percpu_counter_init+0x4e/0xb0 btrfs_init_fs_root+0x99/0x1c0 [btrfs] btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] resolve_indirect_refs+0x130/0x830 [btrfs] find_parent_nodes+0x69e/0xff0 [btrfs] btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] btrfs_find_all_roots+0x50/0x70 [btrfs] btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] transaction_kthread+0x176/0x1b0 [btrfs] kthread+0x102/0x140 ret_from_fork+0x3a/0x50 -> (&fs_info->commit_root_sem){++++..} ops: 1566382 { HARDIRQ-ON-W at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 HARDIRQ-ON-R at: down_read+0x35/0x90 caching_thread+0x57/0x560 [btrfs] normal_work_helper+0x1c0/0x5e0 [btrfs] process_one_work+0x1e0/0x5c0 worker_thread+0x44/0x390 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 SOFTIRQ-ON-W at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-R at: down_read+0x35/0x90 caching_thread+0x57/0x560 [btrfs] normal_work_helper+0x1c0/0x5e0 [btrfs] process_one_work+0x1e0/0x5c0 worker_thread+0x44/0x390 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 INITIAL USE at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs] ... acquired at: cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] btrfs_create_tree+0xbb/0x2a0 [btrfs] btrfs_create_uuid_tree+0x37/0x140 [btrfs] open_ctree+0x23c0/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> (&found->groups_sem){++++..} ops: 2134587 { HARDIRQ-ON-W at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 HARDIRQ-ON-R at: down_read+0x35/0x90 btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] open_ctree+0x207b/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-W at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-R at: down_read+0x35/0x90 btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] open_ctree+0x207b/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 INITIAL USE at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs] ... acquired at: find_free_extent+0xcb4/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] __btrfs_cow_block+0x110/0x5b0 [btrfs] btrfs_cow_block+0xd7/0x290 [btrfs] btrfs_search_slot+0x1f6/0x960 [btrfs] btrfs_lookup_inode+0x2a/0x90 [btrfs] __btrfs_update_delayed_inode+0x65/0x210 [btrfs] btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs] btrfs_evict_inode+0x3fe/0x6a0 [btrfs] evict+0xc4/0x190 __dentry_kill+0xbf/0x170 dput+0x2ae/0x2f0 SyS_rename+0x2a6/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> (&delayed_node->mutex){+.+.-.} ops: 5580204 { HARDIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 IN-RECLAIM_FS-W at: __mutex_lock+0x4e/0x8c0 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 INITIAL USE at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs] ... acquired at: __lock_acquire+0x264/0x11c0 lock_acquire+0xbd/0x1e0 __mutex_lock+0x4e/0x8c0 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 stack backtrace: CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x78/0xb7 print_irq_inversion_bug.part.38+0x19f/0x1aa check_usage_forwards+0x102/0x120 ? ret_from_fork+0x3a/0x50 ? check_usage_backwards+0x110/0x110 mark_lock+0x16c/0x270 __lock_acquire+0x264/0x11c0 ? pagevec_lookup_entries+0x1a/0x30 ? truncate_inode_pages_range+0x2b3/0x7f0 lock_acquire+0xbd/0x1e0 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] __mutex_lock+0x4e/0x8c0 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ? mem_cgroup_shrink_node+0x2c0/0x2c0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x3a/0x50 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:11 +02:00
Anand Jain	8ba0ae7821	btrfs: drop optimal argument from find_live_mirror() Drop optimal argument from the function find_live_mirror() as we can deduce it in the function itself. Also rename optimal to preferred_mirror. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:11 +02:00
Anand Jain	99f92a7c1e	btrfs: drop num argument from find_live_mirror() Obtain the stripes info from the map directly and so no need to pass it as an argument. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:11 +02:00
Nikolay Borisov	0a1e458a1e	btrfs: Drop fs_info parameter from __btrfs_run_delayed_refs It's provided by transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:11 +02:00
Nikolay Borisov	5ead2dd02c	btrfs: Drop fs_info parameter from btrfs_finish_extent_commit It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:11 +02:00
Nikolay Borisov	460fb20a4b	btrfs: Drop fs_info parameter from btrfs_qgroup_account_extents It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:10 +02:00
Nikolay Borisov	c79a70b133	btrfs: drop fs_info parameter from btrfs_run_delayed_refs It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:10 +02:00
Nikolay Borisov	39d7d09dc2	btrfs: Remove unused flush var in shrink_delalloc Added by `08e007d2e5` ("Btrfs: improve the noflush reservation") and made redundant by `17024ad0a0` ("Btrfs: fix early ENOSPC due to delalloc"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:10 +02:00
Nikolay Borisov	101d2dc0b2	btrfs: Remove unused extent_root var from caching_thread Added by `b4570aa994` ("btrfs: fix compiling with CONFIG_BTRFS_DEBUG enabled.") and obsoleted by `2ff7e61e0d` ("btrfs: take an fs_info directly when the root is not used otherwise"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:10 +02:00
Nikolay Borisov	338dae1ae6	btrfs: remove max_active var from open_ctree Introduced by `5cdc7ad337` ("btrfs: Replace fs_info->workers with btrfs_workqueue.") but obsoleted by `2a4581983f` ("btrfs: factor btrfs_init_workqueues() out of open_ctree()"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:57 +02:00
Nikolay Borisov	8535dc1967	btrfs: Remove unused root var from relink_file_extents Added in `38c227d87c` ("Btrfs: snapshot-aware defrag") but subsequently made redundant by `0b246afa62` ("btrfs: root->fs_info cleanup, add fs_info convenience variables"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:57 +02:00
Nikolay Borisov	4eeb97c67a	btrfs: Remove unused tot_len var from lzo_decompress Added already unused in `a6fa6fae40` ("btrfs: Add lzo compression support"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:57 +02:00
Nikolay Borisov	d6e823a578	btrfs: Remove unused length var from scrub_handle_errored_block Added in `b5d67f64f9` ("Btrfs: change scrub to support big blocks") but rendered redundant by `be50a8ddaa` ("Btrfs: Simplify scrub_setup_recheck_block()'s argument"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:57 +02:00
Nikolay Borisov	a6dbceafb9	btrfs: Remove unused op_key var from add_delayed_refs Added as part of `86d5f99442` ("btrfs: convert prelimary reference tracking to use rbtrees") but never used. tmp_op_key essentially subsumed that variable. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:57 +02:00
Qu Wenruo	ba89b80268	btrfs: volumes: Remove the meaningless condition of minimal nr_devs when allocating a chunk When checking the minimal nr_devs, there is one dead and meaningless condition: if (ndevs < devs_increment * sub_stripes \|\| ndevs < devs_min) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This condition is meaningless, @devs_increment has nothing to do with @sub_stripes. In fact, in btrfs_raid_array[], profile with sub_stripes larger than 1 (RAID10) already has the @devs_increment set to 2. So no need to multiple it by @sub_stripes. And above condition is also dead. For RAID10, @devs_increment * @sub_stripes equals 4, which is also the @devs_min of RAID10. For other profiles, @sub_stripes is always 1, and since @ndevs is rounded down to @devs_increment, the condition will always be true. Remove the meaningless condition to make later reader wander less. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:56 +02:00
Nikolay Borisov	6f47c706d9	btrfs: Document parameters of btrfs_reserve_extent This function is the entry to the extent allocator and as such has quite a number of parameters. Some of those have subtle effects on the allocation algorithm. Document the parameters. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:56 +02:00
Nikolay Borisov	d87ff75863	btrfs: Handle error from btrfs_uuid_tree_rem call in _btrfs_ioctl_set_received_subvol As with every function which deals with modifying the btree btrfs_uuid_tree_rem can fail for any number of reasons (ie. EIO/ENOMEM). Handle return error value from this function gracefully by aborting the transaction. Fixes: `dd5f9615fc` ("Btrfs: maintain subvolume items in the UUID tree") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:56 +02:00
Nikolay Borisov	776c4a7ce8	btrfs: Use sizeof directly instead of a constant variable The kernel would like to have all stack VLA usage removed[1]. Unfortunately using an integer constant variable as the size of an array is still considered a VLA. Instead let's use directly sizeof(var) which removes the VLA usage. Use the occasion to remove csum_size altogether and use sizeof() also for the size passed to memcmp [1]: https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:56 +02:00
David Sterba	d0ee393493	btrfs: rename submit callbacks and drop double underscores Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:56 +02:00
David Sterba	6c55343587	btrfs: remove unused parameters from extent_submit_bio_done_t Remove parameters not used by any of the callbacks. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:55 +02:00
David Sterba	d0779291b1	btrfs: remove unused parameters from extent_submit_bio_start_t Remove parameters not used by any of the callbacks. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:55 +02:00
David Sterba	a758781d4b	btrfs: separate types for submit_bio_start and submit_bio_done The callbacks make use of different parameters that are passed to the other type unnecessarily. This patch adds separate types for each and the unused parameters will be removed. The type extent_submit_bio_hook_t keeps all parameters and can be used where the start/done types are not appropriate. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:55 +02:00
David Sterba	d9d19a010b	btrfs: kill tree_mod_log_set_root_pointer helper A useless wrapper around tree_mod_log_insert_root that hides missing error handling. Move it to the callers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:55 +02:00
David Sterba	0e82bcfe3c	btrfs: kill tree_mod_log_set_node_key helper A trivial wrapper that can be simply opencoded and makes the GFP allocation request more visible. The error handling is now moved to the callers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:55 +02:00
David Sterba	bf1d342510	btrfs: kill trivial wrapper tree_mod_log_eb_move The wrapper is effectively an alias for tree_mod_log_insert_move but also hides the missing error handling. To make that more visible, lift the BUG_ON to the callers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:55 +02:00
David Sterba	b1a09f1ec5	btrfs: remove trivial locking wrappers of tree mod log The wrappers are trivial and do not bring any extra value on top of the plain locking primitives. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:54 +02:00
David Sterba	bcd24dabe0	btrfs: drop fs_info parameter from __tree_mod_log_oldest_root It's provided by the extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:54 +02:00
David Sterba	b6dfa35bd5	btrfs: embed tree_mod_move structure to tree_mod_elem The tree_mod_move is not used anywhere and can be embedded as anonymous structure. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:54 +02:00
David Sterba	a446a979ff	btrfs: drop unused fs_info parameter from tree_mod_log_eb_move Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:54 +02:00
David Sterba	95b757c164	btrfs: drop fs_info parameter from tree_mod_log_free_eb It's provided by the extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:54 +02:00
David Sterba	db7279a20b	btrfs: drop fs_info parameter from tree_mod_log_free_eb It's provided by the extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:53 +02:00
David Sterba	e09c2efe7e	btrfs: drop fs_info parameter from tree_mod_log_insert_key It's provided by the extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:53 +02:00
David Sterba	6074d45f60	btrfs: drop fs_info parameter from tree_mod_log_insert_move It's provided by the extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:53 +02:00
David Sterba	3ac6de1abd	btrfs: drop fs_info parameter from tree_mod_log_set_node_key It's provided by the extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:53 +02:00
David Sterba	b8b3d625ce	btrfs: document more parameters of submit_extent_page Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:53 +02:00
David Sterba	0c8508a6e7	btrfs: cleanup merging conditions in submit_extent_page The merge call was factored out to a separate helper but it's a trivial one and arguably we can opencode it and cache the value. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:53 +02:00
David Sterba	8eec8296a0	btrfs: remove redundant variable in __do_readpage The value of page_end is only stored to end, no other use. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:52 +02:00
David Sterba	5c2b1fd753	btrfs: assume that bio_ret is always valid in submit_extent_page All callers pass a valid pointer so we can drop the redundant checks. The call to submit_one_bio never happend and can be removed. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:52 +02:00
Liu Bo	6ca1765b36	Btrfs: scrub: batch rebuild for raid56 In case of raid56, writes and rebuilds always take BTRFS_STRIPE_LEN(64K) as unit, however, scrub_extent() sets blocksize as unit, so rebuild process may be triggered on every block on a same stripe. A typical example would be that when we're replacing a disappeared disk, all reads on the disks get -EIO, every block (size is 4K if blocksize is 4K) would go thru these, scrub_handle_errored_block scrub_recheck_block # re-read pages one by one scrub_recheck_block # rebuild by calling raid56_parity_recover() page by page Although with raid56 stripe cache most of reads during rebuild can be avoided, the parity recover calculation(xor or raid6 algorithms) needs to be done $(BTRFS_STRIPE_LEN / blocksize) times. This makes it smarter by doing raid56 scrub/replace on stripe length. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:52 +02:00
David Sterba	416a72022e	btrfs: sort and group mount option definitions Sort mount options by the primary name, followed by the 'no-' counterpart if it exists. Group the deprecated and debugging options. Enum and token defintions are synced. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:52 +02:00
Howard McLauchlan	62b8e07731	btrfs: Add nossd_spread mount option Btrfs has two mount options for SSD optimizations: ssd and ssd_spread. Presently there is an option to disable all SSD optimizations, but there isn't an option to disable just ssd_spread. This patch adds a mount option nossd_spread that disables ssd_spread only. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:52 +02:00
Nikolay Borisov	92e2f7e370	btrfs: Remove btrfs_fs_info::open_ioctl_trans Since userspace transaction have been removed we no longer have use for this field so delete it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:51 +02:00
Nikolay Borisov	bcf3a3e7fb	btrfs: Remove code referencing unused TRANS_USERSPACE Now that the userspace transaction ioctls have been removed, TRANS_USERSPACE is no longer used hence we can remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:51 +02:00
Nikolay Borisov	859e682d58	btrfs: Remove btrfs_file_private::trans Now that the userspace transaction IOCTL have been removed, this member is no longer used so just remove it Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:51 +02:00
Nikolay Borisov	7a5a07a810	btrfs: Remove userspace transaction ioctls Commit `3558d4f88e` ("btrfs: Deprecate userspace transaction ioctls") marked the beginning of the end of userspace transaction. This commit finishes the job! There are no known users and ceph does not use the ioctl anymore. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Sage Weil <sage@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:51 +02:00
Qu Wenruo	4d31778aa2	btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled When multiple pending snapshots referring to the same source subvolume are executed, enabled quota will cause root item corruption, where root items are using old bytenr (no backref in extent tree). This can be triggered by fstests btrfs/152. The cause is when source subvolume is still dirty, extra commit (simplied transaction commit) of qgroup_account_snapshot() can skip dirty roots not recorded in current transaction, making root item of source subvolume not updated. Fix it by forcing recording source subvolume in current transaction before qgroup sub-transaction commit. Reported-by: Justin Maggard <jmaggard@netgear.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:51 +02:00
Nikolay Borisov	2e32ef87b0	btrfs: Relax memory barrier in btrfs_tree_unlock When performing an unlock on an extent buffer we'd like to order the decrement of extent_buffer::blocking_writers with waking up any waiters. In such situations it's sufficient to use smp_mb__after_atomic rather than the heavy smp_mb. On architectures where atomic operations are fully ordered (such as x86 or s390) unconditionally executing a heavyweight smp_mb instruction causes a severe hit to performance while bringin no improvements in terms of correctness. The better thing is to use the appropriate smp_mb__after_atomic routine which will do the correct thing (invoke a full smp_mb or in the case of ordered atomics insert a compiler barrier). Put another way, an RMW atomic op + smp_load__after_atomic equals, in terms of semantics, to a full smp_mb. This ensures that none of the problems described in the accompanying comment of waitqueue_active occur. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:51 +02:00
Anand Jain	7c829b722d	btrfs: add define for oldest generation Some functions can filter metadata by the generation. Add a define that will annotate such arguments. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:50 +02:00
David Sterba	051c98eb11	btrfs: open code trivial helper btrfs_page_exists_in_range The called function name is self explanatory. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:50 +02:00
Matthew Wilcox	965aab1cfc	btrfs: Use filemap_range_has_page() The current implementation of btrfs_page_exists_in_range() gives the wrong answer if the workingset code has stored a shadow entry in the page cache. The filemap_range_has_page() function does not have this problem, and it's shared code, so use it instead. eigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:26:45 +02:00
Liu Bo	4759700a71	Btrfs: dev-replace: make sure target is identical to source when raid56 rebuild fails In the last step of scrub_handle_error_block, we try to combine good copies on all possible mirrors, this works fine for raid1 and raid10, but not for raid56 as it's doing parity rebuild. If parity rebuild doesn't get back with correct data which matches its checksum, in case of replace we'd rather write what is stored in the source device than the data calculuated from parity. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:43 +02:00
Liu Bo	d6a691350b	Btrfs: raid56: remove redundant async_missing_raid56 async_missing_raid56() is identical to async_read_rebuild(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:43 +02:00
Su Yue	005d67127f	btrfs: adjust return values of btrfs_inode_by_name Previously, btrfs_inode_by_name() returned 0 which left caller to check objectid of location even location if the type was invalid. Let btrfs_inode_by_name() return -EUCLEAN if a corrupted location of a dir entry is found. Removal of label out_err also simplifies the function. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ drop unlikely ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:43 +02:00
Anand Jain	9b99b11564	btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devids This function btrfs_close_extra_devices() is about freeing extra devids which once it may have belonged to this filesystem. So rename it and add the comment. The _devid suffix is appropriate as this function won't handle devices which are outside of the filesytem being mounted. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:42 +02:00
Nikolay Borisov	d02c0e2019	btrfs: Remove root argument from cow_file_range_inline This argument is always set to the root of the inode, which is also passed. So let's get a reference inside the function and simplify the arg list. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:42 +02:00
Liu Bo	895a72be41	Btrfs: send: fix typo in TLV_PUT According to tlv_put()'s prototype, data and attrlen needs to be exchanged in the macro, but seems all callers are already aware of this misorder and are therefore not affected. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:42 +02:00
Nikolay Borisov	e5b84f7a25	btrfs: Remove root argument from btrfs_log_dentry_safe Now that nothing uses the root arg of btrfs_log_dentry_safe it can be safely removed. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:42 +02:00
Nikolay Borisov	f882274b2d	btrfs: Remove root arg from btrfs_log_inode_parent btrfs_log_inode_parent is called from 2 places (btrfs_log_dentry_safe and btrfs_log_new_name) both of which pass inode->root as the root argument and the inode itself. Remove the redundant root argument and get a reference to the root directly from the inode, also remove redundant root != inode->root check from the same function. No functional change. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:42 +02:00
Nikolay Borisov	448f3a17ac	btrfs: Remove redundant comment from btrfs_search_forward This function always sets keep_locks to 1 and saves the old value of keep_locks which is restored at the end. So there is no way it can be called without keep_locks being set. Remove comment imposing redundant requirement on callers. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:41 +02:00
David Sterba	738c93d42c	btrfs: move btrfs_listxattr prototype to xattr.h There's a proper header for xattr handlers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:41 +02:00
David Sterba	bcadd7050a	btrfs: adjust return type of btrfs_getxattr The xattr_handler::get prototype returns int, use it. The only ssize_t exception is the per-inode listxattr handler. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:41 +02:00
David Sterba	ab0d093616	btrfs: drop extern from function declarations Extern for functions does not make any difference, there are only a few so let's remove them before it's too late. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:41 +02:00
David Sterba	7852781d94	btrfs: drop underscores from exported xattr functions Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:41 +02:00
Filipe Manana	ffa7c4296e	Btrfs: send, do not issue unnecessary truncate operations When send finishes processing an inode representing a regular file, it always issues a truncate operation for that file, even if its size did not change or the last write sets the file size correctly. In the most common cases, the issued write operations set the file to correct size (either full or incremental sends) or the file size did not change (for incremental sends), so the only case where a truncate operation is needed is when a file size becomes smaller in the send snapshot when compared to the parent snapshot. By not issuing unnecessary truncate operations we reduce the stream size and save time in the receiver. Currently truncating a file to the same size triggers writeback of its last page (if it's dirty) and waits for it to complete (only if the file size is not aligned with the filesystem's sector size). This is being fixed by another patch and is independent of this change (that patch's title is "Btrfs: skip writeback of last page when truncating file to same size"). The following script was used to measure time spent by a receiver without this change applied, with this change applied, and without this change and with the truncate fix applied (the fix to not make it start and wait for writeback to complete). $ cat test_send.sh #!/bin/bash SRC_DEV=/dev/sdc DST_DEV=/dev/sdd SRC_MNT=/mnt/sdc DST_MNT=/mnt/sdd mkfs.btrfs -f $SRC_DEV >/dev/null mkfs.btrfs -f $DST_DEV >/dev/null mount $SRC_DEV $SRC_MNT mount $DST_DEV $DST_MNT echo "Creating source filesystem" for ((t = 0; t < 10; t++)); do ( for ((i = 1; i <= 20000; i++)); do xfs_io -f -c "pwrite -S 0xab 0 5000" \ $SRC_MNT/file_$i > /dev/null done ) & worker_pids[$t]=$! done wait ${worker_pids[@]} echo "Creating and sending snapshot" btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null /usr/bin/time -f "send took %e seconds" \ btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1 /usr/bin/time -f "receive took %e seconds" \ btrfs receive -f $SRC_MNT/send_file $DST_MNT umount $SRC_MNT umount $DST_MNT The results, which are averages for 5 runs for each case, were the following: * Without this change average receive time was 26.49 seconds standard deviation of 2.53 seconds * Without this change and with the truncate fix average receive time was 12.51 seconds standard deviation of 0.32 seconds * With this change and without the truncate fix average receive time was 10.02 seconds standard deviation of 1.11 seconds Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:41 +02:00
Filipe Manana	213e8c5520	Btrfs: skip writeback of last page when truncating file to same size When we truncate a file to the same size and that size is not aligned with the sector size, we end up triggering writeback (and wait for it to complete) of the last page. This is unncessary as we can not have delayed allocation beyond the inode's i_size and the goal of truncating a file to its own size is to discard prealloc extents (allocated via the fallocate(2) system call). Besides the unnecessary IO start and wait, it also breaks the oppurtunity for larger contiguous extents on disk, as before the last dirty page there might be other dirty pages. This scenario is probably not very common in general, however it is common for btrfs receive implementations because currently the send stream always issues a truncate operation for each processed inode as the last operation for that inode (this truncate operation is not always needed and the send implementation will be addressed to avoid them). So improve this by not starting and waiting for writeback of the inode's last page when we are truncating to exactly the same size. The following script was used to quickly measure the time a receive operation takes: $ cat test_send.sh #!/bin/bash SRC_DEV=/dev/sdc DST_DEV=/dev/sdd SRC_MNT=/mnt/sdc DST_MNT=/mnt/sdd mkfs.btrfs -f $SRC_DEV >/dev/null mkfs.btrfs -f $DST_DEV >/dev/null mount $SRC_DEV $SRC_MNT mount $DST_DEV $DST_MNT echo "Creating source filesystem" for ((t = 0; t < 10; t++)); do ( for ((i = 1; i <= 20000; i++)); do xfs_io -f -c "pwrite -S 0xab 0 5000" \ $SRC_MNT/file_$i > /dev/null done ) & worker_pids[$t]=$! done wait ${worker_pids[@]} echo "Creating and sending snapshot" btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null /usr/bin/time -f "send took %e seconds" \ btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1 /usr/bin/time -f "receive took %e seconds" \ btrfs receive -f $SRC_MNT/send_file $DST_MNT umount $SRC_MNT umount $DST_MNT The results for 5 runs were the following: * Without this change average receive time was 26.49 seconds standard deviation of 2.53 seconds * With this change average receive time was 12.51 seconds standard deviation of 0.32 seconds Reported-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:40 +02:00
Liu Bo	ed5d5f37e6	Btrfs: dev-replace: skip prealloc extents when copy nocow pages It doens't make sense to process prealloc extents as pages will be filled with zero when reading prealloc extents. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:40 +02:00
Anand Jain	d612ac59ef	btrfs: unify types for metadata_ratio and data_chunk_allocations We have btrfs_fs_info::data_chunk_allocations and btrfs_fs_info::metadata_ratio declared as unsigned which would be unsinged int and kernel style prefers unsigned int over bare unsigned. So this patch changes them to u32. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:40 +02:00
Nikolay Borisov	de224b7c56	btrfs: Remove redundant memory barriers around dio_private error status Using any kind of memory barriers around atomic operations which have a return value is redundant, since those operations themselves are fully ordered. atomic_t.txt states: - RMW operations that have a return value are fully ordered; Fully ordered primitives are ordered against everything prior and everything subsequent. Therefore a fully ordered primitive is like having an smp_mb() before and an smp_mb() after the primitive. Given this let's replace the extra memory barriers with comments. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:40 +02:00
Anand Jain	16db5758fe	btrfs: remove assert in btrfs_init_dev_replace_tgtdev() In the same function we just ran btrfs_alloc_device() which means the btrfs_device::resized_list is sure to be empty and we are protected with the btrfs_fs_info::volume_mutex. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:40 +02:00
David Sterba	e67c718b5b	btrfs: add more __cold annotations The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which branches are more/less likely to be taken without any other annotations needed. Though we can't add more __exit annotations, it's still possible to add __cold (that's also added with __exit). That way the following function categories are tagged: - printf wrappers, error messages - exit helpers Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:39 +02:00
David Sterba	ffc5a3794f	btrfs: add (the only possible) __exit annotation Recently, the __init annotations have been added. There's unfortunatelly only one case where we can add __exit, because most of the cleanup helpers are also called from the __init phase. As the __exit annotated functions get discarded completely for a built-in code, we'd miss them from the init phase. Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:39 +02:00
Anand Jain	ccb0e7d1c1	btrfs: verify subvolid mount parameter We aren't verifying the parameter passed to the subvolid mount option, so we won't report and fail the mount if a junk value is specified for example, -o subvolid=abc. This patch verifies the subvolid option with match_u64. Up to now the memparse function accepts the K/M/G/ suffixes, that are usually meant for size values and do not make sense for a subvolume it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:39 +02:00
Liu Bo	5811375325	Btrfs: fix unexpected cow in run_delalloc_nocow Fstests generic/475 provides a way to fail metadata reads while checking if checksum exists for the inode inside run_delalloc_nocow(), and csum_exist_in_range() interprets error (-EIO) as inode having checksum and makes its caller enter the cow path. In case of free space inode, this ends up with a warning in cow_file_range(). The same problem applies to btrfs_cross_ref_exist() since it may also read metadata in between. With this, run_delalloc_nocow() bails out when errors occur at the two places. cc: <stable@vger.kernel.org> v2.6.28+ Fixes: `17d217fe97` ("Btrfs: fix nodatasum handling in balancing code") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:39 +02:00
Nikolay Borisov	9678c54388	btrfs: Remove custom crc32c init code The custom crc32 init code was introduced in `14a958e678` ("Btrfs: fix btrfs boot when compiled as built-in") to enable using btrfs as a built-in. However, later as pointed out by `60efa5eb2e` ("Btrfs: use late_initcall instead of module_init") this wasn't enough and finally btrfs was switched to late_initcall which comes after the generic crc32c implementation is initiliased. The latter commit superseeded the former. Now that we don't have to maintain our own code let's just remove it and switch to using the generic implementation. Despite touching a lot of files the patch is really simple. Here is the gist of the changes: 1. Select LIBCRC32C rather than the low-level modules. 2. s/btrfs_crc32c/crc32c/g 3. replace hash.h with linux/crc32c.h 4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly. I've tested this with btrfs being both a module and a built-in and xfstest doesn't complain. Does seem to fix the longstanding problem of not automatically selectiong the crc32c module when btrfs is used. Possibly there is a workaround in dracut. The modinfo confirms that now all the module dependencies are there: before: depends: zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate after: depends: libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add more info to changelog from mails ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:39 +02:00
Qu Wenruo	3e72ee8874	btrfs: Refactor __get_raid_index() to btrfs_bg_flags_to_raid_index() Function __get_raid_index() is used to convert block group flags into raid index, which can be used to get various info directly from btrfs_raid_array[]. Refactor this function a little: 1) Rename to btrfs_bg_flags_to_raid_index() Double underline prefix is normally for internal functions, while the function is used by both extent-tree and volumes. Although the name is a little longer, but it should explain its usage quite well. 2) Move it to volumes.h and make it static inline Just several if-else branches, really no need to define it as a normal function. This also makes later code re-use between kernel and btrfs-progs easier. 3) Remove function get_block_group_index() Really no need to do such a simple thing as an exported function. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:38 +02:00
Qu Wenruo	2f659546c9	btrfs: tree-checker: Replace root parameter with fs_info When inspecting the error message with real corruption, the "root=%llu" always shows "1" (root tree), instead of the correct owner. The problem is that we are getting @root from page->mapping->host, which points the same btree inode, so we will always get the same root. This makes the root owner output meaningless, and harder to port tree-checker to btrfs-progs. So get rid of the false and meaningless @root parameter and replace it with @fs_info. To get the owner, we can only rely on btrfs_header_owner() now. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:38 +02:00
Liu Bo	393da91819	Btrfs: add tracepoint for em's EEXIST case This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the subtle bugs around merge_extent_mapping. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:38 +02:00
Nikolay Borisov	5d23515be6	btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable Currently btrfs_run_qgroups is doing a bit too much. Not only is it responsible for synchronizing in-memory state of qgroups to disk but it also contains code to trigger the initial qgroup rescan when quota is enabled initially. This condition is detected by checking that BTRFS_FS_QUOTA_ENABLED is not set and BTRFS_FS_QUOTA_ENABLING is set. Nothing really requires from the code to be structured (and scattered) the way it is so let's streamline things. First move the quota rescan code into btrfs_quota_enable, where its invocation is closer to the use. This also makes the FS_QUOTA_ENABLING flag redundant so let's remove it as well. This has been tested with a full xfstest run with qgroups enabled on the scratch device of every xfstest and no regressions were observed. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:38 +02:00
Gu JinXiang	7ce311d552	btrfs: use reada direction enum instead of constant value in load_free_space_tree load_free_space_tree calls either function load_free_space_bitmaps or load_free_space_extents. And either of those two will lead to call btrfs_next_item. So in function load_free_space_tree, use READA_FORWARD to read forward ahead. This also changes the value from READA_BACK to READA_FORWARD, since according to the logic, it should reada_for_search forward, not backward. Signed-off-by: Gu JinXiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:37 +02:00
Gu Jinxiang	019599ada7	btrfs: use reada direction enum instead of constant value in populate_free_space_tree populate_free_space_tree calls function btrfs_search_slot_for_read with parameter int find_higher = 1, it means that, if no exact match is found, then use the next higher item. So in function populate_free_space_tree, use READA_FORWARD to read forward ahead. This also changes the value from READA_BACK to READA_FORWARD, since according to the logic, it should reada_for_search forward, not backward. Signed-off-by: Gu JinXiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:37 +02:00
Nikolay Borisov	c1c3fac2a9	btrfs: Remove btrfs_inode::delayed_iput_count delayed_iput_count wa supposed to be used to implement, well, delayed iput. The idea is that we keep accumulating the number of iputs we do until eventually the inode is deleted. Turns out we never really switched the delayed_iput_count from 0 to 1, hence all conditional code relying on the value of that member being different than 0 was never executed. This, as it turns out, didn't cause any problem due to the simple fact that the generic inode's i_count member was always used to count the number of iputs. So let's just remove the unused member and all unused code. This patch essentially provides no functional changes. While at it, also add proper documentation for btrfs_add_delayed_iput Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comment ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:37 +02:00
Qu Wenruo	793ff2c88c	btrfs: volumes: Cleanup stripe size calculation Cleanup the following things: 1) open coded SZ_16M round up 2) use min() to replace open-coded size comparison 3) code style Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> [ reformat comment ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:37 +02:00
Nikolay Borisov	da07d4ab33	btrfs: Streamline btrfs_delalloc_reserve_metadata initial operations The behavior of btrfs_delalloc_reserve_metadata depends on whether the inode we are allocating for is the freespace inode or not. As it stands if we are the free node we set 'flush' and 'delalloc_lock' variable to certain values. Subsequently we check the values of those vars and act accordingly. Instead, simplify things by having 1 if which checks whether we are the freespace inode or not and do any specific operation in either branches of that if. This makes the code a bit easier to understand, as an added bonus it also shrinks the compiled size: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-17 (-17) Function old new delta btrfs_delalloc_reserve_metadata 1876 1859 -17 Total: Before=85966, After=85949, chg -0.02% No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:37 +02:00
Anand Jain	b1b8e38622	btrfs: insert newly opened device to the end of the list Add opened device to the tail of dev_alloc_list instead of head, so that it maintains the same order as dev_list. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:37 +02:00
Anand Jain	f8e10cd3f8	btrfs: keep device list sorted By maintaining the device list sorted lets us reproduce the problems related to missing chunk in the degraded mode much more consistent. So fix this by sorting the devices by devid within the kernel. So that we know which device is assigned to the struct fs_info::latest_bdev when all the devices are having and same SB generation. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:36 +02:00
Liu Bo	3d5addafd0	Btrfs: do not check inode's runtime flags under root->orphan_lock It's not necessary to hold ->orphan_lock when checking inode's runtime flags. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:36 +02:00
Nikolay Borisov	bc5511d0ed	btrfs: Use schedule_timeout_interruptible Instead of manually fiddling with the state of the task (RUNNING->INTERRUPTIBLE->RUNNING) again just use schedule_timeout_interruptible which adjusts the task state as needed. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:36 +02:00
Nikolay Borisov	f9cacae314	btrfs: Move error handling of btrfs_start_dirty_block_groups closer to call site Even though btrfs_start_dirty_block_groups is fairly in the beginning of btrfs_commit_transaction outside of the critical section defined by the transaction states it can only be run by a single comitter. In other words it defines its own critical section thanks to the BTRFS_TRANS_DIRTY_BG run flag and ro_block_group_mutex. However, its error handling is outside of this critical section which is a bit counter-intuitive. So move the error handling righ after the function is executed and let the sole runner of dirty block groups handle the return value. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:36 +02:00
Anand Jain	7ef2d6a722	btrfs: not a disk error if the bio_add_page fails bio_add_page() can fail for logical reasons as from the bio_add_page() comments: /* * This will only fail if either bio->bi_vcnt == bio->bi_max_vecs or * it's a cloned bio. */ Here we have just allocated the bio, so both of those failures can't occur. So drop the check. We can also drop the error stats for write error. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:36 +02:00
Qu Wenruo	4117f207d4	btrfs: Add chunk allocation ENOSPC debug message for enospc_debug mount option Enospc_debug makes extent allocator print more debug messages, however for chunk allocation, there is no debug message for enospc_debug at all. This patch will add message for the following parts of chunk allocator: 1) No rw device at all Quite rare, but at least output one message for this case. 2) Not enough space for some device This debug message is quite handy for unbalanced disks with stripe based profiles (RAID0/10/5/6). 3) Not enough free devices This debug message should tell us if current chunk allocator is working correctly under minimal device requirements. Although in most cases, we will hit other ENOSPC before we even hit a chunk allocator ENOSPC, but such debug info won't help. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:35 +02:00
Anand Jain	566b1760b4	btrfs: use ASSERT to report logical error in cow_file_range() Use ASSERT to report logical error in cow_file_range(), also move it a bit closer to when the num_bytes is derived. The extent start could be (u64)-1 in some cases, the assert should catch that we do not accidentally pass it to cow_file_range. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:35 +02:00
Anand Jain	3752d22fce	btrfs: cow_file_range() num_bytes and disk_num_bytes are same This patch deletes local variable disk_num_bytes as its value is same as num_bytes in the function cow_file_range(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-26 15:09:35 +02:00

1 2 3 4 5 ...

6933 Commits