linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Chris Mason	b9fab919b7	Btrfs: avoid sleeping in verify_parent_transid while atomic verify_parent_transid needs to lock the extent range to make sure no IO is underway, and so it can safely clear the uptodate bits if our checks fail. But, a few callers are using it with spinlocks held. Most of the time, the generation numbers are going to match, and we don't want to switch to a blocking lock just for the error case. This adds an atomic flag to verify_parent_transid, and changes it to return EAGAIN if it needs to block to properly verifiy things. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-05-06 07:23:47 -04:00
Stefan Behrens	1f699d38b6	Btrfs: fix block_rsv and space_info lock ordering may_commit_transaction() calls spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); and update_global_block_rsv() calls spin_lock(&block_rsv->lock); spin_lock(&sinfo->lock); Lockdep complains about this at run time. Everywhere except in update_global_block_rsv(), the space_info lock is the outer lock, therefore the locking order in update_global_block_rsv() is changed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-27 13:55:14 -04:00
Arne Jansen	b9688bb845	btrfs: don't return EINTR It is basically a good thing if we are interruptible when waiting for free space, but the generality in which it is implemented currently leads to system calls being interruptible that are not documented this way. For example git can't handle interrupted unlink(), leading to corrupt repos under space pressure. Instead we raise the bar to only be interruptible by SIGKILL. Thanks to David Sterba for suggesting this. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-04-18 19:22:33 +02:00
Dan Carpenter	253beebd5a	Btrfs: double unlock bug in error handling The caller expects this function to return with the lock held and releases it immediately on error. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-04-18 19:22:31 +02:00
Josef Bacik	d53ba47484	Btrfs: use commit root when loading free space cache A user reported that booting his box up with btrfs root on 3.4 was way slower than on 3.3 because I removed the ideal caching code. It turns out that we don't load the free space cache if we're in a commit for deadlock reasons, but since we're reading the cache and it hasn't changed yet we are safe reading the inode and free space item from the commit root, so do that and remove all of the deadlock checks so we don't unnecessarily skip loading the free space cache. The user reported this fixed the slowness. Thanks, Tested-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-12 20:54:01 -04:00
Ilya Dryomov	c6664b42c4	Btrfs: remove lock assert from get_restripe_target() This fixes a regression introduced by `fc67c450`. spin_is_locked() always returns 0 on UP kernels, which caused assert in get_restripe_target() to be fired on every call from btrfs_reduce_alloc_profile() on UP systems. Remove it completely for now, it's not clear if it's going to be needed in future. Reported-by: Bobby Powers <bobbypowers@gmail.com> Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-12 16:03:56 -04:00
Chris Mason	8e62c2de6e	Revert "Btrfs: increase the global block reserve estimates" This reverts commit `5500cdbe14`. We've had a number of complaints of early enospc that bisect down to this patch. We'll hae to fix the reservations differently. CC: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-12 13:46:48 -04:00
Liu Bo	15d1ff8111	Btrfs: fix deadlock during allocating chunks This deadlock comes from xfstests 251. We'll hold the chunk_mutex throughout the whole of a chunk allocation. But if we find that we've used up system chunk space, we need to allocate a new system chunk, but this will lead to a recursion of chunk allocation and end up with a deadlock on chunk_mutex. So instead we need to allocate the system chunk first if we find we're in ENOSPC. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:44 -04:00
Liu Bo	2bcc0328c3	Btrfs: show useful info in space reservation tracepoint o For space info, the type of space info is useful for debug. o For transaction handle, its transid is useful. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:44 -04:00
Chris Mason	1c691b330a	Merge branch 'for-chris' of git://github.com/idryomov/btrfs-unstable into for-linus	2012-03-28 20:32:46 -04:00
Chris Mason	1d4284bd6e	Merge branch 'error-handling' into for-linus Conflicts: fs/btrfs/ctree.c fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/inode.c fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-28 20:31:37 -04:00
Ilya Dryomov	4a5e98f5d6	Btrfs: improve the logic in btrfs_can_relocate() Currently if we don't have enough space allocated we go ahead and loop though devices in the hopes of finding enough space for a chunk of the same type as the one we are trying to relocate. The problem with that is that if we are trying to restripe the chunk its target type can be more relaxed than the current one (eg require less devices or less space). So, when restriping, run checks against the target profile instead of the current one. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	7738a53a3a	Btrfs: add __get_block_group_index() helper Add __get_block_group_index() helper to be able to derive block group index from an arbitary set of flags. Implement get_block_group_index() in terms of it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	fc67c45083	Btrfs: add get_restripe_target() helper Add get_restripe_target() helper and switch everybody to use it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	0c460c0d70	Btrfs: move alloc_profile_is_valid() to volumes.c Header file is not a good place to define functions. This also moves a call to alloc_profile_is_valid() down the stack and removes a redundant check from __btrfs_alloc_chunk() - alloc_profile_is_valid() takes it into account. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	e8920a640b	Btrfs: make profile_is_valid() check more strict "0" is a valid value for an on-disk chunk profile, but it is not a valid extended profile. (We have a separate bit for single chunks in extended case) Also rename it to alloc_profile_is_valid() for clarity. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	899c81eac8	Btrfs: add wrappers for working with alloc profiles Add functions to abstract the conversion between chunk and extended allocation profile formats and switch everybody to use them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:16 +03:00
Ilya Dryomov	e3176ca276	Btrfs: stop silently switching single chunks to raid0 on balance This has been causing a lot of confusion for quite a while now and a lot of users were surprised by this (some of them were even stuck in a ENOSPC situation which they couldn't easily get out of). The addition of restriper gives users a clear choice between raid0 and drive concat setup so there's absolutely no excuse for us to keep doing this. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:16 +03:00
Josef Bacik	3083ee2e18	Btrfs: introduce free_extent_buffer_stale Because btrfs cow's we can end up with extent buffers that are no longer necessary just sitting around in memory. So instead of evicting these pages, we could end up evicting things we actually care about. Thus we have free_extent_buffer_stale for use when we are freeing tree blocks. This will make it so that the ref for the eb being in the radix tree is dropped as soon as possible and then is freed when the refcount hits 0 instead of waiting to be released by releasepage. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 16:51:08 -04:00
Josef Bacik	81c9ad237c	Btrfs: remove search_start and search_end from find_free_extent and callers We have been passing nothing but (u64)-1 to find_free_extent for search_end in all of the callers, so it's completely useless, and we've always been passing 0 in as search_start, so just remove them as function arguments and move search_start into find_free_extent. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 14:42:51 -04:00
Josef Bacik	285ff5af6c	Btrfs: remove the ideal caching code This is a relic from before we had the disk space cache and it was to make bootup times when you had btrfs as root not be so damned slow. Now that we have the disk space cache this isn't a problem anymore and really having this code casues uneeded fragmentation and complexity, so just remove it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 14:42:51 -04:00
Jeff Mahoney	79787eaab4	btrfs: replace many BUG_ONs with proper error handling btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 11:52:54 +01:00
Jeff Mahoney	2c536799f1	btrfs: btrfs_drop_snapshot should return int Commit `cb1b69f4` (Btrfs: forced readonly when btrfs_drop_snapshot() fails) made btrfs_drop_snapshot return void because there were no callers checking the return value. That is the wrong order to handle error propogation since the caller will have no idea that an error has occured and continue on as if nothing went wrong. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:36 +01:00
Jeff Mahoney	143bede527	btrfs: return void in functions without error conditions Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:34 +01:00
Jeff Mahoney	538042801a	btrfs: avoid NULL deref in btrfs_reserve_extent with DEBUG_ENOSPC __find_space_info can return NULL but we don't check it before calling dump_space_info(). Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:32 +01:00
Chris Mason	e77266e4c4	Btrfs: fix compiler warnings on 32 bit systems The enospc tracing code added some interesting uses of u64 pointer casts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-24 10:39:05 -05:00
Liu Bo	5500cdbe14	Btrfs: increase the global block reserve estimates When doing IO with large amounts of data fragmentation, the global block reserve calulations are too low. This increases them to avoid ENOSPC crashes. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:49:04 -05:00
Liu Bo	d9b0218f6c	Btrfs: fix a bug on overcommit stuff When overcommitting, we should check the sum of pinned space and bytes for delayed item. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-16 17:23:18 +01:00
Liu Bo	2cac13e41b	Btrfs: fix trim 0 bytes after a device delete A user reported a bug of btrfs's trim, that is we will trim 0 bytes after a device delete. The reproducer: $ mkfs.btrfs disk1 $ mkfs.btrfs disk2 $ mount disk1 /mnt $ fstrim -v /mnt $ btrfs device add disk2 /mnt $ btrfs device del disk1 /mnt $ fstrim -v /mnt This is because after we delete the device, the block group may start from a non-zero place, which will confuse trim to discard nothing. Reported-by: Lutz Euler <lutz.euler@freenet.de> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-15 16:40:23 +01:00
Miao Xie	9e622d6bea	Btrfs: fix enospc error caused by wrong checks of the chunk When we did sysbench test for inline files, enospc error happened easily though there was lots of free disk space which could be allocated for new chunks. Reproduce steps: # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition> # mount <test partition> /mnt # ulimit -n 102400 # cd /mnt # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr prepare # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr run <soon later, BUG_ON() was triggered by enospc error> The reason of this bug is: Now, we can reserve space which is larger than the free space in the chunks if we have enough free disk space which can be used for new chunks. By this way, the space allocator should allocate a new chunk by force if there is no free space in the free space cache. But there are two wrong checks which break this operation. One is if (ret == -ENOSPC && num_bytes > min_alloc_size) in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk even we fail to allocate free space by minimum allocable size. The other is if (space_info->force_alloc) force = space_info->force_alloc; in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen. Fix these two wrong checks. Especially the second one, we fix it by changing the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has higher priority. And if the value which is passed in by the caller is greater than ->force_alloc, use the passed value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Chris Mason	96bdc7dc61	Btrfs: use larger system chunks system chunks by default are very small. This makes them slightly larger and also fixes the conditional checks to make sure we don't allocate a billion of them at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:38:24 -05:00
Josef Bacik	f248679e86	Btrfs: add a delalloc mutex to inodes for delalloc reservations I was using i_mutex for this, but we're getting bogus lockdep warnings by doing that and theres no real way to get rid of those, so just stop using i_mutex to protect delalloc metadata reservations and use a delalloc mutex instead. This shouldn't be contended often at all, only if you are writing and mmap writing to the file at the same time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:43 -05:00
Josef Bacik	8c2a3ca20f	Btrfs: space leak tracepoints This in addition to a script in my btrfs-tracing tree will help track down space leaks when we're getting space left over in block groups on umount. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:43 -05:00
Josef Bacik	3f7de037fb	Btrfs: add allocator tracepoints I used these tracepoints when figuring out what the cluster stuff was doing, so add them to mainline in case we need to profile this stuff again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:42 -05:00
Chris Mason	9785dbdf26	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration	2012-01-16 15:26:31 -05:00
Chris Mason	d756bd2d93	Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:26:17 -05:00
Chris Mason	27263e2832	Merge branch 'restriper' of git://github.com/idryomov/btrfs-unstable into integration	2012-01-16 15:26:02 -05:00
Ilya Dryomov	e4d8ec0f65	Btrfs: implement online profile changing Profile changing is done by launching a balance with BTRFS_BALANCE_CONVERT bits set and target fields of respective btrfs_balance_args structs initialized. Profile reducing code in this case will pick restriper's target profile if it's available instead of doing a blind reduce. If target profile is not yet available it goes back to a plain reduce. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:48 +02:00
Ilya Dryomov	70922617b0	Btrfs: do not reduce profile in do_chunk_alloc() Every caller of do_chunk_alloc() feeds it the reduced allocation profile, so stop trying to reduce it one more time. Instead check the validity of the passed profile. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:48 +02:00
Ilya Dryomov	10ea00f55a	Btrfs: make avail_*_alloc_bits fields dynamic Currently when new chunks are created respective avail_alloc_bits field is updated to reflect profiles of all chunks present in the system. However when chunks are removed profile bits are never cleared. This patch clears profile bit of respective avail_alloc_bits field when the last chunk with that profile is removed. Restriper needs this to properly operate when "downgrading". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	a46d11a8b0	Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for avail_{data,metadata,system}_alloc_bits fields, which gather info about available allocation profiles in the FS. When chunk is created or read from disk, its profile is OR'ed with the corresponding avail_alloc_bits field. Since SINGLE is denoted by 0 in the on-disk format, currently there is no way to tell when such chunks become avaialble. Restriper needs that information, so add a separate bit for SINGLE profile. This bit is going to be in-memory only, it should never be written out to disk, so it's not a disk format change. However to avoid remappings in future, reserve corresponding on-disk bit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	52ba692972	Btrfs: introduce masks for chunk type and profile Chunk's type and profile are encoded in u64 flags field. Introduce masks to easily access them. Also fix the type of BTRFS_BLOCK_GROUP_* constants, it should be ULL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	6fef8df1dc	Btrfs: get rid of *_alloc_profile fields {data,metadata,system}_alloc_profile fields have been unused for a long time now. Get rid of them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Li Zefan	c7c144db53	Btrfs: update global block_rsv when creating a new block group A bug was triggered while using seed device: # mkfs.btrfs /dev/loop1 # btrfstune -S 1 /dev/loop1 # mount -o /dev/loop1 /mnt # btrfs dev add /dev/loop2 /mnt btrfs: block rsv returned -28 ------------[ cut here ]------------ WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]() ... Call Trace: ... [<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs] [<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs] [<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs] [<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs] [<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs] [<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs] [<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs] [<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs] [<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs] [<c04f55f7>] do_vfs_ioctl+0x496/0x4cc [<c04f5660>] sys_ioctl+0x33/0x4f [<c07b9edf>] sysenter_do_call+0x12/0x38 ---[ end trace 906adac595facc7d ]--- Since seed device is readonly, there's no usable space in the filesystem. Afterwards we add a sprout device to it, and the kernel creates a METADATA block group and a SYSTEM block group where comes free space we can reserve, but we still get revervation failure because the global block_rsv hasn't been updated accordingly. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:52 +08:00
Li Zefan	125ccb0ae6	Btrfs: don't pass a trans handle unnecessarily in volumes.c Some functions never use the transaction handle passed to them. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:42 +08:00
Alexandre Oliva	fc7c1077ce	Btrfs: don't set up allocation result twice We store the allocation start and length twice in ins, once right after the other, but with intervening calls that may prevent the duplicate from being optimized out by the compiler. Remove one of the assignments. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-07 19:15:14 -05:00
Alexandre Oliva	a5f6f719a5	Btrfs: test free space only for unclustered allocation Since the clustered allocation may be taking extents from a different block group, there's no point in spin-locking and testing the current block group free space before attempting to allocate space from a cluster, even more so when we might refrain from even trying the cluster in the current block group because, after the cluster was set up, not enough free space remained. Furthermore, cluster creation attempts fail fast when the block group doesn't have enough free space, so the test was completely superfluous. I've move the free space test past the cluster allocation attempt, where it is more useful, and arranged for a cluster in the current block group to be released before trying an unclustered allocation, when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in the cluster stands a chance of being combined with additional free space in the block group so as to succeed in the allocation attempt. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-06 15:48:21 -05:00
Chris Mason	cf1d72c9ce	Btrfs: lower the bar for chunk allocation The chunk allocation code has tried to keep a pretty tight lid on creating new metadata chunks. This is partially because in the past the reservation code didn't give us an accurate idea of how much space was being used. The new code is much more accurate, so we're able to get rid of some of these checks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-06 15:41:34 -05:00
Chris Mason	203bf287cb	Btrfs: run chunk allocations while we do delayed refs Btrfs tries to batch extent allocation tree changes to improve performance and reduce metadata trashing. But it doesn't allocate new metadata chunks while it is doing allocations for the extent allocation tree. This commit changes the delayed refence code to do chunk allocations if we're getting low on room. It prevents crashes and improves performance. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-06 15:23:57 -05:00
Jan Schmidt	a168650c08	Btrfs: add waitqueue instead of doing busy waiting for more delayed refs Now that we may be holding back delayed refs for a limited period, we might end up having no runnable delayed refs. Without this commit, we'd do busy waiting in that thread until another (runnable) ref arives. Instead, we're detecting this situation and use a waitqueue, such that we only try to run more refs after a) another runnable ref was added or b) delayed refs are no longer held back Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-01-04 16:12:48 +01:00

1 2 3 4 5 ...

625 Commits