OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sage Weil	462045928b	Btrfs: add START_SYNC, WAIT_SYNC ioctls START_SYNC will start a sync/commit, but not wait for it to complete. Any modification started after the ioctl returns is guaranteed not to be included in the commit. If a non-NULL pointer is passed, the transaction id will be returned to userspace. WAIT_SYNC will wait for any in-progress commit to complete. If a transaction id is specified, the ioctl will block and then return (success) when the specified transaction has committed. If it has already committed when we call the ioctl, it returns immediately. If the specified transaction doesn't exist, it returns EINVAL. If no transaction id is specified, WAIT_SYNC will wait for the currently committing transaction to finish it's commit to disk. If there is no currently committing transaction, it returns success. These ioctls are useful for applications which want to impose an ordering on when fs modifications reach disk, but do not want to wait for the full (slow) commit process to do so. Picky callers can take the transid returned by START_SYNC and feed it to WAIT_SYNC, and be certain to wait only as long as necessary for the transaction _they_ started to reach disk. Sloppy callers can START_SYNC and WAIT_SYNC without a transid, and provided they didn't wait too long between the calls, they will get the same result. However, if a second commit starts before they call WAIT_SYNC, they may end up waiting longer for it to commit as well. Even so, a START_SYNC+WAIT_SYNC still guarantees that any operation completed before the START_SYNC reaches disk. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:41:32 -04:00
Sage Weil	fccdae435c	Btrfs: fix lockdep warning on clone ioctl I'm no lockdep expert, but this appears to make the lockdep warning go away for the i_mutex locking in the clone ioctl. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:37:33 -04:00
Sage Weil	050006a753	Btrfs: fix clone ioctl where range is adjacent to extent We had an edge case issue where the requested range was just following an existing extent. Instead of skipping to the next extent, we used the previous one which lead to having zero sized extents. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:37:33 -04:00
Sage Weil	9a019196ec	Btrfs: fix delalloc checks in clone ioctl The lookup_first_ordered_extent() was done on the wrong inode, and the ->delalloc_bytes test was wrong, as the following btrfs_wait_ordered_range() would only invoke a range write and wouldn't write the entire file data range. Also, a bad parameter was passed to btrfs_wait_ordered_range(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:37:33 -04:00
Andi Kleen	559af82114	Btrfs: cleanup warnings from gcc 4.6 (nonbugs) These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:14:37 -04:00
Julia Lawall	2354d08fe9	Btrfs: use memdup_user helpers Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) \|\| ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:14:18 -04:00
Josef Bacik	bf5fc093c5	Btrfs: fix the df ioctl to report raid types The new ENOSPC stuff broke the df ioctl since we no longer create seperate space info's for each RAID type. So instead, loop through each space info's raid lists so we can get the right RAID information which will allow the df ioctl to tell us RAID types again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2010-10-22 15:54:53 -04:00
Dan Rosenberg	2ebc346478	Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE 1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:58:20 -04:00
Sage Weil	b5384d48f4	Btrfs: fix CLONE ioctl destination file size expansion to block boundary The CLONE and CLONE_RANGE ioctls round up the range of extents being cloned to the block size when the range to clone extends to the end of file (this is always the case with CLONE). It was then using that offset when extending the destination file's i_size. Fix this by not setting i_size beyond the originally requested ending offset. This bug was introduced by `a22285a6` (2.6.35-rc1). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:15:06 -04:00
Dan Carpenter	cf1e99a4e0	Btrfs: btrfs_lookup_dir_item() can return ERR_PTR btrfs_lookup_dir_item() can return either ERR_PTRs or null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:37 -04:00
Dan Carpenter	d327099a23	Btrfs: unwind after btrfs_start_transaction() errors This was added by a22285a6a3: "Btrfs: Integrate metadata reservation with start_transaction". If we goto out here then we skip all the unwinding and there are locks still held etc. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:35 -04:00
Yan, Zheng	d68fc57b7e	Btrfs: Metadata reservation for orphan inodes reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-05-25 10:34:52 -04:00
Yan, Zheng	8929ecfa50	Btrfs: Introduce global metadata reservation Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-05-25 10:34:52 -04:00
Yan, Zheng	0ca1f7ceb1	Btrfs: Update metadata reservation for delayed allocation Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-05-25 10:34:51 -04:00
Yan, Zheng	a22285a6a3	Btrfs: Integrate metadata reservation with start_transaction Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-05-25 10:34:50 -04:00
Linus Torvalds	18e41da89d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check for read permission on src file in the clone ioctl	2010-05-15 12:55:31 -07:00
Dan Rosenberg	5dc6416414	Btrfs: check for read permission on src file in the clone ioctl The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-05-15 12:05:50 -04:00
Linus Torvalds	795d580bae	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: add check for changed leaves in setup_leaf_for_split Btrfs: create snapshot references in same commit as snapshot Btrfs: fix small race with delalloc flushing waitqueue's Btrfs: use add_to_page_cache_lru, use __page_cache_alloc Btrfs: fix chunk allocate size calculation Btrfs: kill max_extent mount option Btrfs: fail to mount if we have problems reading the block groups Btrfs: check btrfs_get_extent return for IS_ERR() Btrfs: handle kmalloc() failure in inode lookup ioctl Btrfs: dereferencing freed memory Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk() Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree() Btrfs: Remove unnecessary finish_wait() in wait_current_trans() Btrfs: add NULL check for do_walk_down() Btrfs: remove duplicate include in ioctl.c Fix trivial conflict in fs/btrfs/compression.c due to slab.h include cleanups.	2010-04-05 13:21:15 -07:00
Dan Carpenter	6cf8bfbf5e	Btrfs: check btrfs_get_extent return for IS_ERR() btrfs_get_extent() never returns NULL, only a valid pointer or ERR_PTR() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-30 21:19:09 -04:00
Dan Carpenter	c2b96929e2	Btrfs: handle kmalloc() failure in inode lookup ioctl Return -ENOMEM if kmalloc() fails. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-30 21:19:09 -04:00
Dan Carpenter	683be16eb6	Btrfs: dereferencing freed memory The original code dereferenced range on the next line. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-30 21:19:09 -04:00
Andrea Gelmini	2f3014fc2a	Btrfs: remove duplicate include in ioctl.c fs/btrfs/ioctl.c: ctree.h is included more than once. Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-30 21:19:08 -04:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Chris Mason	8ad6fcab56	Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree This is used by the inode lookup ioctl to follow all the backrefs up to the subvol root. But the search being done would sometimes land one past the last item in the leaf instead of finding the backref. This changes the search to look for the highest possible backref and hop back one item. It also fixes a leaked path on failure to find the root. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-18 12:23:10 -04:00
Chris Mason	1b53ac4d1b	Btrfs: allow treeid==0 in the inode lookup ioctl When a root id of 0 is sent to the inode lookup ioctl, it will use the root of the file we're ioctling and pass the root id back to userland along with the results. This allows userland to do searches based on that root later on. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-18 12:17:05 -04:00
Chris Mason	90fdde147f	Btrfs: return keys for large items to the search ioctl The search ioctl was skipping large items entirely (ones that are too big for the results buffer). This changes things to at least copy the item header so that we can send information about the item back to userland. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-18 12:14:54 -04:00
Chris Mason	abc6e1341b	Btrfs: fix key checks and advance in the search ioctl The search ioctl was working well for finding tree roots, but using it for generic searches requires a few changes to how the keys are advanced. This treats the search control min fields for objectid, type and offset more like a key, where we drop the offset to zero once we bump the type, etc. The downside of this is that we are changing the min_type and min_offset fields during the search, and so the ioctl caller needs extra checks to make sure the keys in the result are the ones it wanted. This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make things more readable. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-18 12:10:08 -04:00
Chris Mason	7fde62bffb	Btrfs: buffer results in the space_info ioctl The space_info ioctl was using copy_to_user inside rcu_read_lock. This commit changes things to copy into a buffer first and then dump the result down to userland. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-16 15:40:10 -04:00
Sage Weil	854d2c3531	Btrfs: fix search_ioctl key advance key->type is u8, not u64. fs/btrfs/ioctl.c: In function 'copy_to_sk': fs/btrfs/ioctl.c:1024: warning: comparison is always true due to limited range of data type Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-16 14:24:27 -04:00
Akinobu Mita	91748467a5	btrfs: use memparse Use memparse() instead of its own private implementation. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 11:00:14 -04:00
Josef Bacik	1406e4327b	Btrfs: add a "df" ioctl for btrfs df is a very loaded question in btrfs. This gives us a way to get the per-space usage information so we can tell exactly what is in use where. This will help us figure out ENOSPC problems, and help users better understand where their disk space is going. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 11:00:14 -04:00
Josef Bacik	2ac55d41b5	Btrfs: cache the extent state everywhere we possibly can V2 This patch just goes through and fixes everybody that does lock_extent() blah unlock_extent() to use lock_extent_bits() blah unlock_extent_cached() and pass around a extent_state so we only have to do the searches once per function. This gives me about a 3 mb/s boots on my random write test. I have not converted some things, like the relocation and ioctl's, since they aren't heavily used and the relocation stuff is in the middle of being re-written. I also changed the clear_extent_bit() to only unset the cached state if we are clearing EXTENT_LOCKED and related stuff, so we can do things like this lock_extent_bits() clear delalloc bits unlock_extent_cached() without losing our cached state. I tested this thoroughly and turned on LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 11:00:13 -04:00
Chris Mason	1e701a3292	Btrfs: add new defrag-range ioctl. The btrfs defrag ioctl was limited to doing the entire file. This commit adds a new interface that can defrag a specific range inside the file. It can also force compression on the file, allowing you to selectively compress individual files after they were created, even when mount -o compress isn't turned on. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 11:00:10 -04:00
Chris Mason	940100a4a7	Btrfs: be more selective in the defrag ioctl The btrfs defrag ioctl had some bugs around delalloc accounting, and it wasn't properly skipping pages that were not in the mapping. It wasn't properly clearing the page checked flag, which could make the writeback code ignore the page forever while pinning it as dirty. This commit fixes those problems and makes defrag a little smarter. It skips holes and it doesn't waste time defragging large extents. If a tiny extent comes before a very large extent, it will defrag both of them to make sure the tiny extent ends up next to something big. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 11:00:10 -04:00
Josef Bacik	6ef5ed0d38	Btrfs: add ioctl and incompat flag to set the default mount subvol This patch needs to go along with my previous patch. This lets us set the default dir item's location to whatever root we want to use as our default mounting subvol. With this we don't have to use mount -o subvol=<tree id> anymore to mount a different subvol, we can just set the new one and it will just magically work. I've done some moderate testing with this, mostly just switching the default mount around, mounting subvols and the default mount at the same time and such, everything seems to work. Thanks, Older kernels would generally be able to still mount the filesystem with the default subvolume set, but it would result in a different volume being mounted, which could be an even more unpleasant suprise for users. So if you set your default subvolume, you can't go back to older kernels. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 11:00:08 -04:00
Chris Mason	ac8e9819d7	Btrfs: add search and inode lookup ioctls The search ioctl is a generic tool for doing btree searches from userland applications. The first user of the search ioctl is a subvolume listing feature, but we'll also use it to find new files in a subvolume. The search ioctl allows you to specify min and max keys to search for, along with min and max transid. It returns the items along with a header that includes the item key. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 10:55:10 -04:00
TARUISI Hiroaki	98d377a089	Btrfs: add a function to lookup a directory path by following backrefs This will be used by the inode lookup ioctl. Signed-off-by: TARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-03-15 10:55:09 -04:00
Yan, Zheng	86b9f2eca5	Btrfs: Fix per root used space accounting The bytes_used field in root item was originally planned to trace the amount of used data and tree blocks. But it never worked right since we can't trace freeing of data accurately. This patch changes it to only trace the amount of tree blocks. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-12-17 12:33:35 -05:00
Yan, Zheng	2e4bfab970	Btrfs: Avoid orphan inodes cleanup during committing transaction btrfs_lookup_dentry may trigger orphan cleanup, so it's not good to call it while committing a transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-12-17 12:33:33 -05:00
Yan, Zheng	920bbbfb05	Btrfs: Rewrite btrfs_drop_extents Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can avoid calling lock_extent within transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-12-15 21:24:52 -05:00
Chris Mason	ac6889cbb2	Btrfs: fix file clone ioctl for bookend extents The file clone ioctl was incorrectly taking the offset into the extent on disk into account when calculating the length of the cloned extent. The length never changes based on the offset into the physical extent. Test case: fallocate -l 1g image mke2fs image bcp image image2 e2fsck -f image2 (errors on image2) The math bug ends up wrapping the length of the extent, and things go wrong from there. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-09 11:29:53 -04:00
Yan, Zheng	efefb1438b	Btrfs: remove negative dentry when deleting subvolumne The use of btrfs_dentry_delete is removing dentries from the dcache when deleting subvolumne. btrfs_dentry_delete ignores negative dentries. This is incorrect since if we don't remove the negative dentry, its parent dentry can't be removed. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-09 09:25:16 -04:00
Sage Weil	1ab86aedbc	Btrfs: fix error cases for ioctl transactions Fix leak of vfsmount write reference and open_ioctl_trans reference on ENOMEM. Clean up the error paths while we're at it. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-29 18:38:44 -04:00
Josef Bacik	9ed74f2dba	Btrfs: proper -ENOSPC handling At the start of a transaction we do a btrfs_reserve_metadata_space() and specify how many items we plan on modifying. Then once we've done our modifications and such, just call btrfs_unreserve_metadata_space() for the same number of items we reserved. For keeping track of metadata needed for data I've had to add an extent_io op for when we merge extents. This lets us track space properly when we are doing sequential writes, so we don't end up reserving way more metadata space than what we need. The only place where the metadata space accounting is not done is in the relocation code. This is because Yan is going to be reworking that code in the near future, so running btrfs-vol -b could still possibly result in a ENOSPC related panic. This patch also turns off the metadata_ratio stuff in order to allow users to more efficiently use their disk space. This patch makes it so we track how much metadata we need for an inode's delayed allocation extents by tracking how many extents are currently waiting for allocation. It introduces two new callbacks for the extent_io tree's, merge_extent_hook and split_extent_hook. These help us keep track of when we merge delalloc extents together and split them up. Reservations are handled prior to any actually dirty'ing occurs, and then we unreserve after we dirty. btrfs_unreserve_metadata_for_delalloc() will make the appropriate unreservations as needed based on the number of reservations we currently have and the number of extents we currently have. Doing the reservation outside of doing any of the actual dirty'ing lets us do things like filemap_flush() the inode to try and force delalloc to happen, or as a last resort actually start allocation on all delalloc inodes in the fs. This has survived dbench, fs_mark and an fsx torture test. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-28 16:29:42 -04:00
Sage Weil	1fb58a6051	Btrfs: fix arithmetic error in clone ioctl Fix an arithmetic error that was breaking extents cloned via the clone ioctl starting in the second half of a file. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-21 16:00:27 -04:00
Yan, Zheng	76dda93c6a	Btrfs: add snapshot/subvolume destroy ioctl This patch adds snapshot/subvolume destroy ioctl. A subvolume that isn't being used and doesn't contains links to other subvolumes can be destroyed. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-21 16:00:26 -04:00
Yan, Zheng	4df27c4d5c	Btrfs: change how subvolumes are organized btrfs allows subvolumes and snapshots anywhere in the directory tree. If we snapshot a subvolume that contains a link to other subvolume called subvolA, subvolA can be accessed through both the original subvolume and the snapshot. This is similar to creating hard link to directory, and has the very similar problems. The aim of this patch is enforcing there is only one access point to each subvolume. Only the first directory entry (the one added when the subvolume/snapshot was created) is treated as valid access point. The first directory entry is distinguished by checking root forward reference. If the corresponding root forward reference is missing, we know the entry is not the first one. This patch also adds snapshot/subvolume rename support, the code allows rename subvolume link across subvolumes. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-21 15:56:00 -04:00
Chris Mason	83ebade34b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable	2009-09-11 19:07:25 -04:00
Chris Mason	a1ed835e1a	Btrfs: Fix extent replacment race Data COW means that whenever we write to a file, we replace any old extent pointers with new ones. There was a window where a readpage might find the old extent pointers on disk and cache them in the extent_map tree in ram in the middle of a given write replacing them. Even though both the readpage and the write had their respective bytes in the file locked, the extent readpage inserts may cover more bytes than it had locked down. This commit closes the race by keeping the new extent pinned in the extent map tree until after the on-disk btree is properly setup with the new extent pointers. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00

1 2 3

108 Commits