OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Yan Zheng	a76a3cd40c	Btrfs: Count space allocated to file in bytes This patch makes btrfs count space allocated to file in bytes instead of 512 byte sectors. Everything else in btrfs uses a byte count instead of sector sizes or blocks sizes, so this fits better. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2008-10-09 11:46:29 -04:00
Chris Mason	a62b940160	Btrfs: cast bio->bi_sector to a u64 before shifting On 32 bit machines without CONFIG_LBD, the bi_sector field is only 32 bits. Btrfs needs to cast it before shifting up, or we end up doing IO into the wrong place. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-10-03 16:31:08 -04:00
Chris Mason	323ac95bce	Btrfs: don't read leaf blocks containing only checksums during truncate Checksum items take up a significant portion of the metadata for large files. It is possible to avoid reading them during truncates by checking the keys in the higher level nodes. If a given leaf is followed by another leaf where the lowest key is a checksum item from the same file, we know we can safely delete the leaf without reading it. For a 32GB file on a 6 drive raid0 array, Btrfs needs 8s to delete the file with a cold cache. It is read bound during the run. With this change, Btrfs is able to delete the file in 0.5s Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-10-01 19:05:46 -04:00
Chris Mason	d352ac6814	Btrfs: add and improve comments This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-29 15:18:18 -04:00
Chris Mason	8c8bee1d7c	Btrfs: Wait for IO on the block device inodes of newly added devices btrfs-vol -a /dev/xxx will zero the first and last two MB of the device. The kernel code needs to wait for this IO to finish before it adds the device. btrfs metadata IO does not happen through the block device inode. A separate address space is used, allowing the zero filled buffer heads in the block device inode to be written to disk after FS metadata starts going down to the disk via the btrfs metadata inode. The end result is zero filled metadata blocks after adding new devices into the filesystem. The fix is a simple filemap_write_and_wait on the block device inode before actually inserting it into the pool of available devices. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-29 11:19:10 -04:00
Zheng Yan	5b21f2ed3f	Btrfs: extent_map and data=ordered fixes for space balancing * Add an EXTENT_BOUNDARY state bit to keep the writepage code from merging data extents that are in the process of being relocated. This allows us to do accounting for them properly. * The balancing code relocates data extents indepdent of the underlying inode. The extent_map code was modified to properly account for things moving around (invalidating extent_map caches in the inode). * Don't take the drop_mutex in the create_subvol ioctl. It isn't required. * Fix walking of the ordered extent list to avoid races with sys_unlink * Change the lock ordering rules. Transaction start goes outside the drop_mutex. This allows btrfs_commit_transaction to directly drop the relocation trees. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-26 10:05:38 -04:00
Chris Mason	2b1f55b0f0	Remove Btrfs compat code for older kernels Btrfs had compatibility code for kernels back to 2.6.18. These have been removed, and will be maintained in a separate backport git tree from now on. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 15:41:59 -04:00
Chris Mason	3435302953	Btrfs: Fix race against disk_i_size updates The code to update the on disk i_size happens before the ordered_extent record is removed. So, it is possible for multiple ordered_extent completion routines to run at the same time, and to find each other in the ordered tree. The end result is they both decide not to update disk_i_size, leaving it too small. This temporary fix just puts the updates inside the extent_mutex. A real solution would be stronger ordering of disk_i_size updates against removing the ordered extent from the tree. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Zheng Yan	31840ae1a6	Btrfs: Full back reference support This patch makes the back reference system to explicit record the location of parent node for all types of extents. The location of parent node is placed into the offset field of backref key. Every time a tree block is balanced, the back references for the affected lower level extents are updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Josef Bacik	0f9dd46cda	Btrfs: free space accounting redo 1) replace the per fs_info extent_io_tree that tracked free space with two rb-trees per block group to track free space areas via offset and size. The reason to do this is because most allocations come with a hint byte where to start, so we can usually find a chunk of free space at that hint byte to satisfy the allocation and get good space packing. If we cannot find free space at or after the given offset we fall back on looking for a chunk of the given size as close to that given offset as possible. When we fall back on the size search we also try to find a slot as close to the size we want as possible, to avoid breaking small chunks off of huge areas if possible. 2) remove the extent_io_tree that tracked the block group cache from fs_info and replaced it with an rb-tree thats tracks block group cache via offset. also added a per space_info list that tracks the block group cache for the particular space so we can lookup related block groups easily. 3) cleaned up the allocation code to make it a little easier to read and a little less complicated. Basically there are 3 steps, first look from our provided hint. If we couldn't find from that given hint, start back at our original search start and look for space from there. If that fails try to allocate space if we can and start looking again. If not we're screwed and need to start over again. 4) small fixes. there were some issues in volumes.c where we wouldn't allocate the rest of the disk. fixed cow_file_range to actually pass the alloc_hint, which has helped a good bit in making the fs_mark test I run have semi-normal results as we run out of space. Generally with data allocations we don't track where we last allocated from, so everytime we did a data allocation we'd search through every block group that we have looking for free space. Now searching a block group with no free space isn't terribly time consuming, it was causing a slight degradation as we got more data block groups. The alloc_hint has fixed this slight degredation and made things semi-normal. There is still one nagging problem I'm working on where we will get ENOSPC when there is definitely plenty of space. This only happens with metadata allocations, and only when we are almost full. So you generally hit the 85% mark first, but sometimes you'll hit the BUG before you hit the 85% wall. I'm still tracking it down, but until then this seems to be pretty stable and make a significant performance gain. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	49eb7e46d4	Btrfs: Dir fsync optimizations Drop i_mutex during the commit Don't bother doing the fsync at all unless the dir is marked as dirtied and needing fsync in this transaction. For directories, this means that someone has unlinked a file from the dir without fsyncing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	98509cfc5a	Btrfs: Fix releasepage to properly keep dirty and writeback pages Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	8d5bf1cb35	Btrfs: Update the highest objectid in a root after log replay is done Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Christoph Hellwig	a237d2a2bd	remove unused function btrfs_ilookup btrfs_ilookup is unused, which is good because a normal filesystem should never have to use ilookup anyway. Remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	e02119d5a7	Btrfs: Add a write ahead tree log to optimize synchronous operations File syncs and directory syncs are optimized by copying their items into a special (copy-on-write) log tree. There is one log tree per subvolume and the btrfs super block points to a tree of log tree roots. After a crash, items are copied out of the log tree and back into the subvolume. See tree-log.c for all the details. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Christoph Hellwig	95819c0573	Btrfs: optimize btrget/set/removexattr btrfs actually stores the whole xattr name, including the prefix ondisk, so using the generic resolver that strips off the prefix is not very helpful. Instead do the real ondisk xattrs manually and only use the generic resolver for synthetic xattrs like ACLs. (Sorry Josef for guiding you towards the wrong direction here intially) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
David Woodhouse	f2322b1c65	Btrfs: Optimise NFS readdir hack slightly; don't call readdir() again when done Date: Sun, 17 Aug 2008 17:12:56 +0100 Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
David Woodhouse	49593bfa57	Minor cleanup of btrfs_real_readdir() Date: Sun, 17 Aug 2008 17:08:36 +0100 Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
David Woodhouse	5ecc7e5d1d	Btrfs: Remove special cases for "." and ".." Date: Sun, 17 Aug 2008 15:14:48 +0100 We never get asked by the VFS to lookup either of them, and we can handle the readdir() case a lot more simply, too. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
David Woodhouse	cbdf5a2442	Btrfs: Implement our own copy of the nfsd readdir hack, for older kernels Date: Wed, 6 Aug 2008 19:42:33 +0100 Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Balaji Rao	1a54ef8c11	Introduce btrfs_iget helper Date: Mon, 21 Jul 2008 02:01:04 +0530 This patch introduces a btrfs_iget helper to be used in NFS support. Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	4d1b5fb4d7	Btrfs: Lookup readpage checksums on bio submission again This optimization had been removed because I thought it was triggering csum errors. The real cause of the errors was elsewhere, and so this optimization is back. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	7c2fe32a23	Btrfs: Fix add_extent_mapping to check for duplicates across the whole range add_extent_mapping was allowing the insertion of overlapping extents. This never used to happen because it only inserted the extents from disk and those were never overlapping. But, with the data=ordered code, the disk and memory representations of the file are not the same. add_extent_mapping needs to ensure a new extent does not overlap before it inserts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	53863232ef	Btrfs: Lower contention on the csum mutex This takes the csum mutex deeper in the call chain and releases it more often. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	db69e0ebae	Btrfs: Init address_space->writeback_index properly The writeback_index field is used by write_cache_pages to pick up where writeback on a given inode left off. But, it is never set to a sane value, so writeback can often start at a random offset in the file. Kernels 2.6.28 and higher will have this fixed, but for everyone else, we also fill in the value in btrfs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	4ca8b41e3f	Btrfs: Avoid calling into the FS for the final iput on fake root inodes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Yan Zheng	7ea394f119	Btrfs: Fix nodatacow for the new data=ordered mode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	00e4e6b33a	Get rid of BTRFS_I(inode)->index and use local vars instead rename and link don't always have a lock on the source inode, and our use of a per-inode index variable was racy. This changes things to store the index in a local variable instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	3de9d6b649	btrfs_lookup_bio_sums seems broken, go back to the readpage_io_hook for now Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	ea8c281947	Btrfs: Maintain a list of inodes that are delalloc and a way to wait on them Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	6dab815743	Btrfs: Hold csum mutex while reading in sums during readpages Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	3ce7e67a06	Btrfs: Drop some debugging around the extent_map pinned flag Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	61b4944018	Btrfs: Fix streaming read performance with checksumming on Large streaming reads make for large bios, which means each entry on the list async work queues represents a large amount of data. IO congestion throttling on the device was kicking in before the async worker threads decided a single thread was busy and needed some help. The end result was that a streaming read would result in a single CPU running at 100% instead of balancing the work off to other CPUs. This patch also changes the pre-IO checksum lookup done by reads to work on a per-bio basis instead of a per-page. This results in many extra btree lookups on large streaming reads. Doing the checksum lookup right before bio submit allows us to reuse searches while processing adjacent offsets. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Sven Wegener	0ee0fda06b	Btrfs: Add compatibility for kernels >= 2.6.27-rc1 Add a couple of #if's to follow API changes. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Yan	bcc63abbf3	Btrfs: implement memory reclaim for leaf reference cache The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Yan Zheng	f321e49103	Btrfs: Update and fix mount -o nodatacow To check whether a given file extent is referenced by multiple snapshots, the checker walks down the fs tree through dead root and checks all tree blocks in the path. We can easily detect whether a given tree block is directly referenced by other snapshot. We can also detect any indirect reference from other snapshot by checking reference's generation. The checker can always detect multiple references, but can't reliably detect cases of single reference. So btrfs may do file data cow even there is only one reference. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	ab78c84de1	Btrfs: Throttle operations if the reference cache gets too large A large reference cache is directly related to a lot of work pending for the cleaner thread. This throttles back new operations based on the size of the reference cache so the cleaner thread will be able to keep up. Overall, this actually makes the FS faster because the cleaner thread will be more likely to find things in cache. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	017e5369eb	Btrfs: Leaf reference cache update This changes the reference cache to make a single cache per root instead of one cache per transaction, and to key by the byte number of the disk block instead of the keys inside. This makes it much less likely to have cache misses if a snapshot or something has an extra reference on a higher node or a leaf while the first transaction that added the leaf into the cache is dropping. Some throttling is added to functions that free blocks heavily so they wait for old transactions to drop. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Yan	445dceb78f	Btrfs: Fix .. lookup corner case Inode ref item can be in the next leaf when we find "path->slots[0] == btrfs_header_nritems(...)". Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Balaji Rao	45467261ed	Btrfs: Remove unused variable in fixup_tree_root_location Remove a unused variable 'path' in fixup_tree_root_location. Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Josef Bacik	7b12876623	Btrfs: Create orphan inode records to prevent lost files after a crash Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Josef Bacik	33268eaf0b	Btrfs: Add ACL support Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Josef Bacik	aec7477b3b	Btrfs: Implement new dir index format Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	89642229a5	Btrfs: Search data ordered extents first for checksums on read Checksum items are not inserted into the tree until all of the io from a given extent is complete. This means one dirty page from an extent may be written, freed, and then read again before the entire extent is on disk and the checksum item is inserted. The checksums themselves are stored in the ordered extent so they can be inserted in bulk when IO is complete. On read, if a checksum item isn't found, the ordered extents were being searched for a checksum record. This all worked most of the time, but the checksum insertion code tries to reduce the number of tree operations by pre-inserting checksum items based on i_size and a few other factors. This means the read code might find a checksum item that hasn't yet really been filled in. This commit changes things to check the ordered extents first and only dive into the btree if nothing was found. This removes the need for extra locking and is more reliable. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	ed98b56a63	Btrfs: Take the csum mutex while reading checksums Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	f421950f86	Btrfs: Fix some data=ordered related data corruptions Stress testing was showing data checksum errors, most of which were caused by a lookup bug in the extent_map tree. The tree was caching the last pointer returned, and searches would check the last pointer first. But, search callers also expect the search to return the very first matching extent in the range, which wasn't always true with the last pointer usage. For now, the code to cache the last return value is just removed. It is easy to fix, but I think lookups are rare enough that it isn't required anymore. This commit also replaces do_sync_mapping_range with a local copy of the related functions. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	6af118ce51	Btrfs: Index extent buffers in an rbtree Before, extent buffers were a temporary object, meant to map a number of pages at once and collect operations on them. But, a few extra fields have crept in, and they are also the best place to store a per-tree block lock field as well. This commit puts the extent buffers into an rbtree, and ensures a single extent buffer for each tree block. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	4a09675279	Btrfs: Data ordered fixes * In btrfs_delete_inode, wait for ordered extents after calling truncate_inode_pages. This is much faster, and more correct * Properly clear our the PageChecked bit everywhere we redirty the page. * Change the writepage fixup handler to lock the page range and check to see if an ordered extent had been inserted since the improperly dirtied page was discovered * Wait for ordered extents outside the transaction. This isn't required for locking rules but does improve transaction latencies * Reduce contention on the alloc_mutex by dropping it while incrementing refs on a node/leaf and while dropping refs on a leaf. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	e5a2217ef6	Fix btrfs_wait_ordered_extent_range to properly wait Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	7f3c74fb83	Btrfs: Keep extent mappings in ram until pending ordered extents are done It was possible for stale mappings from disk to be used instead of the new pending ordered extent. This adds a flag to the extent map struct to keep it pinned until the pending ordered extent is actually on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	211f90e68b	Btrfs: Don't allow releasepage to succeed if EXTENT_ORDERED is set Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	3edf7d33f4	Btrfs: Handle data checksumming on bios that span multiple ordered extents Data checksumming is done right before the bio is sent down the IO stack, which means a single bio might span more than one ordered extent. In this case, the checksumming data is split between two ordered extents. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	eb84ae039e	Btrfs: Cleanup and comment ordered-data.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	ee6e6504e1	Add a per-inode lock around btrfs_drop_extents btrfs_drop_extents is always called with a range lock held on the inode. But, it may operate on extents outside that range as it drops and splits them. This patch adds a per-inode mutex that is held while calling btrfs_drop_extents and while inserting new extents into the tree. It prevents races from two procs working against adjacent ranges in the tree. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	ba1da2f442	Btrfs: Don't pin pages in ram until the entire ordered extent is on disk. Checksum items are not inserted until the entire ordered extent is on disk, but individual pages might be clean and available for reclaim long before the whole extent is on disk. In order to allow those pages to be freed, we need to be able to search the list of ordered extents to find the checksum that is going to be inserted in the tree. This way if the page needs to be read back in before the checksums are in the btree, we'll be able to verify the checksum on the page. This commit adds the ability to search the pending ordered extents for a given offset in the file, and changes btrfs_releasepage to allow ordered pages to be freed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	f929574938	btrfs_start_transaction: wait for commits in progress to finish btrfs_commit_transaction has to loop waiting for any writers in the transaction to finish before it can proceed. btrfs_start_transaction should be polite and not join a transaction that is in the process of being finished off. There are a few places that can't wait, basically the ones doing IO that might be needed to finish the transaction. For them, btrfs_join_transaction is added. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	dbe674a99c	Btrfs: Update on disk i_size only after pending ordered extents are done This changes the ordered data code to update i_size after the extent is on disk. An on disk i_size is maintained in the in-memory btrfs inode structures, and this is updated as extents finish. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	247e743cbe	Btrfs: Use async helpers to deal with pages that have been improperly dirtied Higher layers sometimes call set_page_dirty without asking the filesystem to help. This causes many problems for the data=ordered and cow code. This commit detects pages that haven't been properly setup for IO and kicks off an async helper to deal with them. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	e6dcd2dc9c	Btrfs: New data=ordered implementation The old data=ordered code would force commit to wait until all the data extents from the transaction were fully on disk. This introduced large latencies into the commit and stalled new writers in the transaction for a long time. The new code changes the way data allocations and extents work: * When delayed allocation is filled, data extents are reserved, and the extent bit EXTENT_ORDERED is set on the entire range of the extent. A struct btrfs_ordered_extent is allocated an inserted into a per-inode rbtree to track the pending extents. * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding to that page. * When all of the bytes corresponding to a single struct btrfs_ordered_extent are written, The previously reserved extent is inserted into the FS btree and into the extent allocation trees. The checksums for the file data are also updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	1b1e2135dc	Btrfs: Add a per-inode csum mutex to avoid races creating csum items Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	89ce8a63d0	Add btrfs_end_transaction_throttle to force writers to wait for pending commits The existing throttle mechanism was often not sufficient to prevent new writers from coming in and making a given transaction run forever. This adds an explicit wait at the end of most operations so they will allow the current transaction to close. There is no wait inside file_write, inode updates, or cow filling, all which have different deadlock possibilities. This is a temporary measure until better asynchronous commit support is added. This code leads to stalls as it waits for data=ordered writeback, and it really needs to be fixed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	594a24eb0e	Fix btrfs_del_ordered_inode to allow forcing the drop during unlinks This allows us to delete an unlinked inode with dirty pages from the list instead of forcing commit to write these out before deleting the inode. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	a213501153	Btrfs: Replace the big fs_mutex with a collection of other locks Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	925baeddc5	Btrfs: Start btree concurrency work. The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Christoph Hellwig	f46b5a66b3	Btrfs: split out ioctl.c Split the ioctl handling out of inode.c into a file of it's own. Also fix up checkpatch.pl warnings for the moved code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	8b71284292	Btrfs: Add async worker threads for pre and post IO checksumming Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Sage Weil	6bf13c0cc8	Btrfs: transaction ioctls These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Sven Wegener	3b96362cc8	Btrfs: Invalidate dcache entry after creating snapshot and We need to invalidate an existing dcache entry after creating a new snapshot or subvolume, because a negative dache entry will stop us from accessing the new snapshot or subvolume. --- ctree.h \| 23 +++++++++++++++++++++++ inode.c \| 4 ++++ transaction.c \| 4 ++++ 3 files changed, 31 insertions(+) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Mingming	e1b81e6761	btrfs delete ordered inode handling fix Use btrfs_release_file instead of a put_inode call Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	211c17f51f	Fix corners in writepage and btrfs_truncate_page The extent_io writepage calls needed an extra check for discarding pages that started on th last byte in the file. btrfs_truncate_page needed checks to make sure the page was still part of the file after reading it, and most importantly, needed to wait for all IO to the page to finish before freeing the corresponding extents on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	1259ab75c6	Btrfs: Handle write errors on raid1 and raid10 When duplicate copies exist, writes are allowed to fail to one of those copies. This changeset includes a few changes that allow the FS to continue even when some IOs fail. It also adds verification of the parent generation number for btree blocks. This generation is stored in the pointer to a block, and it ensures that missed writes to are detected. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	bbaf549e0c	Btrfs: A number of nodatacow fixes Once part of a delalloc request fails the cow checks, just cow the entire range It is possible for the back references to all be from the same root, but still have snapshots against an extent. The checks are now more strict, forcing cow any time there are multiple refs against the data extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	a68d5933a0	Btrfs: Update nodatacow mode to support cloned single files and resizing Before, nodatacow only checked to make sure multiple roots didn't have references on a single extent. This check makes sure that multiple inodes don't have references. nodatacow needed an extra check to see if the block group was currently readonly. This way cows forced by the chunk relocation code are honored. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	a061fc8da7	Btrfs: Add support for online device removal This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	5d9cd9ecbf	Btrfs: Fix clone ioctl to not hold the path over inserts Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	b9d86667c9	Btrfs: Silence bogus inode.c compiler warnings Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Sage Weil	f2eb0a241f	Btrfs: Clone file data ioctl Add a new ioctl to clone file data Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	ec44a35cbe	Btrfs: Add balance ioctl to restripe the chunks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	788f20eb5a	Btrfs: Add new ioctl to add devices Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	8e7bf94fd5	Btrfs: Do more optimal file RA during shrinking and defrag Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	8f18cf1339	Btrfs: Make the resizer work based on shrinking and growing devices Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	81d7ed29ff	Btrfs: Throttle file_write when data=ordered is flushing the inode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	bcbfce8abd	Btrfs: Fix the unplug_io_fn to grab a consistent copy of page->mapping Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	e1c4b7451e	Fix btrfs_get_extent and get_block corner cases, and disable O_DIRECT reads The generic O_DIRECT code assumes all the bios have the same bdev, which isn't true for multi-device btrfs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	f2d8d74d78	Btrfs: Make an unplug function that doesn't unplug every spindle Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	4ef64eae28	Btrfs: Remove debugging statements from the invalidatepage calls Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	9ad6b7bc2e	Force page->private removal in btrfs_invalidatepage btrfs_invalidatepage is not allowed to leave pages around on the lru. Any such pages will trigger an oops later on because the VM will see page->private and assume it is a buffer head. This also forces extra flushes of the async work queues before dropping all the pages on the btree inode during unmount. Left over items on the work queues are one possible cause of busy state ranges during truncate_inode_pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	3b951516ed	Btrfs: Use the extent map cache to find the logical disk block during data retries The data read retry code needs to find the logical disk block before it can resubmit new bios. But, finding this block isn't allowed to take the fs_mutex because that will deadlock with a number of different callers. This changes the retry code to use the extent map cache instead, but that requires the extent map cache to have the extent we're looking for. This is a problem because btrfs_drop_extent_cache just drops the entire extent instead of the little tiny part it is invalidating. The bulk of the code in this patch changes btrfs_drop_extent_cache to invalidate only a portion of the extent cache, and changes btrfs_get_extent to deal with the results. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	699122f559	Btrfs: Don't wait on tree block writeback before freeing them anymore This isn't required anymore because we don't reallocate blocks that have already been written in this transaction. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	e015640f9c	Btrfs: Write bio checksumming outside the FS mutex This significantly improves streaming write performance by allowing concurrency in the data checksumming. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	44b8bd7edd	Btrfs: Create a work queue for bio writes This allows checksumming to happen in parallel among many cpus, and keeps us from bogging down pdflush with the checksumming code. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	98d20f67cf	Add a min size parameter to btrfs_alloc_extent On huge machines, delayed allocation may try to allocate massive extents. This change allows btrfs_alloc_extent to return something smaller than the caller asked for, and the data allocation routines will loop over the allocations until it fills the whole delayed alloc. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	587f77043a	Btrfs: Fixup a few u64<->pointer casts for 32 bit Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	1643298592	Btrfs: Add O_DIRECT read and write (writes == buffered + cache flush) This adds basic O_DIRECT read and write support. In the write case, we just do a normal buffered write followed by a cache flush. O_DIRECT + O_SYNC are required to trigger metadata syncs. In the read case, there is a basic btrfs_get_block call for use by the generic O_DIRECT code. This does honor multi-volume mapping rules but it skips all checksumming. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	7e38326f5b	Btrfs: Handle checksumming errors while reading data blocks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	f188591e98	Btrfs: Retry metadata reads in the face of checksum failures Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	22c599485b	Btrfs: Handle data block end_io through the async work queue Before it was done by the bio end_io routine, the work queue code is able to scale much better with faster IO subsystems. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	cea9e4452e	Change btrfs_map_block to return a structure with mappings for all stripes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	8790d502e4	Btrfs: Add support for mirroring across drives Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	0416008814	Create a btrfs backing dev info This allows intelligent versions of unplug and congestion functions Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	593060d756	Btrfs: Implement raid0 when multiple devices are present Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	239b14b32d	Btrfs: Bring back mount -o ssd optimizations Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	6324fbf334	Btrfs: Dynamic chunk and block group allocation Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	0b86a832a1	Btrfs: Add support for multiple devices per filesystem Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	6885f308b5	Btrfs: Misc 2.6.25 updates Remove the btrfs read_inode method, and use save_mount_options Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	065631f6dc	Btrfs: checksum file data at bio submission time instead of during writepage When we checkum file data during writepage, the checksumming is done one page at a time, making it difficult to do bulk metadata modifications to insert checksums for large ranges of the file at once. This patch changes btrfs to checksum on a per-bio basis instead. The bios are checksummed before they are handed off to the block layer, so each bio is contiguous and only has pages from the same inode. Checksumming on a bio basis allows us to insert and modify the file checksum items in large groups. It also allows the checksumming to be done more easily by async worker threads. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Yan Zheng	5e591a0703	Btrfs: Fix looping on readdir of the subvol roots Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	9069218d44	Btrfs: Fix i_blocks accounting Now that delayed allocation accounting works, i_blocks accounting is changed to only modify i_blocks when extents inserted or removed. The fillattr call is changed to include the delayed allocation byte count in the i_blocks result. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Yan	c2e639f02c	Btrfs: Fix typo in extent_io.c --- Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	b0c68f8bed	Btrfs: Enable delalloc accounting Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	1b0f7c29e2	Fix hole start calculation in btrfs_settar Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	f392a938f3	Properly align the hole size in btrfs_setattr Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Yan	b1632b10c0	Btrfs: Align extent length to sectorsize in --- Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	291d673e6a	Btrfs: Do delalloc accounting via hooks in the extent_state code Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	9c58309d6c	Btrfs: Add inode item and backref in one insert, reducing cpu usage Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	85e21bac16	Btrfs: During deletes and truncate, remove many items at once from the tree Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	70dec8079d	Btrfs: extent_io and extent_state optimizations The end_bio routines are changed to take a pointer to the extent state struct, and the state tree is walked in order to set/clear appropriate bits as IO completes. This greatly reduces the number of rbtree searches done by the end_bio handlers, and reduces lock contention. The extent_io releasepage function is changed to avoid expensive searches for locked state. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	aadfeb6e39	Btrfs: Add some extra debugging around file data checksum failures Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	c2a8b6e110	Btrfs: Force f_pos to the max when a readdir hits the end of the directory. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	d1310b2e0c	Btrfs: Split the extent_map code into two parts There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	5f56406aab	Btrfs: Fix hole insertion corner cases There were a few places that could cause duplicate extent insertion, this adjusts the code that creates holes to avoid it. lookup_extent_map is changed to correctly return all of the extents in a range, even when there are none matching at the start of the range. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Yan	fb4bc1e056	Btrfs: Fix compile on 2.6.22 kernel This patch fixes compile error on kernel-2.6.22 Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	2da98f003f	Btrfs: Run igrab on data=ordered inodes to prevent deadlocks during writeout Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	9cce6c3bfc	Btrfs: Disable delalloc accounting for now Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	cee36a03e8	Rework btrfs_drop_inode to avoid scheduling Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	61295eb866	Btrfs: Add drop inode func to avoid data=ordered deadlock Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	8c416c9e0d	Btrfs: Delete any remaining extent_maps before freeing the inode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Yan	fdebe2bd70	Btrfs: Add readonly inode flag This patch adds readonly inode flag support. A file with this flag can't be modified, but can be deleted. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Yan	b98b6767a0	Btrfs: Add inode flags support This patch adds NODATASUM & NODATACOW inode flags support. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	c31f8830f0	Btrfs: online shrinking fixes While shrinking the FS, the allocation functions need to make sure they don't try to allocate bytes past the end of the FS. nodatacow needed an extra check to force cows when the existing extents are past the end of the FS. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	e2008b6140	Btrfs: Add some simple throttling to wait for data=ordered and snapshot deletion Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	3063d29f2a	Btrfs: Move snapshot creation to commit time It is very difficult to create a consistent snapshot of the btree when other writers may update the btree before the commit is done. This changes the snapshot creation to happen during the commit, while no other updates are possible. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	dc17ff8f11	Btrfs: Add data=ordered support This forces file data extents down the disk along with the metadata that references them. The current implementation is fairly simple, and just writes out all of the dirty pages in an inode before the commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	d666746207	Btrfs: Change st_blocksize to 4k Some programs (python) do rwm cycles at the granularity returned by stat. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	bd09835d9a	count_snapshots: Properly update the leaf pointer after btrfs_next_leaf Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	f9ef6604ac	Btrfs: 32 bit compile fixes for the resizer and enospc checks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	4313b3994d	Btrfs: Reduce stack usage in the resizer, fix 32 bit compiles Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	56b453c92f	Btrfs: Explicitly send a root objectid to count_snapshots_in_path Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	8f662a76c6	Btrfs: Add readahead to the online shrinker, and a mount -o alloc_start= for testing Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	e52ec0eb62	Btrfs: Fix NULL block groups on reading the inode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	edbd8d4efe	Btrfs: Support for online FS resize (grow and shrink) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	5d4fb734b4	Btrfs: Fix an off by one in the extent_map prepare write code Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	1832a6d5ee	Btrfs: Implement basic support for -ENOSPC This is intended to prevent accidentally filling the drive. A determined user can still make things oops. It includes some accounting of the current bytes under delayed allocation, but this will change as things get optimized Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	879c1cfc31	Btrfs: Fix nodatacow extent lookup Yan Zheng noticed the offset into the extent was incorrectly being added to the extent start before trying to find it in the extent allocation tree. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	190662b212	Btrfs: Fix delayed allocation to avoid missing delalloc extents find_lock_delalloc_range could exit out too early Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	4aec2b5232	kmalloc a few large stack objects in the btrfs_ioctl path Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	6da6abae02	Btrfs: Back port to 2.6.18-el kernels Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	c59f8951d4	Btrfs: Add mount option to enforce a max extent size Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	be20aa9dba	Btrfs: Add mount option to turn off data cow A number of workloads do not require copy on write data or checksumming. mount -o nodatasum to disable checksums and -o nodatacow to disable both copy on write and checksumming. In nodatacow mode, copy on write is still performed when a given extent is under snapshot. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	b6cda9bcb4	Btrfs: Add mount -o nodatasum to turn of file data checksumming Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	e9906a9849	Fixes for loopback files in btrfs Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	7a7205367d	Btrfs: Fix typo in .. check (thanks Yan) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	76fea00a05	Btrfs: Add backrefs for symbolic link inodes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	3954401fa6	Btrfs: Add back pointers from the inode to the directory that references it Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	d8d5f3e16d	Btrfs: Add lowest key information to back refs for extent tree blocks as well. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	7bb86316c3	Btrfs: Add back pointers from extents to the btree or file referencing them Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Yan	9691975dd6	Btrfs: Fix buffer get/release issue in create_snapshot btrfs_cow_block expects a reference to be held on the buffer being cow'd. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Josef Bacik	5103e947b9	xattr support for btrfs Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	3ab2fb5a8c	Btrfs: Add readpages support Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Yan	008630c17c	Properly delete csum item in btrfs_truncate_in_trans. When 'item_end' is equal to 'inode->i_size', 'found_type' is updated and current item is skipped. This behavior is correct for extent item, but incorrect for csum item. For example, there is a csum item with 'offset == 0'. When deleting the inode, 'inode->i_size' is set to 0, so the csum item isn't deleted. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	b293f02e14	Btrfs: Add writepages support Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	179e29e488	Btrfs: Fix a number of inline extent problems that Yan Zheng reported. The fixes do a number of things: 1) Most btrfs_drop_extent callers will try to leave the inline extents in place. It can truncate bytes off the beginning of the inline extent if required. 2) writepage can now update the inline extent, allowing mmap writes to go directly into the inline extent. 3) btrfs_truncate_in_transaction truncates inline extents 4) extent_map.c fixed to not merge inline extent mappings and hole mappings together Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	35ebb934bd	Btrfs: Fix PAGE_CACHE_SHIFT shifts on 32 bit machines Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Yan	689f934661	Fix inline extent handling in btrfs_get_extent 1. Reorder kmap and the test for 'page != NULL' 2. Zero-fill rest area of a block when inline extent isn't big enough. 3. Do not insert extent_map into the map tree when page == NULL. (If insert the extent_map into the map tree, subsequent read requests will find it in the map tree directly and the corresponding inline extent data aren't copied into page by the the get_extent function. extent_read_full_page can't handle that case) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	44ec0b7179	Btrfs: Compile fixes for 2.6.24-rc1 Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Yan	134d451201	Fix ENOTEMPTY check in btrfs_rmdir The ENOTEMPTY check in btrfs_rmdir isn't reliable. It's possible that the backward search finds . or .. at first, then some other directory entry. In that case, btrfs_rmdir delete . or .. improperly. The patch also fixes a fs_mutex unlock issue in btrfs_rmdir. -- Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Yan	0d9f7f3e27	btrfs_inode_by_name return random value. When inode is found, the return value is from the uninitialized variable 'ret'. -- Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Yan	65555a06b4	Btrfs: Off by one fixes in extent_map.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	f578d4bd7e	Btrfs: Optimize csum insertion to create larger items when possible This reduces the number of calls to btrfs_extend_item and greatly lowers the cpu usage while writing large files. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Jens Axboe	bbf0d0062d	Btrfs: KM_IRQ0 usage in end_io handling endio handling is typically called with interrupts disabled, but can also be called with it enabled. So save interrupts before using KM_IRQ0 to be completely safe. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Jens Axboe	ae2f5411c4	btrfs: 32-bit type problems An assorted set of casts to get rid of the warnings on 32-bit archs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	ff79f8190b	Btrfs: Add back file data checksumming Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	19c00ddcc3	Btrfs: Add back metadata checksumming Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	3326d1b07c	Btrfs: Allow tails larger than one page Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	db94535db7	Btrfs: Allow tree blocks larger than the page size Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	5f39d397df	Btrfs: Create extent_buffer interface for large blocksizes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	50b78c24d5	btrfs_get_extent should treat inline extents as though they hold a whole block Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Christoph Hellwig	b3cfa35a49	Btrfs: factor page private preparations into a helper Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Yan	8e1cd76664	Btrfs: Fix double free and off by one in inode.c The first change removes potential double free, the second fix a off by one error. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Yan	bab9fb035f	Btrfs: truncate: don't update inode->i_blocks when extent is a hole I think check whether extent is a hole before update 'inode->i_blocks' is unconditional required. (original codes check it only when del_item isn't equal to 0) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Yan	23223584e4	create btrfs_path slab with the correct size Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Yan	a61721d5b7	fix found_type decrement in btrfs_truncate_in_trans found_type has already been decreased by codes above the change, I think decrease it by one again doesn't make sense. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	d3c2fdcf7b	Btrfs: Use balance_dirty_pages_nr on btree blocks btrfs_btree_balance_dirty is changed to pass the number of pages dirtied for more accurate dirty throttling. This lets the VM make better decisions about when to force some writeback. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:00:48 -04:00
Christoph Hellwig	d03581f434	split up btrfs_ioctl Add a helper per ioctl function to make the code more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-14 10:22:57 -04:00
Christoph Hellwig	34287aa360	Btrfs: use unlocked_ioctl No reason to grab the BKL before calling into the btrfs ioctl code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-14 10:22:47 -04:00
Chris Mason	93a6925ec1	Btrfs: Fix extra link count dec in rename Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-14 09:42:31 -04:00
Christoph Hellwig	d396c6f554	Btrfs: [PATCH] extent_map: provide generic bmap generic_bmap is completely trivial, while the extent to bh mapping in btrfs is rather complex. So provide a extent_bmap instead that takes a get_extent callback and can be used by filesystem using the extent_map code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-10 20:02:30 -04:00
Chris Mason	011410bd85	Btrfs: Add more synchronization before creating a snapshot File data checksums are only done during writepage, so we have to make sure all pages are written when the snapshot is taken. This also adds some locking so that new writes don't race in and add new dirty pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-10 19:58:36 -04:00
Chris Mason	86479a04ee	Add support for defragging files via btrfsctl -d. Avoid OOM on extent tree defrag. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-10 19:58:16 -04:00
Chris Mason	2bf5a725a3	Btrfs: fsx delalloc fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-30 11:54:02 -04:00
Chris Mason	07157aacb1	Btrfs: Add file data csums back in via hooks in the extent map code Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-30 08:50:51 -04:00
Yan	1b4ab1bb4b	Btrfs: Fix mknod to properly send rdev info back to disk Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-29 09:11:44 -04:00
Josef Bacik	58176a9604	Btrfs: Add per-root block accounting and sysfs entries Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-29 15:47:34 -04:00
Chris Mason	b888db2bd7	Btrfs: Add delayed allocation to the extent based page tree code Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-27 16:49:44 -04:00
Chris Mason	a52d9a8033	Btrfs: Extent based page cache code. This uses an rbtree of extents and tests instead of buffer heads. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-27 16:49:44 -04:00
Chris Mason	83df7c1d8b	Btrfs: Make sure to cow the root during a snapshot Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-27 16:49:44 -04:00
Chris Mason	2cc58cf24f	Btrfs: Do more extensive readahead during tree searches Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-27 16:49:44 -04:00
Josef Bacik	15ee9bc7ed	Btrfs: delay commits during fsync to allow more writers Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-10 16:22:09 -04:00
Chris Mason	e9d0b13b5b	Btrfs: Btree defrag on the extent-mapping tree as well Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-10 14:06:19 -04:00
Chris Mason	6702ed490c	Btrfs: Add run time btree defrag, and an ioctl to force btree defrag This adds two types of btree defrag, a run time form that tries to defrag recently allocated blocks in the btree when they are still in ram, and an ioctl that forces defrag of all btree blocks. File data blocks are not defragged yet, but this can make a huge difference in sequential btree reads. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-07 16:15:09 -04:00
Chris Mason	3c69faecb8	Btrfs: Fold some btree readahead routines into something more generic. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-07 15:52:22 -04:00
Chris Mason	92fee66d49	Btrfs: deal with api changes in 2.6.23-rc1 Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-07-25 12:31:35 -04:00
Josef Bacik	618e21d595	Btrfs: Implement mknod Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-07-11 10:18:17 -04:00
Zach Brown	ec6b910fb3	Btrfs: trivial include fixups Almost none of the files including module.h need to do so, remove them. Include sched.h in extent-tree.c to silence a warning about cond_resched() being undeclared. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-07-11 10:00:37 -04:00
Chris Mason	ccd467d60e	Btrfs: crash recovery fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-28 15:57:36 -04:00
Chris Mason	79c44584ea	Btrfs: Fix mtime and ctime updates on parent dirs Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-25 10:09:33 -04:00
Chris Mason	5eda7b5e9b	Btrfs: Add the ability to find and remove dead roots after a crash. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-22 14:16:25 -04:00
Chris Mason	54aa1f4dfd	Btrfs: Audit callers and return codes to make sure -ENOSPC gets up the stack Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-22 14:16:25 -04:00
Chris Mason	8c2383c3dd	Subject: Rework btrfs_file_write to only allocate while page locks are held Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-18 09:57:58 -04:00
Chris Mason	9ebefb180b	Btrfs: patch queue: page_mkwrite Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-15 13:50:00 -04:00
Aneesh	f1ace244c8	btrfs: Code cleanup Attaching below is some of the code cleanups that i came across while reading the code. a) alloc_path already calls init_path. b) Mention that btrfs_inode is the in memory copy.Ext4 have ext4_inode_info as the in memory copy ext4_inode as the disk copy Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-13 16:18:26 -04:00
Chris Mason	340887809d	Btrfs: i386 fixes from axboe Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-12 11:36:58 -04:00
Chris Mason	6cbd557078	Btrfs: add GPLv2 Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-12 09:07:21 -04:00
Chris Mason	8a712645c3	Btrfs: no slashes in subvolume names Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-12 08:21:35 -04:00
Chris Mason	39279cc3d2	Btrfs: split up super.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-12 06:35:45 -04:00

... 3 4 5 6 7 ...

415 Commits