Commit Graph

136 Commits

Author SHA1 Message Date
Tao Ma 555936bfcb ocfs2: Abstract extent split process.
ocfs2_mark_extent_written actually does the following things:
1. check the parameters.
2. initialize the left_path and split_rec.
3. call __ocfs2_mark_extent_written. it will do:
   1) check the flags of unwritten
   2) do the real split work.
The whole process is packed tightly somehow. So this patch
will abstract 2 different functions so that future b-tree
operation can work with it.

1. __ocfs2_split_extent will accept path and split_rec and do
  the real split work.
2. ocfs2_change_extent_flag will accept a new flag and initialize
   path and split_rec.

So now ocfs2_mark_extent_written will do:
1. check the parameters.
2. call ocfs2_change_extent_flag.
   1) initalize the left_path and split_rec.
   2) check whether the new flags conflict with the old one.
   3) call __ocfs2_split_extent to do the split.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:31 -07:00
Tao Ma 853a3a1439 ocfs2: Wrap ocfs2_extent_contig in ocfs2_extent_tree.
Add a new operation eo_ocfs2_extent_contig int the extent tree's
operations vector. So that with the new refcount tree, We want
this so that refcount trees can always return CONTIG_NONE and
prevent extent merging.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:30 -07:00
Joel Becker 5e404e9ed1 ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree().
With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info.  Phew!

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:13 -07:00
Joel Becker a1cf076ba9 ocfs2: __ocfs2_mark_extent_written() doesn't need struct inode.
We only allow unwritten extents on data, so the toplevel
ocfs2_mark_extent_written() can use an inode all it wants.  But the
subfunction isn't even using the inode argument.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:12 -07:00
Joel Becker f3868d0fa2 ocfs2: Teach ocfs2_replace_extent_rec() to use an extent_tree.
Don't use a struct inode anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:11 -07:00
Joel Becker d231129f44 ocfs2: ocfs2_split_and_insert() no longer needs struct inode.
It already has an extent_tree.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:11 -07:00
Joel Becker dbdcf6a48a ocfs2: ocfs2_remove_extent() no longer needs struct inode.
One more generic btree function that is isolated from struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:10 -07:00
Joel Becker cbee7e1a6a ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode.
One more function that doesn't need a struct inode to pass to its
children.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:09 -07:00
Joel Becker cc79d8c19e ocfs2: ocfs2_insert_extent() no longer needs struct inode.
One more function down, no inode in the entire insert-extent chain.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:09 -07:00
Joel Becker 92ba470c44 ocfs2: Make extent map insertion an extent_tree_operation.
ocfs2_insert_extent() wants to insert a record into the extent map if
it's an inode data extent.  But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree.  Other tree types
can leave it empty.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:08 -07:00
Joel Becker 627961b77e ocfs2: ocfs2_figure_insert_type() no longer needs struct inode.
It's not using it, so remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:08 -07:00
Joel Becker 1ef61b3314 ocfs2: Remove inode from ocfs2_figure_extent_contig().
It already has an ocfs2_extent_tree and doesn't need the inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:07 -07:00
Joel Becker a29702914a ocfs2: Swap inode for extent_tree in ocfs2_figure_merge_contig_type().
We don't want struct inode in generic btree operations.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:07 -07:00
Joel Becker b4a176515c ocfs2: ocfs2_extent_contig() only requires the superblock.
Don't pass the inode in.  We don't want it around for generic btree
operations.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:05 -07:00
Joel Becker 3505bec018 ocfs2: ocfs2_do_insert_extent() and ocfs2_insert_path() no longer need an inode.
They aren't using it, so remove it from their parameter lists.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:05 -07:00
Joel Becker c38e52bb1c ocfs2: Give ocfs2_split_record() an extent_tree instead of an inode.
Another on the way to generic btree functions.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:05 -07:00
Joel Becker d562862314 ocfs2: ocfs2_insert_at_leaf() doesn't need struct inode.
Give it an ocfs2_extent_tree and it is happy.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:04 -07:00
Joel Becker 4c911eefca ocfs2: Make truncating the extent map an extent_tree_operation.
ocfs2_remove_extent() wants to truncate the extent map if it's
truncating an inode data extent.  But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree.  Other tree types
can leave it empty.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:03 -07:00
Joel Becker 043beebb6c ocfs2: ocfs2_truncate_rec() doesn't need struct inode.
It's not using it anymore.  Remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:03 -07:00
Joel Becker d401dc12fc ocfs2: ocfs2_grow_branch() and ocfs2_append_rec_to_path() lose struct inode.
ocfs2_grow_branch() not really using it other than to pass it to the
subfunctions ocfs2_shift_tree_depth(), ocfs2_find_branch_target(), and
ocfs2_add_branch().  The first two weren't it either, so they drop the
argument.  ocfs2_add_branch() only passed it to
ocfs2_adjust_rightmost_branch(), which drops the inode argument and uses
the ocfs2_extent_tree as well.

ocfs2_append_rec_to_path() can be take an ocfs2_extent_tree instead of
the inode.  The function ocfs2_adjust_rightmost_records() goes along for
the ride.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:02 -07:00
Joel Becker c495dd24ac ocfs2: ocfs2_try_to_merge_extent() doesn't need struct inode.
It's not using it, so remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:02 -07:00
Joel Becker 4fe82c312a ocfs2: ocfs2_merge_rec_left/right() no longer need struct inode.
Drop it from the parameters - they already have ocfs2_extent_list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:01 -07:00
Joel Becker 70f18c08b4 ocfs2: ocfs2_rotate_tree_left() no longer needs struct inode.
It already gets ocfs2_extent_tree, so we can just use that.  This chains
to the same modification for ocfs2_remove_rightmost_path() and
ocfs2_rotate_rightmost_leaf_left().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:00 -07:00
Joel Becker e46f74dc35 ocfs2: __ocfs2_rotate_tree_left() doesn't need struct inode.
It already has struct ocfs2_extent_tree, which has the caching info.  So
we don't need to pass it struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:59 -07:00
Joel Becker 1e2dd63fe0 ocfs2: ocfs2_rotate_subtree_left() doesn't need struct inode.
It already has struct ocfs2_extent_tree, which has the caching info.  So
we don't need to pass it struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:59 -07:00
Joel Becker 09106bae05 ocfs2: ocfs2_update_edge_lengths() doesn't need struct inode.
Pass in the extent tree, which is all we need.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:58 -07:00
Joel Becker 1bbf0b8d60 ocfs2: ocfs2_rotate_tree_right() doesn't need struct inode.
We don't need struct inode in ocfs2_rotate_tree_right() anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:58 -07:00
Joel Becker 6136ca5f5f ocfs2: Drop struct inode from ocfs2_extent_tree_operations.
We can get to the inode from the caching information.  Other parent
types don't need it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:57 -07:00
Joel Becker 7dc0280567 ocfs2: Pass ocfs2_extent_tree to ocfs2_get_subtree_root()
Get rid of the inode argument.  Use extent_tree instead.  This means a
few more functions have to pass an extent_tree around.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:55 -07:00
Joel Becker 5c601aba8c ocfs2: Get inode out of ocfs2_rotate_subtree_root_right().
Pass the ocfs2_extent_list down through ocfs2_rotate_tree_right() and
get rid of struct inode in ocfs2_rotate_subtree_root_right().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:55 -07:00
Joel Becker 4619c73e7c ocfs2: ocfs2_complete_edge_insert() doesn't need struct inode at all.
Completely unused argument.  Get rid of it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:54 -07:00
Joel Becker 6641b0ce32 ocfs2: Pass ocfs2_extent_tree to ocfs2_unlink_path()
ocfs2_unlink_path() doesn't need struct inode, so let's pass it struct
ocfs2_extent_tree.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:54 -07:00
Joel Becker 42a5a7a9a5 ocfs2: ocfs2_create_new_meta_bhs() doesn't need struct inode.
Pass struct ocfs2_extent_tree into ocfs2_create_new_meta_bhs().  It no
longer needs struct inode or ocfs2_super.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:53 -07:00
Joel Becker facdb77f54 ocfs2: ocfs2_find_path() only needs the caching info
ocfs2_find_path and ocfs2_find_leaf() walk our btrees, reading extent
blocks.  They need struct ocfs2_caching_info for that, but not struct
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:53 -07:00
Joel Becker 3d03a305de ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block().
extent blocks belong to btrees on more than just inodes, so we want to
pass the ocfs2_caching_info structure directly to
ocfs2_read_extent_block().  A number of places in alloc.c can now drop
struct inode from their argument list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:52 -07:00
Joel Becker d9a0a1f83b ocfs2: Store the ocfs2_caching_info on ocfs2_extent_tree.
What do we cache?  Metadata blocks.  What are most of our non-inode metadata
blocks?  Extent blocks for our btrees.  struct ocfs2_extent_tree is the
main structure for managing those.  So let's store the associated
ocfs2_caching_info there.

This means that ocfs2_et_root_journal_access() doesn't need struct inode
anymore, and any place that has an et can refer to et->et_ci instead of
INODE_CACHE(inode).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:51 -07:00
Joel Becker 0cf2f7632b ocfs2: Pass struct ocfs2_caching_info to the journal functions.
The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions.  Thus the
journal locks a metadata cache with the cache io_lock function.  It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:50 -07:00
Joel Becker 8cb471e8f8 ocfs2: Take the inode out of the metadata read/write paths.
We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache.  This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:48 -07:00
Tao Ma 60e2ec4866 ocfs2: release the buffer head in ocfs2_do_truncate.
In ocfs2_do_truncate, we forget to release last_eb_bh which
will cause memleak. So call brelse in the end.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-17 12:50:35 -07:00
Tao Ma 82e12644cf ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records.
In ocfs2_adjust_adjacent_records, we will adjust adjacent records
according to the extent_list in the lower level. But actually
the lower level tree will either be a leaf or a branch. If we only
use ocfs2_is_empty_extent we will meet with some problem if the lower
tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead.
And actually only the leaf record can have holes. So add a BUG_ON
for non-leaf branch.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23 10:58:46 -07:00
Tao Ma 3c5e10683e ocfs2: Add extra credits and access the modified bh in update_edge_lengths.
In normal tree rotation left process, we will never touch the tree
branch above subtree_index and ocfs2_extend_rotate_transaction doesn't
reserve the credits for them either.

But when we want to delete the rightmost extent block, we have to update
the rightmost records for all the rightmost branch(See
ocfs2_update_edge_lengths), so we have to allocate extra credits for them.
What's more, we have to access them also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21 14:41:54 -07:00
Tao Ma 6b791bcc8b ocfs2: Adjust rightmost path in ocfs2_add_branch.
In ocfs2_add_branch, we use the rightmost rec of the leaf extent block
to generate the e_cpos for the newly added branch. In the most case, it
is OK but if the parent extent block's rightmost rec covers more clusters
than the leaf does, it will cause kernel panic if we insert some clusters
in it. The message is something like:
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression:
le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count)
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28,
next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1
 [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2]
 [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2]
 [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2]
...

The panic can be easily reproduced by the following small test case
(with bs=512, cs=4K, and I remove all the error handling so that it looks
clear enough for reading).

int main(int argc, char **argv)
{
	int fd, i;
	char buf[5] = "test";

	fd = open(argv[1], O_RDWR|O_CREAT);

	for (i = 0; i < 30; i++) {
		lseek(fd, 40960 * i, SEEK_SET);
		write(fd, buf, 5);
	}

	ftruncate(fd, 1146880);

	lseek(fd, 1126400, SEEK_SET);
	write(fd, buf, 5);

	close(fd);

	return 0;
}

The reason of the panic is that:
the 30 writes and the ftruncate makes the file's extent list looks like:

	Tree Depth: 1   Count: 19   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             280            86183
	SubAlloc Bit: 7   SubAlloc Slot: 0
	Blknum: 86183   Next Leaf: 0
	CRC32: 00000000   ECC: 0000
	Tree Depth: 0   Count: 28   Next Free Rec: 28
	## Offset        Clusters       Block#          Flags
	0  0             1              143368          0x0
	1  10            1              143376          0x0
	...
	26 260           1              143576          0x0
	27 270           1              143584          0x0

Now another write at 1126400(275 cluster) whiich will write at the gap
between 271 and 280 will trigger ocfs2_add_branch, but the result after
the function looks like:
	Tree Depth: 1   Count: 19   Next Free Rec: 2
	## Offset        Clusters       Block#
	0  0             280            86183
	1  271           0             143592
So the extent record is intersected and make the following operation bug out.

This patch just try to remove the gap before we add the new branch, so that
the root(branch) rightmost rec will cover the same right position. So in the
above case, before adding branch the tree will be changed to
	Tree Depth: 1   Count: 19   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             271            86183
	SubAlloc Bit: 7   SubAlloc Slot: 0
	Blknum: 86183   Next Leaf: 0
	CRC32: 00000000   ECC: 0000
	Tree Depth: 0   Count: 28   Next Free Rec: 28
	## Offset        Clusters       Block#          Flags
	0  0             1              143368          0x0
	1  10            1              143376          0x0
	...
	26 260           1              143576          0x0
	27 270           1              143584          0x0
And after branch add, the tree looks like
	Tree Depth: 1   Count: 19   Next Free Rec: 2
	## Offset        Clusters       Block#
	0  0             271            86183
	1  271           0             143592

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-15 14:49:43 -07:00
Mark Fasheh 9b7895efac ocfs2: Add a name indexed b-tree to directory inodes
This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
2009-04-03 11:39:15 -07:00
Tao Ma 74e77eb30d ocfs2: Fix a bug found by sparse check.
We need to use le32_to_cpu to test rec->e_cpos in
ocfs2_dinode_insert_check.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-03-12 16:46:01 -07:00
Tao Ma 47be12e4ee ocfs2: Access and dirty the buffer_head in mark_written.
In __ocfs2_mark_extent_written, when we meet with the situation
of c_split_covers_rec, the old solution just replace the extent
record and forget to access and dirty the buffer_head. This will
cause a problem when the unwritten extent is in an extent block.
So access and dirty it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:09 -08:00
Mark Fasheh fd4ef23196 ocfs2: add quota call to ocfs2_remove_btree_range()
We weren't reclaiming the clusters which get free'd from this function,
so any user punching holes in a file would still have those bytes accounted
against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this.
Interestingly enough, the journal credits calculation already took this into
account.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Jan Kara <jack@suse.cz>
2009-02-02 14:20:20 -08:00
Fernando Carrijo c19a28e119 remove lots of double-semicolons
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:14 -08:00
Tao Ma 9047beabb8 ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.
In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*()
functions", the wrong buffer_head is accessed. So change it
to the right buffer_head.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:37 -08:00
Joel Becker 2a50a743bd ocfs2: Create ocfs2_xattr_value_buf.
When an ocfs2 extended attribute is large enough to require its own
allocation tree, we root it with an ocfs2_xattr_value_root.  However,
these roots can be a part of inodes, xattr blocks, or xattr buckets.
Thus, they need a different journal access function for each container.

We wrap the bh, its journal access function, and the value root (xv) in
a structure called ocfs2_xattr_valu_buf.  This is a package that can
be passed around.  In this first pass, we simply pass it to the
extent tree code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:32 -08:00
Joel Becker 13723d00e3 ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.
The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out.  This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks.  It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root.  Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal.  We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions.  Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:32 -08:00