Commit Graph

235 Commits

Author SHA1 Message Date
Steven Whitehouse 6dbd822487 [GFS2] Reduce inode size by moving i_alloc out of line
It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.

In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:25 +00:00
Steven Whitehouse 65a6290998 [GFS2] Remove unused variable
The go_drop_th function is never called or referenced.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:16:19 +00:00
Bob Peterson c3f60b6e3a [GFS2] Eliminate the no longer needed sd_statfs_mutex
This patch eliminates the unneeded sd_statfs_mutex mutex but preserves
the ordering as discussed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:16 +00:00
Bob Peterson da6dd40d59 [GFS2] Journal extent mapping
This patch saves a little time when gfs2 writes to the journals by
keeping a mapping between logical and physical blocks on disk.
That's better than constantly looking up indirect pointers in
buffers, when the journals are several levels of indirection
(which they typically are).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:11:46 +00:00
Steven Whitehouse e35b921185 [GFS2] Don't periodically update the jindex
We only care about the content of the jindex in two cases,
one is when we mount the fs and the other is when we need
to recover another journal. In both cases we have to update
the jindex anyway, so there is no point in updating it
periodically between times, so this removes it to simplify
gfs2_logd.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:59 +00:00
Steven Whitehouse fd041f0b40 [GFS2] Use atomic_t for journal free blocks counter
This patch changes the counter which keeps track of the free
blocks in the journal to an atomic_t in preparation for the
following patch which will update the log reservation code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:54 +00:00
Steven Whitehouse 2bcd610d2f [GFS2] Don't add glocks to the journal
The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:52 +00:00
Steven Whitehouse e589665eb9 [GFS2] Remove flags no longer required
The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
depending upon which of the two waiters lists they were going to
be queued upon. They were then tested when the holders were taken
off the lists to ensure that the right type of holder was being
dequeued.

Since we are already using separate lists, there doesn't seem a
lot of point having these flags as well, and since setting them
and testing them is in the fast path for locking and unlocking
glock, this patch removes them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:44 +00:00
Steven Whitehouse c2932e03db [GFS2] Remove "reclaim limit"
This call to reclaim glocks is not needed, and in particular we don't want it
in the fast path for locking glocks. The limit was entirely arbitrary anyway
and we can't expect users to adjust things like this, the remaining code will
do the right thing on its own.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:37 +00:00
Steven Whitehouse 60b0d08779 [GFS2] Remove unused variables
These haven't been used for some time, remove them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:35 +00:00
Steven Whitehouse bf36a71316 [GFS2] Add gfs2_is_writeback()
This adds a function "gfs2_is_writeback()" along the lines of the
existing "gfs2_is_jdata()" in order to clean up the code and make
the various tests for the inode mode more obvious. It also fixes
the PageChecked() logic where we were resetting the flag too early
in the case of an error path.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:21 +00:00
Steven Whitehouse e7e36f1435 [GFS2] Remove unused field in struct gfs2_inode
Removes a field that is not used.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:18 +00:00
Steven Whitehouse f91a0d3e24 [GFS2] Remove useless i_cache from inodes
The i_cache was designed to keep references to the indirect blocks
used during block mapping so that they didn't have to be looked
up continually. The idea failed because there are too many places
where the i_cache needs to be freed, and this has in the past been
the cause of many bugs.

In addition there was no performance benefit being gained since the
disk blocks in question were cached anyway. So this patch removes
it in order to simplify the code to prepare for other changes which
would otherwise have had to add further support for this feature.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:16 +00:00
Steven Whitehouse 3cc3f710ce [GFS2] Use ->page_mkwrite() for mmap()
This cleans up the mmap() code path for GFS2 by implementing the
page_mkwrite function for GFS2. We are thus able to use the
generic filemap_fault function for our ->fault() implementation.

This now means that shared writable mappings will be much more
efficiently shared across the cluster if there is a reasonable
proportion of read activity (the greater proportion, the better).

As a side effect, it also reduces the size of the code, removes
special cases from readpage and readpages, and makes the code
path easier to follow.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:13 +00:00
Wendy Cheng cc7e79b168 [GFS2] Handle multiple glock demote requests
Fix a race condition where multiple glock demote requests are sent to
a node back-to-back. This patch does a check inside handle_callback()
to see whether a demote request is in progress. If true, it sets a flag
to make sure run_queue() will loop again to handle the new request,
instead of erronously setting gl_demote_state to a different state.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:09 +00:00
Steven Whitehouse 16615be18c [GFS2] Clean up journaled data writing
This patch cleans up the code for writing journaled data into the log.
It also removes the need to allocate a small "tag" structure for each
block written into the log. Instead we just keep count of the outstanding
I/O so that we can be sure that its all been written at the correct time.
Another result of this patch is that a number of ll_rw_block() calls
have become submit_bh() calls, closing some races at the same time.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:24 +01:00
Steven Whitehouse 82e86087bb [GFS2] Replace revoke structure with bufdata structure
Both the revoke structure and the bufdata structure are quite similar.
They are basically small tags which are put on lists. In addition to
which the revoke structure is always allocated when there is a bufdata
structure which is (or can be) freed. As such it should be possible to
reduce the number of frees and allocations by using the same structure
for both purposes.

This patch is the first step along that path. It replaces existing uses
of the revoke structure with the bufdata structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:07 +01:00
Steven Whitehouse d7b616e252 [GFS2] Clean up ordered write code
The following patch removes the ordered write processing from
databuf_lo_before_commit() and moves it to log.c. This has the effect of
greatly simplyfying databuf_lo_before_commit() and well as potentially
making the ordered write code more efficient.

As a side effect of this, its now possible to remove ordered buffers
from the ordered buffer list at any time, so we now make use of this in
invalidatepage and releasepage to ensure timely release of these
buffers.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:03 +01:00
Benjamin Marzinski c4f68a130f [GFS2] delay glock demote for a minimum hold time
When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
too early, and is forced to call unmap_mapping_range(). This bumps the
mapping->truncate_count sequence count, forcing do_no_page() to retry. This
patch institutes a minimum glock hold time a tenth a second.  This insures
that even in heavy contention cases, the node has enough time to get some
useful work done before it gives up the glock.

A second issue is that when gfs2_glock_dq() is called from within a page fault
to demote a lock, and the associated page needs to be written out, it will
try to acqire a lock on it, but it has already been locked at a higher level.
This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
issue. This is the same patch as Steve Whitehouse originally proposed to fix
this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
before it queues up the work on it.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:48 +01:00
Steven Whitehouse 8fbbfd214c [GFS2] Reduce number of gfs2_scand processes to one
We only need a single gfs2_scand process rather than the one
per filesystem which we had previously. As a result the parameter
determining the frequency of gfs2_scand runs becomes a module
parameter rather than a mount parameter as it was before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:08 +01:00
Robert Peterson 2332c4435b [GFS2] assertion failure after writing to journaled file, umount
This patch passes all my nasty tests that were causing the code to
fail under one circumstance or another.  Here is a complete summary
of all changes from today's git tree, in order of appearance:

1. There are now separate variables for metadata buffer accounting.
2. Variable sd_log_num_hdrs is no longer needed, since the header
   accounting is taken care of by the reserve/refund sequence.
3. Fixed a tiny grammatical problem in a comment.
4. Added a new function "calc_reserved" to calculate the reserved
   log space.  This isn't entirely necessary, but it has two benefits:
   First, it simplifies the gfs2_log_refund function greatly.
   Second, it allows for easier debugging because I could sprinkle the
   code with calls to this function to make sure the accounting is
   proper (by adding asserts and printks) at strategic point of the code.
5. In log_pull_tail there apparently was a kludge to fix up the
   accounting based on a "pull" parameter.  The buffer accounting is
   now done properly, so the kludge was removed.
6. File sync operations were making a call to gfs2_log_flush that
   writes another journal header.  Since that header was unplanned
   for (reserved) by the reserve/refund sequence, the free space had
   to be decremented so that when log_pull_tail gets called, the free
   space is be adjusted properly.  (Did I hear you call that a kludge?
   well, maybe, but a lot more justifiable than the one I removed).
7. In the gfs2_log_shutdown code, it optionally syncs the log by
   specifying the PULL parameter to log_write_header.  I'm not sure
   this is necessary anymore.  It just seems to me there could be
   cases where shutdown is called while there are outstanding log
   buffers.
8. In the (data)buf_lo_before_commit functions, I changed some offset
   values from being calculated on the fly to being constants.	That
   simplified some code and we might as well let the compiler do the
   calculation once rather than redoing those cycles at run time.
9. This version has my rewritten databuf_lo_add function.
   This version is much more like its predecessor, buf_lo_add, which
   makes it easier to understand.  Again, this might not be necessary,
   but it seems as if this one works as well as the previous one,
   maybe even better, so I decided to leave it in.
10. In databuf_lo_before_commit, a previous data corruption problem
   was caused by going off the end of the buffer.  The proper solution
   is to have the proper limit in place, rather than stopping earlier.
   (Thus my previous attempt to fix it is wrong).
   If you don't wrap the buffer, you're stopping too early and that
   causes more log buffer accounting problems.
11. In lops.h there are two new (previously mentioned) constants for
   figuring out the data offset for the journal buffers.
12. There are also two new functions, buf_limit and databuf_limit to
   calculate how many entries will fit in the buffer.
13. In function gfs2_meta_wipe, it needs to distinguish between pinned
   metadata buffers and journaled data buffers for proper journal buffer
   accounting.	It can't use the JDATA gfs2_inode flag because it's
   sometimes passed the "real" inode and sometimes the "metadata
   inode" and the inode flags will be random bits in a metadata
   gfs2_inode.	It needs to base its decision on which was passed in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:47 +01:00
Steven Whitehouse c8cdf47937 [GFS2] Recovery for lost unlinked inodes
Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.

This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).

Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:26 +01:00
Steven Whitehouse bb8d8a6f54 [GFS2] Fix sign problem in quota/statfs and cleanup _host structures
This patch fixes some sign issues which were accidentally introduced
into the quota & statfs code during the endianess annotation process.
Also included is a general clean up which moves all of the _host
structures out of gfs2_ondisk.h (where they should not have been to
start with) and into the places where they are actually used (often only
one place). Also those _host structures which are not required any more
are removed entirely (which is the eventual plan for all of them).

The conversion routines from ondisk.c are also moved into the places
where they are actually used, which for almost every one, was just one
single place, so all those are now static functions. This also cleans up
the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

The net result is a reduction of about 100 lines of code, many functions
now marked static plus the bug fixes as mentioned above. For good
measure I ran the code through sparse after making these changes to
check that there are no warnings generated.

This fixes Red Hat bz #239686

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:10 +01:00
Abhijith Das 2a87ab0806 [GFS2] Quotas non-functional - fix bug
This patch fixes an error in the quota code where a 'struct
gfs2_quota_lvb*' was being passed to gfs2_adjust_quota() instead of a
'struct gfs2_quota_data*'. Also moved 'struct gfs2_quota_lvb' from
fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:26 +01:00
Steven Whitehouse dbb7cae2a3 [GFS2] Clean up inode number handling
This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:24 +01:00
Robert Peterson 5f8820960c [GFS2] lockdump improvements
The patch below consists of the following changes (in code order):

1. I fixed a minor compiler warning regarding the printing of
   a kernel symbol address.
2. I implemented a suggestion from Dave Teigland that moves
   the debugfs information for gfs2 into a subdirectory so
   we can easily expand our use of debugfs in the future.
   The current code keeps the glock information in:
   /debug/gfs2/<fs>
   With the patch, the new code keeps the glock information in:
   /debug/gfs2/<fs>/glock
   That will allow us to create more debugfs files in the future.
3. This fixes a bug whereby a failed mount attempt causes the
   debugfs file to not be deleted.  Failed mount attempts should
   always clean up after themselves, including deleting the
   debugfs file and/or directory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:33 +01:00
Robert Peterson 04b933f27b [GFS2] Red Hat bz 228540: owner references
In Testing the previously posted and accepted patch for
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
I uncovered some gfs2 badness.  It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures.  Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.

This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.

I wrote this patch that avoids saving process pointers
and instead saves off the process pid.  Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.

This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:55 +01:00
Steven Whitehouse 3b8249f617 [GFS2] Fix bz 224480 and cleanup glock demotion code
This patch prevents the printing of a warning message in cases where
the fs is functioning normally by handing off responsibility for
unlinked, but still open inodes, to another node for eventual deallocation.
Also, there is now an improved system for ensuring that such requests
to other nodes do not get lost. The callback on the iopen lock is
only ever called when i_nlink == 0 and when a node is unable to deallocate
it due to it still being in use on another node. When a node receives
the callback therefore, it knows that i_nlink must be zero, so we mark
it as such (in gfs2_drop_inode) in order that it will then attempt
deallocation of the inode itself.

As an additional benefit, queuing a demote request no longer requires
a memory allocation. This simplifies the code for dealing with gfs2_holders
as it removes one special case.

There are two new fields in struct gfs2_glock. gl_demote_state is the
state which the remote node has requested and gl_demote_time is the
time when the request came in. Both fields are only valid when the
GLF_DEMOTE flag is set in gl_flags.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:39 +01:00
Robert Peterson 7c52b166c5 [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540)
The attached patch resolves bz 228540.  This adds the capability
for gfs2 to dump gfs2 locks through the debugfs file system.
This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
gfs2 because all the ioctls were stripped out.  Please see the bugzilla
for more history about the fix.  This patch is also attached to the bugzilla
record.

The patch is against Steve Whitehouse's latest nmw git tree kernel
(2.6.21-rc1) and has been tested on system trin-10.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:29 +01:00
Steven Whitehouse 631c42e170 [GFS2] go_drop_bh is never used, so remove it
The ->go_drop_bh function is never used, so this removes it and the single
caller,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:02:53 -05:00
Steven Whitehouse 04b159b132 [GFS2] Remove unused variable
Remove an unused variable.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:02:30 -05:00
Steven Whitehouse b5d32bead1 [GFS2] Tidy up glops calls
This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:26 -05:00
Steven Whitehouse 6bd9c8c2fb [GFS2] Remove unused go_callback operation
This is never used, so we might as well remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:17 -05:00
Steven Whitehouse e5dab552c8 [GFS2] Remove the "greedy" function from glock.[ch]
The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:14 -05:00
Steven Whitehouse fee852e374 [GFS2] Shrink gfs2_inode memory by half
Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:11 -05:00
Steven Whitehouse 330005c2b2 [GFS2] Remove max_atomic_write tunable
This removes an unused sysfs tunable parameter.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:08 -05:00
Steven Whitehouse 3699e3a44b [GFS2] Clean up/speed up readdir
This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:04 -05:00
Steven Whitehouse 1a14d3a68f [GFS2] Simplify glops functions
The go_sync callback took two flags, but one of them was set on every
call, so this patch removes once of the flags and makes the previously
conditional operations (on this flag), unconditional.

The go_inval callback took three flags, each of which was set on every
call to it. This patch removes the flags and makes the operations
unconditional, which makes the logic rather more obvious.

Two now unused flags are also removed from incore.h.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:30 -05:00
Steven Whitehouse bfded27ba0 [GFS2] Shrink gfs2_inode (8) - i_vn
This shrinks the size of the gfs2_inode by 8 bytes by
replacing the version counter with a one bit valid/invalid
flag.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:30 -05:00
Al Viro bd209cc017 [GFS2] split and annotate gfs2_statfs_change
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:38 -05:00
Al Viro 629a21e7ec [GFS2] split and annotate gfs2_inum
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:32 -05:00
Al Viro 1e81c4c3e0 [GFS2] split and annotate gfs_rindex
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:29 -05:00
Al Viro 5516762261 [GFS2] split and annotate gfs2_log_head
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:14 -05:00
Al Viro 68826664d1 [GFS2] split and annotate gfs2_rgrp
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:07 -05:00
Al Viro f50dfaf78c [GFS2] split gfs2_sb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:00 -05:00
Al Viro 5c6edb576f [GFS2] gfs2_dinode_host fields are host-endian
Annotated scalar fields, dropped unused ones.  Note that
it's not at all obvious that we want to convert all of them
to host-endian...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:32:55 -05:00
Steven Whitehouse 7276b3b0c7 [GFS2] Tidy up meta_io code
Fix a bug in the directory reading code, where we might have dereferenced
a NULL pointer in case of OOM. Updated the directory code to use the new
& improved version of gfs2_meta_ra() which now returns the first block
that was being read. Previously it was releasing it requiring following
code to grab the block again at each point it was called.

Also turned off readahead on directory lookups since we are reading a
hash table, and therefore reading the entries in order is very
unlikely. Readahead is still used for all other calls to the
directory reading function (e.g. when growing the hash table).

Removed the DIO_START constant. Everywhere this was used, it was
used to unconditionally start i/o aside from a couple of places, so
I've removed it and made the couple of exceptions to this rule into
separate functions.

Also hunted through the other DIO flags and removed them as arguments
from functions which were always called with the same combination of
arguments.

Updated gfs2_meta_indirect_buffer to be a bit more efficient and
hopefully also be a bit easier to read.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-21 17:05:23 -04:00
Steven Whitehouse 56965536b8 [GFS2] Remove unused constants
Three of the DIO constants were not being used, so remove them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-20 15:48:09 -04:00
Steven Whitehouse 16feb9fec0 [GFS2] Use atomic_t rather than kref in glock.c
Use atomic_t as the ref count in glocks rather than a kref.
This is another step towards using RCU for the glock hash.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-13 10:43:37 -04:00
Steven Whitehouse b6397893a5 [GFS2] Use hlist for glock hash chains
This results in smaller list heads, so that we can have more chains
in the same amount of memory (twice as many). I've multiplied the
size of the table by four though - this is because we are saving
memory by not having one lock per chain any more. So we land up
using about the same amount of memory for the hash table as we
did before I started these changes, the difference being that we
now have four times as many hash chains.

The reason that I say "about the same amount of memory" is that the
actual amount now depends upon the NR_CPUS and some of the config
variables, so that its not exact and in some cases we do use more
memory. Eventually we might want to scale the hash table size
according to the size of physical ram as measured on module load.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-12 10:10:01 -04:00
Steven Whitehouse 2426443460 [GFS2] Rewrite of examine_bucket()
The existing implementation of this function in glock.c was not
very efficient as it relied upon keeping a cursor element upon the
hash chain in question and moving it along. This new version improves
upon this by using the current element as a cursor. This is possible
since we only look at the "next" element in the list after we've
taken the read_lock() subsequent to calling the examiner function.
Obviously we have to eventually drop the ref count that we are then
left with and we cannot do that while holding the read_lock, so we
do that next time we drop the lock. That means either just before
we examine another glock, or when the loop has terminated.

The new implementation has several advantages: it uses only a
read_lock() rather than a write_lock(), so it can run simnultaneously
with other code, it doesn't need a "plug" element, so that it removes
a test not only from this list iterator, but from all the other glock
list iterators too. So it makes things faster and smaller.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-11 21:40:30 -04:00
Steven Whitehouse 087efdd391 [GFS2] Make glock hash locks proportional to NR_CPUS
Make the number of locks used for hash chains in glock.c
proportional to NR_CPUS. Also move constants for the number
of hash chains into glock.c from incore.h since they are
not used outside of glock.c.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-09 16:59:11 -04:00
Steven Whitehouse 37b2fa6a24 [GFS2] Move rwlocks in glock.c into their own array
This splits the rwlocks guarding the hash chains of the glock hash
table into their own array. This will reduce memory usage in some
cases due to better alignment, although the real reason for doing it
is to allow the two tables to be different sizes in future (i.e.
the locks will be sized proportionally with the max number of CPUs
and the hash chains sized proportinally with the size of physical memory)

In order to allow this, the gl_bucket member of struct gfs2_glock has
now become gl_hash, so we record the hash rather than a pointer to the
bucket itself.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-08 13:35:56 -04:00
Steven Whitehouse 9b47c11d1c [GFS2] Use void * instead of typedef for locking module interface
As requested by Jan Engelhardt, this removes the typedefs in the
locking module interface and replaces them with void *. Also
since we are changing the interface, I've added a few consts
as well.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-08 10:17:58 -04:00
Steven Whitehouse 85d1da67f7 [GFS2] Move glock hash table out of superblock
There are several reasons why we want to do this:
 - Firstly its large and thus we'll scale better with multiple
   GFS2 fs mounted at the same time
 - Secondly its easier to scale its size as required (thats a plan
   for later patches)
 - Thirdly, we can use kzalloc rather than vmalloc when allocating
   the superblock (its now only 4888 bytes)
 - Fourth its all part of my plan to eventually be able to use RCU
   with the glock hash.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-07 14:40:21 -04:00
Steven Whitehouse f2f7ba5237 [GFS2] Make headers compile on their own
As per Jan Engelhardt's comments, this should make all the headers
compile on their own by including and/or declaring structures
early.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-05 10:39:21 -04:00
Steven Whitehouse cd915493fc [GFS2] Change all types to uX style
This makes all fixed size types have consistent names.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-04 12:49:07 -04:00
Steven Whitehouse e9fc2aa091 [GFS2] Update copyright, tidy up incore.h
As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.

The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].

I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-01 11:05:15 -04:00
Steven Whitehouse 5e2b0613ed [GFS2] Remove unused code from glock layer
Remove the unused sync feature from glocks. This is currently done by
calling the required functions to sync pages/blocks directly so this
code isn't needed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-30 09:38:30 -04:00
Steven Whitehouse 8fb4b536e7 [GFS2] Make glock operations const
For all the usual reasons of enforcing correctness and potentially
reducing code size, this patch makes the glock operations const.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-30 09:30:00 -04:00
Abhijith Das 8638460540 [GFS2] Allow mounting of gfs2 and gfs2meta at the same time
This patch allows the simultaneous mounting of gfs2meta and gfs2
filesystems. A restriction however is that a gfs2meta fs may only be
mounted if its corresponding gfs2 filesystem is also mounted. Also, a
gfs2 filesystem cannot be unmounted before its gfs2meta filesystem.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-25 17:19:55 -04:00
Steven Whitehouse a2242db090 [GFS2] Speed up scanning of glocks
I noticed the gfs2_scand seemed to be taking a lot of CPU,
so in order to cut that down a bit, here is a patch. Firstly
the type of a glock is a constant during its lifetime, so that
its possible to check this without needing locking. I've moved
the (common) case of testing for an inode glock outside of
the glmutex lock.

Also there was a mutex left over from when the glock cache was
master of the inode cache. That isn't required any more so I've
removed that too.

There is probably scope for further speed ups in the future
in this area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-24 17:03:05 -04:00
Steven Whitehouse 59a1cc6bda [GFS2] Fix lock ordering bug in page fault path
Mmapped files were able to trigger a lock ordering bug. Private
maps do not need to take the glock so early on. Shared maps do
unfortunately, however we can get around that by adding a flag
into the flags for the struct gfs2_file. This only works because
we are taking an exclusive lock at this point, so we know that
nobody else can be racing with us.

Fixes Red Hat bugzilla: #201196

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-04 15:41:22 -04:00
Steven Whitehouse bdd512aeea [GFS2] Remove unused flag
The flag GIF_MIN_INIT is no longer used or required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-22 15:26:33 -04:00
David Woodhouse 695165dfba [GFS2] Fix use of bitops on unsigned int (struct gfs2_holder->gh_iflags)
fs/gfs2/glock.c: In function ‘gfs2_holder_get’:
fs/gfs2/glock.c:439: warning: passing argument 2 of ‘set_bit’ from incompatible pointer type
fs/gfs2/glock.c: In function ‘rq_promote’:
fs/gfs2/glock.c:512: warning: passing argument 2 of ‘set_bit’ from incompatible pointer type
fs/gfs2/glock.c:526: warning: passing argument 2 of ‘set_bit’ from incompatible pointer type
 ...

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-20 13:44:27 +01:00
Steven Whitehouse feaa7bba02 [GFS2] Fix unlinked file handling
This patch fixes the way we have been dealing with unlinked,
but still open files. It removes all limits (other than memory
for inodes, as per every other filesystem) on numbers of these
which we can support on GFS2. It also means that (like other
fs) its the responsibility of the last process to close the file
to deallocate the storage, rather than the person who did the
unlinking. Note that with GFS2, those two events might take place
on different nodes.

Also there are a number of other changes:

 o We use the Linux inode subsystem as it was intended to be
used, wrt allocating GFS2 inodes
 o The Linux inode cache is now the point which we use for
local enforcement of only holding one copy of the inode in
core at once (previous to this we used the glock layer).
 o We no longer use the unlinked "special" file. We just ignore it
completely. This makes unlinking more efficient.
 o We now use the 4th block allocation state. The previously unused
state is used to track unlinked but still open inodes.
 o gfs2_inoded is no longer needed
 o Several fields are now no longer needed (and removed) from the in
core struct gfs2_inode
 o Several fields are no longer needed (and removed) from the in core
superblock

There are a number of future possible optimisations and clean ups
which have been made possible by this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-14 15:32:57 -04:00
Steven Whitehouse 6b61b072a8 [GFS2] Move some fields around to reduce wasted space
We can reclaim some space by moving fields in some structures
in order to allow them to pack better on 64 bit architectures.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-06 14:49:39 -04:00
Steven Whitehouse 320dd101e2 [GFS2] glock debugging and inode cache changes
This adds some extra debugging to glock.c and changes
inode.c's deallocation code to call the debugging code
at a suitable moment. I'm chasing down a particular bug
to do with deallocation at the moment and the code can
go again once the bug is fixed.

Also this includes the first part of some changes to unify
the Linux struct inode and GFS2's struct gfs2_inode. This
transformation will happen in small parts over the next short
period.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 16:25:27 -04:00
Steven Whitehouse 3a8a9a1034 [GFS2] Update copyright date to 2006
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 15:09:15 -04:00
Steven Whitehouse 5965b1f479 [GFS2] Don't do recursive locking in glock layer
This patch changes the last user of recursive locking so that
it no longer needs this feature and removes it from the glock
layer. This makes the glock code a lot simpler and easier to
understand. Its also a prerequsite to adding support for the
AOP_TRUNCATED_PAGE return code (or at least it is if you don't
want your brain to melt in the process)

I've left in a couple of checks just in case there is some place
else in the code which is still using this feature that I didn't
spot yet, but they can probably be removed long term.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-26 13:21:55 -04:00
Steven Whitehouse fe1bdedc6c [GFS2] Use vmalloc() in dir code
When allocating memory to sort directory entries, use vmalloc()
rather than kmalloc() since for larger directories, the required
size can easily be graeter than the 128k maximum of kmalloc().

Also adding the first steps towards getting the AOP_TRUNCATED_PAGE
return code get in the glock code by flagging all places where we
request a glock and we are holding a page lock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-18 10:09:15 -04:00
Steven Whitehouse b09e593d79 [GFS2] Fix a ref count bug and other clean ups
This fixes a ref count bug that sometimes showed up a umount time
(causing it to hang) but it otherwise mostly harmless. At the same
time there are some clean ups including making the log operations
structures const, moving a memory allocation so that its not done
in the fast path of checking to see if there is an outstanding
transaction related to a particular glock.

Removes the sd_log_wrap varaible which was updated, but never actually
used anywhere. Updates the gfs2 ioctl() to run without the kernel lock
(which it never needed anyway). Removes the "invalidate inodes" loop
from GFS2's put_super routine. This is done in kill super anyway so
we don't need to do it here. The loop was also bogus in that if there
are any inodes "stuck" at this point its a bug and we need to know
about it rather than hide it by hanging forever.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-07 11:17:32 -04:00
Steven Whitehouse 55eccc6d00 [GFS2] Finish off ioctl support
This puts the finishing touches to the ioctl support and also
removes a couple of unused fields from GFS2's private per file
structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-04 14:29:30 -04:00
Steven Whitehouse d0dc80dbaf [GFS2] Update debugging code
Update the debugging code in trans.c and at the same time improve
the debugging code for gfs2_holders. The new code should be pretty
fast during the normal case and provide just as much information
in case of errors (or more).

One small function from glock.c has moved to glock.h as a static inline so
that its return address won't get in the way of the debugging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-29 14:36:49 -05:00
Steven Whitehouse 484adff8a0 [GFS2] Update locking in log.c
Replace the lock_for_trans()/lock_for_flush() functions with an rwsem.
In fact the sd_log_flush_lock becomes an rwsem (the write part of it)
and is extended slightly to cover everything that the lock_for_flush()
used to cover. The read part of the lock is instead of lock_for_trans().

This corrects the races in the original code and reduces the code size.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-29 09:12:12 -05:00
Steven Whitehouse 71b86f562b [GFS2] Further updates to dir and logging code
This reduces the size of the directory code by about 3k and gets
readdir() to use the functions which were introduced in the previous
directory code update.

Two memory allocations are merged into one. Eliminates zeroing of some
buffers which were never used before they were initialised by
other data.

There is still scope for further improvement in the directory code.

On the logging side, a hand created mutex has been replaced by a
standard Linux mutex in the log allocation code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-28 14:14:04 -05:00
Steven Whitehouse c9fd43078f [GFS2] Tidy up mount code.
We no longer lookup ".gfs2_admin" in the root directory in order to
find it, but instead use the inode number given in the superblock.
Both the root directory and the admin directory are now looked up using
the same routine, so the redundant code is removed.

Also, there is no longer a reference to the root inode in the
GFS2 super block. When required this can be retreived via
sb->s_root->d_inode instead.

Assuming that we introduce a metadata filesystem type for GFS, then
this is a first step towards that goal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-01 15:31:02 -05:00
Steven Whitehouse e317ffcb7c [GFS2] Remove uneeded memory allocation
For every filesystem operation where we need a transaction, we
now make one less memory allocation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-01 11:39:37 -05:00
Steven Whitehouse 5c676f6d35 [GFS2] Macros removal in gfs2.h
As suggested by Pekka Enberg <penberg@cs.helsinki.fi>.

The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h
The other macros are gone from gfs2.h as (although not requested
by Pekka Enberg) are a number of included header file which are now
included individually. The inode number comparison function is
now an inline function.

The DT2IF and IF2DT may be addressed in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-27 17:23:27 -05:00
David Teigland 6a6b3d018f [GFS2] Patch to remove stats gathering from GFS2
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-23 10:11:47 +00:00
Steven Whitehouse f55ab26a8f [GFS2] Use mutices rather than semaphores
As well as a number of minor bug fixes, this patch changes GFS
to use mutices rather than semaphores. This results in better
information in case there are any locking problems.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-21 12:51:39 +00:00
Steven Whitehouse 7359a19cc7 [GFS2] Fix for root inode ref count bug
Umount is now working correctly again. The bug was due to
not getting an extra ref count when mounting the fs. We
should have bumped it by two (once for the internal pointer
to the root inode from the super block and once for the
inode hanging off the dcache entry for root).

Also this patch tidys up the code dealing with looking up
and creating inodes. We now pass Linux inodes (with gfs2_inodes
attached) rather than the other way around and this reduces code
duplication in various places.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-13 12:27:43 +00:00
Steven Whitehouse f42faf4fa4 [GFS2] Add gfs2_internal_read()
Add the new external read function. Its temporarily in jdata.c
even though the protoype is in ops_file.h - this will change
shortly. The current implementation will change to a page cache
one when that happens.

In order to effect the above changes, the various internal inodes
now have Linux inodes attached to them. We keep the references to
the Linux inodes, rather than the gfs2_inodes in the super block.

In order to get everything to work correctly I've had to reorder
the init sequence on mount (which I should probably have done
earlier when .gfs2_admin was made visible).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-30 18:34:10 +00:00
Steven Whitehouse 64fb4eb7d4 [GFS2] Remove gfs2_databuf in favour of gfs2_bufdata structure
Removing the gfs2_databuf structure and using gfs2_bufdata instead
is a step towards allowing journaling of data without requiring the
metadata header on each journaled block. The idea is to merge the
code paths for ordered data with that of journaled data, with the
log operations in lops.c tacking account of the different types of
buffers as they are presented to it. Largely the code path for
metadata will be similar too, but obviously through a different set
of log operations.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-18 13:14:40 +00:00
David Teigland b3b94faa5f [GFS2] The core of GFS2
This patch contains all the core files for GFS2.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-16 16:50:04 +00:00