OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Bob Peterson	06dfc30641	GFS2: Rename glops go_xmote_th to go_sync [Editorial: This is a nit, but has been a minor irritation for a long time:] This patch renames glops structure item for go_xmote_th to go_sync. The functionality is unchanged; it's just for readability. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:31:57 +00:00
Steven Whitehouse	8eae1ca003	GFS2: Review bug traps in glops.c Two of the bug traps here could really be warnings. The others are converted from BUG() to GLOCK_BUG_ON() since we'll most likely need to know the glock state in order to debug any issues which arise. As a result of this, __dump_glock has to be renamed and is no longer static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:31:07 +00:00
Bob Peterson	e5dc76b9af	GFS2: Eliminate redundant calls to may_grant Function add_to_queue was checking may_grant for the passed-in holder for every iteration of its gh2 loop. Now it only checks it once at the beginning to see if a try lock is futile. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:12 +01:00
Bob Peterson	81e1d45061	GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote Function gfs2_glock_dq_wait called two-line function wait_on_demote, so they were combined. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:10 +01:00
Bob Peterson	07a7904942	GFS2: Combine functions gfs2_glock_wait and wait_on_holder Function gfs2_glock_wait only called function wait_on_holder and returned its return code, so they were combined for readability. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:09 +01:00
Bob Peterson	4abb6ad9ea	GFS2: inline __gfs2_glock_schedule_for_reclaim Since function gfs2_glock_schedule_for_reclaim is only two significant lines, we can eliminate it, simplifying the code and making it more readable. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:07 +01:00
Steven Whitehouse	0fe2f1e929	GFS2: Size seq_file buffer more carefully This places a limit on the buffer size for archs with larger PAGE_SIZE. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>	2012-06-11 13:49:47 +01:00
Steven Whitehouse	1bb49303b7	GFS2: Use seq_vprintf for glocks debugfs file Make use of the newly added seq_vprintf() function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk>	2012-06-11 13:26:50 +01:00
Benjamin Marzinski	90306c41dc	GFS2: Use lvbs for storing rgrp information with mount option Instead of reading in the resource groups when gfs2 is checking for free space to allocate from, gfs2 can store the necessary infromation in the resource group's lvb. Also, instead of searching for unlinked inodes in every resource group that's checked for free space, gfs2 can store the number of unlinked but inodes in the lvb, and only check for unlinked inodes if it will find some. The first time a resource group is locked, the lvb must initialized. Since this involves counting the unlinked inodes in the resource group, this takes a little extra time. But after that, if the resource group is locked with GL_SKIP, the buffer head won't be read in unless it's actually needed. Enabling the resource groups lvbs is done via the rgrplvb mount option. If this option isn't set, the lvbs will still be set and updated, but they won't be verfied or used by the filesystem. To safely turn on this option, all of the nodes mounting the filesystem must be running code with this patch, and the filesystem must have been completely unmounted since they were updated. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-08 11:50:01 +01:00
Steven Whitehouse	ba1ddcb6ca	GFS2: Cache last hash bucket for glock seq_files For the glocks and glstats seq_files, which are exposed via debugfs we should cache the most recent hash bucket, along with the offset into that bucket. This allows us to restart from that point, rather than having to begin at the beginning each time. This is an idea from Eric Dumazet, however I've slightly extended it so that if the position from which we are due to start is at any point beyond the last cached point, we start from the last cached point, plus whatever is the appropriate offset. I don't really expect people to be lseeking around these files, but if they did so with only positive offsets, then we'd still get some of the benefit of using a cached offset. With my simple test of around 200k entries in the file, I'm seeing an approx 10x speed up. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-08 11:16:22 +01:00
Steven Whitehouse	df5d2f5560	GFS2: Increase buffer size for glocks and glstats debugfs files As per Al Viro's suggestion, this increases the buffer size used for these two files. This provides a speed up of slightly less than 8x (i.e. proportional to the buffer size) for cases when we have large numbers of glocks. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-06-07 13:30:16 +01:00
Steven Whitehouse	a245769f25	GFS2: glock statistics gathering The stats are divided into two sets: those relating to the super block and those relating to an individual glock. The super block stats are done on a per cpu basis in order to try and reduce the overhead of gathering them. They are also further divided by glock type. In the case of both the super block and glock statistics, the same information is gathered in each case. The super block statistics are used to provide default values for most of the glock statistics, so that newly created glocks should have, as far as possible, a sensible starting point. The statistics are divided into three pairs of mean and variance, plus two counters. The mean/variance pairs are smoothed exponential estimates and the algorithm used is one which will be very familiar to those used to calculation of round trip times in network code. The three pairs of mean/variance measure the following things: 1. DLM lock time (non-blocking requests) 2. DLM lock time (blocking requests) 3. Inter-request time (again to the DLM) A non-blocking request is one which will complete right away, whatever the state of the DLM lock in question. That currently means any requests when (a) the current state of the lock is exclusive (b) the requested state is either null or unlocked or (c) the "try lock" flag is set. A blocking request covers all the other lock requests. There are two counters. The first is there primarily to show how many lock requests have been made, and thus how much data has gone into the mean/variance calculations. The other counter is counting queueing of holders at the top layer of the glock code. Hopefully that number will be a lot larger than the number of dlm lock requests issued. So why gather these statistics? There are several reasons we'd like to get a better idea of these timings: 1. To be able to better set the glock "min hold time" 2. To spot performance issues more easily 3. To improve the algorithm for selecting resource groups for allocation (to base it on lock wait time, rather than blindly using a "try lock") Due to the smoothing action of the updates, a step change in some input quantity being sampled will only fully be taken into account after 8 samples (or 4 for the variance) and this needs to be carefully considered when interpreting the results. Knowing both the time it takes a lock request to complete and the average time between lock requests for a glock means we can compute the total percentage of the time for which the node is able to use a glock vs. time that the rest of the cluster has its share. That will be very useful when setting the lock min hold time. The other point to remember is that all times are in nanoseconds. Great care has been taken to ensure that we measure exactly the quantities that we want, as accurately as possible. There are always inaccuracies in any measuring system, but I hope this is as accurate as we can reasonably make it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 17:09:42 +00:00
Steven Whitehouse	4043b886b0	GFS2: Fix race between lru_list and glock ref count This patch fixes a narrow race window between the glock ref count hitting zero and glocks being removed from the lru_list. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 09:43:07 +00:00
David Teigland	e0c2a9aa1e	GFS2: dlm based recovery coordination This new method of managing recovery is an alternative to the previous approach of using the userland gfs_controld. - use dlm slot numbers to assign journal id's - use dlm recovery callbacks to initiate journal recovery - use a dlm lock to determine the first node to mount fs - use a dlm lock to track journals that need recovery Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-01-11 09:23:05 +00:00
Bob Peterson	7cf8dcd3b6	GFS2: Automatically adjust glock min hold time This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:32:11 +01:00
Linus Torvalds	d205df9955	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Processes waiting on inode glock that no processes are holding	2011-06-07 18:44:10 -07:00
Ying Han	1495f230fa	vmscan: change shrinker API by passing shrink_control struct Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:26 -07:00
Bob Peterson	f90e5b5b13	GFS2: Processes waiting on inode glock that no processes are holding This patch fixes a race in the GFS2 glock state machine that may result in lockups. The symptom is that all nodes but one will hang, waiting for a particular glock. All the holder records will have the "W" (Waiting) bit set. The other node will typically have the glock stuck in Exclusive mode (EX) with no holder records, but the dinode will be cached. In other words, an entry with "I:" will appear in the glock dump for that glock, but nothing else. The race has to do with the glock "Pending Demote" bit, which can be set, then immediately reset, thus losing the fact that another node needs the glock. The sequence of events is: 1. Something schedules the glock workqueue (e.g. glock request from fs) 2. The glock workqueue gets to the point between the test of the reply pending bit and the spin lock: if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) { finish_xmote(gl, gl->gl_reply); drop_ref = 1; } down_read(&gfs2_umount_flush_sem); <---- i.e. here spin_lock(&gl->gl_spin); 3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and (b) the demote request which sets GLF_PENDING_DEMOTE 4. The following test is executed: if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) && gl->gl_state != LM_ST_UNLOCKED && gl->gl_demote_state != LM_ST_EXCLUSIVE) { This resets the pending demote flag, and gl->gl_demote_state is not equal to exclusive, however because the reply from the dlm arrived after we checked for the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG. The patch closes the timing window by only transitioning the "Pending demote" bit to the "demote" flag once we know the other conditions (not unlocked and not exclusive) are met. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-25 10:37:11 +01:00
Linus Torvalds	6c1b8d94bc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits) GFS2: Move all locking inside the inode creation function GFS2: Clean up symlink creation GFS2: Clean up mkdir GFS2: Use UUID field in generic superblock GFS2: Rename ops_inode.c to inode.c GFS2: Inode.c is empty now, remove it GFS2: Move final part of inode.c into super.c GFS2: Move most of the remaining inode.c into ops_inode.c GFS2: Move gfs2_refresh_inode() and friends into glops.c GFS2: Remove gfs2_dinode_print() function GFS2: When adding a new dir entry, inc link count if it is a subdir GFS2: Make gfs2_dir_del update link count when required GFS2: Don't use gfs2_change_nlink in link syscall GFS2: Don't use a try lock when promoting to a higher mode GFS2: Double check link count under glock GFS2: Improve bug trap code in ->releasepage() GFS2: Fix ail list traversal GFS2: make sure fallocate bytes is a multiple of blksize GFS2: Add an AIL writeback tracepoint GFS2: Make writeback more responsive to system conditions ...	2011-05-20 13:28:45 -07:00
Steven Whitehouse	588da3b3be	GFS2: Don't use a try lock when promoting to a higher mode Previously we marked all locks being promoted to a higher mode with the try flag to avoid any potential deadlocks issues. The DLM is able to detect these and report them in way that GFS2 can deal with them correctly. So we can just request the required mode and wait for a response without needing to perform this check. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-05 12:36:38 +01:00
Christoph Hellwig	1879fd6a26	add hlist_bl_lock/unlock helpers Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-25 18:14:10 -07:00
Steven Whitehouse	4667a0ec32	GFS2: Make writeback more responsive to system conditions This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:01:37 +01:00
Steven Whitehouse	f42ab08529	GFS2: Optimise glock lru and end of life inodes The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:01:17 +01:00
Steven Whitehouse	627c10b7e4	GFS2: Improve tracing support (adds two flags) This adds support for two new flags. One keeps track of whether the glock is on the LRU list or not. The other isn't really a flag as such, but an indication of whether the glock has an attached object or not. This indication is reported without any locking, which is ok since we do not dereference the object pointer but merely report whether it is NULL or not. Also, this fixes one place where a tracepoint was missing, which was at the point we remove deallocated blocks from the journal. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:00:59 +01:00
Steven Whitehouse	29687a2ac8	GFS2: Alter point of entry to glock lru list for glocks with an address_space Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 08:59:48 +01:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Linus Torvalds	3ae2a1ce2e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Don't use _raw version of RCU dereference GFS2: Adding missing unlock_page() GFS2: Update to AIL list locking GFS2: introduce AIL lock GFS2: fix block allocation check for fallocate GFS2: Optimize glock multiple-dequeue code GFS2: Remove potential race in flock code GFS2: Fix glock deallocation race GFS2: quota allows exceeding hard limit GFS2: deallocation performance patch GFS2: panics on quotacheck update GFS2: Improve cluster mmap scalability GFS2: Fix glock queue trace point GFS2: Post-VFS scale update for RCU path walk GFS2: Use RCU for glock hash table	2011-03-16 08:58:43 -07:00
Steven Whitehouse	7e32d02613	GFS2: Don't use _raw version of RCU dereference As per RCU glock patch review comments, don't use the _raw version of this function here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-03-15 08:58:17 +00:00
Bob Peterson	fa1bbdea30	GFS2: Optimize glock multiple-dequeue code This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:24:54 +00:00
Steven Whitehouse	fc0e38dae6	GFS2: Fix glock deallocation race This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 10:58:04 +00:00
Tejun Heo	58a69cb47e	workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>	2011-02-16 17:48:59 +01:00
Steven Whitehouse	edae38a643	GFS2: Fix glock queue trace point Somehow this tracepoint landed up in the wrong place. This moves it to where it should be. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-31 09:38:12 +00:00
Steven Whitehouse	bc015cb841	GFS2: Use RCU for glock hash table This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-01-21 09:39:08 +00:00
Steven Whitehouse	47a25380e3	GFS2: Merge glock state fields into a bitfield We can only merge the fields into a bitfield if the locking rules for them are the same. In this case gl_spin covers all of the fields (write side) but a couple of them are used with GLF_LOCK as the read side lock, which should be ok since we know that the field in question won't be changing at the time. The gl_req setting has to be done earlier (in glock.c) in order to place it under gl_spin. The gl_reply setting also has to be brought under gl_spin in order to comply with the new rules. This saves 4*sizeof(unsigned int) per glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2010-11-30 15:49:31 +00:00
Steven Whitehouse	921169ca2f	GFS2: Clean up of gdlm_lock function The DLM never returns -EAGAIN in response to dlm_lock(), and even if it did, the test in gdlm_lock() was wrong anyway. Once that test is removed, it is possible to greatly simplify this code by simply using a "normal" error return code (0 for success). We then no longer need the LM_OUT_ASYNC return code which can be removed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:31:48 +00:00
Joe Perches	5e69069c1a	GFS2: fs/gfs2/glock.c: Use printf extension %pV Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:30:41 +00:00
Joe Perches	cc18152eb7	GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pS Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:22:19 +00:00
Steven Whitehouse	d2115778c7	GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIM The WQ_RESCUER flag should only be used internally to the workqueue implementation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2010-11-30 10:21:55 +00:00
Steven Whitehouse	044b9414c7	GFS2: Fix inode deallocation race This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-15 12:44:42 +00:00
Steven Whitehouse	c741c45512	GFS2: Fix spectator umount issue The tests further down the recovery function relating to unlocking the journal need to be updated to match the intial test. Also, a test in the umount code which was surplus to requirements has been removed. Umounting spectator mounts now works correctly, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:20:52 +01:00
Steven Whitehouse	9fa0ea9f26	GFS2: Use new workqueue scheme The recovery workqueue can be freezable since we want it to finish what it is doing if the system is to be frozen (although why you'd want to freeze a cluster node is beyond me since it will result in it being ejected from the cluster). It does still make sense for single node GFS2 filesystems though. The glock workqueue will benefit from being able to run more work items concurrently. A test running postmark shows improved performance and multi-threaded workloads are likely to benefit even more. It needs to be high priority because the latency directly affects the latency of filesystem glock operations. The delete workqueue is similar to the recovery workqueue in that it must not get blocked by memory allocations, and may run for a long time. Potentially other GFS2 threads might also be converted to workqueues, but I'll leave that for a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2010-09-20 11:20:36 +01:00
Steven Whitehouse	7b5e3d5fcf	GFS2: Don't enforce min hold time when two demotes occur in rapid succession Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>	2010-09-20 11:19:50 +01:00
Steven Whitehouse	0809f6ec18	GFS2: Fix recovery stuck bug (try #2 ) This is a clean up of the code which deals with LM_FLAG_NOEXP which aims to remove any possible race conditions by using gl_spin to cover the gap between testing for the LM_FLAG_NOEXP and the GL_FROZEN flag. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-08-02 10:15:17 +01:00
Steven Whitehouse	7cdee5dbf4	Revert "GFS2: recovery stuck on transaction lock" This reverts commit `b7dc2df572`. The initial patch didn't quite work since it doesn't cover all the possible routes by which the GLF_FROZEN flag might be set. A revised fix is coming up in the next patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-29 14:39:29 +01:00
Steven Whitehouse	d5341a9241	GFS2: Make "try" lock not try quite so hard This looks like a big change, but in reality its only a single line of actual code change, the rest is just moving a function to before its new caller. The "try" flag for glocks is a rather subtle and delicate setting since it requires that the state machine tries just hard enough to ensure that it has a good chance of getting the requested lock, but no so hard that the request can land up blocked behind another. The patch adds in an additional check which will fail any queued try locks if there is another request blocking the try lock request which is not granted and compatible, nor in progress already. The check is made only after all pending locks which may be granted have been granted. I've checked this with the reproducer for the reported flock bug which this is intended to fix, and it now passes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-29 09:37:38 +01:00
Dave Chinner	7f8275d0d6	mm: add context argument to shrinker callback The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-19 14:56:17 +10:00
Bob Peterson	b7dc2df572	GFS2: recovery stuck on transaction lock This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: Steve Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-07-15 09:05:57 +01:00
Bob Peterson	1a0eae8848	GFS2: glock livelock This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-04-14 16:48:05 +01:00
Bob Peterson	4818972efb	GFS2: print glock numbers in hex This patch changes glock numbers from printing in decimal to hex. Since DLM prints corresponding resource IDs in hex, it makes debugging easier. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-03-01 14:09:04 +00:00
Steven Whitehouse	c1184f8ab7	GFS2: Remove loopy umount code As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-03-01 14:07:53 +00:00
Steven Whitehouse	009d851837	GFS2: Metadata address space clean up Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-03-01 14:07:37 +00:00
Steven Whitehouse	8f05228ee7	GFS2: Extend umount wait coverage to full glock lifetime Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-02-03 09:56:21 +00:00
Steven Whitehouse	26bb7505cf	GFS2: Fix glock refcount issues This patch fixes some ref counting issues. Firstly by moving the point at which we drop the ref count after a dlm lock operation has completed we ensure that we never call gfs2_glock_hold() on a lock with a zero ref count. Secondly, by using atomic_dec_and_lock() in gfs2_glock_put() we ensure that at no time will a glock with zero ref count appear on the lru_list. That means that we can remove the check for this in our shrinker (which was racy). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-12-03 12:00:12 +00:00
Steven Whitehouse	7e71c55ee7	GFS2: Fix potential race in glock code We need to be careful of the ordering between clearing the GLF_LOCK bit and scheduling the workqueue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-12-03 11:42:25 +00:00
Benjamin Marzinski	b94a170e96	GFS2: remove dcache entries for remote deleted inodes When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 11:01:03 +01:00
Benjamin Marzinski	8ff22a6f9b	GFS2: Don't put unlikely reclaim candidates on the reclaim list. GFS2 was placing far too many glocks on the reclaim list that were not good candidates for freeing up from cache. These locks would sit there and repeatedly get scanned to see if they could be reclaimed, wasting a lot of time when there was memory pressure. This fix does more checks on the locks to see if they are actually likely to be removable from cache. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 11:00:09 +01:00
Benjamin Marzinski	a51b56fff3	GFS2: Fix panic in glock memory shrinker It is possible for gfs2_shrink_glock_memory() to check a glock for demotion that's in the process of being freed by gfs2_glock_put(). In this case, gfs2_shrink_glock_memory() will acquire a new reference to this glock, and then try to free the glock itself when it drops the refernce. To solve this, gfs2_shrink_glock_memory() just needs to check if the glock is in the process of being freed, and if so skip it without ever unlocking the lru_lock. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 10:59:28 +01:00
Steven Whitehouse	2163b1e616	GFS2: Shrink the shrinker This patch removes some of the special cases that the shrinker was trying to deal with. As a result we leave fewer items on the list and none at all which cannot be demoted. This makes the list scanning more efficient and solves some issues seen with large numbers of inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 10:52:14 +01:00
Steven Whitehouse	63997775b7	GFS2: Add tracepoints This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-06-12 08:49:20 +01:00
Steven Whitehouse	fe64d517df	GFS2: Umount recovery race fix This patch fixes a race condition where we can receive recovery requests part way through processing a umount. This was causing problems since the recovery thread had already gone away. Looking in more detail at the recovery code, it was really trying to implement a slight variation on a work queue, and that happens to align nicely with the recently introduced slow-work subsystem. As a result I've updated the code to use slow-work, rather than its own home grown variety of work queue. When using the wait_on_bit() function, I noticed that the wait function that was supplied as an argument was appearing in the WCHAN field, so I've updated the function names in order to produce more meaningful output. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-19 10:01:18 +01:00
Steven Whitehouse	0c7a531a20	GFS2: Fix glock ref counting bug Depending on the ordering of events as we go around the glock shrinker loop, it is possible to drop the ref count of a glock incorrectly. It doesn't happen very often. This patch corrects the got_ref variable, fixing the problem. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-09 15:15:17 +01:00
Steven Whitehouse	a228df6339	GFS2: Move umount flush rwsem The rwsem, used only on umount, is in the wrong place in glock.c. This patch moves it up a bit so that it does not get called under a spinlock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-04-15 10:16:13 +01:00
Steven Whitehouse	64d576ba23	GFS2: Add a "demote a glock" interface to sysfs This adds a sysfs file called demote_rq to GFS2's per filesystem directory. Its possible to use this file to demote arbitrary glocks in exactly the same way as if a request had come in from a remote node. This is intended for testing issues relating to caching of data under glocks. Despite that, the interface is generic enough to send requests to any type of glock, but be careful as its not always safe to send an arbitrary message to an arbitrary glock. For that reason and to prevent DoS, this interface is restricted to root only. The messages look like this: <type>:<glocknumber> <mode> Example: echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq Which means "please demote inode glock (type 2) number 13324 so that I can get an EX (exclusive) lock". The lock modes are those which would normally be sent by a remote node in its callback so if you want to unlock a glock, you use EX, to demote to shared, use SH or PR (depending on whether you like GFS2 or DLM lock modes better!). If the glock doesn't exist, you'll get -ENOENT returned. If the arguments don't make sense, you'll get -EINVAL returned. The plan is that this interface will be used in combination with the blktrace patch which I recently posted for comments although it is, of course, still useful in its own right. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-03-24 11:21:22 +00:00
Steven Whitehouse	d8348de06f	GFS2: Fix deadlock on journal flush This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-03-24 11:21:18 +00:00
Steven Whitehouse	ac2425e7d3	GFS2: Remove unused field from glock The time stamp field is unused in the glock now that we are using a shrinker, so that we can remove it and save sizeof(unsigned long) bytes in each glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-03-24 11:21:17 +00:00
Steven Whitehouse	f057f6cdf6	GFS2: Merge lock_dlm module into GFS2 This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-03-24 11:21:14 +00:00
Julia Lawall	eb8374e71f	GFS2: Use DEFINE_SPINLOCK SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested in Documentation/spinlocks.txt The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ declarer name DEFINE_SPINLOCK; identifier xxx_lock; @@ - spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED; + DEFINE_SPINLOCK(xxx_lock); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:45:02 +00:00
Steven Whitehouse	fefc03bfed	Revert "GFS2: Fix use-after-free bug on umount" This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43. The original patch is causing problems in relation to order of operations at umount in relation to jdata files. I need to fix this a different way. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:39:18 +00:00
Steven Whitehouse	3af165ac4d	GFS2: Fix use-after-free bug on umount There was a use-after-free with the GFS2 super block during umount. This patch moves almost all of the umount code from ->put_super into ->kill_sb, the only bit that cannot be moved being the glock hash clearing which has to remain as ->put_super due to umount ordering requirements. As a result its now obvious that the kfree is the final operation, whereas before it was hidden in ->put_super. Also gfs2_jindex_free is then only referenced from a single file so thats moved and marked static too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:39:14 +00:00
Steven Whitehouse	2bfb6449b7	GFS2: Move four functions from super.c The functions which are being moved can all be marked static in their new locations, since they only have a single caller each. Their new locations are more logical than before and some of the functions are small enough that the compiler might well inline them. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:39:12 +00:00
Steven Whitehouse	97cc1025b1	GFS2: Kill two daemons with one patch This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:39:09 +00:00
Steven Whitehouse	813e0c46c9	GFS2: Fix "truncate in progress" hang Following on from the recent clean up of gfs2_quotad, this patch moves the processing of "truncate in progress" inodes from the glock workqueue into gfs2_quotad. This fixes a hang due to the "truncate in progress" processing requiring glocks in order to complete. It might seem odd to use gfs2_quotad for this particular item, but we have to use a pre-existing thread since creating a thread implies a GFP_KERNEL memory allocation which is not allowed from the glock workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd may deadlock if used for this operation. gfs2_scand and gfs2_glockd are both scheduled for removal at some (hopefully not too distant) future point. That leaves only gfs2_quotad whose workload is generally fairly light and is easily adapted for this extra task. Also, as a result of this change, it opens the way for a future patch to make the reading of the inode's information asynchronous with respect to the glock workqueue, which is another improvement that has been on the list for some time now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:39:06 +00:00
Harvey Harrison	55ba474dae	GFS2: sparse annotation of gl->gl_spin fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context fs/gfs2/glock.c:308:5: context 'gl+28': wanted >= 1, got 0 fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context fs/gfs2/glock.c:529:2: context 'gl+28': wanted >= 1, got 0 fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context fs/gfs2/glock.c:925:3: context '*gl+28': wanted >= 1, got 0 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-01-05 07:38:50 +00:00
Steven Whitehouse	719ee34467	GFS2: high time to take some time over atime Until now, we've used the same scheme as GFS1 for atime. This has failed since atime is a per vfsmnt flag, not a per fs flag and as such the "noatime" flag was not getting passed down to the filesystems. This patch removes all the "special casing" around atime updates and we simply use the VFS's atime code. The net result is that GFS2 will now support all the same atime related mount options of any other filesystem on a per-vfsmnt basis. We do lose the "lazy atime" updates, but we gain "relatime". We could add lazy atime to the VFS at a later date, if there is a requirement for that variant still - I suspect relatime will be enough. Also we lose about 100 lines of code after this patch has been applied, and I have a suspicion that it will speed things up a bit, even when atime is "on". So it seems like a nice clean up as well. From a user perspective, everything stays the same except the loss of the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very least, and to be honest I don't think anybody ever used it) and that a number of options which were ignored before now work correctly. Please let me know if you've got any comments. I'm pushing this out early so that you can all see what my plans are. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-09-18 13:53:59 +01:00
Steven Whitehouse	dff5257473	GFS2: Fix race relating to glock min-hold time In the case that a request for a glock arrives right after the grant reply has arrived, it sometimes means that the gl_tstamp field hasn't been updated recently enough. The net result is that the min-hold time for the glock is ignored. If this happens often enough, it leads to poor performance. This patch adds an additional test, so that if the reply pending bit is set on a glock, then it will select the maximum length of time for the min-hold time, rather than looking at gl_tstamp. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-09-05 14:18:02 +01:00
Steven Whitehouse	c1e817d03a	GFS2: Fix debugfs glock file iterator Due to an incorrect iterator, some glocks were being missed from the glock dumps obtained via debugfs. This patch fixes the problem and ensures that we don't miss any glocks in future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-08-13 09:59:10 +01:00
Steven Whitehouse	209806aba9	[GFS2] Allow local DF locks when holding a cached EX glock We already allow local SH locks while we hold a cached EX glock, so here we allow DF locks as well. This works only because we rely on the VFS's invalidation for locally cached data, and because if we hold an EX lock, then we know that no other node can be caching data relating to this file. It dramatically speeds up initial writes to O_DIRECT files since we fall back to buffered I/O for this and would otherwise bounce between DF and EX modes on each and every write call. The lessons to be learned from that are to ensure that (for the time being anyway) O_DIRECT files are preallocated and that they are written to using reasonably large I/O sizes. Even so this change fixes that corner case nicely Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-07-07 10:07:28 +01:00
Steven Whitehouse	265d529cef	[GFS2] Fix delayed demote race There is a race in the delayed demote code where it does the wrong thing if a demotion to UN has occurred for other reasons before the delay has expired. This patch adds an assert to catch that condition as well as fixing the root cause by adding an additional check for the UN state. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2008-07-07 10:02:36 +01:00
Steven Whitehouse	1bdad60633	[GFS2] Remove remote lock dropping code There are several reasons why this is undesirable: 1. It never happens during normal operation anyway 2. If it does happen it causes performance to be very, very poor 3. It isn't likely to solve the original problem (memory shortage on remote DLM node) it was supposed to solve 4. It uses a bunch of arbitrary constants which are unlikely to be correct for any particular situation and for which the tuning seems to be a black art. 5. In an N node cluster, only 1/N of the dropped locked will actually contribute to solving the problem on average. So all in all we are better off without it. This also makes merging the lock_dlm module into GFS2 a bit easier. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-06-27 09:39:44 +01:00
Steven Whitehouse	048bca2237	[GFS2] No lock_nolock This patch merges the lock_nolock module into GFS2 itself. As well as removing some of the overhead of the module, it also means that its now impossible to build GFS2 without a lock module (which would be a pointless thing to do anyway). We also plan to merge lock_dlm into GFS2 in the future, but that is a more tricky task, and will therefore be a separate patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: David Teigland <teigland@redhat.com>	2008-06-27 09:39:28 +01:00
Steven Whitehouse	6802e3400f	[GFS2] Clean up the glock core This patch implements a number of cleanups to the core of the GFS2 glock code. As a result a lot of code is removed. It looks like a really big change, but actually a large part of this patch is either removing or moving existing code. There are some new bits too though, such as the new run_queue() function which is considerably streamlined. Highlights of this patch include: o Fixes a cluster coherency bug during SH -> EX lock conversions o Removes the "glmutex" code in favour of a single bit lock o Removes the ->go_xmote_bh() for inodes since it was duplicating ->go_lock() o We now only use the ->lm_lock() function for both locks and unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED) o The fast path is considerably shortly, giving performance gains especially with lock_nolock o The glock_workqueue is now used for all the callbacks from the DLM which allows us to simplify the lock_dlm module (see following patch) o The way is now open to make further changes such as eliminating the two threads (gfs2_glockd and gfs2_scand) in favour of a more efficient scheme. This patch has undergone extensive testing with various test suites so it should be pretty stable by now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2008-06-27 09:39:22 +01:00
Benjamin Marzinski	58e9fee13e	[GFS2] Invalidate cache at correct point GFS2 wasn't invalidating its cache before it called into the lock manager with a request that could potentially drop a lock. This was leaving a window where the lock could be actually be held by another node, but the file's page cache would still appear valid, causing coherency problems. This patch moves the cache invalidation to before the lock manager call when dropping a lock. It also adds the option to the lock_dlm lock manager to not use conversion mode deadlock avoidance, which, on a conversion from shared to exclusive, could internally drop the lock, and then reacquire in. GFS2 now asks lock_dlm to not do this. Instead, GFS2 manually drops the lock and reacquires it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:44 +01:00
Steven Whitehouse	840ca0ec70	[GFS2] Fix bug where we called drop_bh incorrectly As a result of an earlier patch, drop_bh was being called in cases when it shouldn't have been. Since we never have a gh in the drop case and we always have a gh in the promote case, we can use that extra information to tell which case has been seen. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2008-03-31 10:41:01 +01:00
Bob Peterson	cf45b752c9	[GFS2] Remove rgrp and glock version numbers This patch further reduces GFS2's memory requirements by eliminating the 64-bit version number fields in lieu of a couple bits. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:29 +01:00
Steven Whitehouse	da755fdb41	[GFS2] Remove lm.[ch] and distribute content The functions in lm.c were just wrappers which were mostly only used in one other file. By moving the functions to the files where they are being used, they can be marked static and also this will usually result in them being inlined since they are often only used from one point in the code. A couple of really trivial functions have been inlined by hand into the function which called them as it makes the code clearer to do that. We also gain from one fewer function call in the glock lock and unlock paths. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:26 +01:00
Bob Peterson	ab0d756681	[GFS2] Eliminate gl_req_bh This patch further reduces the memory needs of GFS2 by eliminating the gl_req_bh variable from struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:23 +01:00
Bob Peterson	29d38cd163	[GFS2] Get rid of gl_waiters2 This patch reduces memory by replacing the int variable gl_waiters2 by a single bit in the gl_flags. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:13 +01:00
Adrian Bunk	048786f1e6	[GFS2] make gfs2_glock_hold() static gfs2_glock_hold() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:02 +01:00
Bob Peterson	ef8c441cb7	[GFS2] Only wake the reclaim daemon if we need to This patch only wakes up the glock reclaim daemon if there is actually something to be reclaimed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:00 +01:00
Pavel Emelyanov	eccba06891	gfs2: make gfs2_glock.gl_owner_pid be a struct pid * The gl_owner_pid field is used to get the lock owning task by its pid, so make it in a proper manner, i.e. by using the struct pid pointer and pid_task() function. The pid_task() becomes exported for the gfs2 module. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:06 -08:00
Pavel Emelyanov	b1e058da50	gfs2: make gfs2_holder.gh_owner_pid be a struct pid * The gl_owner_pid field is used to get the holder task by its pid and check whether the current is a holder, so make it in a proper manner, i.e. via the struct pid * manipulations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:06 -08:00
Bob Peterson	398bbe6832	[GFS2] Reorganize function gfs2_glmutex_lock This patch optimizes the function gfs2_glmutex_lock. The basic theory is: Why bother initializing a holder, setting up wait bits and then waiting on them, if you know the glock can be yours. So the holder stuff is placed inside the if checking if the glock is locked. This one needs careful scrutiny because changing anything to do with locking should strike terror into one's heart. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:52 +00:00
Fabio Massimo Di Nitto	1a2781cfa5	[GFS2] Fix runtime issue with UP kernels The issue is indeed UP vs SMP and it is totally random. spin_is_locked() is a bad assertion because there is no correct answer on UP. on UP spin_is_locked() has to return either one value or another, always. This means that in my setup I am lucky enough to trigger the issue and your you are lucky enough not to. the patch in attachment removes the bogus calls to BUG_ON and according to David (in CC and thanks for the long explanation on the problem) we can rely upon things like lockdep to find problem that might be trying to catch. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:06 +00:00
Steven Whitehouse	2bcd610d2f	[GFS2] Don't add glocks to the journal The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:52 +00:00
Steven Whitehouse	e589665eb9	[GFS2] Remove flags no longer required The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders depending upon which of the two waiters lists they were going to be queued upon. They were then tested when the holders were taken off the lists to ensure that the right type of holder was being dequeued. Since we are already using separate lists, there doesn't seem a lot of point having these flags as well, and since setting them and testing them is in the fast path for locking and unlocking glock, this patch removes them. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:44 +00:00
Steven Whitehouse	3042a2ccd6	[GFS2] Reorder writeback for glock sync Previously we were doing (write data, wait for data, write metadata, wait for metadata). After this patch we so (write metadata, write data, wait for data, wait for metadata) which should be more efficient. Also I noticed that the drop_bh and xmote_bh functions were almost identical. In fact the only difference was a single test, and that test is such that in the drop_bh case, it would always evaluate to the correct result. As such we can use the xmote_bh functions in all the places where we were using the drop_bh function and remove the drop_bh functions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:42 +00:00
Steven Whitehouse	c2932e03db	[GFS2] Remove "reclaim limit" This call to reclaim glocks is not needed, and in particular we don't want it in the fast path for locking glocks. The limit was entirely arbitrary anyway and we can't expect users to adjust things like this, the remaining code will do the right thing on its own. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:37 +00:00
Wendy Cheng	cc7e79b168	[GFS2] Handle multiple glock demote requests Fix a race condition where multiple glock demote requests are sent to a node back-to-back. This patch does a check inside handle_callback() to see whether a demote request is in progress. If true, it sets a flag to make sure run_queue() will loop again to handle the new request, instead of erronously setting gl_demote_state to a different state. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:09 +00:00
Wendy Cheng	49e61f2ef6	[GFS2] Move inode deletion out of blocking_cb Move inode deletion code out of blocking_cb handle_callback route to avoid racy conditions that end up blocking lock_dlm1 thread. Fix bugzilla 286821. Signed-off-by: Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:56:17 +01:00
Abhijith Das	b4c20166dc	[GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118! This patch adds a new flag to the gfs2_holder structure GL_FLOCK. It is set on holders of glocks representing flocks. This flag is checked in add_to_queue() and a process is permitted to queue more than one holder onto a glock if it is set. This solves the issue of a process not being able to do multiple flocks on the same file. Through a single descriptor, a process can now promote and demote flocks. Through multiple descriptors a process can now queue multiple flocks on the same file. There's still the problem of a process deadlocking itself (because gfs2 blocking locks are not interruptible) by queueing incompatible deadlock. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:56:14 +01:00
Benjamin Marzinski	c4f68a130f	[GFS2] delay glock demote for a minimum hold time When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in a cluster, it will deadlock. The reason is that do_no_page() will repeatedly call gfs2_sharewrite_nopage(), because each node keeps giving up the glock too early, and is forced to call unmap_mapping_range(). This bumps the mapping->truncate_count sequence count, forcing do_no_page() to retry. This patch institutes a minimum glock hold time a tenth a second. This insures that even in heavy contention cases, the node has enough time to get some useful work done before it gives up the glock. A second issue is that when gfs2_glock_dq() is called from within a page fault to demote a lock, and the associated page needs to be written out, it will try to acqire a lock on it, but it has already been locked at a higher level. This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this issue. This is the same patch as Steve Whitehouse originally proposed to fix this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock before it queues up the work on it. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:55:48 +01:00
Abhijith Das	a947e03356	[GFS2] Wendy's dump lockname in hex & fix glock dump With this patch, gfs2 glockdump through the debugfs filesystem will only dump glocks for the specified filesystem instead of all glocks. Also, to aid debugging, the glock number is dumped in hex instead of decimal. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com>	2007-10-10 08:55:41 +01:00
Steven Whitehouse	8fbbfd214c	[GFS2] Reduce number of gfs2_scand processes to one We only need a single gfs2_scand process rather than the one per filesystem which we had previously. As a result the parameter determining the frequency of gfs2_scand runs becomes a module parameter rather than a mount parameter as it was before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:55:08 +01:00
Denis Cheng	4ef290025c	[GFS2] mark struct _operations const these struct _operations are all method tables, thus should be const. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:55:03 +01:00
Steven Whitehouse	7b08fc6201	[GFS2] Fix an oops in glock dumping This fixes an oops which was occurring during glock dumping due to the seq file code not taking a reference to the glock. Also this fixes a memory leak which occurred in certain cases, in turn preventing the filesystem from unmounting. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:54:49 +01:00
Jesper Juhl	aa0481e58a	[GFS2] Clean up duplicate includes in fs/gfs2/ This patch cleans up duplicate includes in fs/gfs2/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:54:44 +01:00
Josef Whiter	26caee5bc6	[GFS2] Fix calculation of demote state If a glock is in the exclusive state and a request for demote to deferred has been received, then further requests for demote to shared are being ignored. This patch fixes that by ensuring that we demote to unlocked in that case. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:54:42 +01:00
Steven Whitehouse	87124e581b	[GFS2] Fix two races relating to glock callbacks One of the races relates to referencing a variable while not holding its protecting spinlock. The patch simply moves the test inside the spin lock. The other races occurs when a demote to unlocked request occurs during the time a demote to shared request is already running. This of course only happens in the case that the lock was in the exclusive mode to start with. The patch adds a check to see if another demote request has occurred in the mean time and if it has, then it performs a second demote. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:54:39 +01:00
Steven Whitehouse	eaf5bd3cac	[GFS2] Simplify multiple glock aquisition There is a bug in the code which acquires multiple glocks where if the initial out-of-order attempt fails part way though we can land up trying to acquire the wrong number of glocks. This is part of the fix for red hat bz #239737. The other part of the bz doesn't apply to upstream kernels since it was fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6 Since the out-of-order code doesn't appear to add anything to the performance of GFS2, this patch just removed it rather than trying to fix it. It should be much easier to see whats going on here now. In addition, we don't allocate any memory unless we are using a lot of glocks (which is a relatively uncommon case). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:50 +01:00
Abhijith Das	d93cfa9884	[GFS2] Fix deallocation issues There were two issues during deallocation of unlinked inodes. The first was relating to the use of a "try" lock which in the case of the inode lock wasn't trying hard enough to deallocate in all circumstances (now changed to a normal glock) and in the case of the iopen lock didn't wait for the demotion of the shared lock before attempting to get the exclusive lock, and thereby sometimes (timing dependent) not completing the deallocation when it should have done. The second issue related to the lack of a way to invalidate dcache entries on remote nodes (now fixed by this patch) which meant that unlinks were taking a long time to return disk space to the fs. By adding some code to invalidate the dcache entries across the cluster for unlinked inodes, that is now fixed. This patch was written jointly by Abhijith Das and Steven Whitehouse. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:36 +01:00
Steven Whitehouse	dbb7cae2a3	[GFS2] Clean up inode number handling This patch cleans up the inode number handling code. The main difference is that instead of looking up the inodes using a struct gfs2_inum_host we now use just the no_addr member of this structure. The tests relating to no_formal_ino can then be done by the calling code. This has advantages in that we want to do different things in different code paths if the no_formal_ino doesn't match. In the NFS patch we want to return -ESTALE, but in the ->lookup() path, its a bug in the fs if the no_formal_ino doesn't match and thus we can withdraw in this case. In order to later fix bz #201012, we need to be able to look up an inode without knowing no_formal_ino, as the only information that is known to us is the on-disk location of the inode in question. This patch will also help us to fix bz #236099 at a later date by cleaning up a lot of the code in that area. There are no user visible changes as a result of this patch and there are no changes to the on-disk format either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:24 +01:00
Robert Peterson	cd81a4bac6	[GFS2] Addendum patch 2 for gfs2_grow This addendum patch 2 corrects three things: 1. It fixes a stupid mistake in the previous addendum that broke gfs2. Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html 2. It fixes a problem that Dave Teigland pointed out regarding the external declarations in ops_address.h being in the wrong place. 3. It recasts a couple more %llu printks to (unsigned long long) as requested by Steve Whitehouse. I would have loved to put this all in one revised patch, but there was a rush to get some patches for RHEL5. Therefore, the previous patches were applied to the git tree "as is" and therefore, I'm posting another addendum. Sorry. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:19 +01:00
Steven Whitehouse	37fde8ca6c	[GFS2] Uncomment sprintf_symbol calling code Now that the patch from -mm has gone upstream, we can uncomment the code in GFS2 which uses sprintf_symbol. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Robert Peterson <rpeterso@redhat.com>	2007-05-01 09:51:39 +01:00
Robert Peterson	5f8820960c	[GFS2] lockdump improvements The patch below consists of the following changes (in code order): 1. I fixed a minor compiler warning regarding the printing of a kernel symbol address. 2. I implemented a suggestion from Dave Teigland that moves the debugfs information for gfs2 into a subdirectory so we can easily expand our use of debugfs in the future. The current code keeps the glock information in: /debug/gfs2/<fs> With the patch, the new code keeps the glock information in: /debug/gfs2/<fs>/glock That will allow us to create more debugfs files in the future. 3. This fixes a bug whereby a failed mount attempt causes the debugfs file to not be deleted. Failed mount attempts should always clean up after themselves, including deleting the debugfs file and/or directory. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:33 +01:00
Robert Peterson	7a0079d9e3	[GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) seen at the "gfs2 summit". This also fixes the bug that caused garbage to be printed by the "initialized at" field. I apologize for the kludge, but that code will all be ripped out anyway when the official sprint_symbol function becomes available in the Linux kernel. I also changed some formatting so that spaces are replaced by proper tabs. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:28 +01:00
Robert Peterson	04b933f27b	[GFS2] Red Hat bz 228540: owner references In Testing the previously posted and accepted patch for https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540 I uncovered some gfs2 badness. It turns out that the current gfs2 code saves off a process pointer when glocks is taken in both the glock and glock holder structures. Those structures will persist in memory long after the process has ended; pointers to poisoned memory. This problem isn't caused by the 228540 fix; the new capability introduced by the fix just uncovered the problem. I wrote this patch that avoids saving process pointers and instead saves off the process pid. Rather than referencing the bad pointers, it now does process lookups. There is special code that makes the output nicer for printing holder information for processes that have ended. This patch also adds a stub for the new "sprint_symbol" function that exists in Andrew Morton's -mm patch set, but won't go into the base kernel until 2.6.22, since it adds functionality but doesn't fix a bug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:55 +01:00
Steven Whitehouse	420d2a1028	[GFS2] Fix a bug on i386 due to evaluation order Since gcc didn't evaluate the last two terms of the expression in glock.c:1881 as a constant expression, it resulted in an error on i386 due to the lack of a 64bit divide instruction. This adds some brackets to fix the problem. This was reported by Andrew Morton. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2007-05-01 09:10:42 +01:00
Steven Whitehouse	3b8249f617	[GFS2] Fix bz 224480 and cleanup glock demotion code This patch prevents the printing of a warning message in cases where the fs is functioning normally by handing off responsibility for unlinked, but still open inodes, to another node for eventual deallocation. Also, there is now an improved system for ensuring that such requests to other nodes do not get lost. The callback on the iopen lock is only ever called when i_nlink == 0 and when a node is unable to deallocate it due to it still being in use on another node. When a node receives the callback therefore, it knows that i_nlink must be zero, so we mark it as such (in gfs2_drop_inode) in order that it will then attempt deallocation of the inode itself. As an additional benefit, queuing a demote request no longer requires a memory allocation. This simplifies the code for dealing with gfs2_holders as it removes one special case. There are two new fields in struct gfs2_glock. gl_demote_state is the state which the remote node has requested and gl_demote_time is the time when the request came in. Both fields are only valid when the GLF_DEMOTE flag is set in gl_flags. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:39 +01:00
Josef Whiter	5c7342d894	[GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option If you specify an invalid mount option when trying to mount a gfs2 filesystem, gfs2 will oops. The attached patch resolves this problem. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:32 +01:00
Robert Peterson	7c52b166c5	[GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) The attached patch resolves bz 228540. This adds the capability for gfs2 to dump gfs2 locks through the debugfs file system. This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from gfs2 because all the ioctls were stripped out. Please see the bugzilla for more history about the fix. This patch is also attached to the bugzilla record. The patch is against Steve Whitehouse's latest nmw git tree kernel (2.6.21-rc1) and has been tested on system trin-10. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:29 +01:00
akpm@linux-foundation.org	95d97b7dd7	[GFS2] build fix fs/gfs2/glock.c:2198: error: 'THIS_MODULE' undeclared here (not in a function) Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-03-07 14:03:25 -05:00
Steven Whitehouse	631c42e170	[GFS2] go_drop_bh is never used, so remove it The ->go_drop_bh function is never used, so this removes it and the single caller, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-03-07 14:02:53 -05:00
Steven Whitehouse	61be084efc	[GFS2] Put back semaphore to avoid umount problem Dave Teigland fixed this bug a while back, but I managed to mistakenly remove the semaphore during later development. It is required to avoid the list of inodes changing during an invalidate_inodes call. I have made it an rwsem since the read side will be taken frequently during normal filesystem operation. The write site will only happen during umount of the file system. Also the bug only triggers when using the DLM lock manager and only then under certain conditions as its timing related. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: David Teigland <teigland@redhat.com>	2007-02-05 13:38:14 -05:00
Steven Whitehouse	d043e1900c	[GFS2] Fix typo in glock.c This is a one letter typo fix in glock.c, spotted by Rob Kenna. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:41 -05:00
Steven Whitehouse	90101c3186	[GFS2] Compile fix for glock.c This one liner got missed from the previous patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:35 -05:00
Steven Whitehouse	12132933c4	[GFS2] Remove queue_empty() function This function is not longer required since we do not do recursive locking in the glock layer. As a result all its callers can be replaceed with list_empty() calls. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:32 -05:00
Steven Whitehouse	b5d32bead1	[GFS2] Tidy up glops calls This patch doesn't make any changes to the ordering of the various operations related to glocking, but it does tidy up the calls to the glops.c functions to make the structure more obvious. The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be made static within glock.c since they are called by every set of glock operations. The xmote_th and drop_th glock operations are then made conditional upon those two routines existing and called from the previously mentioned functions in glock.c respectively. Also it can be seen that the go_sync operation isn't needed since it can easily be replaced by calls to xmote_bh and drop_bh respectively. This results in no longer (confusingly) calling back into routines in glock.c from glops.c and also reducing the glock operations by one member. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:26 -05:00
Steven Whitehouse	1c0f4872dc	[GFS2] Remove local exclusive glock mode Here is a patch for GFS2 to remove the local exclusive flag. In the places it was used, mutex's are always held earlier in the call path, so it appears redundant in the LM_ST_SHARED case. Also, the GFS2 holders were setting local exclusive in any case where the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock code where the flag was tested have been replaced with tests for the lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well as globally exclusive). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:20 -05:00
Steven Whitehouse	6bd9c8c2fb	[GFS2] Remove unused go_callback operation This is never used, so we might as well remove it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:17 -05:00
Steven Whitehouse	e5dab552c8	[GFS2] Remove the "greedy" function from glock.[ch] The "greedy" code was an attempt to retain glocks for a minimum length of time when they relate to mmap()ed files. The current implementation of this feature is not, however, ideal in that it required allocating memory in order to do this and its overly complicated. It also misses the mark by ignoring the other I/O operations which are just as likely to suffer from the same problem. So the plan is to remove this now and then add the functionality back as part of the glock state machine at a later date (and thus take into account all the possible users of this feature) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:14 -05:00
Steven Whitehouse	fee852e374	[GFS2] Shrink gfs2_inode memory by half Here is something I spotted (while looking for something entirely different) the other day. Rather than using a completion in each and every struct gfs2_holder, this removes it in favour of hashed wait queues, thus saving a considerable amount of memory both on the stack (where a number of gfs2_holder structures are allocated) and in particular in the gfs2_inode which has 8 gfs2_holder structures embedded within it. As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to 1912 bytes, a saving of 576 bytes per inode (no thats not a typo!). In actual practice we get a much better result than that since now that a gfs2_inode is under the 2048 byte barrier, we get two per 4k slab page effectively halving the amount of memory required to store gfs2_inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:11 -05:00
Steven Whitehouse	3699e3a44b	[GFS2] Clean up/speed up readdir This removes the extra filldir callback which gfs2 was using to enclose an attempt at readahead for inodes during readdir. The code was too complicated and also hurts performance badly in the case that the getdents64/readdir call isn't being followed by stat() and it wasn't even getting it right all the time when it was. As a result, on my test box an "ls" of a directory containing 250000 files fell from about 7mins (freshly mounted, so nothing cached) to between about 15 to 25 seconds. When the directory content was cached, the time taken fell from about 3mins to about 4 or 5 seconds. Interestingly in the cached case, running "ls -l" once reduced the time taken for subsequent runs of "ls" to about 6 secs even without this patch. Now it turns out that there was a special case of glocks being used for prefetching the metadata, but because of the timeouts for these locks (set to 10 secs) the metadata was being timed out before it was being used and this the prefetch code was constantly trying to prefetch the same data over and over. Calling "ls -l" meant that the inodes were brought into memory and once the inodes are cached, the glocks are not disposed of until the inodes are pushed out of the cache, thus extending the lifetime of the glocks, and thus bringing down the time for subsequent runs of "ls" considerably. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-02-05 13:37:04 -05:00
Linus Torvalds	1c1afa3c05	Merge master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits) [DLM] Clean up lowcomms [GFS2] Change gfs2_fsync() to use write_inode_now() [GFS2] Fix indent in recovery.c [GFS2] Don't flush everything on fdatasync [GFS2] Add a comment about reading the super block [GFS2] Mount problem with the GFS2 code [GFS2] Remove gfs2_check_acl() [DLM] fix format warnings in rcom.c and recoverd.c [GFS2] lock function parameter [DLM] don't accept replies to old recovery messages [DLM] fix size of STATUS_REPLY message [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning [DLM] fix add_requestqueue checking nodes list [GFS2] Fix recursive locking in gfs2_getattr [GFS2] Fix recursive locking in gfs2_permission [GFS2] Reduce number of arguments to meta_io.c:getbuf() [GFS2] Move gfs2_meta_syncfs() into log.c [GFS2] Fix journal flush problem [GFS2] mark_inode_dirty after write to stuffed file [GFS2] Fix glock ordering on inode creation ...	2006-12-07 09:13:20 -08:00
Randy Dunlap	0ac230699a	[GFS2] lock function parameter Fix function parameter typing: fs/gfs2/glock.c💯 warning: function declaration isn't a prototype Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:37:18 -05:00
Steven Whitehouse	b004157ab5	[GFS2] Fix journal flush problem This fixes a bug which resulted in poor performance due to flushing the journal too often. The code path in question was via the inode_go_sync() function in glops.c. The solution is not to flush the journal immediately when inodes are ejected from memory, but batch up the work for glockd to deal with later on. This means that glocks may now live on beyond the end of the lifetime of their inodes (but not very much longer in the normal case). Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in calculation of the number of free journal blocks. The gfs2_logd process has been altered to be more responsive to the journal filling up. We now wake it up when the number of uncommitted journal blocks has reached the threshold level rather than trying to flush directly at the end of each transaction. This again means doing fewer, but larger, log flushes in general. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:36:42 -05:00
Steven Whitehouse	1a14d3a68f	[GFS2] Simplify glops functions The go_sync callback took two flags, but one of them was set on every call, so this patch removes once of the flags and makes the previously conditional operations (on this flag), unconditional. The go_inval callback took three flags, each of which was set on every call to it. This patch removes the flags and makes the operations unconditional, which makes the logic rather more obvious. Two now unused flags are also removed from incore.h. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:36:30 -05:00
Steven Whitehouse	ab923031ce	[GFS2] Fix memory allocation in glock.c Change from GFP_KERNEL to GFP_NOFS as this was causing a slow down when trying to push inodes from cache. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:35:46 -05:00
Steven Whitehouse	c594d88664	[GFS2] Remove unused GL_DUMP flag There is no way to set the GL_DUMP flag, and in any case the same thing can be done with systemtap if required for debugging, so this removes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:34:40 -05:00
Steven Whitehouse	b60623c238	[GFS2] Shrink gfs2_inode (3) - di_mode This removes the duplicate di_mode field in favour of using the inode->i_mode field. This saves 4 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:34:14 -05:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
Steven Whitehouse	907b9bceb4	[GFS2/DLM] Fix trailing whitespace As per Andrew Morton's request, removed trailing whitespace. Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-25 09:26:04 -04:00
Fabio Massimo Di Nitto	7d308590ae	[GFS2] Export lm_interface to kernel headers lm_interface.h has a few out of the tree clients such as GFS1 and userland tools. Right now, these clients keeps a copy of the file in their build tree that can go out of sync. Move lm_interface.h to include/linux, export it to userland and clean up fs/gfs2 to use the new location. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-19 08:45:18 -04:00
Steven Whitehouse	a8336344a5	[GFS2] Fix glock hash clearing A one liner bug fix to prevent the return value being wrong when more than one superblock is mounted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-14 13:57:38 -04:00
Steven Whitehouse	16feb9fec0	[GFS2] Use atomic_t rather than kref in glock.c Use atomic_t as the ref count in glocks rather than a kref. This is another step towards using RCU for the glock hash. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-13 10:43:37 -04:00
Steven Whitehouse	b6397893a5	[GFS2] Use hlist for glock hash chains This results in smaller list heads, so that we can have more chains in the same amount of memory (twice as many). I've multiplied the size of the table by four though - this is because we are saving memory by not having one lock per chain any more. So we land up using about the same amount of memory for the hash table as we did before I started these changes, the difference being that we now have four times as many hash chains. The reason that I say "about the same amount of memory" is that the actual amount now depends upon the NR_CPUS and some of the config variables, so that its not exact and in some cases we do use more memory. Eventually we might want to scale the hash table size according to the size of physical ram as measured on module load. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-12 10:10:01 -04:00
Steven Whitehouse	2426443460	[GFS2] Rewrite of examine_bucket() The existing implementation of this function in glock.c was not very efficient as it relied upon keeping a cursor element upon the hash chain in question and moving it along. This new version improves upon this by using the current element as a cursor. This is possible since we only look at the "next" element in the list after we've taken the read_lock() subsequent to calling the examiner function. Obviously we have to eventually drop the ref count that we are then left with and we cannot do that while holding the read_lock, so we do that next time we drop the lock. That means either just before we examine another glock, or when the loop has terminated. The new implementation has several advantages: it uses only a read_lock() rather than a write_lock(), so it can run simnultaneously with other code, it doesn't need a "plug" element, so that it removes a test not only from this list iterator, but from all the other glock list iterators too. So it makes things faster and smaller. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-11 21:40:30 -04:00
Steven Whitehouse	94610610f1	[GFS2] Remove unused function from glock.c The callback for iopen locks is unused, so this removes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-09 18:59:27 -04:00
Steven Whitehouse	a5e08a9ef5	[GFS2] Add consts to glock sorting function Add back the consts which were casted away in the glock sorting function. Also add early exit code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-09 17:07:05 -04:00
Steven Whitehouse	087efdd391	[GFS2] Make glock hash locks proportional to NR_CPUS Make the number of locks used for hash chains in glock.c proportional to NR_CPUS. Also move constants for the number of hash chains into glock.c from incore.h since they are not used outside of glock.c. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-09 16:59:11 -04:00
Steven Whitehouse	37b2fa6a24	[GFS2] Move rwlocks in glock.c into their own array This splits the rwlocks guarding the hash chains of the glock hash table into their own array. This will reduce memory usage in some cases due to better alignment, although the real reason for doing it is to allow the two tables to be different sizes in future (i.e. the locks will be sized proportionally with the max number of CPUs and the hash chains sized proportinally with the size of physical memory) In order to allow this, the gl_bucket member of struct gfs2_glock has now become gl_hash, so we record the hash rather than a pointer to the bucket itself. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-08 13:35:56 -04:00

1 2 3 4 5 ...

292 Commits