Commit Graph

2721 Commits

Author SHA1 Message Date
Jens Axboe f84d751994 [PATCH] pipe: introduce ->pin() buffer operation
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.

Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:03 +02:00
Jens Axboe 0568b409c7 [PATCH] splice: fix bugs in pipe_to_file()
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.

- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.

And as a cleanup on that:

- Make the bottom fall-through logic a little less convoluted. Also make
  the steal path hold an extra reference to the page, so we don't have
  to differentiate between stolen and non-stolen at the end.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:50:48 +02:00
Jens Axboe 46e678c96b [PATCH] splice: fix bugs with stealing regular pipe pages
- Check that page has suitable count for stealing in the regular pipes.
- pipe_to_file() assumes that the page is locked on succesful steal, so
  do that in the pipe steal hook
- Missing unlock_page() in add_to_page_cache() failure.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-30 16:36:32 +02:00
Steven Whitehouse 56409abbf8 [GFS2] Remove some unused code
Remove some of the unused code flagged up by Adrian Bunk.

Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse
2006-04-28 11:48:45 -04:00
Adrian Bunk 08bc2dbc73 [GFS2] [-mm patch] fs/gfs2/: possible cleanups
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 unused functions
- remove the following global function that was both unused and
  unimplemented:
  - super.c: gfs2_do_upgrade()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:59:12 -04:00
David Teigland c56b39cd2c [DLM] PATCH 3/3 dlm: show recover state
Expose the current recovery state in sysfs to help in debugging.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:51:53 -04:00
David Teigland 1c032c0311 [DLM] PATCH 2/3 dlm: lowcomms close
When a node is removed from a lockspace configuration, close our
connection to it, clearing any remaining messages for it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:50:41 -04:00
David Teigland ae118962b9 [DLM] PATCH 1/3 dlm: force free user lockspace
Lockspaces created from user space should be forcibly freed without
requiring any further user space interaction.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:48:59 -04:00
Steven Whitehouse 363275216c [GFS2] Reordering in deallocation to avoid recursive locking
Despite my earlier careful search, there was a recursive lock left
in the deallocation code. This removes it. It also should speed up
deallocation be reducing the number of locking operations which take
place by using two "try lock" operations on the two locks involved in
inode deallocation which allows us to grab the locks out of order
(compared with NFS which grabs the inode lock first and the iopen
lock later). It is ok for us to fail while doing this since if it
does fail it means that someone else is still using the inode and
thus it wouldn't be possible to deallocate anyway.

This fixes the bug reported to me by Rob Kenna.

Cc: Rob Kenna <rkenna@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:46:21 -04:00
Andreas Schwab 2833c28aa0 [PATCH] powerpc: Wire up *at syscalls
Wire up *at syscalls.

This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 21:04:59 +10:00
David Teigland d26046bb0a Merge branch 'master' 2006-04-27 11:49:55 -04:00
David Teigland e7f5c01cad [GFS2] Remove redundant casts to/from void
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-27 11:25:45 -04:00
Jens Axboe eb20796bf6 [PATCH] splice: make the read-side do batched page lookups
Use the new find_get_pages_contig() to potentially look up the entire
splice range in one single call. This speeds up generic_file_splice_read()
quite a bit.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-27 11:05:22 +02:00
Jens Axboe eb645a24de [PATCH] splice: switch to using page_cache_readahead()
Avoids doing useless work, when the file is fully cached.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-27 08:59:48 +02:00
David Teigland 6bd70aba5a [DLM] lock_dlm recover_status patch
This saves the journal recovery result and makes it visible through sysfs.
User space needs to know if the node actually recovered the journal or
tried and gave up.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-26 15:56:35 -04:00
Steven Whitehouse 579b78a43b [GFS2] Remove GL_NEVER_RECURSE flag
There is no point in keeping this flag since recursion is not
now allowed for any glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-26 14:58:26 -04:00
Steven Whitehouse 5965b1f479 [GFS2] Don't do recursive locking in glock layer
This patch changes the last user of recursive locking so that
it no longer needs this feature and removes it from the glock
layer. This makes the glock code a lot simpler and easier to
understand. Its also a prerequsite to adding support for the
AOP_TRUNCATED_PAGE return code (or at least it is if you don't
want your brain to melt in the process)

I've left in a couple of checks just in case there is some place
else in the code which is still using this feature that I didn't
spot yet, but they can probably be removed long term.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-26 13:21:55 -04:00
James Morris e7edf9cded [PATCH] LSM: add missing hook to do_compat_readv_writev()
This patch addresses a flaw in LSM, where there is no mediation of readv()
and writev() in for 32-bit compatible apps using a 64-bit kernel.

This bug was discovered and fixed initially in the native readv/writev
code [1], but was not fixed in the compat code.  Thanks to Al for spotting
this one.

  [1] http://lwn.net/Articles/154282/

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 07:52:21 -07:00
Al Viro a090d9132c [PATCH] protect ext3 ioctl modifying append_only, immutable, etc. with i_mutex
All modifications of ->i_flags in inodes that might be visible to
somebody else must be under ->i_mutex.  That patch fixes ext3 ioctl()
setting S_APPEND and friends.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 07:52:21 -07:00
Al Viro de0bb97aff [PATCH] forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)
sbi->s_group_desc is an array of pointers to buffer_head.  memcpy() of
buffer size from address of buffer_head is a bad idea - it will generate
junk in any case, may oops if buffer_head is close to the end of slab
page and next page is not mapped and isn't what was intended there.
IOW, ->b_data is missing in that call.  Fortunately, result doesn't go
into the primary on-disk data structures, so only backup ones get crap
written to them; that had allowed this bug to remain unnoticed until
now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 07:52:21 -07:00
Linus Torvalds 7b97ebfb93 Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: add ->splice_write support for /dev/null
  [PATCH] splice: rearrange moving to/from pipe helpers
  [PATCH] Add support for the sys_vmsplice syscall
  [PATCH] splice: fix offset problems
  [PATCH] splice: fix min() warning
2006-04-26 07:47:55 -07:00
Jens Axboe 00522fb41a [PATCH] splice: rearrange moving to/from pipe helpers
We need these for people writing their own ->splice_read/write hooks.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 14:39:29 +02:00
Jens Axboe 912d35f867 [PATCH] Add support for the sys_vmsplice syscall
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.

This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:59:21 +02:00
Miklos Szeredi 8aa09a50b5 [fuse] fix race between checking and setting file->private_data
BKL does not protect against races if the task may sleep between
checking and setting a value.  So move checking of file->private_data
near to setting it in fuse_fill_super().

Found by Al Viro.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:49:16 +02:00
Miklos Szeredi 6dbbcb1205 [fuse] fix deadlock between fuse_put_super() and request_end(), try #2
A deadlock was possible, when the last reference to the superblock was
held due to a background request containing a file reference.

Releasing the file would release the vfsmount which in turn would
release the superblock.  Since sbput_sem is held during the fput() and
fuse_put_super() tries to acquire this same semaphore, a deadlock
results.

The solution is to move the fput() outside the region protected by
sbput_sem.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:49:06 +02:00
Miklos Szeredi 5a5fb1ea74 Revert "[fuse] fix deadlock between fuse_put_super() and request_end()"
This reverts 73ce8355c2 commit.

It was wrong, because it didn't take into account the requirement,
that iput() for background requests must be performed synchronously
with ->put_super(), otherwise active inodes may remain after unmount.

The right solution is to keep the sbput_sem and perform iput() within
the locked region, but move fput() outside sbput_sem.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:48:55 +02:00
Jens Axboe 016b661e2f [PATCH] splice: fix offset problems
Make the move_from_pipe() actors return number of bytes processed, then
move_from_pipe() can decide more cleverly when to move on to the next
buffer.

This fixes problems with pipe offset and differing file offset.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:33:34 +02:00
Andrew Morton ba5f5d90c4 [PATCH] splice: fix min() warning
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:33:34 +02:00
David Teigland 3a2a9c96ac [GFS2] Update plock code in DLM locking module
We should be using fl_pid not fl_owner.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-25 15:45:51 -04:00
Patrick Caulfield 714dc65c34 [DLM] Convert a semaphore to a completion
Convert a semaphore into a completion in device.c.

Cc: David Teigland <teigland@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-25 14:49:01 -04:00
Steven Whitehouse 96c2c0083d [DLM] Update Kconfig in the light of comments on lkml
We now depend on user selectable options rather than
select them. There is no dependancy on SYSFS since this
selection is independant of the DLM (even though it wouldn't
be sensible to build the DLM without it)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-25 13:23:09 -04:00
Steven Whitehouse 4bcf7091f9 [GFS2] Remove inherited flags from exported flags.
We don't need the inherited flags since this action can be
implied by setting the flags on directories where they
wouldn't otherwise make sense. It reduces the number of extra
flags by two. Also updated the list of flags to take account of
one extra ext2/3 flag.

Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-25 13:20:27 -04:00
Steven Whitehouse b5ea3e1ef3 [GFS2] Tidy up Makefile & Kconfig
Remove select of SYSFS as requested by Greg KH. Change whitespace to
tabs rather than spaces in places where it was incorrect and removed
'default m' as suggested by Adrian Bunk.

Reorganised Makefile as suggested by Sam Ravnborg.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-24 14:14:42 -04:00
Steven Whitehouse b800a1cb39 [GFS2] Tidy up daemon.c
As per Andrew Morton's comments, remove uneeded casts and use
wait_event_interruptible() rather than open code the wait.

Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-24 13:13:56 -04:00
Steve French 301dc3e6f6 [CIFS] Fix compile error when CONFIG_CIFS_EXPERIMENTAL is undefined
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-24 16:24:54 +00:00
Steven Whitehouse 61e085a88c [GFS2] Tidy up dir code as per Christoph Hellwig's comments
1. Comment whitespace fix
2. Removed unused header files from dir.c
3. Split the gfs2_dir_get_buffer() function into two functions

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-24 10:07:13 -04:00
Linus Torvalds 41bc3982b9 Merge master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6-stable
* master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6-stable:
  [CIFS] Fix typo in previous
  [CIFS] Readdir fixes to allow search to start at arbitrary position
  [CIFS] Use the kthread_ API instead of opencoding lots of hairy code for kernel
  [CIFS] Don't allow a backslash in a path component
  [CIFS] [CIFS] Do not take rename sem on most path based calls (during
2006-04-23 09:38:09 -07:00
Steve French b66ac3ea21 [CIFS] Fix typo in previous
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-23 01:54:50 +00:00
Jan Kara b9251b823b [PATCH] Fix reiserfs deadlock
reiserfs_cache_default_acl() should return whether we successfully found
the acl or not.  We have to return correct value even if reiserfs_get_acl()
returns error code and not just 0.  Otherwise callers such as
reiserfs_mkdir() can unnecessarily lock the xattrs and later functions such
as reiserfs_new_inode() fail to notice that we have already taken the lock
and try to take it again with obvious consequences.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-22 09:19:53 -07:00
Steve French 60808233f3 [CIFS] Readdir fixes to allow search to start at arbitrary position
in directory

Also includes first part of fix to compensate for servers which forget
to return . and .. as well as updates to changelog and cifs readme.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-22 15:53:05 +00:00
Steve French 45af7a0f2e [CIFS] Use the kthread_ API instead of opencoding lots of hairy code for kernel
thread creation and teardown.

It does not move the cifsd thread handling to kthread due to problems
found in testing with wakeup of threads blocked in the socket peek api,
but the other cifs kernel threads now use kthread.
Also cleanup cifs_init to properly unwind when thread creation fails.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-21 22:52:25 +00:00
Steven Whitehouse 1e09ae544e [GFS2] Move BUG() back into the header file
In order to make the file and line number reporting work
correctly, this has been moved back into the header file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-21 15:52:46 -04:00
Steven Whitehouse 1dde2dbfc7 [GFS2] Add back missing BUG()
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-21 15:39:02 -04:00
Steven Whitehouse a74604bee2 [GFS2] sem -> mutex conversion in locking.c
Convert a semaphore to a mutex in locking.c and also tidy
up one or two loose ends.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-21 15:10:46 -04:00
Steve French 296034f7de [CIFS] Don't allow a backslash in a path component
Unless Posix paths have been negotiated, the backslash, "\", is not a valid
character in a path component.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French  <sfrench@us.ibm.com>
2006-04-21 18:18:37 +00:00
Steve French 0bd4fa977f [CIFS] [CIFS] Do not take rename sem on most path based calls (during
building of full path) to avoid hang rename/readdir hang

Reported by Alan Tyson

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-21 18:17:42 +00:00
Steven Whitehouse a748422ee4 Merge branch 'master' 2006-04-21 12:52:36 -04:00
David Teigland c63e31c2cc [GFS2] journal recovery patch
This is one of the changes related to journal recovery I mentioned a
couple weeks ago.  We can get into a situation where there are only
readonly nodes currently mounting the fs, but there are journals that need
to be recovered.  Since the readonly nodes can't recover journals, the
next rw mounter needs to go through and check all journals and recover any
that are dirty (i.e. what the first node to mount the fs does).  This rw
mounter needs to skip the journals held by the existing readonly nodes.
Skipping those journals amounts to using the TRY flag on the journal locks
so acquiring the lock of a journal held by a readonly node will fail
instead of blocking indefinately.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-20 17:03:48 -04:00
Steven Whitehouse 190562bd84 [GFS2] Fix a bug: scheduling under a spinlock
At some stage, a mutex was added to gfs2_glock_put() without
checking all its call sites. Two of them were called from
under a spinlock causing random delays at various points and
crashes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-20 16:57:23 -04:00
Jens Axboe 82aa5d6183 [PATCH] splice: fix smaller sized splice reads
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-20 13:05:48 +02:00
Linus Torvalds 949b211235 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
  SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
  NFS: remove needless check in nfs_opendir()
  NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
  NFS: make 2 functions static
  NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
  NFS: fix PROC_FS=n compile error
  VFS: Fix another open intent Oops
  RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
2006-04-19 10:46:59 -07:00
Carsten Otte 7451c4f0ee NFS: remove needless check in nfs_opendir()
Local variable res was initialized to 0 - no check needed here.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 13:06:37 -04:00
John Hawkes b9d9506d94 NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
Convert a for-loop that explicitly references "NR_CPUS" into the
potentially more efficient for_each_possible_cpu() construct.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 13:06:20 -04:00
Adrian Bunk ec535ce154 NFS: make 2 functions static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 12:43:47 -04:00
Trond Myklebust e99170ff3b NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 12:43:47 -04:00
Trond Myklebust 95cf959b24 VFS: Fix another open intent Oops
If the call to nfs_intent_set_file() fails to open a file in
nfs4_proc_create(), we should return an error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 12:43:46 -04:00
Linus Torvalds 0efd9323f3 Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: fixup writeout path after ->map changes
  [PATCH] splice: offset fixes
  [PATCH] tee: link_pipe() must be careful when dropping one of the pipe locks
  [PATCH] splice: cleanup the SPLICE_F_NONBLOCK handling
  [PATCH] splice: close i_size truncate races on read
2006-04-19 09:25:52 -07:00
Dipankar Sarma ca99c1da08 [PATCH] Fix file lookup without ref
There are places in the kernel where we look up files in fd tables and
access the file structure without holding refereces to the file.  So, we
need special care to avoid the race between looking up files in the fd
table and tearing down of the file in another CPU.  Otherwise, one might
see a NULL f_dentry or such torn down version of the file.  This patch
fixes those special places where such a race may happen.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:51 -07:00
Arthur Othieno dda27d1a55 [PATCH] hugetlbfs: add Kconfig help text
In kernel bugzilla #6248 (http://bugzilla.kernel.org/show_bug.cgi?id=6248),
Adrian Bunk <bunk@stusta.de> notes that CONFIG_HUGETLBFS is missing Kconfig
help text.

Signed-off-by: Arthur Othieno <apgo@patchbomb.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:50 -07:00
Eric W. Biederman 5e85d4abe3 [PATCH] task: Make task list manipulations RCU safe
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:49 -07:00
Jens Axboe 9e0267c26e [PATCH] splice: fixup writeout path after ->map changes
Since ->map() no longer locks the page, we need to adjust the handling
of those pages (and stealing) a little. This now passes full regressions
again.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:57:31 +02:00
Jens Axboe a4514ebd8e [PATCH] splice: offset fixes
- We need to adjust *ppos for writes as well.
- Copy back modified offset value if one was passed in, similar to
  what sendfile does.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:57:05 +02:00
Jens Axboe 2a27250e6c [PATCH] tee: link_pipe() must be careful when dropping one of the pipe locks
We need to ensure that we only drop a lock that is ordered last, to avoid
ABBA deadlocks with competing processes.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:56:40 +02:00
Jens Axboe c4f895cbe1 [PATCH] splice: cleanup the SPLICE_F_NONBLOCK handling
- generic_file_splice_read() more readable and correct
- Don't bail on page allocation with NONBLOCK set, just don't allow
  direct blocking on IO (eg lock_page).

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:56:12 +02:00
Jens Axboe 91ad66ef44 [PATCH] splice: close i_size truncate races on read
We need to check i_size after doing a blocking readpage.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:55:10 +02:00
Linus Torvalds 385910f2b2 x86: be careful about tailcall breakage for sys_open[at] too
Came up through a quick grep for other cases similar to the ftruncate()
one in commit 0a489cb3b6.

Also, add a comment, so that people who read the code understand why we
do what looks like a no-op.

(Again, this won't actually matter to any sane user, since libc will
save and restore the register gcc stomps on, but it's still wrong to
stomp on it)

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 13:22:59 -07:00
Linus Torvalds 0a489cb3b6 x86: don't allow tail-calls in sys_ftruncate[64]()
Gcc thinks it owns the incoming argument stack, but that's not true for
"asmlinkage" functions, and it corrupts the caller-set-up argument stack
when it pushes the third argument onto the stack.  Which can result in
%ebx getting corrupted in user space.

Now, normally nobody sane would ever notice, since libc will save and
restore %ebx anyway over the system call, but it's still wrong.

I'd much rather have "asmlinkage" tell gcc directly that it doesn't own
the stack, but no such attribute exists, so we're stuck with our hacky
manual "prevent_tail_call()" macro once more (we've had the same issue
before with sys_waitpid() and sys_wait4()).

Thanks to Hans-Werner Hilse <hilse@sub.uni-goettingen.de> for reporting
the issue and testing the fix.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 13:02:48 -07:00
Steven Whitehouse fe1bdedc6c [GFS2] Use vmalloc() in dir code
When allocating memory to sort directory entries, use vmalloc()
rather than kmalloc() since for larger directories, the required
size can easily be graeter than the 128k maximum of kmalloc().

Also adding the first steps towards getting the AOP_TRUNCATED_PAGE
return code get in the glock code by flagging all places where we
request a glock and we are holding a page lock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-18 10:09:15 -04:00
Ananiev, Leonid I 75616cf985 [PATCH] ext3: Fix missed mutex unlock
Missed unlock_super()call is added in error condition code path.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17 14:24:57 -07:00
Stephen Rothwell 2436f039d2 [PATCH] Fix block device symlink name
As noted further on the this file, some block devices have a / in their
name, so fix the "block:..." symlink name the same as the /sys/block name.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17 14:24:57 -07:00
Kay Sievers d4d7e5dffc [PATCH] BLOCK: delay all uevents until partition table is scanned
[BLOCK] delay all uevents until partition table is scanned

Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.

We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-14 11:41:24 -07:00
NeilBrown 4508a7a734 [PATCH] sysfs: Allow sysfs attribute files to be pollable
It works like this:
  Open the file
  Read all the contents.
  Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
  When poll returns,
     close the file and go to top of loop.
   or lseek to start of file and go back to the 'read'.

Events are signaled by an object manager calling
   sysfs_notify(kobj, dir, attr);

If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).

This has a cost of one int  per attribute, one wait_queuehead per kobject,
one int per open file.

The name "sysfs_notify" may be confused with the inotify
functionality.  Maybe it would be nice to support inotify for sysfs
attributes as well?

This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-14 11:41:24 -07:00
Linus Torvalds 9a7e9f1c60 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mszeredi/fuse:
  [fuse] Direct I/O  should not use fuse_reset_request
  [fuse] Don't init request twice
  [fuse] Fix accounting the number of waiting requests
  [fuse] fix deadlock between fuse_put_super() and request_end()
2006-04-14 09:11:34 -07:00
Linus Torvalds 9ca686626c Merge branch 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: add support for sys_tee()
  [PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
2006-04-14 09:02:07 -07:00
Eric W. Biederman c06511d12d [PATCH] de_thread: Don't change our parents and ptrace flags.
This is two distinct changes.
 - Not changing our real parents.
 - Not changing our ptrace parents.

Not changing our real parents is trivially correct because both tasks
have the same real parents as they are part of a thread group.  Now that
we demote the leader to a thread there is no longer any reason to change
it's parentage.

Not changing our ptrace parents is a user visible change if someone
looks hard enough.  I don't think user space applications will care or
even notice.

In the practical and I think common case a debugger will have attached
to all of the threads using the same ptrace flags.  From my quick skim
of strace and gdb that appears to be the case.  Which if true means
debuggers will not notice a change.

Before this point we have already generated a ptrace event in do_exit
that reports the leaders pid has died so de_thread is visible to a
debugger.  Which means attempting to hide this case by copying flags
around appears excessive.

By not doing anything it avoids all of the weird locking issues between
de_thread and ptrace attach, and removes one case from consideration for
fixing the ptrace locking.

This only addresses Oleg's first concern with ptrace_attach, that of the
problems caused by reparenting.  Oleg's second concern is essentially a
race between ptrace_attach and release_task that causes an oops when we
get to force_sig_specific.  There is nothing special about de_thread
with respect to that race.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-14 08:49:19 -07:00
Steven Whitehouse 4d8012b60e [GFS2] Fix bug which was causing postmark to fail
A typo in the directory code was causing postmark to fail
somewhere in the allocation code, since it was unable to
find newly allocated directory leaf blocks under certain
circumstances.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-12 17:39:45 -04:00
Miklos Szeredi 56cf34ff07 [fuse] Direct I/O should not use fuse_reset_request
It's cleaner to allocate a new request, otherwise the uid/gid/pid
fields of the request won't be filled in.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:16:51 +02:00
Miklos Szeredi 4858cae4f0 [fuse] Don't init request twice
Request is already initialized in fuse_request_alloc() so no need to
do it again in fuse_get_req().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:16:38 +02:00
Miklos Szeredi 9bc5dddad1 [fuse] Fix accounting the number of waiting requests
Properly accounting the number of waiting requests was forgotten in
"clean up request accounting" patch.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:16:09 +02:00
Miklos Szeredi 73ce8355c2 [fuse] fix deadlock between fuse_put_super() and request_end()
A deadlock was possible, when the last reference to the superblock was
held due to a background request containing a file reference.

Releasing the file would release the vfsmount which in turn would
release the superblock.  Since sbput_sem is held during the fput() and
fuse_put_super() tries to acquire this same semaphore, a deadlock
results.

The chosen soltuion is to get rid of sbput_sem, and instead use the
spinlock to ensure the referenced inodes/file are released only once.
Since the actual release may sleep, defer these outside the locked
region, but using local variables instead of the structure members.

This is a much more rubust solution.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:14:26 +02:00
Steven Whitehouse f4154ea039 [GFS2] Update journal accounting code.
A small update to the journaling code to change the way that
the "extra" blocks are accounted for in the journal. These are
used at a rate of one per 503 metadata blocks or one per 251
journaled data blocks (or just one if the total number of journaled
blocks in the transaction is smaller). Since we are using them at
two different rates the old method of accounting for them no longer
works and we count them up as required.

Since the "per transaction" accounting can't handle this (there is no
fixed number of header blocks per transaction) we have to account for
it in the general journal code. We now require that each transaction
reserves more blocks than it actually needs to take account of the
possible extra blocks.

Also a final fix to dir.c to ensure that all ref counts are handled
correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-11 14:49:06 -04:00
Jens Axboe 70524490ee [PATCH] splice: add support for sys_tee()
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.

Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:51:17 +02:00
Jens Axboe cbb7e577e7 [PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:47:07 +02:00
Linus Torvalds 88dd9c16ce Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] vfs: add splice_write and splice_read to documentation
  [PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
  [PATCH] splice: warning fix
  [PATCH] another round of fs/pipe.c cleanups
  [PATCH] splice: comment styles
  [PATCH] splice: add Ingo as addition copyright holder
  [PATCH] splice: unlikely() optimizations
  [PATCH] splice: speedups and optimizations
  [PATCH] pipe.c/fifo.c code cleanups
  [PATCH] get rid of the PIPE_*() macros
  [PATCH] splice: speedup __generic_file_splice_read
  [PATCH] splice: add direct fd <-> fd splicing support
  [PATCH] splice: add optional input and output offsets
  [PATCH] introduce a "kernel-internal pipe object" abstraction
  [PATCH] splice: be smarter about calling do_page_cache_readahead()
  [PATCH] splice: optimize the splice buffer mapping
  [PATCH] splice: cleanup __generic_file_splice_read()
  [PATCH] splice: only call wake_up_interruptible() when we really have to
  [PATCH] splice: potential !page dereference
  [PATCH] splice: mark the io page as accessed
2006-04-11 06:34:02 -07:00
NeilBrown 358dd55aa3 [PATCH] knfsd: nfsd4: grant delegations more frequently
Keep unused openowners around for at least one lease period, to avoid the need
for as many open confirmations and to allow handing out more delegations.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown ef0f3390eb [PATCH] knfsd: nfsd4: limit number of delegations handed out.
It's very easy for the server to DOS itself by just giving out too many
delegations.

For now we just solve the problem with a dumb hard limit.  Eventually we'll
want a smarter policy.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown 4e2fd495b5 [PATCH] knfsd: nfsd4: add missing rpciod_down()
We should be shutting down rpciod for the callback channel when we shut down
the server.

Also note that we do rpciod_up() and create the callback client *before*
setting cb_set--the cb_set only determines whether the initial null was
succesful.  So cb_set is not a reliable determiner of whether we need to clean
up, only cb_client is.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown 541e0e0981 [PATCH] knfsd: nfsd4: nfsd4_probe_callback cleanup
Some obvious cleanup.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown 5e8d5c2948 [PATCH] knfsd: nfsd4: fix laundromat shutdown race
We need to make sure the laundromat work doesn't reschedule itself just when
we try to cancel it.  Also, we shouldn't be waiting for it to finish running
while holding the state lock, as that's a potential deadlock.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown bb6e8a9f40 [PATCH] knfsd: nfsd4: fix corruption on readdir encoding with 64k pages
Fix corruption on readdir encoding with 64k pages.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown 6ed6decccf [PATCH] knfsd: nfsd4: fix corruption of returned data when using 64k pages
In v4 we grab an extra page just for the padding of returned data.  The
formula that the rpc server uses to allocate pages for the response doesn't
take into account this extra page.

Instead of adjusting those formulae, we adopt the same solution as v2 and v3,
and put the "tail" data in the same page as the "head" data.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown f0e2993e9e [PATCH] knfsd: nfsd4: remove nfsd_setuser from putrootfh
Since nfsd_setuser() is already called from any operation that uses the
current filehandle (because it's called from fh_verify), there's no reason to
call it from putrootfh.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown 54cceebb67 [PATCH] knfsd: nfsd: nfsd_setuser doesn't really need to modify rqstp->rq_cred.
In addition to setting the processes filesystem id's, nfsd_setuser also
modifies the value of the rq_cred which stores the id's that originally came
from the rpc call, for example to reflect root squashing.

There's no real reason to do that--the only case where rqstp->rq_cred is
actually used later on is in the NFSv4 SETCLIENTID/SETCLIENTID_CONFIRM
operations, and there the results are the opposite of what we want--those two
operations don't deal with the filesystem at all, they only record the
credentials used with the rpc call for later reference (so that we may require
the same credentials be used on later operations), and the credentials
shouldn't vary just because there was or wasn't a previous operation in the
compound that referred to some export

This fixes a bug which caused mounts from Solaris clients to fail.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown cd15654963 [PATCH] knfsd: nfsd: oops exporting nonexistent directory
Export a directory that does not exist:
	exportfs -orw,fsid=0,insecure,no_subtree_check client:/home/NFS4

Try to mount from client with nfs4. Mount hangs (I'm not sure why -
that's another issue).

While client is hung, back on server

	mkdir /home/NFS4

The server panics in dput.  I traced the problem back to svc_export_parse()
calling path_release() even though path_lookup() failed (it happens to fill in
the nameidata structure with a negative dentry - so the test after out:
succeeds).

After patching, an recreating the problem, the client mount still takes some
time before finally exiting with a message "couldn't read superblock".

Here is a simple patch to resolve this issue:

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown b5872b0dcc [PATCH] knfsd: nfsd4: fix acl xattr length return
We should be using the length from the second vfs_getxattr, in case it
changed.  (Note: there's still a small race here; we could end up returning
-ENOMEM if the length increased between the first and second call.  I don't
know whether it's worth spending a lot of effort to fix that.)

This makes XFS ACLs usable on NFS exports, which they currently aren't, since
XFS appears to be returning a too-large value for vfs_getxattr() when it's
passed a NULL buffer.  So there's probably an XFS bug here too, though since
getxattr with a NULL buffer is usually used to decide how much memory to
allocate, it may be a fairly harmless bug in most cases.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown b905b7b0a0 [PATCH] knfsd: nfsd4: better nfs4acl errors
We're returning -1 in a few places in the NFSv4<->POSIX acl translation code
where we could return a reasonable error.

Also allows some minor simplification elsewhere.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown 249920527f [PATCH] knfsd: nfsd4: Wrong error handling in nfs4acl
this fixes coverity id #3.  Coverity detected dead code, since the == -1
comparison only returns 0 or 1 to error.  Therefore the if ( error < 0 )
statement was always false.  Seems that this was an if( error = nfs4...  )
statement some time ago, which got broken during cleanup.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
Adrian Bunk e465a77f94 [PATCH] fs/nfsd/nfs4state.c: make a struct static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown d5b9026a67 [PATCH] knfsd: locks: flag NFSv4-owned locks
Use the fl_lmops field to identify which locks are ours, instead of trying to
look them up in our private hash.  This is safer and more efficient.

Earlier versions of this patch used a lock flag instead, but Trond pointed out
that adding a new flag for each lock manager wasn't going to scale well, and
suggested this approach instead; a separate patch converts lockd to using
fl_lmops in the same way.

In the NFSv4 case this looks like a bit of a hack, since the NFSv4 server
isn't currently actually defining a lock_manager_operations struct, so we end
up defining one *just* to serve as a cookie to identify our locks.

But it works, and we actually do expect to start using the
lock_manager_operations at some point anyway.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown 7775f4c85d [PATCH] knfsd: Correct reserved reply space for read requests.
NFSd makes sure there is enough space to hold the maximum possible reply
before accepting a request.  The units for this maximum is (4byte) words.
However in three places, particularly for read request, the number given is
a number of bytes.

This means too much space is reserved which is slightly wasteful.

This is the sort of patch that could uncover a deeper bug, and it is not
critical, so it would be best for it to spend a while in -mm before going
in to mainline.

(akpm: target 2.6.17-rc2, 2.6.16.3 (approx))

Discovered-by: "Eivind  Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
Miklos Szeredi 08a53cdce6 [PATCH] fuse: account background requests
The previous patch removed limiting the number of outstanding requests.  This
patch adds a much simpler limiting, that is also compatible with file locking
operations.

A task may have at most one synchronous request allocated.  So these requests
need not be otherwise limited.

However the number of background requests (release, forget, asynchronous
reads, interrupted requests) can grow indefinitely.  This can be used by a
malicous user to cause FUSE to allocate arbitrary amounts of unswappable
kernel memory, denying service.

For this reason add a limit for the number of background requests, and block
allocations of new requests until the number goes bellow the limit.

Also use this mechanism to block all requests until the INIT reply is
received.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:49 -07:00
Miklos Szeredi ce1d5a491f [PATCH] fuse: clean up request accounting
FUSE allocated most requests from a fixed size pool filled at mount time.
However in some cases (release/forget) non-pool requests were used.  File
locking operations aren't well served by the request pool, since they may
block indefinetly thus exhausting the pool.

This patch removes the request pool and always allocates requests on demand.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:49 -07:00
Miklos Szeredi a87046d822 [PATCH] fuse: consolidate device errors
Return consistent error values for the case when the opened device file has no
mount associated yet.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Miklos Szeredi d713311464 [PATCH] fuse: use a per-mount spinlock
Remove the global spinlock in favor of a per-mount one.

This patch is basically find & replace.  The difficult part has already been
done by the previous patch.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Miklos Szeredi 0720b31597 [PATCH] fuse: simplify locking
This is in preparation for removing the global spinlock in favor of a
per-mount one.

The only critical part is the interaction between fuse_dev_release() and
fuse_fill_super(): fuse_dev_release() must see the assignment to
file->private_data, otherwise it will leak the reference to fuse_conn.

This is ensured by the fput() operation, which will synchronize the assignment
with other CPU's that may do a final fput() soon after this.

Also redundant locking is removed from fuse_fill_super(), where exclusion is
already ensured by the BKL held for this function by the VFS.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Jeff Dike e5ac1d1e70 [PATCH] fuse: add O_NONBLOCK support to FUSE device
I don't like duplicating the connected and list_empty tests in fuse_dev_readv,
but this seemed cleaner than adding the f_flags test to request_wait.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Jeff Dike 385a17bfc3 [PATCH] fuse: add O_ASYNC support to FUSE device
This adds asynchronous notification to FUSE - a FUSE server can request
O_ASYNC on a /dev/fuse file descriptor and receive SIGIO when there is input
available.

One subtlety - fuse_dev_fasync, which is called when O_ASYNC is requested,
does no locking, unlink the other methods.  I think it's unnecessary, as the
fuse_conn.fasync list is manipulated only by fasync_helper and kill_fasync,
which provide their own locking.  It would also be wrong to use the fuse_lock,
as it's a spin lock and fasync_helper can sleep.  My one concern with this is
the fuse_conn going away underneath fuse_dev_fasync - sys_fcntl takes a
reference on the file struct, so this seems not to be a problem.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Miklos Szeredi 7025d9ad10 [PATCH] fuse: fix fuse_dev_poll() return value
fuse_dev_poll() returned an error value instead of a poll mask.  Luckily (or
unluckily) -ENODEV does contain the POLLERR bit.

There's also a race if filesystem is unmounted between fuse_get_conn() and
spin_lock(), in which case this event will be missed by poll().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:47 -07:00
Miklos Szeredi d3406ffa4a [PATCH] fuse: fix oops in fuse_send_readpages()
During heavy parallel filesystem activity it was possible to Oops the kernel.
The reason is that read_cache_pages() could skip pages which have already been
inserted into the cache by another task.  Occasionally this may result in zero
pages actually being sent, while fuse_send_readpages() relies on at least one
page being in the request.

So check this corner case and just free the request instead of trying to send
it.

Reported and tested by Konstantin Isakov.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:47 -07:00
Ananiev, Leonid I 389ed39b97 [PATCH] ext3: Fix missed mutex unlock
Missed unlock_super()call is added in error condition code path.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:46 -07:00
Arnd Bergmann 091e881d0e [PATCH] inotify: check for NULL inode in inotify_d_instantiate
The spufs file system creates files in a directory before instantiating the
directory itself, which causes a NULL pointer access in
inotify_d_instantiate since c32ccd87bf.

I'd like to keep this behavior since it means that the user will not have
access to files in the directory before I know that I succeed in creating
everything in it.  This patch adds a simple check for the inode to keep
that working.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:45 -07:00
Vivek Goyal 68250ba5df [PATCH] kdump: enable CONFIG_PROC_VMCORE by default
Everybody seems to be using /proc/vmcore as a method to access the kernel
crash dump.  Hence probably it makes sense to enable CONFIG_PROC_VMCORE by
default if CONFIG_CRASH_DUMP is selected.  This makes kdump configuration
further easier for a user.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:45 -07:00
Roland McGrath f5e902817f [PATCH] process accounting: take original leader's start_time in non-leader exec
The only record we have of the real-time age of a process, regardless of
execs it's done, is start_time.  When a non-leader thread exec, the
original start_time of the process is lost.  Things looking at the
real-time age of the process are fooled, for example the process accounting
record when the process finally dies.  This change makes the oldest
start_time stick around with the process after a non-leader exec.  This way
the association between PID and start_time is kept constant, which seems
correct to me.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:42 -07:00
Davide Libenzi 2395140ee2 [PATCH] uniform POLLRDHUP handling between epoll and poll/select
As reported by Michael Kerrisk, POLLRDHUP handling was not consistent
between epoll and poll/select, since in epoll it was unmaskeable.  This
patch brings uniformity in POLLRDHUP handling.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:42 -07:00
Vivek Goyal 80e8ff6341 [PATCH] kdump proc vmcore size oveflow fix
A couple of /proc/vmcore data structures overflow with 32bit systems having
memory more than 4G.  This patch fixes those.

Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:42 -07:00
Mitchell Blank Jr b04eb6aa08 [PATCH] select: don't overflow if (SELECT_STACK_ALLOC % sizeof(long) != 0)
If SELECT_STACK_ALLOC is not a multiple of sizeof(long) then stack_fds[]
would be shorter than SELECT_STACK_ALLOC bytes and could overflow later in
the function.  Fixed by simply rearranging the test later to work on
sizeof(stack_fds) Currently SELECT_STACK_ALLOC is 256 so this doesn't
happen, but it's nasty to have things like this hidden in the code.  What
if later someone decides to change SELECT_STACK_ALLOC to 300?

Signed-off-by: Mitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:41 -07:00
Eric Van Hensbergen 00fbc6dfe7 [PATCH] 9p: handle sget() failure
Handle a failing sget() in v9fs_get_sb().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:41 -07:00
Herbert Poetzl f6422f17d3 [PATCH] vfs: propagate mnt_flags into do_loopback/vfsmount
The mnt_flags are propagated into do_loopback(), so that they can be stored
with the vfsmount

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:41 -07:00
Andrew Morton 5246d05031 [PATCH] sync_file_range(): use unsigned for flags
Ulrich suggested that the `flags' arg to sync_file_range() become unsigned.

Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:40 -07:00
Jeff Dike 7b04d7170e [PATCH] Add GFP_NOWAIT
Introduce GFP_NOWAIT, as an alias for GFP_ATOMIC & ~__GFP_HIGH.

This also changes XFS, which is the only in-tree user of this idiom that I
could find.  The XFS piece is compile-tested only.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:35 -07:00
Andrew Morton 29ff2db551 [PATCH] select() warning fixes
fs/select.c: In function `core_sys_select':
fs/select.c:339: warning: assignment from incompatible pointer type
fs/select.c:376: warning: comparison of distinct pointer types lacks a cast

By using a void* we can remove lots of casts rather than adding more.

Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:30 -07:00
Ingo Molnar 341b446bc5 [PATCH] another round of fs/pipe.c cleanups
make pipe.c a bit more readable and hackable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:57:45 +02:00
Ingo Molnar 73d62d83ec [PATCH] splice: comment styles
- capitalize consistently
 - end sentences in one way or another
 - update comment text to match the implementation

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:57:21 +02:00
Jens Axboe c2058e0611 [PATCH] splice: add Ingo as addition copyright holder
The comment is also somewhat out of date, correct that as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:56:34 +02:00
Jens Axboe 49570e9b29 [PATCH] splice: unlikely() optimizations
Also corrects a few comments. Patch mainly from Ingo, changes by me.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:56:09 +02:00
Jens Axboe 6f767b0425 [PATCH] splice: speedups and optimizations
- Kill the local variables that cache ->nrbufs, they just take up space.

- Only set do_wakeup for a real pipe. This is a big win for direct splicing.

- Kill i_mutex lock around ->f_pos update, regular io paths don't do this
  either.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:53:56 +02:00
Ingo Molnar 923f4f2394 [PATCH] pipe.c/fifo.c code cleanups
more code cleanups after the macro conversion:

 - standardize on 'struct pipe_inode_info *pipe' variable names
 - introduce 'pipe' temporaries to reduce mass inode->i_pipe dereferencing

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:53:33 +02:00
Ingo Molnar 9aeedfc471 [PATCH] get rid of the PIPE_*() macros
get rid of the PIPE_*() macros. Scripted transformation.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:53:10 +02:00
Jens Axboe 7480a90435 [PATCH] splice: speedup __generic_file_splice_read
Using find_get_page() is a lot faster than find_or_create_page(). This
gets splice a lot closer to sendfile() for fd -> socket transfers.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:52:47 +02:00
Jens Axboe b92ce55893 [PATCH] splice: add direct fd <-> fd splicing support
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.

Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:52:07 +02:00
Nathan Scott 019ff2d57b [XFS] Fix a problem in aligning inode allocations to stripe unit
boundaries.

SGI-PV: 951862
SGI-Modid: xfs-linux-melb:xfs-kern:25726a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:45:05 +10:00
Nathan Scott 8c0b5113a5 [XFS] Fix utime(2) in the case that no times parameter was passed in.
SGI-PV: 949858
SGI-Modid: xfs-linux-melb:xfs-kern:25717a

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:12:45 +10:00
David Chinner 58829e490e [XFS] Fix an inode use-after-free durin an unpin. When reclaiming inodes
that have been unlinked, we may need to execute transactions during
reclaim. By the time the transaction has hit the disk, the linux inode and
xfs vnode may already have been freed so we can't reference them safely.
Use the known xfs inode state to determine if it is safe to reference the
vnode and linux inode during the unpin operation.

SGI-PV: 946321
SGI-Modid: xfs-linux-melb:xfs-kern:25687a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:11:20 +10:00
David Chinner 1fc5d959d8 [XFS] Fix inode reclaim scalability regression. When a filesystem has
millions of inodes cached and has sparse cluster population, removing
inodes from the cluster hash consumes excessive amounts of CPU time.
Reduce the CPU cost by making removal O(1) via use of a double linked list
for the hash chains.

SGI-PV: 951551
SGI-Modid: xfs-linux-melb:xfs-kern:25683a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:11:12 +10:00
Nathan Scott 8272145c05 [XFS] Fix a writepage regression where we accidentally stopped honouring
nonblock mode with the new IO path code (since 2.6.16).

SGI-PV: 951662
SGI-Modid: xfs-linux-melb:xfs-kern:25676a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:10:55 +10:00
Nathan Scott e50bd16fe4 [XFS] Fix superblock validation regression for the zero imaxpct case.
Thanks to kjamieson for noticing.

SGI-PV: 951661
SGI-Modid: xfs-linux-melb:xfs-kern:25675a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:10:45 +10:00
Linus Torvalds e38d557896 Merge branch 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2
* 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2:
  [PATCH] CONFIGFS_FS must depend on SYSFS
  [PATCH] Bogus NULL pointer check in fs/configfs/dir.c
  ocfs2: Better I/O error handling in heartbeat
  ocfs2: test and set teardown flag early in user_dlm_destroy_lock()
  ocfs2: Handle the DLM_CANCELGRANT case in user_unlock_ast()
  ocfs2: catch an invalid ast case in dlmfs
  ocfs2: remove an overly aggressive BUG() in dlmfs
  ocfs2: multi node truncate fix
2006-04-10 16:44:09 -07:00
Eric W. Biederman de12a7878c [PATCH] de_thread: Don't confuse users do_each_thread.
Oleg Nesterov spotted two interesting bugs with the current de_thread
code.  The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process.  Caused by
two processes exiting when only one was created.

The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice.  Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.

The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.

To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.

Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process.  I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids.  This should also be
slightly cheaper then the existing thread_group_leader macro.

I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-10 16:36:50 -07:00
Adrian Bunk 65714b9184 [PATCH] CONFIGFS_FS must depend on SYSFS
This patch fixes the a compile error with CONFIG_SYSFS=n

Configfs is creating, as a matter of policy, the /sys/kernel/config
mountpoint.  This means it requires CONFIG_SYSFS.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-10 11:17:21 -07:00
Eric Sesterhenn cbca692c24 [PATCH] Bogus NULL pointer check in fs/configfs/dir.c
We check the "group" pointer after we dereference it.  This check is
bogus, as it cannot be NULL coming in.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-10 11:16:17 -07:00
Ingo Molnar 529565dcb1 [PATCH] splice: add optional input and output offsets
add optional input and output offsets to sys_splice(), for seekable file
descriptors:

 asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
                            int fd_out, loff_t __user *off_out,
                            size_t len, unsigned int flags);

semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ).  Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 15:18:58 +02:00
Ingo Molnar 3a326a2ce8 [PATCH] introduce a "kernel-internal pipe object" abstraction
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:

 - pipes: the allocation and freeing of pipe_inode_info is now more symmetric
   and more streamlined with existing kernel practices.

 - splice: small micro-optimization: less pointer dereferencing in splice
   methods

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Update XFS for the ->splice_read/->splice_write changes.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 15:18:35 +02:00
Jens Axboe 0b749ce380 [PATCH] splice: be smarter about calling do_page_cache_readahead()
We don't want to call into the read-ahead logic unless we are at the
start of a page, _or_ we have multiple pages to read.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:05:04 +02:00
Jens Axboe 49d0b21be2 [PATCH] splice: optimize the splice buffer mapping
We don't really need to lock down the pages, just make sure they
are uptodate.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:04:41 +02:00
Jens Axboe 16c523ddab [PATCH] splice: cleanup __generic_file_splice_read()
The whole shadow/pages logic got overly complex, and this simpler
approach is actually faster in testing.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:03:58 +02:00
Jens Axboe c0bd1f650b [PATCH] splice: only call wake_up_interruptible() when we really have to
__wake_up_common() is pretty heavy in the kernel profiles, this brings
it down to a more acceptable level.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:03:32 +02:00
Dave Jones 9aefe431f5 [PATCH] splice: potential !page dereference
We can get to out: with a NULL page, which we probably
don't want to be calling page_cache_release() on.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:02:40 +02:00
Jens Axboe c7f21e4f5a [PATCH] splice: mark the io page as accessed
We should do that, since we do the LRU manipulation ourselves now. Suggested
by Nick Piggin.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:01:01 +02:00
Mark Fasheh a9e2ae3917 ocfs2: Better I/O error handling in heartbeat
Propagate errors received in o2hb_bio_end_io() back to the heartbeat thread
so it can skip re-arming the timer.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 18:03:09 -07:00
Mark Fasheh 2cd9888590 ocfs2: test and set teardown flag early in user_dlm_destroy_lock()
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 17:39:43 -07:00