Commit Graph

8398 Commits

Author SHA1 Message Date
Dave Young 08ca0db8aa zisofs: fix readpage() outside i_size
A read request outside i_size will be handled in do_generic_file_read().  So
we just return 0 to avoid getting -EIO as normal reading, let
do_generic_file_read do the rest.

At the same time we need unlock the page to avoid system stuck.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Report-by: Christian Perle <chris@linuxinfotag.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Randy Dunlap a6b91919e0 fs: fix kernel-doc notation warnings
Fix kernel-doc notation warnings in fs/.

Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
 *	mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
 *	lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
 *	lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
 * lookup_one_len:  filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
 * bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
 * bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
 * writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
 * writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
 * writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
 * void journal_invalidatepage()

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Michael Halcrow 5366dc9fd1 eCryptfs: Swap dput() and mntput()
ecryptfs_d_release() is doing a mntput before doing the dput.  This patch
moves the dput before the mntput.

Thanks to Rajouri Jammu for reporting this.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Rajouri Jammu <rajouri.jammu@gmail.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Duane Griffin d00256766a jbd2: correctly unescape journal data blocks
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping.  At the moment the wrong buffer head's
data is being unescaped.

To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device.  Without this patch the files will contain zeros
instead of the correct data after recovery.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Duane Griffin 439aeec639 jbd: correctly unescape journal data blocks
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping.  At the moment the wrong buffer head's
data is being unescaped.

To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device.  Without this patch the files will contain zeros
instead of the correct data after recovery.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
David Howells 4ebf89845b ROMFS: Fix up an error in iget removal
Fix up an error in iget removal in which romfs_lookup() making a successful
call to romfs_iget() continues through the negative/error handling (previously
the successful case jumped around the negative/error handling case):

 (1) inode is initialised to NULL at the top of the function, eliminating the
     need for specific negative-inode handling.  This means the positive
     success handling now flows straight through.

 (2) Rename the labels to be clearer about what they mean.

Also make romfs_lookup()'s result variable of type long so as to avoid
32-bit/64-bit conversions with PTR_ERR() and friends.

Based upon a report and patch from Adam Richter.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Adam J. Richter" <adam@yggdrasil.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Josef Bacik c587f0c0a6 ext3: fix wrong gfp type under transaction
There are several places where we make allocations with GFP_KERNEL while under
a transaction, which could lead to an assertion panic or lockup if under
memory pressure.  This patch switches these problem areas to use GFP_NOFS to
keep these problems from happening.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:36 -07:00
Jan Kara 87cb055bc1 quota: add possibly missing iput() when quotaon and quotaoff races
We should always put inode we have reference to, even if quota was reenabled
in the mean time.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:35 -07:00
Randy Dunlap 0cf01f6685 jbd: fix jbd kernel-doc notation
Fix kernel-doc notation in jbd.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:35 -07:00
Quentin Barnes 6cb2a21049 aio: bad AIO race in aio_complete() leads to process hang
My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.

We ran the tests on x86_64 SMP.  The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe").  On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.

My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
	1) add_wait_queue_exclusive() adds thread to ctx->wait
	2) aio_read_evt() to check tail
	3) if aio_read_evt() returned 0, call [io_]schedule() and sleep

In aio_complete() with processor #2:
	A) info->tail = tail;
	B) waitqueue_active(&ctx->wait)
	C) if waitqueue_active() returned non-0, call wake_up()

The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.

The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors.  Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
                so proc 1 sleeps indefinitely

My patch adds a memory barrier between steps A and B.  It ensures that the
update in step 1 gets seen on processor 2 before continuing.  If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).

Before the patch our AIO process would hang virtually 100% of the time.  After
the patch, we have yet to see the process ever hang.

Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
  coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-19 18:53:35 -07:00
Fred Isaman f8512ad0da nfs: don't ignore return value from nfs_pageio_add_request
Ignoring the return value from nfs_pageio_add_request can cause deadlocks.

In read path:
  call nfs_pageio_add_request from readpage_async_filler
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
    request.
  BUT, since return code is ignored, readpage_async_filler assumes it has
    been added, and does nothing further, leaving page locked.
  do_generic_mapping_read will eventually call lock_page, resulting in deadlock

In write path:
  page is marked dirty by generic_perform_write
  nfs_writepages is called
  call nfs_pageio_add_request from nfs_page_async_flush
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
    request, yet marking the request as locked (PG_BUSY) and in writeback,
    clearing dirty marks.
  The next time a write is done to the page, deadlock will result as
    nfs_write_end calls nfs_update_request

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:02 -04:00
Al Viro a02f76c34d [PATCH] get stack footprint of pathname resolution back to relative sanity
Somebody had put struct nameidata in stack frame of link_path_walk().
Unfortunately, there are certain realities to deal with:
	* It's in the middle of recursion.  Depth is equal to the nesting
depth of symlinks, i.e. up to 8.
	* struct namiedata is, even if one discards the intent junk,
at least 12 pointers + 5 ints.
	* moreover, adding a stack frame is not free in that situation.
	* there are fs methods called on top of that, and they also have
stack footprint.
	* kernel stack is not infinite.

The thing is, even if one chooses to deal with -ESTALE that way (and it's
one hell of an overkill), the only thing that needs to be preserved is
vfsmount + dentry, not the entire struct nameidata.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-19 06:55:46 -04:00
Al Viro b4d232e65f [PATCH] double iput() on failure exit in hugetlb
once we'd done d_instantiate(), we should only do dput().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-19 06:55:01 -04:00
Dave Hansen 430e285e08 [PATCH] fix up new filp allocators
Some new uses of get_empty_filp() have crept in; switched
to alloc_file() to make sure that pieces of initialization
won't be missing.

We really need to kill get_empty_filp().

[AV] fixed dentry leak on failure exit in anon_inode_getfd()

Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-19 06:54:05 -04:00
Christoph Hellwig 322ee5b36e [PATCH] check for null vfsmount in dentry_open()
Make sure no-one calls dentry_open with a NULL vfsmount argument and crap
out with a stacktrace otherwise.  A NULL file->f_vfsmnt has always been
problematic, but with the per-mount r/o tracking we can't accept anymore
at all.

[AV] the last place that passed NULL had been eliminated by the previous
patch (reiserfs xattr stuff)

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-19 06:50:44 -04:00
Jeff Mahoney 3227e14c3c [PATCH] reiserfs: eliminate private use of struct file in xattr
After several posts and bug reports regarding interaction with the NULL
nameidata, here's a patch to clean up the mess with struct file in the
reiserfs xattr code.

As observed in several of the posts, there's really no need for struct file
to exist in the xattr code.  It was really only passed around due to the
f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it.

reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the
struct file passed to it, and the xattr code uses a private version of
reiserfs_readdir() to enumerate the xattr directories.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-19 06:49:36 -04:00
Al Viro f382d6e631 [PATCH] sanitize hppfs
* hppfs_iget() and its users are racy; there's no need to pollute icache
  anyway, new_inode() works fine and is safe, unlike the current kludges
  (these relied on overwriting ->i_ino before another iget_locked() gets
  to that one - and did it after unlocking).
* merge hppfs_iget()/init_inode()/hppfs_read_inode(), while we are
  at it.
* to pass proper vfsmount to dentry_open() store the reference
  in hppfs superblock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 --
2008-03-19 06:42:18 -04:00
Dave Hansen 1dd0dd111f hppfs pass vfsmount to dentry_open()
Here's patch for hppfs that uses vfs_kern_mount to make sure it always has a
procfs instance and passed the vfsmount on through the inode private data.
Also fixes a procfs file_system_type leak for every attempted hppfs mount.

[ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-19 06:39:56 -04:00
Linus Torvalds f920bb6f5f Merge branch 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  [PATCH] export sessionid alongside the loginuid in procfs
2008-03-18 08:43:59 -07:00
Eric Paris 1e0bd7550e [PATCH] export sessionid alongside the loginuid in procfs
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-18 10:51:22 -04:00
Linus Torvalds 92f53c6f1e Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Revert "unexport bio_{,un}map_user"
  relay: fix subbuf_splice_actor() adding too many pages
  The ps2esdi driver was marked as BROKEN more than two years ago due to being
2008-03-18 07:43:14 -07:00
David S. Miller 2f633928cb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-17 23:44:31 -07:00
Al Viro e6f1cebf71 [NET] endianness noise: INADDR_ANY
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17 22:44:53 -07:00
Al Viro 8a4e98d9d7 [PATCH] restore export of do_kern_mount()
vfs_kern_mount() requires having a reference to fs type, which
makes it impossible for module to create procfs, etc. private
mount.  Open-coding is not an option, since e.g. put_filesystem()
is _not_ exported, and for a good reason.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-17 22:58:04 -04:00
Jens Axboe 40044ce0bf Revert "unexport bio_{,un}map_user"
Outside users like asmlib uses the mapping functions. API wise, the
export is definitely sane. It's a better idea to keep this export
than to require external users to open-code this piece of code instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-03-17 21:14:40 +01:00
Al Viro 3d10a15d69 hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage
oops and fs corruption; the latter can happen even on valid fs in case of oom.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-17 09:46:55 -07:00
J. Bruce Fields b663c6fd98 nfsd: fix oops on access from high-numbered ports
This bug was always here, but before my commit 6fa02839bf
("recheck for secure ports in fh_verify"), it could only be triggered by
failure of a kmalloc().  After that commit it could be triggered by a
client making a request from a non-reserved port for access to an export
marked "secure".  (Exports are "secure" by default.)

The result is a struct svc_export with a reference count one too low,
resulting in likely oopses next time the export is accessed.

The reference counting here is not straightforward; a later patch will
clean up fh_verify().

Thanks to Lukas Hejtmanek for the bug report and followup.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-14 16:49:15 -07:00
Steve French 8b1327f6ed [CIFS] file create with acl support enabled is slow
Shirish Pargaonkar noted:
With cifsacl mount option, when a file is created on the Windows server,
exclusive oplock is broken right away because the get cifs acl code
again opens the file to obtain security descriptor.
The client does not have the newly created file handle or inode in any
of its lists yet so it does not respond to oplock break and server waits for
its duration and then responds to the second open. This slows down file
creation signficantly.  The fix is to pass the file descriptor to the get
cifsacl code wherever available so that get cifs acl code does not send
second open (NT Create ANDX) and oplock is not broken.

CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-03-14 22:37:16 +00:00
Steve French ebe8912be2 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-14 19:29:18 +00:00
Steve French 50531444fa [CIFS] Fix mtime on cp -p when file data cached but written out too late
Kukks noticed that cp -p can write out file data too late, after the timestamp
is already set.  This was introduced as an unintentional sideeffect of the change
in an earlier patch (see below) which fixed some delayed return code propagation.

cea218054a
Author: Jeff Layton <jlayton@redhat.com>
Date:   Tue Nov 20 23:19:03 2007 +0000

Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-03-14 19:21:31 +00:00
Marcelo Tosatti fb39380b8d pagemap: proper read error handling
Fix pagemap_read() error handling by releasing acquired resources and checking
for get_user_pages() partial failure.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-13 13:11:43 -07:00
Linus Torvalds 609eb39c8d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  [SCTP]: Fix local_addr deletions during list traversals.
  net: fix build with CONFIG_NET=n
  [TCP]: Prevent sending past receiver window with TSO (at last skb)
  rt2x00: Add new D-Link USB ID
  rt2x00: never disable multicast because it disables broadcast too
  libertas: fix the 'compare command with itself' properly
  drivers/net/Kconfig: fix whitespace for GELIC_WIRELESS entry
  [NETFILTER]: nf_queue: don't return error when unregistering a non-existant handler
  [NETFILTER]: nfnetlink_queue: fix EPERM when binding/unbinding and instance 0 exists
  [NETFILTER]: nfnetlink_log: fix EPERM when binding/unbinding and instance 0 exists
  [NETFILTER]: nf_conntrack: replace horrible hack with ksize()
  [NETFILTER]: nf_conntrack: add \n to "expectation table full" message
  [NETFILTER]: xt_time: fix failure to match on Sundays
  [NETFILTER]: nfnetlink_log: fix computation of netlink skb size
  [NETFILTER]: nfnetlink_queue: fix computation of allocated size for netlink skb.
  [NETFILTER]: nfnetlink: fix ifdef in nfnetlink_compat.h
  [NET]: include <linux/types.h> into linux/ethtool.h for __u* typedef
  [NET]: Make /proc/net a symlink on /proc/self/net (v3)
  RxRPC: fix rxrpc_recvmsg()'s returning of msg_name
  net/enc28j60: oops fix
  ...
2008-03-12 13:08:09 -07:00
Andrew Morton b2211a361a net: fix build with CONFIG_NET=n
fs/built-in.o:(.rodata+0x1134): undefined reference to `proc_net_inode_operations'
fs/built-in.o:(.rodata+0x1138): undefined reference to `proc_net_operations'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-11 18:03:35 -07:00
Steve French bc5b6e24a1 [CIFS] Fix build problem
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-03-11 21:07:48 +00:00
Steve French 5b4d4771e2 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-11 19:15:41 +00:00
Tao Ma cdef59a94c ocfs2: Fix NULL pointer dereferences in o2net
In some situations, ocfs2_set_nn_state might get called with sc = NULL and
valid = 0. If sc = NULL, we can't dereference it to get the o2nm_node
member. Instead, do what o2net_initialize_handshake does and use NULL when
calling o2net_reconnect_delay and o2net_idle_timeout.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:19 -07:00
Sunil Mushran c824c3c723 ocfs2/dlm: dlm_thread should not sleep while holding the dlm_spinlock
This patch addresses the bug in which the dlm_thread could go to sleep
while holding the dlm_spinlock.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:17 -07:00
Sunil Mushran 535f7026fd ocfs2/dlm: Print message showing the recovery master
Knowing the dlm recovery master helps in debugging recovery
issues. This patch prints a message on the recovery master node.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:14 -07:00
Sunil Mushran b31cfc0237 ocfs2/dlm: Add missing dlm_lockres_put()s
dlm_master_request_handler() forgot to put a lockres when
dlm_assert_master_worker() failed or was skipped.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:12 -07:00
Sunil Mushran 52987e2ab4 ocfs2/dlm: Add missing dlm_lockres_put()s in migration path
During migration, the recovery master node may be asked to master a lockres
it may not know about. In that case, it would not only have to create a
lockres and add it to the hash, but also remember to to do the _put_
corresponding to the kref_init in dlm_init_lockres(), as soon as the migration
is completed. Yes, we don't wait for the dlm_purge_lockres() to do that
matching put. Note the ref added for it being in the hash protects the lockres
from being freed prematurely.

This patch adds that missing put, as described above, to plug a memleak.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:11 -07:00
Sunil Mushran 2c5c54aca9 ocfs2/dlm: Add missing dlm_lock_put()s
Normally locks for remote nodes are freed when that node sends an UNLOCK
message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action
to do an extra put on the lock at the end.

However, there are times when the master node has to free the locks for the
remote nodes forcibly.

Two cases when this happens are:
1. When the master has migrated the lockres plus all locks to another node.
2. When the master is clearing all the locks of a dead node.

It was in the above two conditions that the dlm was missing the extra put.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:09 -07:00
Tao Ma 4338ab6a75 ocfs2: Fix an endian bug in online resize.
In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we
were putting cpu byteorder values into it. Swap things to the right endian
before storing.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:07 -07:00
Jan Engelhardt 90d99779a4 [PATCH] [OCFS2]: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:57 -07:00
Joel Becker 0f71b7b40f ocfs2: Fix endian bug in o2dlm protocol negotiation.
struct dlm_query_join_packet is made up of four one-byte fields.  They
are effectively in big-endian order already.  However, little-endian
machines swap them before putting the packet on the wire (because
query_join's response is a status, and that status is treated as a u32
on the wire).  Thus, a big-endian and little-endian machines will
treat this structure differently.

The solution is to have little-endian machines swap the structure when
converting from the structure to the u32 representation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:54 -07:00
Tao Ma 2af37ce82d ocfs2: Use dlm_print_one_lock_resource for lock resource print
__dlm_print_one_lock_resource must be called with spin_lock
the res->spinlock. While in some cases, we use it without this
precondition and lead to the failure of assert_spin_locked.
So call dlm_print_one_lock_resource instead.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:50 -07:00
Andrew Morton 3a4780a85d [PATCH] fs/ocfs2/dlm/dlmdomain.c: fix printk warning
fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels':
fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:39 -07:00
Harvey Harrison 55f78e1771 [CIFS] cifs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-03-10 17:14:34 +00:00
Igor Mammedov 7962670e64 [CIFS] DFS patch that connects inode with dfs handling ops
if DFS junction point

Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-03-09 03:44:18 +00:00
Linus Torvalds 4c1aa6f8b9 Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
  NFS: Fix the fsid revalidation in nfs_update_inode()
  SUNRPC: Fix a nfs4 over rdma transport oops
  NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
2008-03-07 12:08:07 -08:00
Trond Myklebust 4e99a1ff34 NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:35:41 -05:00
Trond Myklebust c37dcd334c NFS: Fix the fsid revalidation in nfs_update_inode()
When we detect that we've crossed a mountpoint on the remote server, we
must take care not to use that inode to revalidate the fsid on our
current superblock. To do so, we label the inode as a remote mountpoint,
and check for that in nfs_update_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:35:37 -05:00
Trond Myklebust af1b8c2ff7 NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
O_SYNC is stored in filp->f_flags.
Thanks to Al Viro for pointing out the bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:33:40 -05:00
Pavel Emelyanov e9720acd72 [NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.

The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.

The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.

# ls -l /proc/net
lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net

In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.

Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
  screwup pointed out by Stephen.

  To get the correct nlink count the ->getattr callback for /proc/net
  is overridden to read one from the net->proc_net entry.

  To make selinux still work the net->proc_net entry is initialized
  properly, i.e. with the "net" name and the proc_net parent.

Selinux fixes are
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>

Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
Linus Torvalds 910da1a48e Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] fix inode leak in xfs_iget_core()
  [XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many
2008-03-06 08:14:00 -08:00
David Chinner 72772a3b5b [XFS] fix inode leak in xfs_iget_core()
If the radix_tree_preload() fails, we need to destroy the inode we just
read in before trying again. This could leak xfs_vnode structures when
there is memory pressure. Noticed by Christoph Hellwig.

SGI-PV: 977823
SGI-Modid: xfs-linux-melb:xfs-kern:30606a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-03-06 16:38:50 +11:00
David Chinner 92d9cd1059 [XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many
wakeups

Idle state is not being detected properly by the xfsaild push code. The
current idle state is detected by an empty list which may never happen
with mostly idle filesystem or one using lazy superblock counters. A
single dirty item in the list that exists beyond the push target can
result repeated looping attempting to push up to the target because it
fails to check if the push target has been acheived or not.

Fix by considering a dirty list with everything past the target as an idle
state and set the timeout appropriately.

SGI-PV: 977545
SGI-Modid: xfs-linux-melb:xfs-kern:30532a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-03-06 16:38:17 +11:00
Eric Paris f9c3a38021 NFS: use new LSM interfaces to explicitly set mount options
NFS and SELinux worked together previously because SELinux had NFS
specific knowledge built in.  This design was approved by both groups
back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and
the usage of nfs_clone_mount_data showed this to be a poor fragile
solution.  This patch fixes the NFS functionality regression by making
use of the new LSM interfaces to allow an FS to explicitly set its own
mount options.

The explicit setting of mount options is done in the nfs get_sb
functions which are called before the generic vfs hooks try to set mount
options for filesystems which use text mount data.

This does not currently support NFSv4 as that functionality did not
exist in previous kernels and thus there is no regression.  I will be
adding the needed code, which I believe to be the exact same as the v3
code, in nfs4_get_sb for 2.6.26.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-03-06 08:40:59 +11:00
Eric Paris e000752989 LSM/SELinux: Interfaces to allow FS to control mount options
Introduce new LSM interfaces to allow an FS to deal with their own mount
options.  This includes a new string parsing function exported from the
LSM that an FS can use to get a security data blob and a new security
data blob.  This is particularly useful for an FS which uses binary
mount data, like NFS, which does not pass strings into the vfs to be
handled by the loaded LSM.  Also fix a BUG() in both SELinux and SMACK
when dealing with binary mount data.  If the binary mount data is less
than one page the copy_page() in security_sb_copy_data() can cause an
illegal page fault and boom.  Remove all NFSisms from the SELinux code
since they were broken by past NFS changes.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-03-06 08:40:53 +11:00
Harvey Harrison 15732a1cb5 jfs: replace __inline with inline
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-03-05 14:38:22 -06:00
Linus Torvalds 2c6f2db13a Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  debugfs: fix sparse warnings
  Driver core: Fix cleanup when failing device_add().
  driver core: Remove dpm_sysfs_remove() from error path of device_add()
  PM: fix new mutex-locking bug in the PM core
  PM: Do not acquire device semaphores upfront during suspend
  kobject: properly initialize ksets
  sysfs: CONFIG_SYSFS_DEPRECATED fix
  driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
2008-03-04 16:37:35 -08:00
Josef Bacik 92587216f8 ext3: fix mount option parsing
The "resize" option won't be noticed as it comes after the NULL option, so if
you try to mount (or in this case remount) with that option it won't be
recognized.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:18 -08:00
Michael Halcrow e4465fdaeb eCryptfs: make ecryptfs_prepare_write decrypt the page
When the page is not up to date, ecryptfs_prepare_write() should be
acting much like ecryptfs_readpage(). This includes the painfully
obvious step of actually decrypting the page contents read from the
lower encrypted file.

Note that this patch resolves a bug in eCryptfs in 2.6.24 that one can
produce with these steps:

# mount -t ecryptfs /secret /secret
# echo "abc" > /secret/file.txt
# umount /secret
# mount -t ecryptfs /secret /secret
# echo "def" >> /secret/file.txt
# cat /secret/file.txt

Without this patch, the resulting data returned from cat is likely to
be something other than "abc\ndef\n".

(Thanks to Benedikt Driessen for reporting this.)

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Benedikt Driessen <bdriessen@escrypt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:16 -08:00
Julia Lawall acc1f3ede9 fs/reiserfs/super.c: correct use of ! and &
In commit e6bafba5b4 ("wmi: (!x & y)
strikes again"), a bug was fixed that involved converting !x & y to !(x
& y).  The code below shows the same pattern, and thus should perhaps be
fixed in the same way.

This is not tested and clearly changes the semantics, so it is only
something to consider.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@ expression E1,E2; @@
(
  !E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:16 -08:00
Jan Kara e3892296de vfs: fix NULL pointer dereference in fsync_buffers_list()
Fix NULL pointer dereference in fsync_buffers_list() introduced by recent fix
of races in private_list handling.  Since bh->b_assoc_map has been cleared in
__remove_assoc_queue() we should really use original value stored in the
'mapping' variable.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:10 -08:00
Roland McGrath d31472b6d4 core dump: user_regset writeback
This makes the user_regset-based core dump code call user_regset writeback
hooks when available.  This is necessary groundwork to allow IA64 to set
CORE_DUMP_USE_REGSET.

Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:10 -08:00
Harvey Harrison 3634634edd debugfs: fix sparse warnings
extern does not belong in C files, move declaration to linux/debugfs.h
fs/debugfs/file.c:42:30: warning: symbol 'debugfs_file_operations' was not declared. Should it be static?
fs/debugfs/file.c:54:31: warning: symbol 'debugfs_link_operations' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-04 14:47:06 -08:00
Linus Torvalds c5750ee69f Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [PATCH] fs/ocfs2/aops.c: Correct use of ! and &
  [2.6 patch] ocfs2: make dlm_do_assert_master() static
  [2.6 patch] make ocfs2_downconvert_thread() static
  [2.6 patch] fs/ocfs2/: possible cleanups
  [PATCH] ocfs2: le*_add_cpu conversion
  ocfs2: Fix writeout in ocfs2_data_convert_worker()
  ocfs2: Enable localalloc for local mounts
2008-03-04 11:27:32 -08:00
Linus Torvalds ce932967b9 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix blkdev_issue_flush() not detecting and passing EOPNOTSUPP back
  block: fix shadowed variable warning in blk-map.c
  block: remove extern on function definition
  cciss: remove READ_AHEAD define and use block layer defaults
  make cdrom.c:check_for_audio_disc() static
  block/genhd.c: proper externs
  unexport blk_rq_map_user_iov
  unexport blk_{get,put}_queue
  block/genhd.c: cleanups
  proper prototype for blk_dev_init()
  block/blk-tag.c should #include "blk.h"
  Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory
  block: separate out padding from alignment
  block: restore the meaning of rq->data_len to the true data length
  resubmit: cciss: procfs updates to display info about many
  splice: only return -EAGAIN if there's hope of more data
  block: fix kernel-docbook parameters and files
2008-03-04 08:08:05 -08:00
Adrian Bunk a0db701a6b block/genhd.c: proper externs
This patch adds proper externs for two structs in include/linux/genhd.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-03-04 11:28:36 +01:00
Jens Axboe 02cf01aea5 splice: only return -EAGAIN if there's hope of more data
sys_tee() currently is a bit eager in returning -EAGAIN, it may do so
even if we don't have a chance of anymore data becoming available. So
improve the logic and only return -EAGAIN if we have an attached writer
to the input pipe.

Reported by Johann Felix Soden <johfel@gmx.de> and
Patrick McManus <mcmanus@ducksong.com>.

Tested-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-03-04 11:14:39 +01:00
Steve French 966ea8c4b7 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-04 04:22:00 +00:00
Julia Lawall 86c838b03d [PATCH] fs/ocfs2/aops.c: Correct use of ! and &
In commit e6bafba5b4, a bug was fixed that
involved converting !x & y to !(x & y).  The code below shows the same
pattern, and thus should perhaps be fixed in the same way.

This is not tested and clearly changes the semantics, so it is only
something to consider.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Adrian Bunk 05488bbebe [2.6 patch] ocfs2: make dlm_do_assert_master() static
This patch makes the needlessly global dlm_do_assert_master() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Adrian Bunk 200bfae37a [2.6 patch] make ocfs2_downconvert_thread() static
This patch makes the needlessly global ocfs2_downconvert_thread()
static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Adrian Bunk 006000566d [2.6 patch] fs/ocfs2/: possible cleanups
This patch contains the following cleanups that are now possible:
- make the following needlessly global functions static:
  - dlmglue.c:ocfs2_process_blocked_lock()
  - heartbeat.c:ocfs2_node_map_init()
- #if 0 the following unused global function plus support functions:
  - heartbeat.c:ocfs2_node_map_is_only()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Marcin Slusarz 0dd3256e04 [PATCH] ocfs2: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Mark Fasheh 1044e401af ocfs2: Fix writeout in ocfs2_data_convert_worker()
Commit f1f540688e "optimized"
ocfs2_data_convert_worker() to "only do work for regular files".
Unfortunately, I left out a '!', which casued it to *skip* regular files.
This was hidden from testing until recently because the default data
journaling mode (data=ordered) doesn't exercise this code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-03-03 15:50:21 -08:00
Sunil Mushran 7ad8b3d30e ocfs2: Enable localalloc for local mounts
Commit 2fbe8d1ebe disabled localalloc
for local mounts. This caused issues as ocfs2 uses localalloc to
provide write locality. This patch enables localalloc for local mounts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Randy Dunlap 78a4a50a86 docbook: fix filesystems.tmpl source files
Fix docbook problems in filesystems.tmpl.
These cause the generated docbook to be incorrect.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-03 10:47:13 -08:00
Linus Torvalds a64e715fc7 Allow ARG_MAX execve string space even with a small stack limit
The new code that removed the limitation on the execve string size
(which was historically 32 pages) replaced it with a much softer limit
based on RLIMIT_STACK which is usually much larger than the traditional
limit.  See commit b6a2fea393 ("mm:
variable length argument support") for details.

However, if you have a small stack limit (perhaps because you need lots
of stacks in a threaded environment), the new heuristic of allowing up
to 1/4th of RLIMIT_STACK to be used for argument and environment strings
could actually be smaller than the old limit.

So just say that it's ok to have up to ARG_MAX strings regardless of the
value of RLIMIT_STACK, and check the rlimit only when going over that
traditional limit.

(Of course, if you actually have a *really* small stack limit, the whole
stack itself will be limited before you hit ARG_MAX, but that has always
been true and is clearly the right behaviour anyway).

Acked-by: Carlos O'Donell <carlos@codesourcery.com>
Cc: Michael Kerrisk <michael.kerrisk@googlemail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ollie Wild <aaw@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-03 10:12:14 -08:00
Steve French 0dbd888936 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-01 18:29:55 +00:00
Josef Jeff Sipek 1bd960ee2b [XFS] If you mount an XFS filesystem with no mount options at all, then
the "ikeep" option is set rather than "noikeep".

This regression was introduced in 970451.

With no mount options specified, xfs_parseargs() does the following:

int ikeep = 0;

args->flags |= XFSMNT_BARRIER;

args->flags2 |= XFSMNT2_COMPAT_IOSIZE;

if (!options)

goto done;

It only sets the above two options by default and before, it also used to
set XFSMNT_IDELETE by default.

If options are specified, then

if (!(args->flags & XFSMNT_DMAPI) && !ikeep)

args->flags |= XFSMNT_IDELETE;

is executed later on which is skipped by the "goto done;" above.

The solution is to invert the logic.

SGI-PV: 977771
SGI-Modid: xfs-linux-melb:xfs-kern:30590a

Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-28 20:37:56 -08:00
Linus Torvalds 7704a8b6fc Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
  [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
  Remove empty file fs/xfs/Makefile-linux-2.6.
2008-02-26 07:55:29 -08:00
Linus Torvalds adefe11c53 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add missing ext4_journal_stop()
  ext4: ext4_find_next_zero_bit needs an aligned address on some arch
  ext4: set EXT4_EXTENTS_FL only for directory and regular files
  ext4: Don't mark filesystem error if fallocate fails
  ext4: Fix BUG when writing to an unitialized extent
  ext4: Don't use ext4_dec_count() if not needed
  ext4: modify block allocation algorithm for the last group
  ext4: Don't claim block from group which has corrupt bitmap
  ext4: Get journal write access before modifying the extent tree
  ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent()
  ext4: Don't leave behind a half-created inode if ext4_mkdir() fails
  ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!
  ext4: Fix locking hierarchy violation in ext4_fallocate()
  Remove incorrect BKL comments in ext4
2008-02-26 07:50:16 -08:00
Lachlan McIlroy ef8ece55d9 [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
platform.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30559a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-26 17:05:44 +11:00
Lachlan McIlroy db69c915e6 [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
platform.

SGI-PV: 974005
SGI-Modid: xfs-linux-melb:xfs-kern:30558a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-26 17:05:37 +11:00
Steve French 0b442d2c28 [CIFS] remove unused variable
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-26 03:44:02 +00:00
Akinobu Mita 5606bf5d0c ext4: add missing ext4_journal_stop()
Add missing ext4_journal_stop() in error handling.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>
2008-02-25 15:37:42 -05:00
Christoph Hellwig 75f12983d9 [CIFS] consolidate duplicate code in posix/unix inode handling
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-25 20:25:21 +00:00
Hiroshi Shimamoto 13d77c37ca latencytop: change /proc task_struct access method
Change getting task_struct by get_proc_task() at read or write time,
and returns -ESRCH if get_proc_task() returns NULL.
This is same behavior as other /proc files.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-25 16:34:18 +01:00
Hiroshi Shimamoto d6643d12cb latencytop: fix memory leak on latency proc file
At lstats_open(), calling get_proc_task() gets task struct, but it never put.
put_task_struct() should be called when releasing.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-25 16:34:17 +01:00
Hiroshi Shimamoto ae0027869d latencytop: fix kernel panic while reading latency proc file
Reading /proc/<pid>/latency or /proc/<pid>/task/<tid>/latency could cause
NULL pointer dereference.

In lstats_open(), get_proc_task() can return NULL, in which case the kernel
will oops at lstats_show_proc() because m->private is NULL.

When get_proc_task() returns NULL, the kernel should return -ENOENT.

This can be reproduced by the following script.
while :
do
        date
        bash -c 'ls > ls.$$' &
        pid=$!
        cat /proc/$pid/latency &
        cat /proc/$pid/latency &
        cat /proc/$pid/latency &
        cat /proc/$pid/latency
done

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-25 16:34:17 +01:00
Eugene Teo 8808117ca5 proc: add RLIMIT_RTTIME to /proc/<pid>/limits
RLIMIT_RTTIME was introduced to allow the user to set a runtime timeout on
real-time tasks: http://lkml.org/lkml/2007/12/18/218. This patch updates
/proc/<pid>/limits with the new rlimit.

Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:15 -08:00
Christoph Hellwig 45254b4fb2 efs: move headers out of include/linux/
Merge include/linux/efs_fs{_i,_dir}.h into fs/efs/efs.h.  efs_vh.h remains
there because this is the IRIX volume header and shouldn't really be
handled by efs but by the partitioning code.  efs_sb.h remains there for
now because it's exported to userspace.  Of course this wrong and aboot
should have a copy of it's own, but I'll leave that to a separate patch to
avoid any contention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:15 -08:00
Hans Rosenfeld 745329c4a2 /proc/pid/pagemap: fix PM_SPECIAL macro
There seems to be a bug in the PM_SPECIAL macro for /proc/pid/pagemap.  I
think masking out those other bits makes more sense then setting all those
mask bits.

Signed-off-by: Hans Rosenfeld <Hans.Rosenfeld@amd.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:13 -08:00
Roel Kluin f81e8a4387 ufs: fix parenthesisation in ufs_set_fs_state()
This bug snuck in with

commit 252e211e90
Author: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Date:   Tue Oct 16 23:26:31 2007 -0700

    Add in SunOS 4.1.x compatible mode for UFS

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Acked-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:13 -08:00
Miklos Szeredi 1a823ac9ff fuse: fix permission checking
I added a nasty local variable shadowing bug to fuse in 2.6.24, with the
result, that the 'default_permissions' mount option is basically ignored.

How did this happen?

 - old err declaration in inner scope
 - new err getting declared in outer scope
 - 'return err' from inner scope getting removed
 - old declaration not being noticed

-Wshadow would have saved us, but it doesn't seem practical for
the kernel :(

More testing would have also saved us :((

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:13 -08:00
Aneesh Kumar K.V ffad0a44b7 ext4: ext4_find_next_zero_bit needs an aligned address on some arch
ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned
address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit
and use them in the mballoc.

Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286

Eric Sandeen debugged the problem and suggested the fix.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by:      Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-23 01:38:34 -05:00
Aneesh Kumar K.V 42bf0383d1 ext4: set EXT4_EXTENTS_FL only for directory and regular files
In addition, don't inherit EXT4_EXTENTS_FL from parent directory.
If we have a directory with extent flag set and later mount the file
system with -o noextents, the files created in that directory will also
have extent flag set but we would not have called ext4_ext_tree_init for
them. This will cause error later when we are verifying the extent header

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-25 16:38:03 -05:00
Aneesh Kumar K.V 2c98615d3b ext4: Don't mark filesystem error if fallocate fails
If we fail to allocate blocks don't call ext4_error. Also don't hide
errors from ext4_get_blocks_wrap

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-25 15:41:35 -05:00
Mingming Cao f5ab0d1f8f ext4: Fix BUG when writing to an unitialized extent
This patch fixes a bug when writing to preallocated but uninitialized
blocks, which resulted in a BUG in fs/buffer.c saying that the buffer
is not mapped.

When writing to a file, ext4_get_block_wrap() is called with create=1 in
order to request that blocks be allocated if necessary.  It currently
calls ext4_get_blocks() with create=0 in order to do a lookup first.  If
the inode contains an unitialized data block, the buffer head is left
unampped, which ext4_get_blocks_wrap() returns, causing the BUG.

We fix this by checking to see if the buffer head is unmapped, and if
so, we make sure the the buffer head is mapped by calling
ext4_ext_get_blocks with create=1.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-25 15:29:55 -05:00
Linus Torvalds 1a4c6be4ac Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  Wrap buffers used for rpc debug printks into RPC_IFDEBUG
  nfs: fix sparse warnings
  NFS: flush signals before taking down callback thread
2008-02-21 17:19:48 -08:00
Pavel Emelyanov 5216a8e70e Wrap buffers used for rpc debug printks into RPC_IFDEBUG
Sorry for the noise, but here's the v3 of this compilation fix :)

There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.

Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-21 18:42:29 -05:00
David Teigland 599e0f584d dlm: fix rcom_names message to self
The recent patch to validate data lengths in rcom_names messages
failed to account for fake messages a node directs to itself before
ever sending it.  In this case we need to fill in the message length
in the header for the validation code to use.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-21 15:19:54 -06:00
Linus Torvalds 1803f3389b Remove empty file remnants that were left in the tree by mistake
Noted by various people (Sam, Jeff, Roland..)

Commit 58b7983d15 intended to remove the
xfs "Makefile-linux-2.6" file, but it was mistakenly still left in the
tree as a empty file, and would cause git to correctly complain about a
tracked file being removed after a "make distclean" (which removes empty
files as garbage).

And the asm-x86/desc_64.h file was supposed to be removed by commit
c81c6ca45a, but instead stayed around
containing just a single newline.

Get rid of them both properly.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-20 19:56:01 -08:00
Harvey Harrison 90dc7d2796 nfs: fix sparse warnings
fs/nfs/nfs4state.c:788:34: warning: Using plain integer as NULL pointer
fs/nfs/delegation.c:52:34: warning: Using plain integer as NULL pointer
fs/nfs/idmap.c:312:12: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:257:6: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:270:6: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:281:6: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-20 16:15:44 -05:00
Jeff Layton 1227a74e2e NFS: flush signals before taking down callback thread
Now that the reference counting on the callback thread is working as
expected, it uncovers another problem.  Peter Staubach noticed while
testing that patch on an older kernel that he would occasionally see
this printk in rpc_register fire:

    "RPC: failed to contact portmap (errno -512).

The NFSv4 callback thread is signaled by nfs_callback_down(), but never
flushes that signal. All of the shutdown processing is done with that
signal pending. This makes it fail the call to unregister the port with
the portmapper.

In actuality, this rpc_register call isn't necessary at all since the
port isn't actually registered with the portmapper anymore. Regardless,
there doesn't seem to be any reason to leave the signal pending while
the thread is being shut down and flushing it should generally silence
that printk.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-20 13:32:43 -05:00
Adrian Bunk 86b6c7a7f7 fs/block_dev.c: remove #if 0'ed code
Commit b2e895dbd8 #if 0'ed this code stating:

<--  snip  -->

    [PATCH] revert blockdev direct io back to 2.6.19 version

    Andrew Vasquez is reporting as-iosched oopses and a 65% throughput
    slowdown due to the recent special-casing of direct-io against
    blockdevs.  We don't know why either of these things are occurring.

    The patch minimally reverts us back to the 2.6.19 code for a 2.6.20
    release.

<--  snip  -->

It has since been dead code, and unless someone wants to revive it now
it's time to remove it.

This patch also makes bio_release_pages() static again and removes the
ki_bio_count member from struct kiocb, reverting changes that had been
done for this dead code.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2008-02-19 10:04:00 +01:00
Adrian Bunk 4c54ac62dc make struct def_blk_aops static
This patch makes the needlessly global struct def_blk_aops static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2008-02-19 10:04:00 +01:00
Steve French e086fcea86 [CIFS] fix build break when proc disabled
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-18 04:03:58 +00:00
Lachlan McIlroy c58310bf49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus 2008-02-18 13:51:42 +11:00
Lachlan McIlroy 269cdfaf76 [XFS] Added quota targets and removed dmapi directory
Fixes build failures introduced by bad merge to mainline.
2008-02-18 13:06:17 +11:00
Eric Sandeen 794f744b22 [XFS] Fix up xfs out-of-tree builds. (a.k.a. external modules)
Change -I include directives to find headers in the out-of-tree spot. This
allows a directory containing only xfs files to be built as:

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29878a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-18 12:59:11 +11:00
Andi Kleen 58b7983d15 [XFS] Remove Makefile wrappers in XFS
Makefile (and Kbuild) would include Makefile-linux-26 I doubt XFS will
really still compile on 2.4; so drop that. This moves Makefile-linux-26
into Makefile and drops Kbuild. Also having wrappers as both Kbuild and
Makefile seemed redundant anyways.

The patch is relatively large because it renames a file, but no functional
changes.

SGI-PV: 971050
SGI-Modid: xfs-linux-melb:xfs-kern:29781a

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-18 12:48:03 +11:00
Steve French 0a3abcf75b Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-02-15 21:06:08 +00:00
Christoph Hellwig 70eff55d2d [CIFS] factoring out common code in get_inode_info functions
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-15 20:55:05 +00:00
Theodore Ts'o 825f1481ea ext4: Don't use ext4_dec_count() if not needed
The ext4_dec_count() function is only needed when dropping the i_nlink
count on inodes which are (or which could be) directories.  If we
*know* that the inode in question can't possibly be a directory, use
drop_nlink or clear_nlink() if we know i_nlink is 1.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15 15:00:38 -05:00
Steve French c2d68ea65b [CIFS] fix prepath conversion when server supports posix paths
Jeff Layton that we were converting \ to / in the posix path case which is
not always right (depends on what the old delim was).

CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-15 19:20:18 +00:00
Igor Mammedov 11b6d6450c [CIFS] Only convert / when server does not support posix paths
Also add warning if posix path setting changes on reconnect

Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-15 19:06:04 +00:00
Valerie Clement 74d3487fc8 ext4: modify block allocation algorithm for the last group
When a directory inode is allocated in the last group and the last group
contains less than s_blocks_per_group blocks, the initial block allocated
for the directory is not always allocated in the same group as the
directory inode, but in one of the first groups of the filesystem (group 1
for example).
Depending on the current process's pid, ext4_find_near() and 
ext4_ext_find_goal() can return a block number greater than the maximum
blocks count in the filesystem and in that case the block will be not
allocated in the same group as the inode.

The following patch fixes the problem.

Should the modification also be done in ext2/3 code?

Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15 13:43:07 -05:00
Aneesh Kumar K.V e56eb65906 ext4: Don't claim block from group which has corrupt bitmap
In ext4_mb_complex_scan_group, if the extent length of the newly
found extentet is greater than than the total free blocks counted
in group info, break without claiming the block.

Document different ext4_error usage, explaining the state with which we
continue if we mount with errors=continue

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15 13:48:21 -05:00
Aneesh Kumar K.V 9df5643ad1 ext4: Get journal write access before modifying the extent tree
When the user was writing into an unitialized extent,
ext4_ext_convert_to_initialize() was not requesting journal write access
before it started to modify the extent tree.   Fix this oversight.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-22 06:17:31 -05:00
Aneesh Kumar K.V b35905c16a ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent()
The path variable returned via ext4_ext_find_extent is a kmalloc
variable and needs to be freeded.  It also contains a reference to
buffer_head which needs to be dropped.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-25 16:54:37 -05:00
Aneesh Kumar K.V 4cdeed861b ext4: Don't leave behind a half-created inode if ext4_mkdir() fails
If ext4_mkdir() fails to allocate the initial block for the directory,
don't leave behind a half-created directory inode with the link count
left at one.  This was caused by an inappropriate call to ext4_dec_count().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-22 06:17:31 -05:00
Valerie Clement b73fce69ec ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!
With the flex_bg feature enabled, a large file creation oopses the
kernel.   The BUG_ON is:
	BUG_ON(len >= EXT4_BLOCKS_PER_GROUP(sb));

As the allocation of the bitmaps and the inode table can be done
outside the block group with flex_bg, this allows to allocate up to
EXT4_BLOCKS_PER_GROUP blocks in a group.

This patch fixes the oops.

Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15 13:48:51 -05:00
Trond Myklebust 52833e897f Merge branch 'linus_origin' into hotfixes 2008-02-15 13:36:30 -05:00
Igor Mammedov 8aad018b6c [CIFS] Fix mixed case name in structure dfs_info3_param
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-15 18:21:49 +00:00
Aneesh Kumar K.V 55bd725aa3 ext4: Fix locking hierarchy violation in ext4_fallocate()
ext4_fallocate() was trying to acquire i_data_sem outside of
jbd2_start_transaction/jbd2_journal_stop, which violates ext4's locking
hierarchy.  So we take i_mutex to prevent writes and truncates during
the complete fallocate operation, and use ext4_get_block_wrap() which
acquires and releases i_data_sem for each block allocation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15 12:47:21 -05:00
Andi Kleen 642be6ec21 Remove incorrect BKL comments in ext4
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-25 17:20:46 -05:00
Christoph Lameter 4a0962abd1 dentries: Extract common code to remove dentry from lru
Extract the common code to remove a dentry from the lru into a new function
dentry_lru_remove().

Two call sites used list_del() instead of list_del_init().  AFAIK the
performance of both is the same.  dentry_lru_remove() does a list_del_init().

As a result dentry->d_lru is now always empty when a dentry is freed.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:09 -08:00
Jan Blunck cf28b4863f d_path: Make d_path() use a struct path
d_path() is used on a <dentry,vfsmount> pair.  Lets use a struct path to
reflect this.

[akpm@linux-foundation.org: fix build in mm/memory.c]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Bryan Wu <bryan.wu@analog.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:09 -08:00
Jan Blunck c32c2f63a9 d_path: Make seq_path() use a struct path argument
seq_path() is always called with a dentry and a vfsmount from a struct path.
Make seq_path() take it directly as an argument.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck e83aece3af Use struct path in struct svc_expkey
I'm embedding struct path into struct svc_expkey.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck 5477549161 Use struct path in struct svc_export
I'm embedding struct path into struct svc_export.

[akpm@linux-foundation.org: coding-style fixes]
[ezk@cs.sunysb.edu: NFSD: fix wrong mnt_writer count in rename]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck 448678a0f3 d_path: Make get_dcookie() use a struct path argument
get_dcookie() is always called with a dentry and a vfsmount from a struct
path.  Make get_dcookie() take it directly as an argument.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck 3dcd25f37c d_path: Make proc_get_link() use a struct path argument
proc_get_link() is always called with a dentry and a vfsmount from a struct
path.  Make proc_get_link() take it directly as an argument.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck a03a8a709a d_path: kerneldoc cleanup
Move and update d_path() kernel API documentation.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck 329c97f0af One less parameter to __d_path
All callers to __d_path pass the dentry and vfsmount of a struct path to
__d_path.  Pass the struct path directly, instead.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck ac748a09fc Make set_fs_{root,pwd} take a struct path
In nearly all cases the set_fs_{root,pwd}() calls work on a struct
path. Change the function to reflect this and use path_get() here.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 6ac08c39a1 Use struct path in fs_struct
* Use struct path in fs_struct.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 5dd784d049 Introduce path_get()
This introduces the symmetric function to path_put() for getting a reference
to the dentry and vfsmount of a struct path in the right order.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 09da5916ba Use path_put() in a few places instead of {mnt,d}put()
Use path_put() in a few places instead of {mnt,d}put()

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 1d957f9bf8 Introduce path_put()
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 4ac9137858 Embed a struct path into struct nameidata instead of nd->{dentry,mnt}
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 429731b155 Remove path_release_on_umount()
path_release_on_umount() should only be called from sys_umount(). I merged the
function into sys_umount() instead of having in in namei.c.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:32 -08:00
Mike Frysinger e2a366dc5c FLAT binaries: drop BINFMT_FLAT bad header magic warning
The warning issued by fs/binfmt_flat.c when the format handler is given a
non-FLAT and non-script executable is annoying to say the least when working
with FDPIC ELF objects.  If you build a kernel that supports both FLAT and
FDPIC ELFs on no-mmu, every time you execute an FDPIC ELF, the kernel spits
out this message.  While I understand a lot of newcomers to the no-mmu world
screw up generation of FLAT binaries, this warning is not usable for systems
that support more than just FLAT.

Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Bernd Schmidt <bernds_cb1@t-online.de>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 20:58:05 -08:00
Harvey Harrison 3c828e4945 inotify: make variables static in inotify_user.c
inotify_max_user_instances, inotify_max_user_watches,
inotify_max_queued_events can all be made static.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 20:58:04 -08:00
Steve French 03a143c909 [CIFS] fixup prefixpaths which contain multiple path components
Currently, when we get a prefixpath as part of mount, the kernel only
changes the first character to be a '/' or '\' depending on whether
posix extensions are enabled. This is problematic as it expects
mount.cifs to pass in the correct delimiter in the rest of the
prefixpath. But, mount.cifs may not know *what* the correct delimiter
is. It's a chicken and egg problem.

Note that mount.cifs should not do conversion of the
prefixpath - if we want posix behavior then '\' is legal in a path
(and we have had bugs in the distant path to prove to me that
customers sometimes have apps that require '\').  The kernel code
assumes that the path passed in is posix (and current code will handle
the first path component fine but was broken for Windows mounts
for "deep" prefixpaths unless the user specified a prefixpath with '\'
deep in it.   So e.g. with current kernel code:

1) mount to //server/share/dir1 will work to all server types
2) mount to //server/share/dir1/subdir1 will work to Samba
3) mount to //server/share/dir1\\subdir1 will work to Windows

But case two would fail to Windows without the fix.
With the kernel cifs module fix case two now works.

First analyzed by Jeff Layton and Simo Sorce

CC: Jeff Layton <jlayton@redhat.com>
CC: Simo Sorce <simo@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-14 06:38:30 +00:00
Olga Kornievskaia 8d042218b0 NFS: add missing spkm3 strings to mount option parser
This patch adds previous missing spkm3 string values that are needed
to parse mount options in the kernel.
2008-02-13 23:24:08 -05:00
Jeff Layton 25606656b1 NFS: remove error field from nfs_readdir_descriptor_t
The error field in nfs_readdir_descriptor_t is never used outside of the
function in which it is set. Remove the field and change the place that
does use it to use an existing local variable.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:07 -05:00
Dan Muntz 497799e7c0 NFS: missing spaces in KERN_WARNING
The warning message for a v4 server returning various bad sequence-ids is
missing spaces.

Signed-off-by: Dan Muntz <dmuntz@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:06 -05:00
Chuck Lever 4267c9561d NFS: Allow text-based mounts via compat_sys_mount
The compat_sys_mount() system call throws EINVAL for text-based NFSv4
mounts.

The text-based mount interface assumes that any mount option blob that
doesn't set the version field to "1" is a C string (ie not a legacy
mount request).  The compat_sys_mount() call treats blobs that don't
set the version field to "1" as an error.  We just relax the check in
compat_sys_mount() a bit to allow C strings to be passed down to the NFSv4
client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:05 -05:00
Jeff Layton 8e60029f40 NFS: fix reference counting for NFSv4 callback thread
The reference counting for the NFSv4 callback thread stays artificially
high. When this thread comes down, it doesn't properly tear down the
svc_serv, causing a memory leak. In my testing on an older kernel on
x86_64, memory would leak out of the 8k kmalloc slab. So, we're leaking
at least a page of memory every time the thread comes down.

svc_create() creates the svc_serv with a sv_nrthreads count of 1, and
then svc_create_thread() increments that count. Whenever the callback
thread is started it has a sv_nrthreads count of 2. When coming down, it
calls svc_exit_thread() which decrements that count and if it hits 0, it
tears everything down. That never happens here since the count is always
at 2 when the thread exits.

The problem is that nfs_callback_up() should be calling svc_destroy() on
the svc_serv on both success and failure. This is how lockd_up_proto()
handles the reference counting, and doing that here fixes the leak.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:04 -05:00
Marcin Slusarz cba44359d1 udf: fix udf_add_free_space
In commit 742ba02a51 (udf: create common
function for changing free space counter) by accident I reversed safety
condition which lead to null pointer dereference in case of media error and
wrong counting of free space in normal situation

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:20 -08:00
Jan Kara e28d80f182 udf: fix directory offset handling
Patch cleaning up UDF directory offset handling missed modifications in dir.c
(because I've submitted an old version :(). Fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:20 -08:00
Sergio Luis ed58f80279 fs/smbfs/inode.c: fix warning message deprecating smbfs
Fix the warning message regarding smbfs to

"smbfs is deprecated and will be removed from the 2.6.27 kernel. Please migrate to cifs"

instead of

"smbfs is deprecated and will be removedfrom the 2.6.27 kernel.  Please migrate to cifs"

Signed-off-by: Sergio Luis <sergio@uece.br>
Screwed-up-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:19 -08:00
Marcin Slusarz 413d57c990 xfs: convert beX_add to beX_add_cpu (new common API)
remove beX_add functions and replace all uses with beX_add_cpu

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Dave Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:19 -08:00
Randy Dunlap b51d63c6d3 kernel-doc: fix fs/pipe.c notation
Fix several kernel-doc notation errors in fs/pipe.c.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:19 -08:00
Marcin Slusarz 8914562278 jfs: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
                                        expression_in_cpu_byteorder);
with:
        leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: jfs-discussion@lists.sourceforge.net
2008-02-13 15:34:20 -06:00
Steve French c1ce264470 [CIFS] fix typo
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-13 02:59:36 +00:00
Shirish Pargaonkar d9f382eff6 [CIFS] patch to fix incorrect encoding of number of aces on set mode
This patch fixes an error in the experimental cifs acl code. During chmod,
set security descriptor data (num aces) is not sent with little-endian encoding.

Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-12 20:46:26 +00:00
Roel Kluin 6f7e8f3763 [CIFS] Fix typo in quota operations
Although these experimental operations are not fully implemented, fix the
typo in the definition of the quotactl operations for cifs.

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-12 20:38:10 +00:00
Steve French 90c81e0b0e [CIFS] clean up some hard to read ifdefs
Christoph had noticed too many ifdefs in the CIFS code making it
hard to read.  This patch removes about a quarter of them from
the C files in cifs by improving a few key ifdefs in the .h files.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-12 20:32:36 +00:00
Linus Torvalds 0c0d61ca93 Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
  SUNPRC: Fix printk format warning
  nfsd: clean up svc_reserve_auth()
  NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
  NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
  NLM: have server-side RPC clients default to soft RPC tasks
  NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
2008-02-11 09:19:47 -08:00
Jeff Layton c64e80d55d NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
is in flight. If this happens we don't want lockd to insert the block
back into the nlm_blocked list.

This helps that situation, but there's still a possible race. Fixing
that will mean adding real locking for nlm_blocked.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10 18:09:36 -05:00
Jeff Layton 9706501e43 NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
With the current scheme in nlmsvc_grant_blocked, we can end up with more
than one GRANT_MSG callback for a block in flight. Right now, we requeue
the block unconditionally so that a GRANT_MSG callback is done again in
30s. If the client is unresponsive, it can take more than 30s for the
call already in flight to time out.

There's no benefit to having more than one GRANT_MSG RPC queued up at a
time, so put it on the list with a timeout of NLM_NEVER before doing the
RPC call. If the RPC call submission fails, we requeue it with a short
timeout. If it works, then nlmsvc_grant_callback will end up requeueing
it with a shorter timeout after it completes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10 18:09:36 -05:00
Jeff Layton 90bd17c878 NLM: have server-side RPC clients default to soft RPC tasks
Now that it no longer does an RPC ping, lockd always ends up queueing
an RPC task for the GRANT_MSG callback. But, it also requeues the block
for later attempts. Since these are hard RPC tasks, if the client we're
calling back goes unresponsive the GRANT_MSG callbacks can stack up in
the RPC queue.

Fix this by making server-side RPC clients default to soft RPC tasks.
lockd requeues the block anyway, so this should be OK.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10 18:09:36 -05:00
Jeff Layton 031fd3aa20 NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
It's currently possible for an unresponsive NLM client to completely
lock up a server's lockd. The scenario is something like this:

1) client1 (or a process on the server) takes a lock on a file
2) client2 tries to take a blocking lock on the same file and
   awaits the callback
3) client2 goes unresponsive (plug pulled, network partition, etc)
4) client1 releases the lock

...at that point the server's lockd will try to queue up a GRANT_MSG
callback for client2, but first it requeues the block with a timeout of
30s. nlm_async_call will attempt to bind the RPC client to client2 and
will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is
unresponsive it will take around 60s for that to time out. Once it times
out, it's already time to retry the block and the whole process repeats.

Once in this situation, nlmsvc_retry_blocked will never return until
the host starts responding again. lockd won't service new calls.

Fix this by skipping the RPC ping on NLM RPC clients. This makes
nlm_async_call return quickly when called.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10 18:09:36 -05:00
Bastian Blank 712a30e63c splice: fix user pointer access in get_iovec_page_array()
Commit 8811930dc7 ("splice: missing user
pointer access verification") added the proper access_ok() calls to
copy_from_user_mmap_sem() which ensures we can copy the struct iovecs
from userspace to the kernel.

But we also must check whether we can access the actual memory region
pointed to by the struct iovec to fix the access checks properly.

Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Oliver Pinter <oliver.pntr@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-10 10:27:21 -08:00
Theodore Tso 469108ff3d ext4: Add new "development flag" to the ext4 filesystem
This flag is simply a generic "this is a crash/burn test filesystem"
marker.  If it is set, then filesystem code which is "in development"
will be allowed to mount the filesystem.  Filesystem code which is not
considered ready for prime-time will check for this flag, and if it is
not set, it will refuse to touch the filesystem.

As we start rolling ext4 out to distro's like Fedora, et. al, this makes
it less likely that a user might accidentally start using ext4 on a
production filesystem; a bad thing, since that will essentially make it
be unfsckable until e2fsprogs catches up.

Signed-off-by: Theodore Tso <tytso@MIT.EDU>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-02-10 01:11:44 -05:00
Aneesh Kumar K.V 26346ff681 ext4: Don't panic in case of corrupt bitmap
Multiblock allocator calls BUG_ON in many case if the free and used
blocks count obtained looking at the bitmap is different from what
the allocator internally accounted for. Use ext4_error in such case
and don't panic the system.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10 01:10:04 -05:00
Eric Sandeen 256bdb497c ext4: allocate struct ext4_allocation_context from a kmem cache
struct ext4_allocation_context is rather large, and this bloats
the stack of many functions which use it.  Allocating it from
a named slab cache will alleviate this.

For example, with this change (on top of the noinline patch sent earlier):

-ext4_mb_new_blocks		200
+ext4_mb_new_blocks		 40

-ext4_mb_free_blocks		344
+ext4_mb_free_blocks		168

-ext4_mb_release_inode_pa	216
+ext4_mb_release_inode_pa	 40

-ext4_mb_release_group_pa	192
+ext4_mb_release_group_pa	 24

Most of these stack-allocated structs are actually used only for
mballoc history; and in those cases often a smaller struct would do.
So changing that may be another way around it, at least for those
functions, if preferred.  For now, in those cases where the ac
is only for history, an allocation failure simply skips the history
recording, and does not cause any other failures.


Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10 01:13:33 -05:00
Dave Kleikamp c4e35e07af JBD2: Clear buffer_ordered flag for barried IO request on success
In JBD2 jbd2_journal_write_commit_record(), clear the buffer_ordered
flag for the bh after barried IO has succeed. This prevents later, if
the same buffer head were submitted to the underlying device, which has
been reconfigured to not support barrier request, the JBD2 commit code
could treat it as a normal IO (without barrier).

This is a port from JBD/ext3 fix from Neil Brown.

More details from Neil:

Some devices - notably dm and md - can change their behaviour in
response to BIO_RW_BARRIER requests.  They might start out accepting
such requests but on reconfiguration, they find out that they cannot
any more. JBD2 deal with this by always testing if BIO_RW_BARRIER
requests fail with EOPNOTSUPP, and retrying the write
requests without the barrier (probably after waiting for any pending
writes to complete).

However there is a bug in the handling this in JBD2 for ext4 .

When ext4/JBD2 to submit a BIO_RW_BARRIER request,
it sets the buffer_ordered flag on the buffer head.
If the request completes successfully, the flag STAYS SET.

Other code might then write the same buffer_head after the device has
been reconfigured to not accept barriers.  This write will then fail,
but the "other code" is not ready to handle EOPNOTSUPP errors and the
error will be treated as fatal.

Cc:  Neil Brown <neilb@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10 01:09:32 -05:00
Jan Kara 7fb5409df0 ext4: Fix Direct I/O locking
We cannot start transaction in ext4_direct_IO() and just let it last
during the whole write because dio_get_page() acquires mmap_sem which
ranks above transaction start (e.g. because we have dependency chain
mmap_sem->PageLock->journal_start, or because we update atime while
holding mmap_sem) and thus deadlocks could happen. We solve the problem
by starting a transaction separately for each ext4_get_block() call.

We *could* have a problem that we allocate a block and before its data
are written out the machine crashes and thus we expose stale data. But
that does not happen because for hole-filling generic code falls back to
buffered writes and for file extension, we add inode to orphan list and
thus in case of crash, journal replay will truncate inode back to the
original size.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10 01:08:38 -05:00
Aneesh Kumar K.V 8009f9fb30 ext4: Fix circular locking dependency with migrate and rm.
In order to prevent a circular locking dependency when an unlink
operation is racing with an ext4 migration, we delay taking i_data_sem
until just before switch the inode format, and use i_mutex to prevent
writes and truncates during the first part of the migration operation.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10 01:20:05 -05:00
Steve French ad7a2926b9 [CIFS] reduce checkpatch warnings
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-07 23:25:02 +00:00
Lachlan McIlroy de2eeea609 [XFS] add __init/__exit mark to specific init/cleanup functions
SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30459a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Denis Cheng <crquan@gmail.com>
2008-02-07 18:25:19 +11:00
David Chinner 450790a2c5 [XFS] Fix oops in xfs_file_readdir()
When xfs_file_readdir() exactly fills a buffer, it can move it's index
past the end of the buffer and dereference it even though the result of
the dereference is never used. On some platforms this causes an oops.

SGI-PV: 976923
SGI-Modid: xfs-linux-melb:xfs-kern:30458a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:24:13 +11:00
Christoph Hellwig cbc89dcfd2 [XFS] kill xfs_root
The only caller (xfs_fs_fill_super) can simplify call igrab on the root
inode.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30393a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:24:00 +11:00
Christoph Hellwig 4188c78d95 [XFS] keep i_nlink updated and use proper accessors
To get the read-only bind mounts in -mm to work correctly with XFS we need
to call the drop_nlink and inc_nlink helpers to monitor the link count.
Add calls to these to xfs_bumplink and xfs_droplink and stop copying over
di_nlink to i_nlink in xfs_validate_fields and vn_revalidate.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30392a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:23:38 +11:00
Christoph Hellwig 222096ae7f [XFS] stop updating inode->i_blocks
The VFS doesn't use i_blocks, it's only used by generic_fillattr and the
generic quota code which XFS doesn't use. In XFS there is one use to check
whether we have an inline or out of line sumlink, but we can replace that
with a check of the XFS_IFINLINE inode flag.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30391a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:23:15 +11:00
David Chinner de08dbc197 [XFS] Make xfs_ail_check check less by default
Checking the entire AIL on every insert and remove is prohibitively
expensive - the sustained sequntial create rate on a single disk drops
from about 1800/s to 60/s because of this checking resulting in the
xfslogd becoming cpu bound.

By default on debug builds, only check the next and previous entries in
the list to ensure they are ordered correctly. If you really want, define
XFS_TRANS_DEBUG to use the old behaviour.

SGI-PV: 972759
SGI-Modid: xfs-linux-melb:xfs-kern:30372a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:23:05 +11:00
David Chinner 249a8c1124 [XFS] Move AIL pushing into it's own thread
When many hundreds to thousands of threads all try to do simultaneous
transactions and the log is in a tail-pushing situation (i.e. full), we
can get multiple threads walking the AIL list and contending on the AIL
lock.

The AIL push is, in effect, a simple I/O dispatch algorithm complicated by
the ordering constraints placed on it by the transaction subsystem. It
really does not need multiple threads to push on it - even when only a
single CPU is pushing the AIL, it can push the I/O out far faster that
pretty much any disk subsystem can handle.

So, to avoid contention problems stemming from multiple list walkers, move
the list walk off into another thread and simply provide a "target" to
push to. When a thread requires a push, it sets the target and wakes the
push thread, then goes to sleep waiting for the required amount of space
to become available in the log.

This mechanism should also be a lot fairer under heavy load as the waiters
will queue in arrival order, rather than queuing in "who completed a push
first" order.

Also, by moving the pushing to a separate thread we can do more
effectively overload detection and prevention as we can keep context from
loop iteration to loop iteration. That is, we can push only part of the
list each loop and not have to loop back to the start of the list every
time we run. This should also help by reducing the number of items we try
to lock and/or push items that we cannot move.

Note that this patch is not intended to solve the inefficiencies in the
AIL structure and the associated issues with extremely large list
contents. That needs to be addresses separately; parallel access would
cause problems to any new structure as well, so I'm only aiming to isolate
the structure from unbounded parallelism here.

SGI-PV: 972759
SGI-Modid: xfs-linux-melb:xfs-kern:30371a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:22:51 +11:00
Christoph Hellwig 4576758db5 [XFS] use generic_permission
Now that all direct caller of xfs_iaccess are gone we can kill xfs_iaccess
and xfs_access and just use generic_permission with a check_acl callback.
This is required for the per-mount read-only patchset in -mm to work
properly with XFS.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30370a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:22:38 +11:00
Christoph Hellwig f6aa7f2184 [XFS] stop re-checking permissions in xfs_swapext
xfs_swapext should simplify check if we have a writeable file descriptor
instead of re-checking the permissions using xfs_iaccess. Add an
additional check to refuse O_APPEND file descriptors because swapext is
not an append-only write operation.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30369a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:22:24 +11:00
Christoph Hellwig 35fec8df65 [XFS] clean up xfs_swapext
- stop using vnodes
- use proper multiple label goto unwinding
- give the struct file * variables saner names

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30366a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:22:02 +11:00
Christoph Hellwig 199037c598 [XFS] remove permission check from xfs_change_file_space
Both callers of xfs_change_file_space alreaedy do the file->f_mode &
FMODE_WRITE check to ensure we have a file descriptor that has been opened
for write mode, so there is no need to re-check that with xfs_iaccess.
Especially as the later might wrongly deny it for corner cases like file
descriptor passing through unix domain sockets.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30365a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:21:14 +11:00
Lachlan McIlroy 9742bb93da [XFS] prevent panic during log recovery due to bogus op_hdr length
A problem was reported where a system panicked in log recovery due to a
corrupt log record. The cause of the corruption is not known but this
change will at least prevent a crash for this specific scenario. Log
recovery definitely needs some more work in this area.

SGI-PV: 974151
SGI-Modid: xfs-linux-melb:xfs-kern:30318a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-02-07 18:20:58 +11:00
Christoph Hellwig f71354bc3a [XFS] Cleanup various fid related bits:
- merge xfs_fid2 into it's only caller xfs_dm_inode_to_fh.
- remove xfs_vget and opencode it in the two callers, simplifying
  both of them by avoiding the awkward calling convetion.
- assign directly to the dm_fid_t members in various places in the
  dmapi code instead of casting them to xfs_fid_t first (which
  is identical to dm_fid_t)

SGI-PV: 974747
SGI-Modid: xfs-linux-melb:xfs-kern:30258a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:20:11 +11:00
David Chinner edd319dc52 [XFS] Fix xfs_lowbit64
xfs_lowbit64 was broken on 32 bit platforms in a recent cleanup of the xfs
bitops. Fix it back up again.

SGI-PV: 974005
SGI-Modid: xfs-linux-melb:xfs-kern:30202a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:19:41 +11:00
Christoph Hellwig 45ba598e56 [XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros.
Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of
XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode
that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an
xfs_dinode that embedds a xfs_dinode_core one will have to do endian
swapping while the other doesn't. Instead of having the current mess with
the CFORK macros that have byteswapping and non-byteswapping version
(which are inconsistantly named while we're at it) just define each family
of the macros to stand by itself and simplify the whole matter.

A few direct references to the CFORK variants were cleaned up to use IFORK
or DFORK to make this possible.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30163a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:19:24 +11:00
Christoph Hellwig a9759f2de3 [XFS] kill superflous buffer locking (2nd attempt)
There is no need to lock any page in xfs_buf.c because we operate on our
own address_space and all locking is covered by the buffer semaphore. If
we ever switch back to main blockdeive address_space as suggested e.g. for
fsblock with a similar scheme the locking will have to be totally revised
anyway because the current scheme is neither correct nor coherent with
itself.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30156a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:18:50 +11:00
Robert P. J. Day 40ebd81d1a [XFS] Use kernel-supplied "roundup_pow_of_two" for simplicity
SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30098a

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:18:19 +11:00
Tim Shimmin e6a4b37f38 [XFS] Remove the BPCSHIFT and NB* based macros from XFS.
The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff are either not
used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to
be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE
where appropriate. Initial patch and motivation from Nicolas Kaiser.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30096a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:17:58 +11:00
Niv Sardi f7b7c3673e [XFS] Remove bogus assert
This assert is bogus. We can have a forced shutdown occur
between the check for the XLOG_FORCED_SHUTDOWN and the ASSERT. Also, the
logging system shouldn't care about the state of XFS_FORCED_SHUTDOWN, it
should only check XLOG_FORCED_SHUTDOWN. The logging system has it's own
forced shutdown flag so, for the case of a forced shutdown that's not due
to a logging error, we can flush the log.

SGI-PV: 972985
SGI-Modid: xfs-linux-melb:xfs-kern:30029a

Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:17:39 +11:00
Eric Sandeen 71ddabb94a [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config
Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if
CONFIG_XFS_RT is off. This should be safe because mount checks in
xfs_rtmount_init:

so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be
encountered after that.

Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space,
presumeably gcc can optimize around the various "if (0)" type checks:

xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8
xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8
<-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4
xfs_qm_vop_chown_reserve -4

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30014a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:16:43 +11:00
David Chinner a67d7c5f5d [XFS] Move platform specific mount option parse out of core XFS code
Mount option parsing is platform specific. Move it out of core code into
the platform specific superblock operation file.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30012a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:16:30 +11:00
David Chinner 3ed6526441 [XFS] Implement fallocate.
Implement the new generic callout for file preallocation. Atomically
change the file size if requested.

SGI-PV: 972756
SGI-Modid: xfs-linux-melb:xfs-kern:30009a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:16:17 +11:00
David Chinner 5d51eff453 [XFS] Fix inode allocation latency
The log force added in xfs_iget_core() has been a performance issue since
it was introduced for tight loops that allocate then unlink a single file.
under heavy writeback, this can introduce unnecessary latency due tothe
log I/o getting stuck behind bulk data writes.

Fix this latency problem by avoinding the need for the log force by moving
the place we mark linux inode dirty to the transaction commit rather than
on transaction completion.

This also closes a potential hole in the sync code where a linux inode is
not dirty between the time it is modified and the time the log buffer has
been written to disk.

SGI-PV: 972753
SGI-Modid: xfs-linux-melb:xfs-kern:30007a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:16:07 +11:00
David Chinner e4143a1cf5 [XFS] Fix transaction overrun during writeback.
Prevent transaction overrun in xfs_iomap_write_allocate() if we race with
a truncate that overlaps the delalloc range we were planning to allocate.

If we race, we may allocate into a hole and that requires block
allocation. At this point in time we don't have a reservation for block
allocation (apart from metadata blocks) and so allocating into a hole
rather than a delalloc region results in overflowing the transaction block
reservation.

Fix it by only allowing a single extent to be allocated at a time.

SGI-PV: 972757
SGI-Modid: xfs-linux-melb:xfs-kern:30005a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:15:55 +11:00
David Chinner 786f486f81 [XFS] Show all mount args in /proc/mounts
There are several mount options that don't show up in /proc/mounts. Add
them in and clean up the showargs code at the same time.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30004a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:15:43 +11:00
David Chinner 8ae2c0f64a [XFS] Fix sparse warning in xlog_recover_do_efd_trans.
Sparse trips over the locking order in xlog_recover_do_efd_trans() when
xfs_trans_delete_ail() drops the ail lock. Because the unlock is
conditional, we need to either annotate with a "fake unlock" or change the
structure of the code so sparse thinks the function always unlocks.

Reordering the code makes it simpler, so do that.

SGI-PV: 972755
SGI-Modid: xfs-linux-melb:xfs-kern:30003a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:15:29 +11:00
David Chinner a8272ce0c1 [XFS] Fix up sparse warnings.
These are mostly locking annotations, marking things static, casts where
needed and declaring stuff in header files.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30002a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:14:38 +11:00
David Chinner a69b176df2 [XFS] Use the generic bitops rather than implementing them ourselves.
Patch inspired by Andi Kleen.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:30000a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:14:22 +11:00
Vlad Apostolov c319b58b13 [XFS] Make xfs_bulkstat() to report unlinked but referenced inodes
We need xfs_bulkstat() to report inode stat for inodes with link count
zero but reference count non zero.

The fix here:

http://oss.sgi.com/archives/xfs/2007-09/msg00266.html

changed this behavior and made xfs_bulkstat() to filter all unlinked
inodes including those that are not destroyed yet but held by reference.

The attached patch returns back to the original behavior by marking the
on-disk inode buffer "dirty" when di_mode is cleared (at that time both
inode link and reference counter are zero).

SGI-PV: 972004
SGI-Modid: xfs-linux-melb:xfs-kern:29914a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:13:37 +11:00
Lachlan McIlroy 98ce2b5b1b [XFS] 971186 Undo mod xfs-linux-melb:xfs-kern:29845a due to a regression
SGI-PV: 971596
SGI-Modid: xfs-linux-melb:xfs-kern:29902a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07 18:13:27 +11:00
Eric Sandeen bc58f9bb6b [XFS] fix 32-bit compat ioctls for GETXFLAGS, SETXFLAGS, GETVERSION
XFS_IOC_GETVERSION, XFS_IOC_GETXFLAGS and XFS_IOC_SETXFLAGS all take a
"long" which changes size between 32 and 64 bit platforms.

So, the ioctl cmds that come in from a 32-bit app aren't as expected, for
example on GETXFLAGS,

unknown cmd fd(3) cmd(80046601){t:'f';sz:4}

due to the size mismatch.

So, use instead the 32-bit version of the commands for compat ioctls, and
other than that it doesn't take any more manipulation.

Also, for both native and compat versions, just define them to the values
as defined in fs.h

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29849a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:13:17 +11:00
Eric Sandeen d4f3cc016f [XFS] lose xfs_hex_dump in favor of print_hex_dump
No need for xfs to have its own hex dumping routine now that the kernel
has one.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29847a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:13:05 +11:00
Christoph Hellwig 91906a882a [XFS] kill XFS_INOBT_IS_FREE_DISK
This macro is unused an all other acros in this family operate on native
types, so we most likely won't grow a user either.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29846a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:12:41 +11:00
Christoph Hellwig c40ea74101 [XFS] kill superflous buffer locking
There is no need to lock any page in xfs_buf.c because we operate on our
own address_space and all locking is covered by the buffer semaphore. If
we ever switch back to main blockdeive address_space as suggested e.g. for
fsblock with a similar scheme the locking will have to be totally revised
anyway because the current scheme is neither correct nor coherent with
itself.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29845a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:12:07 +11:00
Eric Sandeen 0771fb4515 [XFS] Refactor xfs_mountfs
Refactoring xfs_mountfs() to call sub-functions for logical chunks can
help save a bit of stack, and can make it easier to read this long
function.

The mount path is one of the longest common callchains, easily getting to
within a few bytes of the end of a 4k stack when over lvm, quotas are
enabled, and quotacheck must be done.

With this change on top of the other stack-related changes I've sent, I
can get xfs to survive a normal xfsqa run on 4k stacks over lvm.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29834a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:11:56 +11:00
Christoph Hellwig b53e675dc8 [XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations
Mostly trivial conversion with one exceptions: h_num_logops was kept in
native endian previously and only converted to big endian in xlog_sync,
but we always keep it big endian now. With todays cpus fast byteswap
instructions that's not an issue but the new variant keeps the code clean
and maintainable.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29821a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:11:47 +11:00
Christoph Hellwig 67fcb7bfb6 [XFS] clean up some xfs_log_priv.h macros
- the various assign lsn macros are replaced by a single inline,
xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except
for a more sane calling convention. ASSIGN_LSN_DISK is replaced
by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same,
except we pass the cycle and block arguments explicitly instead of a
log paramter. The latter two variants only had 2, respectively one
user anyway.
- the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the
same calling conventions.
- GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away
the unused arch argument. Instead of conditional defintions
depending on host endianess we now do an unconditional swap and shift
then, which generates equal code.
- the unused XLOG_SET macro is removed.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29820a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:11:38 +11:00
Christoph Hellwig 03bea6fe6c [XFS] clean up some xfs_log_priv.h macros
- the various assign lsn macros are replaced by a single inline,
xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except
for a more sane calling convention. ASSIGN_LSN_DISK is replaced
by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same,
except we pass the cycle and block arguments explicitly instead of a
log paramter. The latter two variants only had 2, respectively one
user anyway.
- the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the
same calling conventions.
- GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away
the unused arch argument. Instead of conditional defintions
depending on host endianess we now do an unconditional swap and shift
then, which generates equal code.
- the unused XLOG_SET macro is removed.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29819a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:10:31 +11:00
Christoph Hellwig 9909c4aa1a [XFS] kill xfs_freeze.
No need to have a wrapper just two call two more functions.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29816a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 18:09:56 +11:00
Christoph Hellwig 10090be25c [XFS] cleanup vnode useage in xfs_iget.c
Get rid of vnode useage in xfs_iget.c and pass Linux inode / xfs_inode
where apropinquate. And kill some useless helpers while we're at it.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29808a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:55:46 +11:00
Christoph Hellwig 6e7f75eafb [XFS] cleanup vnode useage in xfs_ioctl.c
xfs_ioctl.c passes around vnode pointers quite a lot, but all places
already have the Linux inode which is identical to the vnode these days.
Clean the code up to always use the Linux inode.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29807a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:55:35 +11:00
Christoph Hellwig 4ca488eb45 [XFS] Kill off xfs_statvfs.
We were already filling the Linux struct statfs anyway, and doing this
trivial task directly in xfs_fs_statfs makes the code quite a bit cleaner.
While I was at it I also moved copying attributes that don't change over
the lifetime of the filesystem outside the superblock lock.

xfs_fs_fill_super used to get the magic number and blocksize through
xfs_statvfs, but assigning them directly is a lot cleaner and will save
some stack space during mount.

SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29802a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:53:27 +11:00
Christoph Hellwig c43f408795 [XFS] simplify xfs_vn_getattr
Just fill in struct kstat directly from the xfs_inode instead of doing a
detour through a bhv_vattr_t and xfs_getattr.

SGI-PV: 970980
SGI-Modid: xfs-linux-melb:xfs-kern:29770a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:49:06 +11:00
Christoph Hellwig 613d70436c [XFS] kill xfs_iocore_t
xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it
just duplicates fields already in xfs_inode, and there is nothing this
abstraction buys us on XFS/Linux. This patch removes it and shrinks source
and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44
bytes in debug/non-debug builds.

SGI-PV: 970852
SGI-Modid: xfs-linux-melb:xfs-kern:29754a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:48:58 +11:00
Eric Sandeen 007c61c686 [XFS] Remove spin.h
remove spinlock init abstraction macro in spin.h, remove the callers, and
remove the file. Move no-op spinlock_destroy to xfs_linux.h Cleanup
spinlock locals in xfs_mount.c

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29751a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:47:45 +11:00
Eric Sandeen 36e41eebda [XFS] Cleanup lock goop.
Switch last couple lock_t's to spinlock_t's. Remove now-unused
spinlock-related macros & types.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29748a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:47:35 +11:00
Eric Sandeen 3a0e487034 [XFS] ktrace kt_lock is unused, remove it.
SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29747a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:47:25 +11:00
Eric Sandeen 3685c2a1d7 [XFS] Unwrap XFS_SB_LOCK.
Un-obfuscate XFS_SB_LOCK, remove XFS_SB_LOCK->mutex_lock->spin_lock
macros, call spin_lock directly, remove extraneous cookie holdover from
old xfs code, and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29746a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:47:15 +11:00
Eric Sandeen ba74d0cba5 [XFS] Unwrap mru_lock.
Un-obfuscate mru_lock, remove mutex_lock->spin_lock macros, call spin_lock
directly, remove extraneous cookie holdover from old xfs code.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29745a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:47:01 +11:00
Eric Sandeen 703e1f0fd2 [XFS] Unwrap xfs_dabuf_global_lock
Un-obfuscate dabuf_global_lock, remove mutex_lock->spin_lock macros, call
spin_lock directly, remove extraneous cookie holdover from old xfs code,
and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29744a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:46:48 +11:00
Eric Sandeen 64137e56d7 [XFS] Unwrap pagb_lock.
Un-obfuscate pagb_lock, remove mutex_lock->spin_lock macros, call
spin_lock directly, remove extraneous cookie holdover from old xfs code,
and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29743a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:46:39 +11:00
Eric Sandeen 869b906078 [XFS] Unwrap XFS_DQ_PINUNLOCK.
Un-obfuscate DQ_PINLOCK, remove DQ_PINLOCK->mutex_lock->spin_lock macros,
call spin_lock directly, remove extraneous cookie holdover from old xfs
code, and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29742a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:44:50 +11:00
Eric Sandeen c8b5ea289f [XFS] Unwrap GRANT_LOCK.
Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros,
call spin_lock directly, remove extraneous cookie holdover from old xfs
code, and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29741a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:44:41 +11:00
Eric Sandeen b22cd72c95 [XFS] Unwrap LOG_LOCK.
Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call
spin_lock directly, remove extraneous cookie holdover from old xfs code,
and change lock type to spinlock_t.

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29740a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:44:32 +11:00
Donald Douwsma 287f3dad14 [XFS] Unwrap AIL_LOCK
SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29739a

Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:44:23 +11:00
Lachlan McIlroy 541d7d3c4b [XFS] kill unnessecary ioops indirection
Currently there is an indirection called ioops in the XFS data I/O path.
Various functions are called by functions pointers, but there is no
coherence in what this is for, and of course for XFS itself it's entirely
unused. This patch removes it instead and significantly reduces source and
binary size of XFS while making maintaince easier.

SGI-PV: 970841
SGI-Modid: xfs-linux-melb:xfs-kern:29737a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:44:14 +11:00
Christoph Hellwig 21a62542b6 [XFS] simplify vn_revalidate
No need to allocate a bhv_vattr_t on stack and call xfs_getattr to update
a few fields in the Linux inode from the XFS inode, just do it directly.

And yes, this function is in dire need of a better name and prototype,
I'll do in a separate patch, though.

SGI-PV: 970705
SGI-Modid: xfs-linux-melb:xfs-kern:29713a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:44:04 +11:00
Lachlan McIlroy 15947f2d4f [XFS] more vnode/inode tracing fixes
SGI-PV: 970335
SGI-Modid: xfs-linux-melb:xfs-kern:29697a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:43:54 +11:00
Christoph Hellwig 7642861b7e [XFS] kill BMAPI_UNWRITTEN
There is no reason to go through xfs_iomap for the BMAPI_UNWRITTEN because
it has nothing in common with the other cases. Instead check for the
shutdown filesystem in xfs_end_bio_unwritten and perform a direct call to
xfs_iomap_write_unwritten (which should be renamed to something more
sensible one day)

SGI-PV: 970241
SGI-Modid: xfs-linux-melb:xfs-kern:29681a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:43:44 +11:00
Christoph Hellwig 6214ed4461 [XFS] kill BMAPI_DEVICE
There is no reason to go into the iomap machinery just to get the right
block device for an inode. Instead look at the realtime flag in the inode
and grab the right device from the mount structure.

I created a new helper, xfs_find_bdev_for_inode instead of opencoding it
because I plan to use it in other places in the future.

SGI-PV: 970240
SGI-Modid: xfs-linux-melb:xfs-kern:29680a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:43:35 +11:00
Lachlan McIlroy cf441eeb79 [XFS] clean up vnode/inode tracing
Simplify vnode tracing calls by embedding function name & return addr in
the calling macro.

Also do a lot of vnode->inode renaming for consistency, while we're at it.

SGI-PV: 970335
SGI-Modid: xfs-linux-melb:xfs-kern:29650a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:42:19 +11:00
Lachlan McIlroy 44866d3928 [XFS] remove dead SYNC_BDFLUSH case in xfs_sync_inodes
A large part of xfs_sync_inodes is conditional on the SYNC_BDFLUSH which
is never passed to it. This patch removes it and adds an assert that
triggers in case some new code tries to pass SYNC_BDFLUSH to it.

SGI-PV: 970242
SGI-Modid: xfs-linux-melb:xfs-kern:29630a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-02-07 16:40:53 +11:00
Steve French f315ccb3e6 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-02-06 16:04:00 +00:00
Eric Sandeen 0040d9875d allow in-inode EAs on ext4 root inode
The ext3 root inode was treated specially with respect
to in-inode extended attributes, for reasons detailed
in the removed comment below.  The first mkfs-created
inodes would not get extra_i_size or the EXT3_STATE_XATTR
flag set in ext3_read_inode, which disallowed reading or
setting in-inode EAs on the root.

However, in ext4, ext4_mark_inode_dirty calls
ext4_expand_extra_isize for all inodes; once this is done
EAs may be placed in the root ext4 inode body.

But for reasons above, it won't be found after a reboot.

testcase:

setfattr -n user.name -v value mntpt/
setfattr -n user.name2 -v value2 mntpt/
umount mntpt/; remount mntpt/
getfattr -d mntpt/

name2/value2 has gone missing; debugfs shows it in the
inode body, but it is not found there by getattr.

The following fixes it up; newer mkfs appears to properly
zero the inodes, so this workaround isn't needed for ext4.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-02-05 22:36:43 -05:00
Aneesh Kumar K.V 42a10add85 ext4: Fix null bh pointer dereference in mballoc
Repoted by Adrian Bunk <bunk@kernel.org>:

The Coverity checker spotted the following NULL dereference:

static int ext4_mb_mark_diskspace_used
{
	...
	if (!bitmap_bh)
		goto out_err;
	...
out_err:
	sb->s_dirt = 1;
	put_bh(bitmap_bh);
	...

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-02-10 01:07:28 -05:00
Andrew Morton c773633916 deprecate smbfs in favour of cifs
smbfs is a bit buggy and has no maintainer.  Change it to shout at the user on
the first five mount attempts - tell them to switch to CIFS.

Come December we'll mark it BROKEN and see what happens.

[olecom@flower.upol.cz: documentation update]
Cc: Urban Widmark <urban@teststation.com>
Acked-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Oleg Verych <olecom@flower.upol.cz>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 14:37:15 -08:00
Dominique Quatravaux d7b88513c5 uml: fix hostfs tv_usec calculations
To convert from tv_nsec to tv_usec, one needs to divide by 1000, not multiply.

Signed-off-by: Dominique Quatravaux <dominique@quatravaux.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:30 -08:00
Andrew Morgan e338d263a7 Add 64-bit capability support to the kernel
The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities.  If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.

FWIW libcap-2.00 supports this change (and earlier capability formats)

 http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/

[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:20 -08:00
David P. Quigley 4bea58053f VFS: Reorder vfs_getxattr to avoid unnecessary calls to the LSM
Originally vfs_getxattr would pull the security xattr variable using
the inode getxattr handle and then proceed to clobber it with a subsequent call
to the LSM.

This patch reorders the two operations such that when the xattr requested is
in the security namespace it first attempts to grab the value from the LSM
directly.

If it fails to obtain the value because there is no module present or the
module does not support the operation it will fall back to using the inode
getxattr operation.

In the event that both are inaccessible it returns EOPNOTSUPP.

Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:20 -08:00
David P. Quigley 4249259404 VFS/Security: Rework inode_getsecurity and callers to return resulting buffer
This patch modifies the interface to inode_getsecurity to have the function
return a buffer containing the security blob and its length via parameters
instead of relying on the calling function to give it an appropriately sized
buffer.

Security blobs obtained with this function should be freed using the
release_secctx LSM hook.  This alleviates the problem of the caller having to
guess a length and preallocate a buffer for this function allowing it to be
used elsewhere for Labeled NFS.

The patch also removed the unused err parameter.  The conversion is similar to
the one performed by Al Viro for the security_getprocattr hook.

Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:20 -08:00
Fengguang Wu 8bc3be2751 writeback: speed up writeback of big dirty files
After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays.  But sometimes the following
happens instead:

	- after 30s:    ~4M
	- after 5s:     ~4M
	- after 5s:     all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

		s_io            s_more_io
		-------------------------
	1)	100M,1K         0
	2)	1K              96M
	3)	0               96M
1) initial state with a 100M file and a 1K file

2) 4M written, nr_to_write <= 0, so write more

3) 1K written, nr_to_write > 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out.  The big dirty file is actually still sitting in
s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty.  It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes.  So nr_to_write > 0
does not necessarily mean that "all data are written".  This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done.  With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.

In sync_sb_inodes() we only set more_io on super_blocks we actually
visited.  This avoids the interaction between two pdflush deamons.

Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress.  Failing to do so may lead to 100% iowait.

Tested-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:19 -08:00
Qi Yong 2d544564f9 skip writing data pages when inode is under I_SYNC
Since I_SYNC was split out from I_LOCK, the concern in commit
4b89eed93e ("Write back inode data pages
even when the inode itself is locked") is not longer valid.

We should revert to the original behavior: in __writeback_single_inode(),
when we find an I_SYNC-ed inode and we're not doing a data-integrity sync,
skip writing entirely.  Otherwise, we are double calling do_writepages()

Signed-off-by: Qi Yong <qiyong@fc-cn.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:18 -08:00
Andrea Arcangeli 7766755a2f Fix /proc dcache deadlock in do_exit
This patch fixes a sles9 system hang in start_this_handle from a customer
with some heavy workload where all tasks are waiting on kjournald to commit
the transaction, but kjournald waits on t_updates to go down to zero (it
never does).

This was reported as a lowmem shortage deadlock but when checking the debug
data I noticed the VM wasn't under pressure at all (well it was really
under vm pressure, because lots of tasks hanged in the VM prune_dcache
methods trying to flush dirty inodes, but no task was hanging in GFP_NOFS
mode, the holder of the journal handle should have if this was a vm issue
in the first place).

No task was apparently holding the leftover handle in the committing
transaction, so I deduced t_updates was stuck to 1 because a journal_stop
was never run by some path (this turned out to be correct).  With a debug
patch adding proper reverse links and stack trace logging in ext3 deployed
in production, I found journal_stop is never run because
mark_inode_dirty_sync is called inside release_task called by do_exit.
(that was quite fun because I would have never thought about this
subtleness, I thought a regular path in ext3 had a bug and it forgot to
call journal_stop)

do_exit->release_task->mark_inode_dirty_sync->schedule() (will never
come back to run journal_stop)

The reason is that shrink_dcache_parent is racy by design (feature not
a bug) and it can do blocking I/O in some case, but the point is that
calling shrink_dcache_parent at the last stage of do_exit isn't safe
for self-reaping tasks.

I guess the memory pressure of the unbalanced highmem system allowed
to trigger this more easily.

Now mainline doesn't have this line in iput (like sles9 has):

    	     if (inode->i_state & I_DIRTY_DELAYED)
	     			mark_inode_dirty_sync(inode);

so it will probably not crash with ext3, but for example ext2 implements an
I/O-blocking ext2_put_inode that will lead to similar screwups with
ext2_free_blocks never coming back and it's definitely wrong to call
blocking-IO paths inside do_exit.  So this should fix a subtle bug in
mainline too (not verified in practice though).  The equivalent fix for
ext3 is also not verified yet to fix the problem in sles9 but I don't have
doubt it will (it usually takes days to crash, so it'll take weeks to be
sure).

An alternate fix would be to offload that work to a kernel thread, but I
don't think a reschedule for this is worth it, the vm should be able to
collect those entries for the synchronous release_task.

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:18 -08:00
Matt Mackall 1e88328111 maps4: make page monitoring /proc file optional
Make /proc/ page monitoring configurable

This puts the following files under an embedded config option:

/proc/pid/clear_refs
/proc/pid/smaps
/proc/pid/pagemap
/proc/kpagecount
/proc/kpageflags

[akpm@linux-foundation.org: Kconfig fix]
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
Matt Mackall 304daa8132 maps4: add /proc/kpageflags interface
This makes a subset of physical page flags available to userspace. Together
with /proc/pid/kpagemap, this allows tracking of a wide variety of VM behaviors.

Exported flags are decoupled from the kernel's internal flags. This
allows us to reorder flag bits, and synthesize any bits that get
redefined in terms of other bits.

[akpm@linux-foundation.org: remove unneeded access_ok()]
[akpm@linux-foundation.org: s/0/NULL/]
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
Matt Mackall 161f47bf41 maps4: add /proc/kpagecount interface
This makes physical page map counts available to userspace. Together
with /proc/pid/pagemap and /proc/pid/clear_refs, this can be used to
monitor memory usage on a per-page basis.

[akpm@linux-foundation.org: remove unneeded access_ok()]
[bunk@stusta.de: make struct proc_kpagemap static]
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
Matt Mackall 85863e475e maps4: add /proc/pid/pagemap interface
This interface provides a mapping for each page in an address space to its
physical page frame number, allowing precise determination of what pages are
mapped and what pages are shared between processes.

New in this version:

- headers gone again (as recommended by Dave Hansen and Alan Cox)
- 64-bit entries (as per discussion with Andi Kleen)
- swap pte information exported (from Dave Hansen)
- page walker callback for holes (from Dave Hansen)
- direct put_user I/O (as suggested by Rusty Russell)

This patch folds in cleanups and swap PTE support from Dave Hansen
<haveblue@us.ibm.com>.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
Matt Mackall a6198797cc maps4: regroup task_mmu by interface
Reorder source so that all the code and data for each interface is together.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
Matt Mackall f248dcb34d maps4: move clear_refs code to task_mmu.c
This puts all the clear_refs code where it belongs and probably lets things
compile on MMU-less systems as well.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
Matt Mackall 4752c36978 maps4: simplify interdependence of maps and smaps
This pulls the shared map display code out of show_map and puts it in
show_smap where it belongs.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
Matt Mackall b3ae5acbbb maps4: use pagewalker in clear_refs and smaps
Use the generic pagewalker for smaps and clear_refs

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
Fengguang Wu ec4dd3eb35 maps4: add proportional set size accounting in smaps
The "proportional set size" (PSS) of a process is the count of pages it has
in memory, where each page is divided by the number of processes sharing
it.  So if a process has 1000 pages all to itself, and 1000 shared with one
other process, its PSS will be 1500.

               - lwn.net: "ELC: How much memory are applications really using?"

The PSS proposed by Matt Mackall is a very nice metic for measuring an
process's memory footprint.  So collect and export it via
/proc/<pid>/smaps.

Matt Mackall's pagemap/kpagemap and John Berthels's exmap can also do the
job.  They are comprehensive tools.  But for PSS, let's do it in the simple
way.

Cc: John Berthels <jjberthels@gmail.com>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Cc: Padraig Brady <P@draigBrady.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
Ken Chen 75897d60a5 hugetlb: allow sticky directory mount option
Allow sticky directory mount option for hugetlbfs.  This allows admin
to create a shared hugetlbfs mount point for multiple users, while
prevent accidental file deletion that users may step on each other.
It is similiar to default tmpfs mount option, or typical option used
on /tmp.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:14 -08:00
Christoph Lameter b98938c373 bufferhead: revert constructor removal
The constructor for buffer_head slabs was removed recently.  We need the
constructor back in slab defrag in order to insure that slab objects always
have a definite state even before we allocated them.

I think we mistakenly merged the removal of the constuctor into a cleanup
patch.  You (ie: akpm) had a test that showed that the removal of the
constructor led to a small regression.  The prior state makes things easier
for slab defrag.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:14 -08:00
Christoph Lameter 9e2779fa28 is_vmalloc_addr(): Check if an address is within the vmalloc boundaries
Checking if an address is a vmalloc address is done in a couple of places.
Define a common version in mm.h and replace the other checks.

Again the include structures suck.  The definition of VMALLOC_START and
VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
there.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:14 -08:00
Christoph Lameter eebd2aa355 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:13 -08:00
Davide Libenzi 4d672e7ac7 timerfd: new timerfd API
This is the new timerfd API as it is implemented by the following patch:

int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
		    const struct itimerspec *utmr,
		    struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);

The timerfd_create() API creates an un-programmed timerfd fd.  The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).

The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter.  Otherwise it's a relative time.

The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.

Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface).  Here's a simple test program I used to
exercise the new timerfd APIs:

http://www.xmailserver.org/timerfd-test2.c

[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Oleg Nesterov ed5d2cac11 exec: rework the group exit and fix the race with kill
As Roland pointed out, we have the very old problem with exec.  de_thread()
sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
clears signal->flags.  All signals (even fatal ones) sent in this window
(which is not too small) will be lost.

With this patch exec doesn't abuse SIGNAL_GROUP_EXIT.  signal_group_exit(),
the new helper, should be used to detect exit_group() or exec() in progress.
It can have more users, but this patch does only strictly necessary changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Andrew Morton 59714d65df get_task_comm(): return the result
It was dumb to make get_task_comm() return void.  Change it to return a
pointer to the resulting output for caller convenience.

Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Peter Zijlstra 0ccf831cbe lockdep: annotate epoll
On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:

> I remember I talked with Arjan about this time ago. Basically, since 1)
> you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
> are used, you can see a wake_up() from inside another wake_up(), but they
> will never refer to the same lock instance.
> Think about:
>
> 	dfd = socket(...);
> 	efd1 = epoll_create();
> 	efd2 = epoll_create();
> 	epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
> 	epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
>
> When a packet arrives to the device underneath "dfd", the net code will
> issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
> callback wakeup entry on that queue, and the wake_up() performed by the
> "dfd" net code will end up in ep_poll_callback(). At this point epoll
> (efd1) notices that it may have some event ready, so it needs to wake up
> the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
> that ends up in another wake_up(), after having checked about the
> recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
> avoid stack blasting. Never hit the same queue, to avoid loops like:
>
> 	epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
> 	epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
> 	epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
> 	epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
>
> The code "if (tncur->wq == wq || ..." prevents re-entering the same
> queue/lock.

Since the epoll code is very careful to not nest same instance locks
allow the recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Valerie Clement b8356c465b ext4: Don't set EXTENTS_FL flag for fast symlinks
For fast symbolic links, the file content is stored in the i_block[]
array, which is not compatible with the new file extents format.
e2fsck reports error on such files because EXTENTS_FL is set.
Don't set the EXTENTS_FL flag when creating fast symlinks.

In the case of file migration, skip fast symbolic links.

Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-05 10:56:37 -05:00
Aneesh Kumar K.V 4d60517972 JBD2: Use the incompat macro for testing the incompat feature.
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT needs to be checked with
JBD2_HAS_INCOMPAT_FEATURE

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-05 10:56:15 -05:00
Aneesh Kumar K.V c4b8e635f5 jbd2: Fix reference counting on the journal commit block's buffer head
With journal checksum patch we added asynchronous commits of journal
commit headers, and accidentally dropped taking a reference on the
buffer head.

(Before the change, sync_dirty_buffer did the get_bh(). The associative
put_bh is done by journal_wait_on_commit_record().)

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-05 10:55:26 -05:00
Andrew Morton ead03e30b0 [CIFS] fix warning in cifs_spnego.c
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-02-05 15:51:24 +00:00
Linus Torvalds f5bb3a5e9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits)
  Jesper Juhl is the new trivial patches maintainer
  Documentation: mention email-clients.txt in SubmittingPatches
  fs/binfmt_elf.c: spello fix
  do_invalidatepage() comment typo fix
  Documentation/filesystems/porting fixes
  typo fixes in net/core/net_namespace.c
  typo fix in net/rfkill/rfkill.c
  typo fixes in net/sctp/sm_statefuns.c
  lib/: Spelling fixes
  kernel/: Spelling fixes
  include/scsi/: Spelling fixes
  include/linux/: Spelling fixes
  include/asm-m68knommu/: Spelling fixes
  include/asm-frv/: Spelling fixes
  fs/: Spelling fixes
  drivers/watchdog/: Spelling fixes
  drivers/video/: Spelling fixes
  drivers/ssb/: Spelling fixes
  drivers/serial/: Spelling fixes
  drivers/scsi/: Spelling fixes
  ...
2008-02-04 07:58:52 -08:00
Linus Torvalds 9853832c49 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  pid-namespaces-vs-locks-interaction
  file locks: Use wait_event_interruptible_timeout()
  locks: clarify posix_locks_deadlock
2008-02-04 07:58:03 -08:00
Nick Piggin 2f98735c9c vm audit: add VM_DONTEXPAND to mmap for drivers that need it
Drivers that register a ->fault handler, but do not range-check the
offset argument, must set VM_DONTEXPAND in the vm_flags in order to
prevent an expanding mremap from overflowing the resource.

I've audited the tree and attempted to fix these problems (usually by
adding VM_DONTEXPAND where it is not obvious).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-04 07:55:38 -08:00
Vitaliy Gusev ab1f161165 pid-namespaces-vs-locks-interaction
fcntl(F_GETLK,..) can return pid of process for not current pid namespace
(if process is belonged to the several namespaces).  It is true also for
pids in /proc/locks.  So correct behavior is saving pointer to the struct
pid of the process lock owner.

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-03 17:51:36 -05:00
Matthew Wilcox 4321e01e7d file locks: Use wait_event_interruptible_timeout()
interruptible_sleep_on_locked() is just an open-coded
wait_event_interruptible_timeout(), with the one difference that
interruptible_sleep_on_locked() doesn't bother to check the condition on
which it is waiting, depending instead on the BKL to avoid the case
where it blocks after the wakeup has already been called.

locks_block_on_timeout() is only used in one place, so it's actually
simpler to inline it into its caller.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-03 17:51:36 -05:00
J. Bruce Fields b533184fc3 locks: clarify posix_locks_deadlock
For such a short function (with such a long comment),
posix_locks_deadlock() seems to cause a lot of confusion.  Attempt to
make it a bit clearer:

	- Remove the initial posix_same_owner() check, which can never
	  pass (since this is only called in the case that block_fl and
	  caller_fl conflict)
	- Use an explicit loop (and a helper function) instead of a goto.
	- Rewrite the comment, attempting a clearer explanation, and
	  removing some uninteresting historical detail.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-03 17:51:36 -05:00
Ohad Ben-Cohen 09c6dd3c9d fs/binfmt_elf.c: spello fix
s/litle/little

Signed-off-by: Ohad Ben-Cohen <ohad@bencohen.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 18:05:15 +02:00
Joe Perches c78bad11fb fs/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:33:42 +02:00
Paulius Zaleckas efad798b9f Spelling fixes: lenght->length
Signed-off-by: Paulius Zaleckas <pauliusz@yahoo.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 15:42:53 +02:00
Robert P. J. Day e1b8513d21 Typoes: "whith" -> "with"
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 15:14:02 +02:00
Robert P. J. Day 14e4a0f2bb Fix a small number of "memeber" typoes.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 15:12:15 +02:00
Linus Torvalds 63e9b66e29 Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits)
  SUNRPC: RPC program information is stored in unsigned integers
  SUNRPC: Move exported symbol definitions after function declaration part 2
  NLM: tear down RPC clients in nlm_shutdown_hosts
  SUNRPC: spin svc_rqst initialization to its own function
  nfsd: more careful input validation in nfsctl write methods
  lockd: minor log message fix
  knfsd: don't bother mapping putrootfh enoent to eperm
  rdma: makefile
  rdma: ONCRPC RDMA protocol marshalling
  rdma: SVCRDMA sendto
  rdma: SVCRDMA recvfrom
  rdma: SVCRDMA Core Transport Services
  rdma: SVCRDMA Transport Module
  rdma: SVCRMDA Header File
  svc: Add svc_xprt_names service to replace svc_sock_names
  knfsd: Support adding transports by writing portlist file
  svc: Add svc API that queries for a transport instance
  svc: Add /proc/sys/sunrpc/transport files
  svc: Add transport hdr size for defer/revisit
  svc: Move the xprt independent code to the svc_xprt.c file
  ...
2008-02-02 14:31:28 +11:00
Jeff Layton d801b86168 NLM: tear down RPC clients in nlm_shutdown_hosts
It's possible for a RPC to outlive the lockd daemon that created it, so
we need to make sure that all RPC's are killed when lockd is coming
down. When nlm_shutdown_hosts is called, kill off all RPC tasks
associated with the host. Since we need to wait until they have all gone
away, we might as well just shut down the RPC client altogether.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:15 -05:00
J. Bruce Fields 87d26ea777 nfsd: more careful input validation in nfsctl write methods
Neil Brown points out that we're checking buf[size-1] in a couple places
without first checking whether size is zero.

Actually, given the implementation of simple_transaction_get(), buf[-1]
is zero, so in both of these cases the subsequent check of the value of
buf[size-1] will catch this case.

But it seems fragile to depend on that, so add explicit checks for this
case.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
2008-02-01 16:42:15 -05:00
J. Bruce Fields 50431d94e7 lockd: minor log message fix
Wendy Cheng noticed that function name doesn't agree here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Wendy Cheng <wcheng@redhat.com>
2008-02-01 16:42:15 -05:00
J. Bruce Fields f7b8066f9f knfsd: don't bother mapping putrootfh enoent to eperm
Neither EPERM and ENOENT map to valid errors for PUTROOTFH according to
rfc 3530, and, if anything, ENOENT is likely to be slightly more
informative; so don't bother mapping ENOENT to EPERM.  (Probably this
was originally done because one likely cause was that there is an fsid=0
export but that it isn't permitted to this particular client.  Now that
we allow WRONGSEC returns, this is somewhat less likely.)

In the long term we should work to make this situation less likely,
perhaps by turning off nfsv4 service entirely in the absence of the
pseudofs root, or constructing a pseudofilesystem root ourselves in the
kernel as necessary.

Thanks to Benny Halevy <bhalevy@panasas.com> for pointing out this
problem.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Benny Halevy <bhalevy@panasas.com>
2008-02-01 16:42:15 -05:00
Tom Tucker 9571af18fa svc: Add svc_xprt_names service to replace svc_sock_names
Create a transport independent version of the svc_sock_names function.

The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker a217813f90 knfsd: Support adding transports by writing portlist file
Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:

<transport_name><space><port number>

For example:

echo "tcp 2049" > /proc/fs/nfsd/portlist

This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.

Transports can also be removed as follows:

'-'<transport_name><space><port number>

For example:

echo "-tcp 2049" > /proc/fs/nfsd/portlist

Attempting to add a listener with an invalid transport string results
in EPROTONOSUPPORT and a perror string of "Protocol not supported".

Attempting to remove an non-existent listener (.e.g. bad proto or port)
results in ENOTCONN and a perror string of
"Transport endpoint is not connected"

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker 7fcb98d58c svc: Add svc API that queries for a transport instance
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.

Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker 7a18208383 svc: Make close transport independent
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.

The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.

This code races with module removal and transport addition.

Thanks to Simon Holm Thøgersen for a compile fix.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
2008-02-01 16:42:11 -05:00
Tom Tucker d7c9f1ed97 svc: Change services to use new svc_create_xprt service
Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Oleg Drokin 54ca95eb36 Leak in nlmsvc_testlock for async GETFL case
Fix nlm_block leak for the case of supplied blocking lock info.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
J. Bruce Fields 8838dc43d6 nfsd4: clean up access_valid, deny_valid checks.
Document these checks a little better and inline, as suggested by Neil
Brown (note both functions have two callers).  Remove an obviously bogus
check while we're there (checking whether unsigned value is negative).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
2008-02-01 16:42:07 -05:00
J. Bruce Fields 5c002b3bb2 nfsd: allow root to set uid and gid on create
The server silently ignores attempts to set the uid and gid on create.
Based on the comment, this appears to have been done to prevent some
overly-clever IRIX client from causing itself problems.

Perhaps we should remove that hack completely.  For now, at least, it
makes sense to allow root (when no_root_squash is set) to set uid and
gid.

While we're there, since nfsd_create and nfsd_create_v3 share the same
logic, pull that out into a separate function.  And spell out the
individual modifications of ia_valid instead of doing them both at once
inside a conditional.

Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report
and original patch on which this is based.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Oleg Drokin 29dbf54615 lockd: fix a leak in nlmsvc_testlock asynchronous request handling
Without the patch, there is a leakage of nlmblock structure refcount
that holds a reference nlmfile structure, that holds a reference to
struct file, when async GETFL is used (-EINPROGRESS return from
file_ops->lock()), and also in some error cases.

Fix up a style nit while we're here.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
Frank Filz 406a7ea97d nfsd: Allow AIX client to read dir containing mountpoints
This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.

I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i

The AIX client mounts / -o sec=krb5:krb5i onto /mnt

If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.

Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.

In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.

The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 39325bd03f nfsd4: fix bad seqid on lock request incompatible with open mode
The failure to return a stateowner from nfs4_preprocess_seqid_op() means
in the case where a lock request is of a type incompatible with an open
(due to, e.g., an application attempting a write lock on a file open for
read), means that fs/nfsd/nfs4xdr.c:ENCODE_SEQID_OP_TAIL() never bumps
the seqid as it should.  The client, attempting to close the file
afterwards, then gets an (incorrect) bad sequence id error.  Worse, this
prevents the open file from ever being closed, so we leak state.

Thanks to Benny Halevy and Trond Myklebust for analysis, and to Steven
Wilton for the report and extensive data-gathering.

Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Steven Wilton <steven.wilton@team.eftel.com.au>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
Oleg Drokin b7e6b86948 lockd: fix reference count leaks in async locking case
In a number of places where we wish only to translate nlm_drop_reply to
rpc_drop_reply errors we instead return early with rpc_drop_reply,
skipping some important end-of-function cleanup.

This results in reference count leaks when lockd is doing posix locking
on GFS2.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 404ec117be nfsd4: recognize callback channel failure earlier
When the callback channel fails, we inform the client of that by
returning a cb_path_down error the next time it tries to renew its
lease.

If we wait most of a lease period before deciding that a callback has
failed and that the callback channel is down, then we decrease the
chances that the client will find out in time to do anything about it.

So, mark the channel down as soon as we recognize that an rpc has
failed.  However, continue trying to recall delegations anyway, in hopes
it will come back up.  This will prevent more delegations from being
given out, and ensure cb_path_down is returned to renew calls earlier,
while still making the best effort to deliver recalls of existing
delegations.

Also fix a couple comments and remove a dprink that doesn't seem likely
to be useful.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 35bba9a37e nfsd4: miscellaneous nfs4state.c style fixes
Fix various minor style violations.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00