OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Andrew Morton	9df130392f	fs/ufs/balloc.c: fix sparc64 printk warning fs/ufs/balloc.c: In function `ufs_change_blocknr': fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 2) fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 3) sector_t is u64 and we don't know what type the architecture uses to implement u64. Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:37 -07:00
Dave Young	08ca0db8aa	zisofs: fix readpage() outside i_size A read request outside i_size will be handled in do_generic_file_read(). So we just return 0 to avoid getting -EIO as normal reading, let do_generic_file_read do the rest. At the same time we need unlock the page to avoid system stuck. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227 Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Report-by: Christian Perle <chris@linuxinfotag.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Randy Dunlap	a6b91919e0	fs: fix kernel-doc notation warnings Fix kernel-doc notation warnings in fs/. Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line: * mark_files_ro Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line: * lookup_one_len: filesystem helper to lookup single pathname component Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line: * bh_uptodate_or_lock: Test whether the buffer is uptodate Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line: * bh_submit_read: Submit a locked buffer for reading Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line: * writeback_acquire: attempt to get exclusive writeback access to a device Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line: * writeback_in_progress: determine whether there is writeback in progress Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line: * writeback_release: relinquish exclusive writeback access against a device. Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line: * void journal_invalidatepage() Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Michael Halcrow	5366dc9fd1	eCryptfs: Swap dput() and mntput() ecryptfs_d_release() is doing a mntput before doing the dput. This patch moves the dput before the mntput. Thanks to Rajouri Jammu for reporting this. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Rajouri Jammu <rajouri.jammu@gmail.com> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Duane Griffin	d00256766a	jbd2: correctly unescape journal data blocks Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Duane Griffin	439aeec639	jbd: correctly unescape journal data blocks Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
David Howells	4ebf89845b	ROMFS: Fix up an error in iget removal Fix up an error in iget removal in which romfs_lookup() making a successful call to romfs_iget() continues through the negative/error handling (previously the successful case jumped around the negative/error handling case): (1) inode is initialised to NULL at the top of the function, eliminating the need for specific negative-inode handling. This means the positive success handling now flows straight through. (2) Rename the labels to be clearer about what they mean. Also make romfs_lookup()'s result variable of type long so as to avoid 32-bit/64-bit conversions with PTR_ERR() and friends. Based upon a report and patch from Adam Richter. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Adam J. Richter" <adam@yggdrasil.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Josef Bacik	c587f0c0a6	ext3: fix wrong gfp type under transaction There are several places where we make allocations with GFP_KERNEL while under a transaction, which could lead to an assertion panic or lockup if under memory pressure. This patch switches these problem areas to use GFP_NOFS to keep these problems from happening. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Jan Kara	87cb055bc1	quota: add possibly missing iput() when quotaon and quotaoff races We should always put inode we have reference to, even if quota was reenabled in the mean time. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Randy Dunlap	0cf01f6685	jbd: fix jbd kernel-doc notation Fix kernel-doc notation in jbd. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Quentin Barnes	6cb2a21049	aio: bad AIO race in aio_complete() leads to process hang My group ran into a AIO process hang on a 2.6.24 kernel with the process sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come and it never would. We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box ("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't shared between all eight processors, but is L2 is shared between between all two processors on the Core2Duo we use. My analysis of the hang is if you go down to the second while-loop in read_events(), what happens on processor #1: 1) add_wait_queue_exclusive() adds thread to ctx->wait 2) aio_read_evt() to check tail 3) if aio_read_evt() returned 0, call [io_]schedule() and sleep In aio_complete() with processor #2: A) info->tail = tail; B) waitqueue_active(&ctx->wait) C) if waitqueue_active() returned non-0, call wake_up() The way the code is written, step 1 must be seen by all other processors before processor 1 checks for pending events in step 2 (that were recorded by step A) and step A by processor 2 must be seen by all other processors (checked in step 2) before step B is done. The race I believed I was seeing is that steps 1 and 2 were effectively swapped due to the __list_add() being delayed by the L2 cache not shared by some of the other processors. Imagine: proc 2: just before step A proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet proc 1, step 2: checks tail and sees no pending events proc 2, step A: updates tail proc 1, step 3: calls [io_]schedule() and sleeps proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup so proc 1 sleeps indefinitely My patch adds a memory barrier between steps A and B. It ensures that the update in step 1 gets seen on processor 2 before continuing. If processor 1 was just before step 1, the memory barrier makes sure that step A (update tail) gets seen by the time processor 1 makes it to step 2 (check tail). Before the patch our AIO process would hang virtually 100% of the time. After the patch, we have yet to see the process ever hang. Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com> Reviewed-by: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ We should probably disallow that "if (waitqueue_active()) wake_up()" coding pattern, because it's so often buggy wrt memory ordering ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Fred Isaman	f8512ad0da	nfs: don't ignore return value from nfs_pageio_add_request Ignoring the return value from nfs_pageio_add_request can cause deadlocks. In read path: call nfs_pageio_add_request from readpage_async_filler assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_pageio_add_request to return 0, WITHOUT adding the original request. BUT, since return code is ignored, readpage_async_filler assumes it has been added, and does nothing further, leaving page locked. do_generic_mapping_read will eventually call lock_page, resulting in deadlock In write path: page is marked dirty by generic_perform_write nfs_writepages is called call nfs_pageio_add_request from nfs_page_async_flush assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_page_async_flush to return 0, WITHOUT adding the original request, yet marking the request as locked (PG_BUSY) and in writeback, clearing dirty marks. The next time a write is done to the page, deadlock will result as nfs_write_end calls nfs_update_request Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-19 17:59:02 -04:00
Al Viro	a02f76c34d	[PATCH] get stack footprint of pathname resolution back to relative sanity Somebody had put struct nameidata in stack frame of link_path_walk(). Unfortunately, there are certain realities to deal with: * It's in the middle of recursion. Depth is equal to the nesting depth of symlinks, i.e. up to 8. * struct namiedata is, even if one discards the intent junk, at least 12 pointers + 5 ints. * moreover, adding a stack frame is not free in that situation. * there are fs methods called on top of that, and they also have stack footprint. * kernel stack is not infinite. The thing is, even if one chooses to deal with -ESTALE that way (and it's one hell of an overkill), the only thing that needs to be preserved is vfsmount + dentry, not the entire struct nameidata. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:55:46 -04:00
Al Viro	b4d232e65f	[PATCH] double iput() on failure exit in hugetlb once we'd done d_instantiate(), we should only do dput(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:55:01 -04:00
Dave Hansen	430e285e08	[PATCH] fix up new filp allocators Some new uses of get_empty_filp() have crept in; switched to alloc_file() to make sure that pieces of initialization won't be missing. We really need to kill get_empty_filp(). [AV] fixed dentry leak on failure exit in anon_inode_getfd() Cc: Erez Zadok <ezk@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J Bruce Fields" <bfields@fieldses.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:54:05 -04:00
Christoph Hellwig	322ee5b36e	[PATCH] check for null vfsmount in dentry_open() Make sure no-one calls dentry_open with a NULL vfsmount argument and crap out with a stacktrace otherwise. A NULL file->f_vfsmnt has always been problematic, but with the per-mount r/o tracking we can't accept anymore at all. [AV] the last place that passed NULL had been eliminated by the previous patch (reiserfs xattr stuff) Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:50:44 -04:00
Jeff Mahoney	3227e14c3c	[PATCH] reiserfs: eliminate private use of struct file in xattr After several posts and bug reports regarding interaction with the NULL nameidata, here's a patch to clean up the mess with struct file in the reiserfs xattr code. As observed in several of the posts, there's really no need for struct file to exist in the xattr code. It was really only passed around due to the f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it. reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the struct file passed to it, and the xattr code uses a private version of reiserfs_readdir() to enumerate the xattr directories. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:49:36 -04:00
Al Viro	f382d6e631	[PATCH] sanitize hppfs * hppfs_iget() and its users are racy; there's no need to pollute icache anyway, new_inode() works fine and is safe, unlike the current kludges (these relied on overwriting ->i_ino before another iget_locked() gets to that one - and did it after unlocking). * merge hppfs_iget()/init_inode()/hppfs_read_inode(), while we are at it. * to pass proper vfsmount to dentry_open() store the reference in hppfs superblock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --	2008-03-19 06:42:18 -04:00
Dave Hansen	1dd0dd111f	hppfs pass vfsmount to dentry_open() Here's patch for hppfs that uses vfs_kern_mount to make sure it always has a procfs instance and passed the vfsmount on through the inode private data. Also fixes a procfs file_system_type leak for every attempted hppfs mount. [ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ] Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:39:56 -04:00
Linus Torvalds	f920bb6f5f	Merge branch 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] export sessionid alongside the loginuid in procfs	2008-03-18 08:43:59 -07:00
Eric Paris	1e0bd7550e	[PATCH] export sessionid alongside the loginuid in procfs Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-18 10:51:22 -04:00
Linus Torvalds	92f53c6f1e	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Revert "unexport bio_{,un}map_user" relay: fix subbuf_splice_actor() adding too many pages The ps2esdi driver was marked as BROKEN more than two years ago due to being	2008-03-18 07:43:14 -07:00
David S. Miller	2f633928cb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-17 23:44:31 -07:00
Al Viro	e6f1cebf71	[NET] endianness noise: INADDR_ANY Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:44:53 -07:00
Al Viro	8a4e98d9d7	[PATCH] restore export of do_kern_mount() vfs_kern_mount() requires having a reference to fs type, which makes it impossible for module to create procfs, etc. private mount. Open-coding is not an option, since e.g. put_filesystem() is _not_ exported, and for a good reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-17 22:58:04 -04:00
Jens Axboe	40044ce0bf	Revert "unexport bio_{,un}map_user" Outside users like asmlib uses the mapping functions. API wise, the export is definitely sane. It's a better idea to keep this export than to require external users to open-code this piece of code instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-17 21:14:40 +01:00
Al Viro	3d10a15d69	hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage oops and fs corruption; the latter can happen even on valid fs in case of oom. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-17 09:46:55 -07:00
J. Bruce Fields	b663c6fd98	nfsd: fix oops on access from high-numbered ports This bug was always here, but before my commit `6fa02839bf` ("recheck for secure ports in fh_verify"), it could only be triggered by failure of a kmalloc(). After that commit it could be triggered by a client making a request from a non-reserved port for access to an export marked "secure". (Exports are "secure" by default.) The result is a struct svc_export with a reference count one too low, resulting in likely oopses next time the export is accessed. The reference counting here is not straightforward; a later patch will clean up fh_verify(). Thanks to Lukas Hejtmanek for the bug report and followup. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-14 16:49:15 -07:00
Steve French	8b1327f6ed	[CIFS] file create with acl support enabled is slow Shirish Pargaonkar noted: With cifsacl mount option, when a file is created on the Windows server, exclusive oplock is broken right away because the get cifs acl code again opens the file to obtain security descriptor. The client does not have the newly created file handle or inode in any of its lists yet so it does not respond to oplock break and server waits for its duration and then responds to the second open. This slows down file creation signficantly. The fix is to pass the file descriptor to the get cifsacl code wherever available so that get cifs acl code does not send second open (NT Create ANDX) and oplock is not broken. CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-14 22:37:16 +00:00
Steve French	ebe8912be2	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-14 19:29:18 +00:00
Steve French	50531444fa	[CIFS] Fix mtime on cp -p when file data cached but written out too late Kukks noticed that cp -p can write out file data too late, after the timestamp is already set. This was introduced as an unintentional sideeffect of the change in an earlier patch (see below) which fixed some delayed return code propagation. `cea218054a` Author: Jeff Layton <jlayton@redhat.com> Date: Tue Nov 20 23:19:03 2007 +0000 Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-14 19:21:31 +00:00
Marcelo Tosatti	fb39380b8d	pagemap: proper read error handling Fix pagemap_read() error handling by releasing acquired resources and checking for get_user_pages() partial failure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-13 13:11:43 -07:00
Linus Torvalds	609eb39c8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) [SCTP]: Fix local_addr deletions during list traversals. net: fix build with CONFIG_NET=n [TCP]: Prevent sending past receiver window with TSO (at last skb) rt2x00: Add new D-Link USB ID rt2x00: never disable multicast because it disables broadcast too libertas: fix the 'compare command with itself' properly drivers/net/Kconfig: fix whitespace for GELIC_WIRELESS entry [NETFILTER]: nf_queue: don't return error when unregistering a non-existant handler [NETFILTER]: nfnetlink_queue: fix EPERM when binding/unbinding and instance 0 exists [NETFILTER]: nfnetlink_log: fix EPERM when binding/unbinding and instance 0 exists [NETFILTER]: nf_conntrack: replace horrible hack with ksize() [NETFILTER]: nf_conntrack: add \n to "expectation table full" message [NETFILTER]: xt_time: fix failure to match on Sundays [NETFILTER]: nfnetlink_log: fix computation of netlink skb size [NETFILTER]: nfnetlink_queue: fix computation of allocated size for netlink skb. [NETFILTER]: nfnetlink: fix ifdef in nfnetlink_compat.h [NET]: include <linux/types.h> into linux/ethtool.h for __u* typedef [NET]: Make /proc/net a symlink on /proc/self/net (v3) RxRPC: fix rxrpc_recvmsg()'s returning of msg_name net/enc28j60: oops fix ...	2008-03-12 13:08:09 -07:00
Andrew Morton	b2211a361a	net: fix build with CONFIG_NET=n fs/built-in.o:(.rodata+0x1134): undefined reference to `proc_net_inode_operations' fs/built-in.o:(.rodata+0x1138): undefined reference to `proc_net_operations' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-11 18:03:35 -07:00
Steve French	bc5b6e24a1	[CIFS] Fix build problem Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-11 21:07:48 +00:00
Steve French	5b4d4771e2	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-11 19:15:41 +00:00
Tao Ma	cdef59a94c	ocfs2: Fix NULL pointer dereferences in o2net In some situations, ocfs2_set_nn_state might get called with sc = NULL and valid = 0. If sc = NULL, we can't dereference it to get the o2nm_node member. Instead, do what o2net_initialize_handshake does and use NULL when calling o2net_reconnect_delay and o2net_idle_timeout. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:19 -07:00
Sunil Mushran	c824c3c723	ocfs2/dlm: dlm_thread should not sleep while holding the dlm_spinlock This patch addresses the bug in which the dlm_thread could go to sleep while holding the dlm_spinlock. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:17 -07:00
Sunil Mushran	535f7026fd	ocfs2/dlm: Print message showing the recovery master Knowing the dlm recovery master helps in debugging recovery issues. This patch prints a message on the recovery master node. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:14 -07:00
Sunil Mushran	b31cfc0237	ocfs2/dlm: Add missing dlm_lockres_put()s dlm_master_request_handler() forgot to put a lockres when dlm_assert_master_worker() failed or was skipped. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:12 -07:00
Sunil Mushran	52987e2ab4	ocfs2/dlm: Add missing dlm_lockres_put()s in migration path During migration, the recovery master node may be asked to master a lockres it may not know about. In that case, it would not only have to create a lockres and add it to the hash, but also remember to to do the _put_ corresponding to the kref_init in dlm_init_lockres(), as soon as the migration is completed. Yes, we don't wait for the dlm_purge_lockres() to do that matching put. Note the ref added for it being in the hash protects the lockres from being freed prematurely. This patch adds that missing put, as described above, to plug a memleak. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:11 -07:00
Sunil Mushran	2c5c54aca9	ocfs2/dlm: Add missing dlm_lock_put()s Normally locks for remote nodes are freed when that node sends an UNLOCK message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action to do an extra put on the lock at the end. However, there are times when the master node has to free the locks for the remote nodes forcibly. Two cases when this happens are: 1. When the master has migrated the lockres plus all locks to another node. 2. When the master is clearing all the locks of a dead node. It was in the above two conditions that the dlm was missing the extra put. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:09 -07:00
Tao Ma	4338ab6a75	ocfs2: Fix an endian bug in online resize. In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we were putting cpu byteorder values into it. Swap things to the right endian before storing. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:07 -07:00
Jan Engelhardt	90d99779a4	[PATCH] [OCFS2]: constify function pointer tables Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:57 -07:00
Joel Becker	0f71b7b40f	ocfs2: Fix endian bug in o2dlm protocol negotiation. struct dlm_query_join_packet is made up of four one-byte fields. They are effectively in big-endian order already. However, little-endian machines swap them before putting the packet on the wire (because query_join's response is a status, and that status is treated as a u32 on the wire). Thus, a big-endian and little-endian machines will treat this structure differently. The solution is to have little-endian machines swap the structure when converting from the structure to the u32 representation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:54 -07:00
Tao Ma	2af37ce82d	ocfs2: Use dlm_print_one_lock_resource for lock resource print __dlm_print_one_lock_resource must be called with spin_lock the res->spinlock. While in some cases, we use it without this precondition and lead to the failure of assert_spin_locked. So call dlm_print_one_lock_resource instead. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:50 -07:00
Andrew Morton	3a4780a85d	[PATCH] fs/ocfs2/dlm/dlmdomain.c: fix printk warning fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels': fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:39 -07:00
Harvey Harrison	55f78e1771	[CIFS] cifs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-10 17:14:34 +00:00
Igor Mammedov	7962670e64	[CIFS] DFS patch that connects inode with dfs handling ops if DFS junction point Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-09 03:44:18 +00:00
Linus Torvalds	4c1aa6f8b9	Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings NFS: Fix the fsid revalidation in nfs_update_inode() SUNRPC: Fix a nfs4 over rdma transport oops NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c	2008-03-07 12:08:07 -08:00
Trond Myklebust	4e99a1ff34	NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings As long as the directory contents haven't changed, we should just let the path walk proceed to cross the mountpoint. Apart from being an optimisation in the case of 'nohide' mountpoint traversals, it also fixes an issue with referrals: referral inodes don't have valid filehandles, so calling nfs_revalidate_inode() on them is a bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-07 14:35:41 -05:00
Trond Myklebust	c37dcd334c	NFS: Fix the fsid revalidation in nfs_update_inode() When we detect that we've crossed a mountpoint on the remote server, we must take care not to use that inode to revalidate the fsid on our current superblock. To do so, we label the inode as a remote mountpoint, and check for that in nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-07 14:35:37 -05:00
Trond Myklebust	af1b8c2ff7	NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c O_SYNC is stored in filp->f_flags. Thanks to Al Viro for pointing out the bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-07 14:33:40 -05:00
Pavel Emelyanov	e9720acd72	[NET]: Make /proc/net a symlink on /proc/self/net (v3) Current /proc/net is done with so called "shadows", but current implementation is broken and has little chances to get fixed. The problem is that dentries subtree of /proc/net directory has fancy revalidation rules to make processes living in different net namespaces see different entries in /proc/net subtree, but currently, tasks see in the /proc/net subdir the contents of any other namespace, depending on who opened the file first. The proposed fix is to turn /proc/net into a symlink, which points to /proc/self/net, which in turn shows what previously was in /proc/net - the network-related info, from the net namespace the appropriate task lives in. # ls -l /proc/net lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net In other words - this behaves like /proc/mounts, but unlike "mounts", "net" is not a file, but a directory. Changes from v2: * Fixed discrepancy of /proc/net nlink count and selinux labeling screwup pointed out by Stephen. To get the correct nlink count the ->getattr callback for /proc/net is overridden to read one from the net->proc_net entry. To make selinux still work the net->proc_net entry is initialized properly, i.e. with the "net" name and the proc_net parent. Selinux fixes are Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Changes from v1: * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-07 11:08:40 -08:00
Linus Torvalds	910da1a48e	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] fix inode leak in xfs_iget_core() [XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many	2008-03-06 08:14:00 -08:00
David Chinner	72772a3b5b	[XFS] fix inode leak in xfs_iget_core() If the radix_tree_preload() fails, we need to destroy the inode we just read in before trying again. This could leak xfs_vnode structures when there is memory pressure. Noticed by Christoph Hellwig. SGI-PV: 977823 SGI-Modid: xfs-linux-melb:xfs-kern:30606a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-03-06 16:38:50 +11:00
David Chinner	92d9cd1059	[XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many wakeups Idle state is not being detected properly by the xfsaild push code. The current idle state is detected by an empty list which may never happen with mostly idle filesystem or one using lazy superblock counters. A single dirty item in the list that exists beyond the push target can result repeated looping attempting to push up to the target because it fails to check if the push target has been acheived or not. Fix by considering a dirty list with everything past the target as an idle state and set the timeout appropriately. SGI-PV: 977545 SGI-Modid: xfs-linux-melb:xfs-kern:30532a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-03-06 16:38:17 +11:00
Eric Paris	f9c3a38021	NFS: use new LSM interfaces to explicitly set mount options NFS and SELinux worked together previously because SELinux had NFS specific knowledge built in. This design was approved by both groups back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and the usage of nfs_clone_mount_data showed this to be a poor fragile solution. This patch fixes the NFS functionality regression by making use of the new LSM interfaces to allow an FS to explicitly set its own mount options. The explicit setting of mount options is done in the nfs get_sb functions which are called before the generic vfs hooks try to set mount options for filesystems which use text mount data. This does not currently support NFSv4 as that functionality did not exist in previous kernels and thus there is no regression. I will be adding the needed code, which I believe to be the exact same as the v3 code, in nfs4_get_sb for 2.6.26. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-03-06 08:40:59 +11:00
Eric Paris	e000752989	LSM/SELinux: Interfaces to allow FS to control mount options Introduce new LSM interfaces to allow an FS to deal with their own mount options. This includes a new string parsing function exported from the LSM that an FS can use to get a security data blob and a new security data blob. This is particularly useful for an FS which uses binary mount data, like NFS, which does not pass strings into the vfs to be handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK when dealing with binary mount data. If the binary mount data is less than one page the copy_page() in security_sb_copy_data() can cause an illegal page fault and boom. Remove all NFSisms from the SELinux code since they were broken by past NFS changes. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-03-06 08:40:53 +11:00
Harvey Harrison	15732a1cb5	jfs: replace __inline with inline Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2008-03-05 14:38:22 -06:00
Linus Torvalds	2c6f2db13a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: debugfs: fix sparse warnings Driver core: Fix cleanup when failing device_add(). driver core: Remove dpm_sysfs_remove() from error path of device_add() PM: fix new mutex-locking bug in the PM core PM: Do not acquire device semaphores upfront during suspend kobject: properly initialize ksets sysfs: CONFIG_SYSFS_DEPRECATED fix driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED	2008-03-04 16:37:35 -08:00
Josef Bacik	92587216f8	ext3: fix mount option parsing The "resize" option won't be noticed as it comes after the NULL option, so if you try to mount (or in this case remount) with that option it won't be recognized. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
Michael Halcrow	e4465fdaeb	eCryptfs: make ecryptfs_prepare_write decrypt the page When the page is not up to date, ecryptfs_prepare_write() should be acting much like ecryptfs_readpage(). This includes the painfully obvious step of actually decrypting the page contents read from the lower encrypted file. Note that this patch resolves a bug in eCryptfs in 2.6.24 that one can produce with these steps: # mount -t ecryptfs /secret /secret # echo "abc" > /secret/file.txt # umount /secret # mount -t ecryptfs /secret /secret # echo "def" >> /secret/file.txt # cat /secret/file.txt Without this patch, the resulting data returned from cat is likely to be something other than "abc\ndef\n". (Thanks to Benedikt Driessen for reporting this.) Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Benedikt Driessen <bdriessen@escrypt.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Julia Lawall	acc1f3ede9	fs/reiserfs/super.c: correct use of ! and & In commit `e6bafba5b4` ("wmi: (!x & y) strikes again"), a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E1,E2; @@ ( !E1 & !E2 \| - !E1 & E2 + !(E1 & E2) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Jan Kara	e3892296de	vfs: fix NULL pointer dereference in fsync_buffers_list() Fix NULL pointer dereference in fsync_buffers_list() introduced by recent fix of races in private_list handling. Since bh->b_assoc_map has been cleared in __remove_assoc_queue() we should really use original value stored in the 'mapping' variable. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:10 -08:00
Roland McGrath	d31472b6d4	core dump: user_regset writeback This makes the user_regset-based core dump code call user_regset writeback hooks when available. This is necessary groundwork to allow IA64 to set CORE_DUMP_USE_REGSET. Cc: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:10 -08:00
Harvey Harrison	3634634edd	debugfs: fix sparse warnings extern does not belong in C files, move declaration to linux/debugfs.h fs/debugfs/file.c:42:30: warning: symbol 'debugfs_file_operations' was not declared. Should it be static? fs/debugfs/file.c:54:31: warning: symbol 'debugfs_link_operations' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-03-04 14:47:06 -08:00
Linus Torvalds	c5750ee69f	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: [PATCH] fs/ocfs2/aops.c: Correct use of ! and & [2.6 patch] ocfs2: make dlm_do_assert_master() static [2.6 patch] make ocfs2_downconvert_thread() static [2.6 patch] fs/ocfs2/: possible cleanups [PATCH] ocfs2: le*_add_cpu conversion ocfs2: Fix writeout in ocfs2_data_convert_worker() ocfs2: Enable localalloc for local mounts	2008-03-04 11:27:32 -08:00
Linus Torvalds	ce932967b9	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: fix blkdev_issue_flush() not detecting and passing EOPNOTSUPP back block: fix shadowed variable warning in blk-map.c block: remove extern on function definition cciss: remove READ_AHEAD define and use block layer defaults make cdrom.c:check_for_audio_disc() static block/genhd.c: proper externs unexport blk_rq_map_user_iov unexport blk_{get,put}_queue block/genhd.c: cleanups proper prototype for blk_dev_init() block/blk-tag.c should #include "blk.h" Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory block: separate out padding from alignment block: restore the meaning of rq->data_len to the true data length resubmit: cciss: procfs updates to display info about many splice: only return -EAGAIN if there's hope of more data block: fix kernel-docbook parameters and files	2008-03-04 08:08:05 -08:00
Adrian Bunk	a0db701a6b	block/genhd.c: proper externs This patch adds proper externs for two structs in include/linux/genhd.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:28:36 +01:00
Jens Axboe	02cf01aea5	splice: only return -EAGAIN if there's hope of more data sys_tee() currently is a bit eager in returning -EAGAIN, it may do so even if we don't have a chance of anymore data becoming available. So improve the logic and only return -EAGAIN if we have an attached writer to the input pipe. Reported by Johann Felix Soden <johfel@gmx.de> and Patrick McManus <mcmanus@ducksong.com>. Tested-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:14:39 +01:00
Steve French	966ea8c4b7	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-04 04:22:00 +00:00
Julia Lawall	86c838b03d	[PATCH] fs/ocfs2/aops.c: Correct use of ! and & In commit `e6bafba5b4`, a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-03 15:50:21 -08:00
Adrian Bunk	05488bbebe	[2.6 patch] ocfs2: make dlm_do_assert_master() static This patch makes the needlessly global dlm_do_assert_master() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-03 15:50:21 -08:00
Adrian Bunk	200bfae37a	[2.6 patch] make ocfs2_downconvert_thread() static This patch makes the needlessly global ocfs2_downconvert_thread() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-03 15:50:21 -08:00
Adrian Bunk	006000566d	[2.6 patch] fs/ocfs2/: possible cleanups This patch contains the following cleanups that are now possible: - make the following needlessly global functions static: - dlmglue.c:ocfs2_process_blocked_lock() - heartbeat.c:ocfs2_node_map_init() - #if 0 the following unused global function plus support functions: - heartbeat.c:ocfs2_node_map_is_only() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-03 15:50:21 -08:00
Marcin Slusarz	0dd3256e04	[PATCH] ocfs2: le*_add_cpu conversion replace all: little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) + expression_in_cpu_byteorder); with: leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-03 15:50:21 -08:00
Mark Fasheh	1044e401af	ocfs2: Fix writeout in ocfs2_data_convert_worker() Commit `f1f540688e` "optimized" ocfs2_data_convert_worker() to "only do work for regular files". Unfortunately, I left out a '!', which casued it to skip regular files. This was hidden from testing until recently because the default data journaling mode (data=ordered) doesn't exercise this code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2008-03-03 15:50:21 -08:00
Sunil Mushran	7ad8b3d30e	ocfs2: Enable localalloc for local mounts Commit `2fbe8d1ebe` disabled localalloc for local mounts. This caused issues as ocfs2 uses localalloc to provide write locality. This patch enables localalloc for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-03 15:50:21 -08:00
Randy Dunlap	78a4a50a86	docbook: fix filesystems.tmpl source files Fix docbook problems in filesystems.tmpl. These cause the generated docbook to be incorrect. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-03 10:47:13 -08:00
Linus Torvalds	a64e715fc7	Allow ARG_MAX execve string space even with a small stack limit The new code that removed the limitation on the execve string size (which was historically 32 pages) replaced it with a much softer limit based on RLIMIT_STACK which is usually much larger than the traditional limit. See commit `b6a2fea393` ("mm: variable length argument support") for details. However, if you have a small stack limit (perhaps because you need lots of stacks in a threaded environment), the new heuristic of allowing up to 1/4th of RLIMIT_STACK to be used for argument and environment strings could actually be smaller than the old limit. So just say that it's ok to have up to ARG_MAX strings regardless of the value of RLIMIT_STACK, and check the rlimit only when going over that traditional limit. (Of course, if you actually have a really small stack limit, the whole stack itself will be limited before you hit ARG_MAX, but that has always been true and is clearly the right behaviour anyway). Acked-by: Carlos O'Donell <carlos@codesourcery.com> Cc: Michael Kerrisk <michael.kerrisk@googlemail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ollie Wild <aaw@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-03 10:12:14 -08:00
Steve French	0dbd888936	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-01 18:29:55 +00:00
Josef Jeff Sipek	1bd960ee2b	[XFS] If you mount an XFS filesystem with no mount options at all, then the "ikeep" option is set rather than "noikeep". This regression was introduced in 970451. With no mount options specified, xfs_parseargs() does the following: int ikeep = 0; args->flags \|= XFSMNT_BARRIER; args->flags2 \|= XFSMNT2_COMPAT_IOSIZE; if (!options) goto done; It only sets the above two options by default and before, it also used to set XFSMNT_IDELETE by default. If options are specified, then if (!(args->flags & XFSMNT_DMAPI) && !ikeep) args->flags \|= XFSMNT_IDELETE; is executed later on which is skipped by the "goto done;" above. The solution is to invert the logic. SGI-PV: 977771 SGI-Modid: xfs-linux-melb:xfs-kern:30590a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-28 20:37:56 -08:00
Linus Torvalds	7704a8b6fc	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac Remove empty file fs/xfs/Makefile-linux-2.6.	2008-02-26 07:55:29 -08:00
Linus Torvalds	adefe11c53	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add missing ext4_journal_stop() ext4: ext4_find_next_zero_bit needs an aligned address on some arch ext4: set EXT4_EXTENTS_FL only for directory and regular files ext4: Don't mark filesystem error if fallocate fails ext4: Fix BUG when writing to an unitialized extent ext4: Don't use ext4_dec_count() if not needed ext4: modify block allocation algorithm for the last group ext4: Don't claim block from group which has corrupt bitmap ext4: Get journal write access before modifying the extent tree ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent() ext4: Don't leave behind a half-created inode if ext4_mkdir() fails ext4: Fix kernel BUG at fs/ext4/mballoc.c:910! ext4: Fix locking hierarchy violation in ext4_fallocate() Remove incorrect BKL comments in ext4	2008-02-26 07:50:16 -08:00
Lachlan McIlroy	ef8ece55d9	[XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac platform. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30559a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-26 17:05:44 +11:00
Lachlan McIlroy	db69c915e6	[XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac platform. SGI-PV: 974005 SGI-Modid: xfs-linux-melb:xfs-kern:30558a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-26 17:05:37 +11:00
Steve French	0b442d2c28	[CIFS] remove unused variable Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-26 03:44:02 +00:00
Akinobu Mita	5606bf5d0c	ext4: add missing ext4_journal_stop() Add missing ext4_journal_stop() in error handling. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Stephen Tweedie <sct@redhat.com> Cc: adilger@clusterfs.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mingming Cao <cmm@us.ibm.com>	2008-02-25 15:37:42 -05:00
Christoph Hellwig	75f12983d9	[CIFS] consolidate duplicate code in posix/unix inode handling Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-25 20:25:21 +00:00
Hiroshi Shimamoto	13d77c37ca	latencytop: change /proc task_struct access method Change getting task_struct by get_proc_task() at read or write time, and returns -ESRCH if get_proc_task() returns NULL. This is same behavior as other /proc files. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:18 +01:00
Hiroshi Shimamoto	d6643d12cb	latencytop: fix memory leak on latency proc file At lstats_open(), calling get_proc_task() gets task struct, but it never put. put_task_struct() should be called when releasing. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:17 +01:00
Hiroshi Shimamoto	ae0027869d	latencytop: fix kernel panic while reading latency proc file Reading /proc/<pid>/latency or /proc/<pid>/task/<tid>/latency could cause NULL pointer dereference. In lstats_open(), get_proc_task() can return NULL, in which case the kernel will oops at lstats_show_proc() because m->private is NULL. When get_proc_task() returns NULL, the kernel should return -ENOENT. This can be reproduced by the following script. while : do date bash -c 'ls > ls.$$' & pid=$! cat /proc/$pid/latency & cat /proc/$pid/latency & cat /proc/$pid/latency & cat /proc/$pid/latency done Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-25 16:34:17 +01:00
Eugene Teo	8808117ca5	proc: add RLIMIT_RTTIME to /proc/<pid>/limits RLIMIT_RTTIME was introduced to allow the user to set a runtime timeout on real-time tasks: http://lkml.org/lkml/2007/12/18/218. This patch updates /proc/<pid>/limits with the new rlimit. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:15 -08:00
Christoph Hellwig	45254b4fb2	efs: move headers out of include/linux/ Merge include/linux/efs_fs{_i,_dir}.h into fs/efs/efs.h. efs_vh.h remains there because this is the IRIX volume header and shouldn't really be handled by efs but by the partitioning code. efs_sb.h remains there for now because it's exported to userspace. Of course this wrong and aboot should have a copy of it's own, but I'll leave that to a separate patch to avoid any contention. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:15 -08:00
Hans Rosenfeld	745329c4a2	/proc/pid/pagemap: fix PM_SPECIAL macro There seems to be a bug in the PM_SPECIAL macro for /proc/pid/pagemap. I think masking out those other bits makes more sense then setting all those mask bits. Signed-off-by: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:13 -08:00
Roel Kluin	f81e8a4387	ufs: fix parenthesisation in ufs_set_fs_state() This bug snuck in with commit `252e211e90` Author: Mark Fortescue <mark@mtfhpc.demon.co.uk> Date: Tue Oct 16 23:26:31 2007 -0700 Add in SunOS 4.1.x compatible mode for UFS Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Acked-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:13 -08:00
Miklos Szeredi	1a823ac9ff	fuse: fix permission checking I added a nasty local variable shadowing bug to fuse in 2.6.24, with the result, that the 'default_permissions' mount option is basically ignored. How did this happen? - old err declaration in inner scope - new err getting declared in outer scope - 'return err' from inner scope getting removed - old declaration not being noticed -Wshadow would have saved us, but it doesn't seem practical for the kernel :( More testing would have also saved us :(( Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-23 17:12:13 -08:00
Aneesh Kumar K.V	ffad0a44b7	ext4: ext4_find_next_zero_bit needs an aligned address on some arch ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit and use them in the mballoc. Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286 Eric Sandeen debugged the problem and suggested the fix. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-23 01:38:34 -05:00
Aneesh Kumar K.V	42bf0383d1	ext4: set EXT4_EXTENTS_FL only for directory and regular files In addition, don't inherit EXT4_EXTENTS_FL from parent directory. If we have a directory with extent flag set and later mount the file system with -o noextents, the files created in that directory will also have extent flag set but we would not have called ext4_ext_tree_init for them. This will cause error later when we are verifying the extent header Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-25 16:38:03 -05:00
Aneesh Kumar K.V	2c98615d3b	ext4: Don't mark filesystem error if fallocate fails If we fail to allocate blocks don't call ext4_error. Also don't hide errors from ext4_get_blocks_wrap Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-25 15:41:35 -05:00
Mingming Cao	f5ab0d1f8f	ext4: Fix BUG when writing to an unitialized extent This patch fixes a bug when writing to preallocated but uninitialized blocks, which resulted in a BUG in fs/buffer.c saying that the buffer is not mapped. When writing to a file, ext4_get_block_wrap() is called with create=1 in order to request that blocks be allocated if necessary. It currently calls ext4_get_blocks() with create=0 in order to do a lookup first. If the inode contains an unitialized data block, the buffer head is left unampped, which ext4_get_blocks_wrap() returns, causing the BUG. We fix this by checking to see if the buffer head is unmapped, and if so, we make sure the the buffer head is mapped by calling ext4_ext_get_blocks with create=1. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-25 15:29:55 -05:00
Linus Torvalds	1a4c6be4ac	Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: Wrap buffers used for rpc debug printks into RPC_IFDEBUG nfs: fix sparse warnings NFS: flush signals before taking down callback thread	2008-02-21 17:19:48 -08:00
Pavel Emelyanov	5216a8e70e	Wrap buffers used for rpc debug printks into RPC_IFDEBUG Sorry for the noise, but here's the v3 of this compilation fix :) There are some places, which declare the char buf[...] on the stack to push it later into dprintk(). Since the dprintk sometimes (if the CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers cause gcc to produce appropriate warnings. Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to compile them out when not needed. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-21 18:42:29 -05:00
David Teigland	599e0f584d	dlm: fix rcom_names message to self The recent patch to validate data lengths in rcom_names messages failed to account for fake messages a node directs to itself before ever sending it. In this case we need to fill in the message length in the header for the validation code to use. Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-21 15:19:54 -06:00
Linus Torvalds	1803f3389b	Remove empty file remnants that were left in the tree by mistake Noted by various people (Sam, Jeff, Roland..) Commit `58b7983d15` intended to remove the xfs "Makefile-linux-2.6" file, but it was mistakenly still left in the tree as a empty file, and would cause git to correctly complain about a tracked file being removed after a "make distclean" (which removes empty files as garbage). And the asm-x86/desc_64.h file was supposed to be removed by commit `c81c6ca45a`, but instead stayed around containing just a single newline. Get rid of them both properly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-20 19:56:01 -08:00
Harvey Harrison	90dc7d2796	nfs: fix sparse warnings fs/nfs/nfs4state.c:788:34: warning: Using plain integer as NULL pointer fs/nfs/delegation.c:52:34: warning: Using plain integer as NULL pointer fs/nfs/idmap.c:312:12: warning: Using plain integer as NULL pointer fs/nfs/callback_xdr.c:257:6: warning: Using plain integer as NULL pointer fs/nfs/callback_xdr.c:270:6: warning: Using plain integer as NULL pointer fs/nfs/callback_xdr.c:281:6: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-20 16:15:44 -05:00
Jeff Layton	1227a74e2e	NFS: flush signals before taking down callback thread Now that the reference counting on the callback thread is working as expected, it uncovers another problem. Peter Staubach noticed while testing that patch on an older kernel that he would occasionally see this printk in rpc_register fire: "RPC: failed to contact portmap (errno -512). The NFSv4 callback thread is signaled by nfs_callback_down(), but never flushes that signal. All of the shutdown processing is done with that signal pending. This makes it fail the call to unregister the port with the portmapper. In actuality, this rpc_register call isn't necessary at all since the port isn't actually registered with the portmapper anymore. Regardless, there doesn't seem to be any reason to leave the signal pending while the thread is being shut down and flushing it should generally silence that printk. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-20 13:32:43 -05:00
Adrian Bunk	86b6c7a7f7	fs/block_dev.c: remove #if 0'ed code Commit `b2e895dbd8` #if 0'ed this code stating: <-- snip --> [PATCH] revert blockdev direct io back to 2.6.19 version Andrew Vasquez is reporting as-iosched oopses and a 65% throughput slowdown due to the recent special-casing of direct-io against blockdevs. We don't know why either of these things are occurring. The patch minimally reverts us back to the 2.6.19 code for a 2.6.20 release. <-- snip --> It has since been dead code, and unless someone wants to revive it now it's time to remove it. This patch also makes bio_release_pages() static again and removes the ki_bio_count member from struct kiocb, reverting changes that had been done for this dead code. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2008-02-19 10:04:00 +01:00
Adrian Bunk	4c54ac62dc	make struct def_blk_aops static This patch makes the needlessly global struct def_blk_aops static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2008-02-19 10:04:00 +01:00
Steve French	e086fcea86	[CIFS] fix build break when proc disabled Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-18 04:03:58 +00:00
Lachlan McIlroy	c58310bf49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus	2008-02-18 13:51:42 +11:00
Lachlan McIlroy	269cdfaf76	[XFS] Added quota targets and removed dmapi directory Fixes build failures introduced by bad merge to mainline.	2008-02-18 13:06:17 +11:00
Eric Sandeen	794f744b22	[XFS] Fix up xfs out-of-tree builds. (a.k.a. external modules) Change -I include directives to find headers in the out-of-tree spot. This allows a directory containing only xfs files to be built as: SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29878a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-18 12:59:11 +11:00
Andi Kleen	58b7983d15	[XFS] Remove Makefile wrappers in XFS Makefile (and Kbuild) would include Makefile-linux-26 I doubt XFS will really still compile on 2.4; so drop that. This moves Makefile-linux-26 into Makefile and drops Kbuild. Also having wrappers as both Kbuild and Makefile seemed redundant anyways. The patch is relatively large because it renames a file, but no functional changes. SGI-PV: 971050 SGI-Modid: xfs-linux-melb:xfs-kern:29781a Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-18 12:48:03 +11:00
Steve French	0a3abcf75b	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-02-15 21:06:08 +00:00
Christoph Hellwig	70eff55d2d	[CIFS] factoring out common code in get_inode_info functions Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-15 20:55:05 +00:00
Theodore Ts'o	825f1481ea	ext4: Don't use ext4_dec_count() if not needed The ext4_dec_count() function is only needed when dropping the i_nlink count on inodes which are (or which could be) directories. If we know that the inode in question can't possibly be a directory, use drop_nlink or clear_nlink() if we know i_nlink is 1. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-15 15:00:38 -05:00
Steve French	c2d68ea65b	[CIFS] fix prepath conversion when server supports posix paths Jeff Layton that we were converting \ to / in the posix path case which is not always right (depends on what the old delim was). CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-15 19:20:18 +00:00
Igor Mammedov	11b6d6450c	[CIFS] Only convert / when server does not support posix paths Also add warning if posix path setting changes on reconnect Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-15 19:06:04 +00:00
Valerie Clement	74d3487fc8	ext4: modify block allocation algorithm for the last group When a directory inode is allocated in the last group and the last group contains less than s_blocks_per_group blocks, the initial block allocated for the directory is not always allocated in the same group as the directory inode, but in one of the first groups of the filesystem (group 1 for example). Depending on the current process's pid, ext4_find_near() and ext4_ext_find_goal() can return a block number greater than the maximum blocks count in the filesystem and in that case the block will be not allocated in the same group as the inode. The following patch fixes the problem. Should the modification also be done in ext2/3 code? Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-15 13:43:07 -05:00
Aneesh Kumar K.V	e56eb65906	ext4: Don't claim block from group which has corrupt bitmap In ext4_mb_complex_scan_group, if the extent length of the newly found extentet is greater than than the total free blocks counted in group info, break without claiming the block. Document different ext4_error usage, explaining the state with which we continue if we mount with errors=continue Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-15 13:48:21 -05:00
Aneesh Kumar K.V	9df5643ad1	ext4: Get journal write access before modifying the extent tree When the user was writing into an unitialized extent, ext4_ext_convert_to_initialize() was not requesting journal write access before it started to modify the extent tree. Fix this oversight. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-22 06:17:31 -05:00
Aneesh Kumar K.V	b35905c16a	ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent() The path variable returned via ext4_ext_find_extent is a kmalloc variable and needs to be freeded. It also contains a reference to buffer_head which needs to be dropped. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-25 16:54:37 -05:00
Aneesh Kumar K.V	4cdeed861b	ext4: Don't leave behind a half-created inode if ext4_mkdir() fails If ext4_mkdir() fails to allocate the initial block for the directory, don't leave behind a half-created directory inode with the link count left at one. This was caused by an inappropriate call to ext4_dec_count(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-22 06:17:31 -05:00
Valerie Clement	b73fce69ec	ext4: Fix kernel BUG at fs/ext4/mballoc.c:910! With the flex_bg feature enabled, a large file creation oopses the kernel. The BUG_ON is: BUG_ON(len >= EXT4_BLOCKS_PER_GROUP(sb)); As the allocation of the bitmaps and the inode table can be done outside the block group with flex_bg, this allows to allocate up to EXT4_BLOCKS_PER_GROUP blocks in a group. This patch fixes the oops. Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-15 13:48:51 -05:00
Trond Myklebust	52833e897f	Merge branch 'linus_origin' into hotfixes	2008-02-15 13:36:30 -05:00
Igor Mammedov	8aad018b6c	[CIFS] Fix mixed case name in structure dfs_info3_param Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-15 18:21:49 +00:00
Aneesh Kumar K.V	55bd725aa3	ext4: Fix locking hierarchy violation in ext4_fallocate() ext4_fallocate() was trying to acquire i_data_sem outside of jbd2_start_transaction/jbd2_journal_stop, which violates ext4's locking hierarchy. So we take i_mutex to prevent writes and truncates during the complete fallocate operation, and use ext4_get_block_wrap() which acquires and releases i_data_sem for each block allocation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-15 12:47:21 -05:00
Andi Kleen	642be6ec21	Remove incorrect BKL comments in ext4 Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-25 17:20:46 -05:00
Christoph Lameter	4a0962abd1	dentries: Extract common code to remove dentry from lru Extract the common code to remove a dentry from the lru into a new function dentry_lru_remove(). Two call sites used list_del() instead of list_del_init(). AFAIK the performance of both is the same. dentry_lru_remove() does a list_del_init(). As a result dentry->d_lru is now always empty when a dentry is freed. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:09 -08:00
Jan Blunck	cf28b4863f	d_path: Make d_path() use a struct path d_path() is used on a <dentry,vfsmount> pair. Lets use a struct path to reflect this. [akpm@linux-foundation.org: fix build in mm/memory.c] Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Bryan Wu <bryan.wu@analog.com> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:09 -08:00
Jan Blunck	c32c2f63a9	d_path: Make seq_path() use a struct path argument seq_path() is always called with a dentry and a vfsmount from a struct path. Make seq_path() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	e83aece3af	Use struct path in struct svc_expkey I'm embedding struct path into struct svc_expkey. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	5477549161	Use struct path in struct svc_export I'm embedding struct path into struct svc_export. [akpm@linux-foundation.org: coding-style fixes] [ezk@cs.sunysb.edu: NFSD: fix wrong mnt_writer count in rename] Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	448678a0f3	d_path: Make get_dcookie() use a struct path argument get_dcookie() is always called with a dentry and a vfsmount from a struct path. Make get_dcookie() take it directly as an argument. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	3dcd25f37c	d_path: Make proc_get_link() use a struct path argument proc_get_link() is always called with a dentry and a vfsmount from a struct path. Make proc_get_link() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	a03a8a709a	d_path: kerneldoc cleanup Move and update d_path() kernel API documentation. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	329c97f0af	One less parameter to __d_path All callers to __d_path pass the dentry and vfsmount of a struct path to __d_path. Pass the struct path directly, instead. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	ac748a09fc	Make set_fs_{root,pwd} take a struct path In nearly all cases the set_fs_{root,pwd}() calls work on a struct path. Change the function to reflect this and use path_get() here. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	6ac08c39a1	Use struct path in fs_struct * Use struct path in fs_struct. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	5dd784d049	Introduce path_get() This introduces the symmetric function to path_put() for getting a reference to the dentry and vfsmount of a struct path in the right order. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	09da5916ba	Use path_put() in a few places instead of {mnt,d}put() Use path_put() in a few places instead of {mnt,d}put() Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	1d957f9bf8	Introduce path_put() * Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	4ac9137858	Embed a struct path into struct nameidata instead of nd->{dentry,mnt} This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 `6894212` 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	429731b155	Remove path_release_on_umount() path_release_on_umount() should only be called from sys_umount(). I merged the function into sys_umount() instead of having in in namei.c. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:32 -08:00
Mike Frysinger	e2a366dc5c	FLAT binaries: drop BINFMT_FLAT bad header magic warning The warning issued by fs/binfmt_flat.c when the format handler is given a non-FLAT and non-script executable is annoying to say the least when working with FDPIC ELF objects. If you build a kernel that supports both FLAT and FDPIC ELFs on no-mmu, every time you execute an FDPIC ELF, the kernel spits out this message. While I understand a lot of newcomers to the no-mmu world screw up generation of FLAT binaries, this warning is not usable for systems that support more than just FLAT. Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Bernd Schmidt <bernds_cb1@t-online.de> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 20:58:05 -08:00
Harvey Harrison	3c828e4945	inotify: make variables static in inotify_user.c inotify_max_user_instances, inotify_max_user_watches, inotify_max_queued_events can all be made static. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 20:58:04 -08:00
Steve French	03a143c909	[CIFS] fixup prefixpaths which contain multiple path components Currently, when we get a prefixpath as part of mount, the kernel only changes the first character to be a '/' or '\' depending on whether posix extensions are enabled. This is problematic as it expects mount.cifs to pass in the correct delimiter in the rest of the prefixpath. But, mount.cifs may not know what the correct delimiter is. It's a chicken and egg problem. Note that mount.cifs should not do conversion of the prefixpath - if we want posix behavior then '\' is legal in a path (and we have had bugs in the distant path to prove to me that customers sometimes have apps that require '\'). The kernel code assumes that the path passed in is posix (and current code will handle the first path component fine but was broken for Windows mounts for "deep" prefixpaths unless the user specified a prefixpath with '\' deep in it. So e.g. with current kernel code: 1) mount to //server/share/dir1 will work to all server types 2) mount to //server/share/dir1/subdir1 will work to Samba 3) mount to //server/share/dir1\\subdir1 will work to Windows But case two would fail to Windows without the fix. With the kernel cifs module fix case two now works. First analyzed by Jeff Layton and Simo Sorce CC: Jeff Layton <jlayton@redhat.com> CC: Simo Sorce <simo@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-14 06:38:30 +00:00
Olga Kornievskaia	8d042218b0	NFS: add missing spkm3 strings to mount option parser This patch adds previous missing spkm3 string values that are needed to parse mount options in the kernel.	2008-02-13 23:24:08 -05:00
Jeff Layton	25606656b1	NFS: remove error field from nfs_readdir_descriptor_t The error field in nfs_readdir_descriptor_t is never used outside of the function in which it is set. Remove the field and change the place that does use it to use an existing local variable. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-13 23:24:07 -05:00
Dan Muntz	497799e7c0	NFS: missing spaces in KERN_WARNING The warning message for a v4 server returning various bad sequence-ids is missing spaces. Signed-off-by: Dan Muntz <dmuntz@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-13 23:24:06 -05:00
Chuck Lever	4267c9561d	NFS: Allow text-based mounts via compat_sys_mount The compat_sys_mount() system call throws EINVAL for text-based NFSv4 mounts. The text-based mount interface assumes that any mount option blob that doesn't set the version field to "1" is a C string (ie not a legacy mount request). The compat_sys_mount() call treats blobs that don't set the version field to "1" as an error. We just relax the check in compat_sys_mount() a bit to allow C strings to be passed down to the NFSv4 client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-13 23:24:05 -05:00
Jeff Layton	8e60029f40	NFS: fix reference counting for NFSv4 callback thread The reference counting for the NFSv4 callback thread stays artificially high. When this thread comes down, it doesn't properly tear down the svc_serv, causing a memory leak. In my testing on an older kernel on x86_64, memory would leak out of the 8k kmalloc slab. So, we're leaking at least a page of memory every time the thread comes down. svc_create() creates the svc_serv with a sv_nrthreads count of 1, and then svc_create_thread() increments that count. Whenever the callback thread is started it has a sv_nrthreads count of 2. When coming down, it calls svc_exit_thread() which decrements that count and if it hits 0, it tears everything down. That never happens here since the count is always at 2 when the thread exits. The problem is that nfs_callback_up() should be calling svc_destroy() on the svc_serv on both success and failure. This is how lockd_up_proto() handles the reference counting, and doing that here fixes the leak. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-02-13 23:24:04 -05:00
Marcin Slusarz	cba44359d1	udf: fix udf_add_free_space In commit `742ba02a51` (udf: create common function for changing free space counter) by accident I reversed safety condition which lead to null pointer dereference in case of media error and wrong counting of free space in normal situation Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Jan Kara <jack@suse.cz> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:20 -08:00
Jan Kara	e28d80f182	udf: fix directory offset handling Patch cleaning up UDF directory offset handling missed modifications in dir.c (because I've submitted an old version :(). Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:20 -08:00
Sergio Luis	ed58f80279	fs/smbfs/inode.c: fix warning message deprecating smbfs Fix the warning message regarding smbfs to "smbfs is deprecated and will be removed from the 2.6.27 kernel. Please migrate to cifs" instead of "smbfs is deprecated and will be removedfrom the 2.6.27 kernel. Please migrate to cifs" Signed-off-by: Sergio Luis <sergio@uece.br> Screwed-up-by: Andrew Morton <akpm@linux-foundation.org> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:19 -08:00
Marcin Slusarz	413d57c990	xfs: convert beX_add to beX_add_cpu (new common API) remove beX_add functions and replace all uses with beX_add_cpu Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Dave Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:19 -08:00
Randy Dunlap	b51d63c6d3	kernel-doc: fix fs/pipe.c notation Fix several kernel-doc notation errors in fs/pipe.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:19 -08:00
Marcin Slusarz	8914562278	jfs: le*_add_cpu conversion replace all: little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) + expression_in_cpu_byteorder); with: leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: jfs-discussion@lists.sourceforge.net	2008-02-13 15:34:20 -06:00
Steve French	c1ce264470	[CIFS] fix typo Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-13 02:59:36 +00:00
Shirish Pargaonkar	d9f382eff6	[CIFS] patch to fix incorrect encoding of number of aces on set mode This patch fixes an error in the experimental cifs acl code. During chmod, set security descriptor data (num aces) is not sent with little-endian encoding. Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-12 20:46:26 +00:00
Roel Kluin	6f7e8f3763	[CIFS] Fix typo in quota operations Although these experimental operations are not fully implemented, fix the typo in the definition of the quotactl operations for cifs. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-12 20:38:10 +00:00
Steve French	90c81e0b0e	[CIFS] clean up some hard to read ifdefs Christoph had noticed too many ifdefs in the CIFS code making it hard to read. This patch removes about a quarter of them from the C files in cifs by improving a few key ifdefs in the .h files. Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-12 20:32:36 +00:00
Linus Torvalds	0c0d61ca93	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: SUNPRC: Fix printk format warning nfsd: clean up svc_reserve_auth() NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight NLM: don't reattempt GRANT_MSG when there is already an RPC in flight NLM: have server-side RPC clients default to soft RPC tasks NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients	2008-02-11 09:19:47 -08:00
Jeff Layton	c64e80d55d	NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback is in flight. If this happens we don't want lockd to insert the block back into the nlm_blocked list. This helps that situation, but there's still a possible race. Fixing that will mean adding real locking for nlm_blocked. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-10 18:09:36 -05:00
Jeff Layton	9706501e43	NLM: don't reattempt GRANT_MSG when there is already an RPC in flight With the current scheme in nlmsvc_grant_blocked, we can end up with more than one GRANT_MSG callback for a block in flight. Right now, we requeue the block unconditionally so that a GRANT_MSG callback is done again in 30s. If the client is unresponsive, it can take more than 30s for the call already in flight to time out. There's no benefit to having more than one GRANT_MSG RPC queued up at a time, so put it on the list with a timeout of NLM_NEVER before doing the RPC call. If the RPC call submission fails, we requeue it with a short timeout. If it works, then nlmsvc_grant_callback will end up requeueing it with a shorter timeout after it completes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-10 18:09:36 -05:00
Jeff Layton	90bd17c878	NLM: have server-side RPC clients default to soft RPC tasks Now that it no longer does an RPC ping, lockd always ends up queueing an RPC task for the GRANT_MSG callback. But, it also requeues the block for later attempts. Since these are hard RPC tasks, if the client we're calling back goes unresponsive the GRANT_MSG callbacks can stack up in the RPC queue. Fix this by making server-side RPC clients default to soft RPC tasks. lockd requeues the block anyway, so this should be OK. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-10 18:09:36 -05:00
Jeff Layton	031fd3aa20	NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients It's currently possible for an unresponsive NLM client to completely lock up a server's lockd. The scenario is something like this: 1) client1 (or a process on the server) takes a lock on a file 2) client2 tries to take a blocking lock on the same file and awaits the callback 3) client2 goes unresponsive (plug pulled, network partition, etc) 4) client1 releases the lock ...at that point the server's lockd will try to queue up a GRANT_MSG callback for client2, but first it requeues the block with a timeout of 30s. nlm_async_call will attempt to bind the RPC client to client2 and will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is unresponsive it will take around 60s for that to time out. Once it times out, it's already time to retry the block and the whole process repeats. Once in this situation, nlmsvc_retry_blocked will never return until the host starts responding again. lockd won't service new calls. Fix this by skipping the RPC ping on NLM RPC clients. This makes nlm_async_call return quickly when called. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-10 18:09:36 -05:00
Bastian Blank	712a30e63c	splice: fix user pointer access in get_iovec_page_array() Commit `8811930dc7` ("splice: missing user pointer access verification") added the proper access_ok() calls to copy_from_user_mmap_sem() which ensures we can copy the struct iovecs from userspace to the kernel. But we also must check whether we can access the actual memory region pointed to by the struct iovec to fix the access checks properly. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Oliver Pinter <oliver.pntr@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-10 10:27:21 -08:00
Theodore Tso	469108ff3d	ext4: Add new "development flag" to the ext4 filesystem This flag is simply a generic "this is a crash/burn test filesystem" marker. If it is set, then filesystem code which is "in development" will be allowed to mount the filesystem. Filesystem code which is not considered ready for prime-time will check for this flag, and if it is not set, it will refuse to touch the filesystem. As we start rolling ext4 out to distro's like Fedora, et. al, this makes it less likely that a user might accidentally start using ext4 on a production filesystem; a bad thing, since that will essentially make it be unfsckable until e2fsprogs catches up. Signed-off-by: Theodore Tso <tytso@MIT.EDU> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-02-10 01:11:44 -05:00
Aneesh Kumar K.V	26346ff681	ext4: Don't panic in case of corrupt bitmap Multiblock allocator calls BUG_ON in many case if the free and used blocks count obtained looking at the bitmap is different from what the allocator internally accounted for. Use ext4_error in such case and don't panic the system. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:10:04 -05:00
Eric Sandeen	256bdb497c	ext4: allocate struct ext4_allocation_context from a kmem cache struct ext4_allocation_context is rather large, and this bloats the stack of many functions which use it. Allocating it from a named slab cache will alleviate this. For example, with this change (on top of the noinline patch sent earlier): -ext4_mb_new_blocks 200 +ext4_mb_new_blocks 40 -ext4_mb_free_blocks 344 +ext4_mb_free_blocks 168 -ext4_mb_release_inode_pa 216 +ext4_mb_release_inode_pa 40 -ext4_mb_release_group_pa 192 +ext4_mb_release_group_pa 24 Most of these stack-allocated structs are actually used only for mballoc history; and in those cases often a smaller struct would do. So changing that may be another way around it, at least for those functions, if preferred. For now, in those cases where the ac is only for history, an allocation failure simply skips the history recording, and does not cause any other failures. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:13:33 -05:00
Dave Kleikamp	c4e35e07af	JBD2: Clear buffer_ordered flag for barried IO request on success In JBD2 jbd2_journal_write_commit_record(), clear the buffer_ordered flag for the bh after barried IO has succeed. This prevents later, if the same buffer head were submitted to the underlying device, which has been reconfigured to not support barrier request, the JBD2 commit code could treat it as a normal IO (without barrier). This is a port from JBD/ext3 fix from Neil Brown. More details from Neil: Some devices - notably dm and md - can change their behaviour in response to BIO_RW_BARRIER requests. They might start out accepting such requests but on reconfiguration, they find out that they cannot any more. JBD2 deal with this by always testing if BIO_RW_BARRIER requests fail with EOPNOTSUPP, and retrying the write requests without the barrier (probably after waiting for any pending writes to complete). However there is a bug in the handling this in JBD2 for ext4 . When ext4/JBD2 to submit a BIO_RW_BARRIER request, it sets the buffer_ordered flag on the buffer head. If the request completes successfully, the flag STAYS SET. Other code might then write the same buffer_head after the device has been reconfigured to not accept barriers. This write will then fail, but the "other code" is not ready to handle EOPNOTSUPP errors and the error will be treated as fatal. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:09:32 -05:00
Jan Kara	7fb5409df0	ext4: Fix Direct I/O locking We cannot start transaction in ext4_direct_IO() and just let it last during the whole write because dio_get_page() acquires mmap_sem which ranks above transaction start (e.g. because we have dependency chain mmap_sem->PageLock->journal_start, or because we update atime while holding mmap_sem) and thus deadlocks could happen. We solve the problem by starting a transaction separately for each ext4_get_block() call. We could have a problem that we allocate a block and before its data are written out the machine crashes and thus we expose stale data. But that does not happen because for hole-filling generic code falls back to buffered writes and for file extension, we add inode to orphan list and thus in case of crash, journal replay will truncate inode back to the original size. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:08:38 -05:00
Aneesh Kumar K.V	8009f9fb30	ext4: Fix circular locking dependency with migrate and rm. In order to prevent a circular locking dependency when an unlink operation is racing with an ext4 migration, we delay taking i_data_sem until just before switch the inode format, and use i_mutex to prevent writes and truncates during the first part of the migration operation. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:20:05 -05:00
Steve French	ad7a2926b9	[CIFS] reduce checkpatch warnings Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-07 23:25:02 +00:00
Lachlan McIlroy	de2eeea609	[XFS] add __init/__exit mark to specific init/cleanup functions SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30459a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Denis Cheng <crquan@gmail.com>	2008-02-07 18:25:19 +11:00
David Chinner	450790a2c5	[XFS] Fix oops in xfs_file_readdir() When xfs_file_readdir() exactly fills a buffer, it can move it's index past the end of the buffer and dereference it even though the result of the dereference is never used. On some platforms this causes an oops. SGI-PV: 976923 SGI-Modid: xfs-linux-melb:xfs-kern:30458a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:24:13 +11:00
Christoph Hellwig	cbc89dcfd2	[XFS] kill xfs_root The only caller (xfs_fs_fill_super) can simplify call igrab on the root inode. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30393a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:24:00 +11:00
Christoph Hellwig	4188c78d95	[XFS] keep i_nlink updated and use proper accessors To get the read-only bind mounts in -mm to work correctly with XFS we need to call the drop_nlink and inc_nlink helpers to monitor the link count. Add calls to these to xfs_bumplink and xfs_droplink and stop copying over di_nlink to i_nlink in xfs_validate_fields and vn_revalidate. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30392a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:23:38 +11:00
Christoph Hellwig	222096ae7f	[XFS] stop updating inode->i_blocks The VFS doesn't use i_blocks, it's only used by generic_fillattr and the generic quota code which XFS doesn't use. In XFS there is one use to check whether we have an inline or out of line sumlink, but we can replace that with a check of the XFS_IFINLINE inode flag. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30391a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:23:15 +11:00
David Chinner	de08dbc197	[XFS] Make xfs_ail_check check less by default Checking the entire AIL on every insert and remove is prohibitively expensive - the sustained sequntial create rate on a single disk drops from about 1800/s to 60/s because of this checking resulting in the xfslogd becoming cpu bound. By default on debug builds, only check the next and previous entries in the list to ensure they are ordered correctly. If you really want, define XFS_TRANS_DEBUG to use the old behaviour. SGI-PV: 972759 SGI-Modid: xfs-linux-melb:xfs-kern:30372a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:23:05 +11:00
David Chinner	249a8c1124	[XFS] Move AIL pushing into it's own thread When many hundreds to thousands of threads all try to do simultaneous transactions and the log is in a tail-pushing situation (i.e. full), we can get multiple threads walking the AIL list and contending on the AIL lock. The AIL push is, in effect, a simple I/O dispatch algorithm complicated by the ordering constraints placed on it by the transaction subsystem. It really does not need multiple threads to push on it - even when only a single CPU is pushing the AIL, it can push the I/O out far faster that pretty much any disk subsystem can handle. So, to avoid contention problems stemming from multiple list walkers, move the list walk off into another thread and simply provide a "target" to push to. When a thread requires a push, it sets the target and wakes the push thread, then goes to sleep waiting for the required amount of space to become available in the log. This mechanism should also be a lot fairer under heavy load as the waiters will queue in arrival order, rather than queuing in "who completed a push first" order. Also, by moving the pushing to a separate thread we can do more effectively overload detection and prevention as we can keep context from loop iteration to loop iteration. That is, we can push only part of the list each loop and not have to loop back to the start of the list every time we run. This should also help by reducing the number of items we try to lock and/or push items that we cannot move. Note that this patch is not intended to solve the inefficiencies in the AIL structure and the associated issues with extremely large list contents. That needs to be addresses separately; parallel access would cause problems to any new structure as well, so I'm only aiming to isolate the structure from unbounded parallelism here. SGI-PV: 972759 SGI-Modid: xfs-linux-melb:xfs-kern:30371a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:22:51 +11:00
Christoph Hellwig	4576758db5	[XFS] use generic_permission Now that all direct caller of xfs_iaccess are gone we can kill xfs_iaccess and xfs_access and just use generic_permission with a check_acl callback. This is required for the per-mount read-only patchset in -mm to work properly with XFS. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30370a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:22:38 +11:00
Christoph Hellwig	f6aa7f2184	[XFS] stop re-checking permissions in xfs_swapext xfs_swapext should simplify check if we have a writeable file descriptor instead of re-checking the permissions using xfs_iaccess. Add an additional check to refuse O_APPEND file descriptors because swapext is not an append-only write operation. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30369a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:22:24 +11:00
Christoph Hellwig	35fec8df65	[XFS] clean up xfs_swapext - stop using vnodes - use proper multiple label goto unwinding - give the struct file * variables saner names SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30366a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:22:02 +11:00
Christoph Hellwig	199037c598	[XFS] remove permission check from xfs_change_file_space Both callers of xfs_change_file_space alreaedy do the file->f_mode & FMODE_WRITE check to ensure we have a file descriptor that has been opened for write mode, so there is no need to re-check that with xfs_iaccess. Especially as the later might wrongly deny it for corner cases like file descriptor passing through unix domain sockets. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30365a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:21:14 +11:00
Lachlan McIlroy	9742bb93da	[XFS] prevent panic during log recovery due to bogus op_hdr length A problem was reported where a system panicked in log recovery due to a corrupt log record. The cause of the corruption is not known but this change will at least prevent a crash for this specific scenario. Log recovery definitely needs some more work in this area. SGI-PV: 974151 SGI-Modid: xfs-linux-melb:xfs-kern:30318a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-02-07 18:20:58 +11:00
Christoph Hellwig	f71354bc3a	[XFS] Cleanup various fid related bits: - merge xfs_fid2 into it's only caller xfs_dm_inode_to_fh. - remove xfs_vget and opencode it in the two callers, simplifying both of them by avoiding the awkward calling convetion. - assign directly to the dm_fid_t members in various places in the dmapi code instead of casting them to xfs_fid_t first (which is identical to dm_fid_t) SGI-PV: 974747 SGI-Modid: xfs-linux-melb:xfs-kern:30258a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:20:11 +11:00
David Chinner	edd319dc52	[XFS] Fix xfs_lowbit64 xfs_lowbit64 was broken on 32 bit platforms in a recent cleanup of the xfs bitops. Fix it back up again. SGI-PV: 974005 SGI-Modid: xfs-linux-melb:xfs-kern:30202a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:19:41 +11:00
Christoph Hellwig	45ba598e56	[XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros. Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an xfs_dinode that embedds a xfs_dinode_core one will have to do endian swapping while the other doesn't. Instead of having the current mess with the CFORK macros that have byteswapping and non-byteswapping version (which are inconsistantly named while we're at it) just define each family of the macros to stand by itself and simplify the whole matter. A few direct references to the CFORK variants were cleaned up to use IFORK or DFORK to make this possible. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30163a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:19:24 +11:00
Christoph Hellwig	a9759f2de3	[XFS] kill superflous buffer locking (2nd attempt) There is no need to lock any page in xfs_buf.c because we operate on our own address_space and all locking is covered by the buffer semaphore. If we ever switch back to main blockdeive address_space as suggested e.g. for fsblock with a similar scheme the locking will have to be totally revised anyway because the current scheme is neither correct nor coherent with itself. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30156a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:18:50 +11:00
Robert P. J. Day	40ebd81d1a	[XFS] Use kernel-supplied "roundup_pow_of_two" for simplicity SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30098a Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:18:19 +11:00
Tim Shimmin	e6a4b37f38	[XFS] Remove the BPCSHIFT and NB* based macros from XFS. The BPCSHIFT based macros, btoc, ctob, offtoc* and ctooff are either not used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE where appropriate. Initial patch and motivation from Nicolas Kaiser. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30096a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:17:58 +11:00
Niv Sardi	f7b7c3673e	[XFS] Remove bogus assert This assert is bogus. We can have a forced shutdown occur between the check for the XLOG_FORCED_SHUTDOWN and the ASSERT. Also, the logging system shouldn't care about the state of XFS_FORCED_SHUTDOWN, it should only check XLOG_FORCED_SHUTDOWN. The logging system has it's own forced shutdown flag so, for the case of a forced shutdown that's not due to a logging error, we can flush the log. SGI-PV: 972985 SGI-Modid: xfs-linux-melb:xfs-kern:30029a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:17:39 +11:00
Eric Sandeen	71ddabb94a	[XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if CONFIG_XFS_RT is off. This should be safe because mount checks in xfs_rtmount_init: so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be encountered after that. Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space, presumeably gcc can optimize around the various "if (0)" type checks: xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8 xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8 <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4 xfs_qm_vop_chown_reserve -4 SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30014a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:16:43 +11:00
David Chinner	a67d7c5f5d	[XFS] Move platform specific mount option parse out of core XFS code Mount option parsing is platform specific. Move it out of core code into the platform specific superblock operation file. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30012a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:16:30 +11:00
David Chinner	3ed6526441	[XFS] Implement fallocate. Implement the new generic callout for file preallocation. Atomically change the file size if requested. SGI-PV: 972756 SGI-Modid: xfs-linux-melb:xfs-kern:30009a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:16:17 +11:00
David Chinner	5d51eff453	[XFS] Fix inode allocation latency The log force added in xfs_iget_core() has been a performance issue since it was introduced for tight loops that allocate then unlink a single file. under heavy writeback, this can introduce unnecessary latency due tothe log I/o getting stuck behind bulk data writes. Fix this latency problem by avoinding the need for the log force by moving the place we mark linux inode dirty to the transaction commit rather than on transaction completion. This also closes a potential hole in the sync code where a linux inode is not dirty between the time it is modified and the time the log buffer has been written to disk. SGI-PV: 972753 SGI-Modid: xfs-linux-melb:xfs-kern:30007a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:16:07 +11:00
David Chinner	e4143a1cf5	[XFS] Fix transaction overrun during writeback. Prevent transaction overrun in xfs_iomap_write_allocate() if we race with a truncate that overlaps the delalloc range we were planning to allocate. If we race, we may allocate into a hole and that requires block allocation. At this point in time we don't have a reservation for block allocation (apart from metadata blocks) and so allocating into a hole rather than a delalloc region results in overflowing the transaction block reservation. Fix it by only allowing a single extent to be allocated at a time. SGI-PV: 972757 SGI-Modid: xfs-linux-melb:xfs-kern:30005a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:15:55 +11:00
David Chinner	786f486f81	[XFS] Show all mount args in /proc/mounts There are several mount options that don't show up in /proc/mounts. Add them in and clean up the showargs code at the same time. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30004a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:15:43 +11:00
David Chinner	8ae2c0f64a	[XFS] Fix sparse warning in xlog_recover_do_efd_trans. Sparse trips over the locking order in xlog_recover_do_efd_trans() when xfs_trans_delete_ail() drops the ail lock. Because the unlock is conditional, we need to either annotate with a "fake unlock" or change the structure of the code so sparse thinks the function always unlocks. Reordering the code makes it simpler, so do that. SGI-PV: 972755 SGI-Modid: xfs-linux-melb:xfs-kern:30003a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:15:29 +11:00
David Chinner	a8272ce0c1	[XFS] Fix up sparse warnings. These are mostly locking annotations, marking things static, casts where needed and declaring stuff in header files. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30002a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:14:38 +11:00
David Chinner	a69b176df2	[XFS] Use the generic bitops rather than implementing them ourselves. Patch inspired by Andi Kleen. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30000a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:14:22 +11:00
Vlad Apostolov	c319b58b13	[XFS] Make xfs_bulkstat() to report unlinked but referenced inodes We need xfs_bulkstat() to report inode stat for inodes with link count zero but reference count non zero. The fix here: http://oss.sgi.com/archives/xfs/2007-09/msg00266.html changed this behavior and made xfs_bulkstat() to filter all unlinked inodes including those that are not destroyed yet but held by reference. The attached patch returns back to the original behavior by marking the on-disk inode buffer "dirty" when di_mode is cleared (at that time both inode link and reference counter are zero). SGI-PV: 972004 SGI-Modid: xfs-linux-melb:xfs-kern:29914a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:13:37 +11:00
Lachlan McIlroy	98ce2b5b1b	[XFS] 971186 Undo mod xfs-linux-melb:xfs-kern:29845a due to a regression SGI-PV: 971596 SGI-Modid: xfs-linux-melb:xfs-kern:29902a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:13:27 +11:00
Eric Sandeen	bc58f9bb6b	[XFS] fix 32-bit compat ioctls for GETXFLAGS, SETXFLAGS, GETVERSION XFS_IOC_GETVERSION, XFS_IOC_GETXFLAGS and XFS_IOC_SETXFLAGS all take a "long" which changes size between 32 and 64 bit platforms. So, the ioctl cmds that come in from a 32-bit app aren't as expected, for example on GETXFLAGS, unknown cmd fd(3) cmd(80046601){t:'f';sz:4} due to the size mismatch. So, use instead the 32-bit version of the commands for compat ioctls, and other than that it doesn't take any more manipulation. Also, for both native and compat versions, just define them to the values as defined in fs.h SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29849a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:13:17 +11:00
Eric Sandeen	d4f3cc016f	[XFS] lose xfs_hex_dump in favor of print_hex_dump No need for xfs to have its own hex dumping routine now that the kernel has one. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29847a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:13:05 +11:00
Christoph Hellwig	91906a882a	[XFS] kill XFS_INOBT_IS_FREE_DISK This macro is unused an all other acros in this family operate on native types, so we most likely won't grow a user either. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29846a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:12:41 +11:00
Christoph Hellwig	c40ea74101	[XFS] kill superflous buffer locking There is no need to lock any page in xfs_buf.c because we operate on our own address_space and all locking is covered by the buffer semaphore. If we ever switch back to main blockdeive address_space as suggested e.g. for fsblock with a similar scheme the locking will have to be totally revised anyway because the current scheme is neither correct nor coherent with itself. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29845a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:12:07 +11:00
Eric Sandeen	0771fb4515	[XFS] Refactor xfs_mountfs Refactoring xfs_mountfs() to call sub-functions for logical chunks can help save a bit of stack, and can make it easier to read this long function. The mount path is one of the longest common callchains, easily getting to within a few bytes of the end of a 4k stack when over lvm, quotas are enabled, and quotacheck must be done. With this change on top of the other stack-related changes I've sent, I can get xfs to survive a normal xfsqa run on 4k stacks over lvm. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29834a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:56 +11:00
Christoph Hellwig	b53e675dc8	[XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations Mostly trivial conversion with one exceptions: h_num_logops was kept in native endian previously and only converted to big endian in xlog_sync, but we always keep it big endian now. With todays cpus fast byteswap instructions that's not an issue but the new variant keeps the code clean and maintainable. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29821a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:47 +11:00
Christoph Hellwig	67fcb7bfb6	[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29820a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:38 +11:00
Christoph Hellwig	03bea6fe6c	[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29819a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:10:31 +11:00
Christoph Hellwig	9909c4aa1a	[XFS] kill xfs_freeze. No need to have a wrapper just two call two more functions. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29816a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:09:56 +11:00
Christoph Hellwig	10090be25c	[XFS] cleanup vnode useage in xfs_iget.c Get rid of vnode useage in xfs_iget.c and pass Linux inode / xfs_inode where apropinquate. And kill some useless helpers while we're at it. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29808a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:55:46 +11:00
Christoph Hellwig	6e7f75eafb	[XFS] cleanup vnode useage in xfs_ioctl.c xfs_ioctl.c passes around vnode pointers quite a lot, but all places already have the Linux inode which is identical to the vnode these days. Clean the code up to always use the Linux inode. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29807a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:55:35 +11:00
Christoph Hellwig	4ca488eb45	[XFS] Kill off xfs_statvfs. We were already filling the Linux struct statfs anyway, and doing this trivial task directly in xfs_fs_statfs makes the code quite a bit cleaner. While I was at it I also moved copying attributes that don't change over the lifetime of the filesystem outside the superblock lock. xfs_fs_fill_super used to get the magic number and blocksize through xfs_statvfs, but assigning them directly is a lot cleaner and will save some stack space during mount. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29802a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:53:27 +11:00
Christoph Hellwig	c43f408795	[XFS] simplify xfs_vn_getattr Just fill in struct kstat directly from the xfs_inode instead of doing a detour through a bhv_vattr_t and xfs_getattr. SGI-PV: 970980 SGI-Modid: xfs-linux-melb:xfs-kern:29770a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:49:06 +11:00
Christoph Hellwig	613d70436c	[XFS] kill xfs_iocore_t xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it just duplicates fields already in xfs_inode, and there is nothing this abstraction buys us on XFS/Linux. This patch removes it and shrinks source and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44 bytes in debug/non-debug builds. SGI-PV: 970852 SGI-Modid: xfs-linux-melb:xfs-kern:29754a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:48:58 +11:00
Eric Sandeen	007c61c686	[XFS] Remove spin.h remove spinlock init abstraction macro in spin.h, remove the callers, and remove the file. Move no-op spinlock_destroy to xfs_linux.h Cleanup spinlock locals in xfs_mount.c SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29751a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:45 +11:00
Eric Sandeen	36e41eebda	[XFS] Cleanup lock goop. Switch last couple lock_t's to spinlock_t's. Remove now-unused spinlock-related macros & types. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29748a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:35 +11:00
Eric Sandeen	3a0e487034	[XFS] ktrace kt_lock is unused, remove it. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29747a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:25 +11:00
Eric Sandeen	3685c2a1d7	[XFS] Unwrap XFS_SB_LOCK. Un-obfuscate XFS_SB_LOCK, remove XFS_SB_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29746a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:15 +11:00
Eric Sandeen	ba74d0cba5	[XFS] Unwrap mru_lock. Un-obfuscate mru_lock, remove mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29745a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:01 +11:00
Eric Sandeen	703e1f0fd2	[XFS] Unwrap xfs_dabuf_global_lock Un-obfuscate dabuf_global_lock, remove mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29744a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:46:48 +11:00
Eric Sandeen	64137e56d7	[XFS] Unwrap pagb_lock. Un-obfuscate pagb_lock, remove mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29743a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:46:39 +11:00
Eric Sandeen	869b906078	[XFS] Unwrap XFS_DQ_PINUNLOCK. Un-obfuscate DQ_PINLOCK, remove DQ_PINLOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29742a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:50 +11:00
Eric Sandeen	c8b5ea289f	[XFS] Unwrap GRANT_LOCK. Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29741a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:41 +11:00
Eric Sandeen	b22cd72c95	[XFS] Unwrap LOG_LOCK. Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29740a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:32 +11:00
Donald Douwsma	287f3dad14	[XFS] Unwrap AIL_LOCK SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29739a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:23 +11:00
Lachlan McIlroy	541d7d3c4b	[XFS] kill unnessecary ioops indirection Currently there is an indirection called ioops in the XFS data I/O path. Various functions are called by functions pointers, but there is no coherence in what this is for, and of course for XFS itself it's entirely unused. This patch removes it instead and significantly reduces source and binary size of XFS while making maintaince easier. SGI-PV: 970841 SGI-Modid: xfs-linux-melb:xfs-kern:29737a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:14 +11:00
Christoph Hellwig	21a62542b6	[XFS] simplify vn_revalidate No need to allocate a bhv_vattr_t on stack and call xfs_getattr to update a few fields in the Linux inode from the XFS inode, just do it directly. And yes, this function is in dire need of a better name and prototype, I'll do in a separate patch, though. SGI-PV: 970705 SGI-Modid: xfs-linux-melb:xfs-kern:29713a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:04 +11:00
Lachlan McIlroy	15947f2d4f	[XFS] more vnode/inode tracing fixes SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29697a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:54 +11:00
Christoph Hellwig	7642861b7e	[XFS] kill BMAPI_UNWRITTEN There is no reason to go through xfs_iomap for the BMAPI_UNWRITTEN because it has nothing in common with the other cases. Instead check for the shutdown filesystem in xfs_end_bio_unwritten and perform a direct call to xfs_iomap_write_unwritten (which should be renamed to something more sensible one day) SGI-PV: 970241 SGI-Modid: xfs-linux-melb:xfs-kern:29681a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:44 +11:00
Christoph Hellwig	6214ed4461	[XFS] kill BMAPI_DEVICE There is no reason to go into the iomap machinery just to get the right block device for an inode. Instead look at the realtime flag in the inode and grab the right device from the mount structure. I created a new helper, xfs_find_bdev_for_inode instead of opencoding it because I plan to use it in other places in the future. SGI-PV: 970240 SGI-Modid: xfs-linux-melb:xfs-kern:29680a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:35 +11:00
Lachlan McIlroy	cf441eeb79	[XFS] clean up vnode/inode tracing Simplify vnode tracing calls by embedding function name & return addr in the calling macro. Also do a lot of vnode->inode renaming for consistency, while we're at it. SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29650a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:42:19 +11:00
Lachlan McIlroy	44866d3928	[XFS] remove dead SYNC_BDFLUSH case in xfs_sync_inodes A large part of xfs_sync_inodes is conditional on the SYNC_BDFLUSH which is never passed to it. This patch removes it and adds an assert that triggers in case some new code tries to pass SYNC_BDFLUSH to it. SGI-PV: 970242 SGI-Modid: xfs-linux-melb:xfs-kern:29630a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:40:53 +11:00
Steve French	f315ccb3e6	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-02-06 16:04:00 +00:00
Eric Sandeen	0040d9875d	allow in-inode EAs on ext4 root inode The ext3 root inode was treated specially with respect to in-inode extended attributes, for reasons detailed in the removed comment below. The first mkfs-created inodes would not get extra_i_size or the EXT3_STATE_XATTR flag set in ext3_read_inode, which disallowed reading or setting in-inode EAs on the root. However, in ext4, ext4_mark_inode_dirty calls ext4_expand_extra_isize for all inodes; once this is done EAs may be placed in the root ext4 inode body. But for reasons above, it won't be found after a reboot. testcase: setfattr -n user.name -v value mntpt/ setfattr -n user.name2 -v value2 mntpt/ umount mntpt/; remount mntpt/ getfattr -d mntpt/ name2/value2 has gone missing; debugfs shows it in the inode body, but it is not found there by getattr. The following fixes it up; newer mkfs appears to properly zero the inodes, so this workaround isn't needed for ext4. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-02-05 22:36:43 -05:00
Aneesh Kumar K.V	42a10add85	ext4: Fix null bh pointer dereference in mballoc Repoted by Adrian Bunk <bunk@kernel.org>: The Coverity checker spotted the following NULL dereference: static int ext4_mb_mark_diskspace_used { ... if (!bitmap_bh) goto out_err; ... out_err: sb->s_dirt = 1; put_bh(bitmap_bh); ... Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-02-10 01:07:28 -05:00
Andrew Morton	c773633916	deprecate smbfs in favour of cifs smbfs is a bit buggy and has no maintainer. Change it to shout at the user on the first five mount attempts - tell them to switch to CIFS. Come December we'll mark it BROKEN and see what happens. [olecom@flower.upol.cz: documentation update] Cc: Urban Widmark <urban@teststation.com> Acked-by: Steven French <sfrench@us.ibm.com> Signed-off-by: Oleg Verych <olecom@flower.upol.cz> Cc: Jeff Layton <jlayton@redhat.com> Cc: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 14:37:15 -08:00
Dominique Quatravaux	d7b88513c5	uml: fix hostfs tv_usec calculations To convert from tv_nsec to tv_usec, one needs to divide by 1000, not multiply. Signed-off-by: Dominique Quatravaux <dominique@quatravaux.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:30 -08:00
Andrew Morgan	e338d263a7	Add 64-bit capability support to the kernel The patch supports legacy (32-bit) capability userspace, and where possible translates 32-bit capabilities to/from userspace and the VFS to 64-bit kernel space capabilities. If a capability set cannot be compressed into 32-bits for consumption by user space, the system call fails, with -ERANGE. FWIW libcap-2.00 supports this change (and earlier capability formats) http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/ [akpm@linux-foundation.org: coding-syle fixes] [akpm@linux-foundation.org: use get_task_comm()] [ezk@cs.sunysb.edu: build fix] [akpm@linux-foundation.org: do not initialise statics to 0 or NULL] [akpm@linux-foundation.org: unused var] [serue@us.ibm.com: export __cap_ symbols] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
David P. Quigley	4bea58053f	VFS: Reorder vfs_getxattr to avoid unnecessary calls to the LSM Originally vfs_getxattr would pull the security xattr variable using the inode getxattr handle and then proceed to clobber it with a subsequent call to the LSM. This patch reorders the two operations such that when the xattr requested is in the security namespace it first attempts to grab the value from the LSM directly. If it fails to obtain the value because there is no module present or the module does not support the operation it will fall back to using the inode getxattr operation. In the event that both are inaccessible it returns EOPNOTSUPP. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
David P. Quigley	4249259404	VFS/Security: Rework inode_getsecurity and callers to return resulting buffer This patch modifies the interface to inode_getsecurity to have the function return a buffer containing the security blob and its length via parameters instead of relying on the calling function to give it an appropriately sized buffer. Security blobs obtained with this function should be freed using the release_secctx LSM hook. This alleviates the problem of the caller having to guess a length and preallocate a buffer for this function allowing it to be used elsewhere for Labeled NFS. The patch also removed the unused err parameter. The conversion is similar to the one performed by Al Viro for the security_getprocattr hook. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
Fengguang Wu	8bc3be2751	writeback: speed up writeback of big dirty files After making dirty a 100M file, the normal behavior is to start the writeback for all data after 30s delays. But sometimes the following happens instead: - after 30s: ~4M - after 5s: ~4M - after 5s: all remaining 92M Some analyze shows that the internal io dispatch queues goes like this: s_io s_more_io ------------------------- 1) 100M,1K 0 2) 1K 96M 3) 0 96M 1) initial state with a 100M file and a 1K file 2) 4M written, nr_to_write <= 0, so write more 3) 1K written, nr_to_write > 0, no more writes(BUG) nr_to_write > 0 in (3) fools the upper layer to think that data have all been written out. The big dirty file is actually still sitting in s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and let the loop in generic_sync_sb_inodes() continue: this may starve newly expired inodes in s_dirty. It is also not an option to draw inodes from both s_more_io and s_dirty, an let the loop go on: this might lead to live locks, and might also starve other superblocks in sync time(well kupdate may still starve some superblocks, that's another bug). We have to return when a full scan of s_io completes. So nr_to_write > 0 does not necessarily mean that "all data are written". This patch introduces a flag writeback_control.more_io to indicate that more io should be done. With it the big dirty file no longer has to wait for the next kupdate invokation 5s later. In sync_sb_inodes() we only set more_io on super_blocks we actually visited. This avoids the interaction between two pdflush deamons. Also in __sync_single_inode() we don't blindly keep requeuing the io if the filesystem cannot progress. Failing to do so may lead to 100% iowait. Tested-by: Mike Snitzer <snitzer@gmail.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:19 -08:00
Qi Yong	2d544564f9	skip writing data pages when inode is under I_SYNC Since I_SYNC was split out from I_LOCK, the concern in commit `4b89eed93e` ("Write back inode data pages even when the inode itself is locked") is not longer valid. We should revert to the original behavior: in __writeback_single_inode(), when we find an I_SYNC-ed inode and we're not doing a data-integrity sync, skip writing entirely. Otherwise, we are double calling do_writepages() Signed-off-by: Qi Yong <qiyong@fc-cn.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Cc: Joern Engel <joern@wohnheim.fh-wedel.de> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:18 -08:00
Andrea Arcangeli	7766755a2f	Fix /proc dcache deadlock in do_exit This patch fixes a sles9 system hang in start_this_handle from a customer with some heavy workload where all tasks are waiting on kjournald to commit the transaction, but kjournald waits on t_updates to go down to zero (it never does). This was reported as a lowmem shortage deadlock but when checking the debug data I noticed the VM wasn't under pressure at all (well it was really under vm pressure, because lots of tasks hanged in the VM prune_dcache methods trying to flush dirty inodes, but no task was hanging in GFP_NOFS mode, the holder of the journal handle should have if this was a vm issue in the first place). No task was apparently holding the leftover handle in the committing transaction, so I deduced t_updates was stuck to 1 because a journal_stop was never run by some path (this turned out to be correct). With a debug patch adding proper reverse links and stack trace logging in ext3 deployed in production, I found journal_stop is never run because mark_inode_dirty_sync is called inside release_task called by do_exit. (that was quite fun because I would have never thought about this subtleness, I thought a regular path in ext3 had a bug and it forgot to call journal_stop) do_exit->release_task->mark_inode_dirty_sync->schedule() (will never come back to run journal_stop) The reason is that shrink_dcache_parent is racy by design (feature not a bug) and it can do blocking I/O in some case, but the point is that calling shrink_dcache_parent at the last stage of do_exit isn't safe for self-reaping tasks. I guess the memory pressure of the unbalanced highmem system allowed to trigger this more easily. Now mainline doesn't have this line in iput (like sles9 has): if (inode->i_state & I_DIRTY_DELAYED) mark_inode_dirty_sync(inode); so it will probably not crash with ext3, but for example ext2 implements an I/O-blocking ext2_put_inode that will lead to similar screwups with ext2_free_blocks never coming back and it's definitely wrong to call blocking-IO paths inside do_exit. So this should fix a subtle bug in mainline too (not verified in practice though). The equivalent fix for ext3 is also not verified yet to fix the problem in sles9 but I don't have doubt it will (it usually takes days to crash, so it'll take weeks to be sure). An alternate fix would be to offload that work to a kernel thread, but I don't think a reschedule for this is worth it, the vm should be able to collect those entries for the synchronous release_task. Signed-off-by: Andrea Arcangeli <andrea@suse.de> Cc: Jan Kara <jack@ucw.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:18 -08:00
Matt Mackall	1e88328111	maps4: make page monitoring /proc file optional Make /proc/ page monitoring configurable This puts the following files under an embedded config option: /proc/pid/clear_refs /proc/pid/smaps /proc/pid/pagemap /proc/kpagecount /proc/kpageflags [akpm@linux-foundation.org: Kconfig fix] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:17 -08:00
Matt Mackall	304daa8132	maps4: add /proc/kpageflags interface This makes a subset of physical page flags available to userspace. Together with /proc/pid/kpagemap, this allows tracking of a wide variety of VM behaviors. Exported flags are decoupled from the kernel's internal flags. This allows us to reorder flag bits, and synthesize any bits that get redefined in terms of other bits. [akpm@linux-foundation.org: remove unneeded access_ok()] [akpm@linux-foundation.org: s/0/NULL/] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:17 -08:00
Matt Mackall	161f47bf41	maps4: add /proc/kpagecount interface This makes physical page map counts available to userspace. Together with /proc/pid/pagemap and /proc/pid/clear_refs, this can be used to monitor memory usage on a per-page basis. [akpm@linux-foundation.org: remove unneeded access_ok()] [bunk@stusta.de: make struct proc_kpagemap static] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:17 -08:00
Matt Mackall	85863e475e	maps4: add /proc/pid/pagemap interface This interface provides a mapping for each page in an address space to its physical page frame number, allowing precise determination of what pages are mapped and what pages are shared between processes. New in this version: - headers gone again (as recommended by Dave Hansen and Alan Cox) - 64-bit entries (as per discussion with Andi Kleen) - swap pte information exported (from Dave Hansen) - page walker callback for holes (from Dave Hansen) - direct put_user I/O (as suggested by Rusty Russell) This patch folds in cleanups and swap PTE support from Dave Hansen <haveblue@us.ibm.com>. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	a6198797cc	maps4: regroup task_mmu by interface Reorder source so that all the code and data for each interface is together. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	f248dcb34d	maps4: move clear_refs code to task_mmu.c This puts all the clear_refs code where it belongs and probably lets things compile on MMU-less systems as well. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	4752c36978	maps4: simplify interdependence of maps and smaps This pulls the shared map display code out of show_map and puts it in show_smap where it belongs. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	b3ae5acbbb	maps4: use pagewalker in clear_refs and smaps Use the generic pagewalker for smaps and clear_refs Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Fengguang Wu	ec4dd3eb35	maps4: add proportional set size accounting in smaps The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500. - lwn.net: "ELC: How much memory are applications really using?" The PSS proposed by Matt Mackall is a very nice metic for measuring an process's memory footprint. So collect and export it via /proc/<pid>/smaps. Matt Mackall's pagemap/kpagemap and John Berthels's exmap can also do the job. They are comprehensive tools. But for PSS, let's do it in the simple way. Cc: John Berthels <jjberthels@gmail.com> Cc: Bernardo Innocenti <bernie@codewiz.org> Cc: Padraig Brady <P@draigBrady.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Hugh Dickins <hugh@veritas.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Ken Chen	75897d60a5	hugetlb: allow sticky directory mount option Allow sticky directory mount option for hugetlbfs. This allows admin to create a shared hugetlbfs mount point for multiple users, while prevent accidental file deletion that users may step on each other. It is similiar to default tmpfs mount option, or typical option used on /tmp. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <hermes@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:14 -08:00
Christoph Lameter	b98938c373	bufferhead: revert constructor removal The constructor for buffer_head slabs was removed recently. We need the constructor back in slab defrag in order to insure that slab objects always have a definite state even before we allocated them. I think we mistakenly merged the removal of the constuctor into a cleanup patch. You (ie: akpm) had a test that showed that the removal of the constructor led to a small regression. The prior state makes things easier for slab defrag. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:14 -08:00
Christoph Lameter	9e2779fa28	is_vmalloc_addr(): Check if an address is within the vmalloc boundaries Checking if an address is a vmalloc address is done in a couple of places. Define a common version in mm.h and replace the other checks. Again the include structures suck. The definition of VMALLOC_START and VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included there. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:14 -08:00
Christoph Lameter	eebd2aa355	Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:13 -08:00
Davide Libenzi	4d672e7ac7	timerfd: new timerfd API This is the new timerfd API as it is implemented by the following patch: int timerfd_create(int clockid, int flags); int timerfd_settime(int ufd, int flags, const struct itimerspec utmr, struct itimerspec otmr); int timerfd_gettime(int ufd, struct itimerspec *otmr); The timerfd_create() API creates an un-programmed timerfd fd. The "clockid" parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME. The timerfd_settime() API give new settings by the timerfd fd, by optionally retrieving the previous expiration time (in case the "otmr" parameter is not NULL). The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit is set in the "flags" parameter. Otherwise it's a relative time. The timerfd_gettime() API returns the next expiration time of the timer, or {0, 0} if the timerfd has not been set yet. Like the previous timerfd API implementation, read(2) and poll(2) are supported (with the same interface). Here's a simple test program I used to exercise the new timerfd APIs: http://www.xmailserver.org/timerfd-test2.c [akpm@linux-foundation.org: coding-style cleanups] [akpm@linux-foundation.org: fix ia64 build] [akpm@linux-foundation.org: fix m68k build] [akpm@linux-foundation.org: fix mips build] [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds] [heiko.carstens@de.ibm.com: fix s390] [akpm@linux-foundation.org: fix powerpc build] [akpm@linux-foundation.org: fix sparc64 more] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Oleg Nesterov	ed5d2cac11	exec: rework the group exit and fix the race with kill As Roland pointed out, we have the very old problem with exec. de_thread() sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then clears signal->flags. All signals (even fatal ones) sent in this window (which is not too small) will be lost. With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(), the new helper, should be used to detect exit_group() or exec() in progress. It can have more users, but this patch does only strictly necessary changes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Holt <holt@sgi.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Andrew Morton	59714d65df	get_task_comm(): return the result It was dumb to make get_task_comm() return void. Change it to return a pointer to the resulting output for caller convenience. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Peter Zijlstra	0ccf831cbe	lockdep: annotate epoll On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote: > I remember I talked with Arjan about this time ago. Basically, since 1) > you can drop an epoll fd inside another epoll fd 2) callback-based wakeups > are used, you can see a wake_up() from inside another wake_up(), but they > will never refer to the same lock instance. > Think about: > > dfd = socket(...); > efd1 = epoll_create(); > efd2 = epoll_create(); > epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...); > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...); > > When a packet arrives to the device underneath "dfd", the net code will > issue a wake_up() on its poll wake list. Epoll (efd1) has installed a > callback wakeup entry on that queue, and the wake_up() performed by the > "dfd" net code will end up in ep_poll_callback(). At this point epoll > (efd1) notices that it may have some event ready, so it needs to wake up > the waiters on its poll wait list (efd2). So it calls ep_poll_safewake() > that ends up in another wake_up(), after having checked about the > recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to > avoid stack blasting. Never hit the same queue, to avoid loops like: > > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...); > epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...); > epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...); > epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...); > > The code "if (tncur->wq == wq \|\| ..." prevents re-entering the same > queue/lock. Since the epoll code is very careful to not nest same instance locks allow the recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Valerie Clement	b8356c465b	ext4: Don't set EXTENTS_FL flag for fast symlinks For fast symbolic links, the file content is stored in the i_block[] array, which is not compatible with the new file extents format. e2fsck reports error on such files because EXTENTS_FL is set. Don't set the EXTENTS_FL flag when creating fast symlinks. In the case of file migration, skip fast symbolic links. Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-05 10:56:37 -05:00
Aneesh Kumar K.V	4d60517972	JBD2: Use the incompat macro for testing the incompat feature. JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT needs to be checked with JBD2_HAS_INCOMPAT_FEATURE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-05 10:56:15 -05:00
Aneesh Kumar K.V	c4b8e635f5	jbd2: Fix reference counting on the journal commit block's buffer head With journal checksum patch we added asynchronous commits of journal commit headers, and accidentally dropped taking a reference on the buffer head. (Before the change, sync_dirty_buffer did the get_bh(). The associative put_bh is done by journal_wait_on_commit_record().) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-05 10:55:26 -05:00
Andrew Morton	ead03e30b0	[CIFS] fix warning in cifs_spnego.c Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-02-05 15:51:24 +00:00
Linus Torvalds	f5bb3a5e9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits) Jesper Juhl is the new trivial patches maintainer Documentation: mention email-clients.txt in SubmittingPatches fs/binfmt_elf.c: spello fix do_invalidatepage() comment typo fix Documentation/filesystems/porting fixes typo fixes in net/core/net_namespace.c typo fix in net/rfkill/rfkill.c typo fixes in net/sctp/sm_statefuns.c lib/: Spelling fixes kernel/: Spelling fixes include/scsi/: Spelling fixes include/linux/: Spelling fixes include/asm-m68knommu/: Spelling fixes include/asm-frv/: Spelling fixes fs/: Spelling fixes drivers/watchdog/: Spelling fixes drivers/video/: Spelling fixes drivers/ssb/: Spelling fixes drivers/serial/: Spelling fixes drivers/scsi/: Spelling fixes ...	2008-02-04 07:58:52 -08:00
Linus Torvalds	9853832c49	Merge branch 'locks' of git://linux-nfs.org/~bfields/linux * 'locks' of git://linux-nfs.org/~bfields/linux: pid-namespaces-vs-locks-interaction file locks: Use wait_event_interruptible_timeout() locks: clarify posix_locks_deadlock	2008-02-04 07:58:03 -08:00
Nick Piggin	2f98735c9c	vm audit: add VM_DONTEXPAND to mmap for drivers that need it Drivers that register a ->fault handler, but do not range-check the offset argument, must set VM_DONTEXPAND in the vm_flags in order to prevent an expanding mremap from overflowing the resource. I've audited the tree and attempted to fix these problems (usually by adding VM_DONTEXPAND where it is not obvious). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-04 07:55:38 -08:00
Vitaliy Gusev	ab1f161165	pid-namespaces-vs-locks-interaction fcntl(F_GETLK,..) can return pid of process for not current pid namespace (if process is belonged to the several namespaces). It is true also for pids in /proc/locks. So correct behavior is saving pointer to the struct pid of the process lock owner. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
Matthew Wilcox	4321e01e7d	file locks: Use wait_event_interruptible_timeout() interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout(), with the one difference that interruptible_sleep_on_locked() doesn't bother to check the condition on which it is waiting, depending instead on the BKL to avoid the case where it blocks after the wakeup has already been called. locks_block_on_timeout() is only used in one place, so it's actually simpler to inline it into its caller. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
J. Bruce Fields	b533184fc3	locks: clarify posix_locks_deadlock For such a short function (with such a long comment), posix_locks_deadlock() seems to cause a lot of confusion. Attempt to make it a bit clearer: - Remove the initial posix_same_owner() check, which can never pass (since this is only called in the case that block_fl and caller_fl conflict) - Use an explicit loop (and a helper function) instead of a goto. - Rewrite the comment, attempting a clearer explanation, and removing some uninteresting historical detail. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
Ohad Ben-Cohen	09c6dd3c9d	fs/binfmt_elf.c: spello fix s/litle/little Signed-off-by: Ohad Ben-Cohen <ohad@bencohen.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 18:05:15 +02:00
Joe Perches	c78bad11fb	fs/: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 17:33:42 +02:00
Paulius Zaleckas	efad798b9f	Spelling fixes: lenght->length Signed-off-by: Paulius Zaleckas <pauliusz@yahoo.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 15:42:53 +02:00
Robert P. J. Day	e1b8513d21	Typoes: "whith" -> "with" Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 15:14:02 +02:00
Robert P. J. Day	14e4a0f2bb	Fix a small number of "memeber" typoes. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 15:12:15 +02:00
Linus Torvalds	63e9b66e29	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits) SUNRPC: RPC program information is stored in unsigned integers SUNRPC: Move exported symbol definitions after function declaration part 2 NLM: tear down RPC clients in nlm_shutdown_hosts SUNRPC: spin svc_rqst initialization to its own function nfsd: more careful input validation in nfsctl write methods lockd: minor log message fix knfsd: don't bother mapping putrootfh enoent to eperm rdma: makefile rdma: ONCRPC RDMA protocol marshalling rdma: SVCRDMA sendto rdma: SVCRDMA recvfrom rdma: SVCRDMA Core Transport Services rdma: SVCRDMA Transport Module rdma: SVCRMDA Header File svc: Add svc_xprt_names service to replace svc_sock_names knfsd: Support adding transports by writing portlist file svc: Add svc API that queries for a transport instance svc: Add /proc/sys/sunrpc/transport files svc: Add transport hdr size for defer/revisit svc: Move the xprt independent code to the svc_xprt.c file ...	2008-02-02 14:31:28 +11:00
Jeff Layton	d801b86168	NLM: tear down RPC clients in nlm_shutdown_hosts It's possible for a RPC to outlive the lockd daemon that created it, so we need to make sure that all RPC's are killed when lockd is coming down. When nlm_shutdown_hosts is called, kill off all RPC tasks associated with the host. Since we need to wait until they have all gone away, we might as well just shut down the RPC client altogether. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:15 -05:00
J. Bruce Fields	87d26ea777	nfsd: more careful input validation in nfsctl write methods Neil Brown points out that we're checking buf[size-1] in a couple places without first checking whether size is zero. Actually, given the implementation of simple_transaction_get(), buf[-1] is zero, so in both of these cases the subsequent check of the value of buf[size-1] will catch this case. But it seems fragile to depend on that, so add explicit checks for this case. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: NeilBrown <neilb@suse.de>	2008-02-01 16:42:15 -05:00
J. Bruce Fields	50431d94e7	lockd: minor log message fix Wendy Cheng noticed that function name doesn't agree here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Wendy Cheng <wcheng@redhat.com>	2008-02-01 16:42:15 -05:00
J. Bruce Fields	f7b8066f9f	knfsd: don't bother mapping putrootfh enoent to eperm Neither EPERM and ENOENT map to valid errors for PUTROOTFH according to rfc 3530, and, if anything, ENOENT is likely to be slightly more informative; so don't bother mapping ENOENT to EPERM. (Probably this was originally done because one likely cause was that there is an fsid=0 export but that it isn't permitted to this particular client. Now that we allow WRONGSEC returns, this is somewhat less likely.) In the long term we should work to make this situation less likely, perhaps by turning off nfsv4 service entirely in the absence of the pseudofs root, or constructing a pseudofilesystem root ourselves in the kernel as necessary. Thanks to Benny Halevy <bhalevy@panasas.com> for pointing out this problem. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Benny Halevy <bhalevy@panasas.com>	2008-02-01 16:42:15 -05:00
Tom Tucker	9571af18fa	svc: Add svc_xprt_names service to replace svc_sock_names Create a transport independent version of the svc_sock_names function. The toclose capability of the svc_sock_names service can be implemented using the svc_xprt_find and svc_xprt_close services. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:14 -05:00
Tom Tucker	a217813f90	knfsd: Support adding transports by writing portlist file Update the write handler for the portlist file to allow creating new listening endpoints on a transport. The general form of the string is: <transport_name><space><port number> For example: echo "tcp 2049" > /proc/fs/nfsd/portlist This is intended to support the creation of a listening endpoint for RDMA transports without adding #ifdef code to the nfssvc.c file. Transports can also be removed as follows: '-'<transport_name><space><port number> For example: echo "-tcp 2049" > /proc/fs/nfsd/portlist Attempting to add a listener with an invalid transport string results in EPROTONOSUPPORT and a perror string of "Protocol not supported". Attempting to remove an non-existent listener (.e.g. bad proto or port) results in ENOTCONN and a perror string of "Transport endpoint is not connected" Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:13 -05:00
Tom Tucker	7fcb98d58c	svc: Add svc API that queries for a transport instance Add a new svc function that allows a service to query whether a transport instance has already been created. This is used in lockd to determine whether or not a transport needs to be created when a lockd instance is brought up. Specifying 0 for the address family or port is effectively a wild-card, and will result in matching the first transport in the service's list that has a matching class name. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:13 -05:00
Tom Tucker	7a18208383	svc: Make close transport independent Move sk_list and sk_ready to svc_xprt. This involves close because these lists are walked by svcs when closing all their transports. So I combined the moving of these lists to svc_xprt with making close transport independent. The svc_force_sock_close has been changed to svc_close_all and takes a list as an argument. This removes some svc internals knowledge from the svcs. This code races with module removal and transport addition. Thanks to Simon Holm Thøgersen for a compile fix. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Simon Holm Thøgersen <odie@cs.aau.dk>	2008-02-01 16:42:11 -05:00
Tom Tucker	d7c9f1ed97	svc: Change services to use new svc_create_xprt service Modify the various kernel RPC svcs to use the svc_create_xprt service. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:09 -05:00
Oleg Drokin	54ca95eb36	Leak in nlmsvc_testlock for async GETFL case Fix nlm_block leak for the case of supplied blocking lock info. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:07 -05:00
J. Bruce Fields	8838dc43d6	nfsd4: clean up access_valid, deny_valid checks. Document these checks a little better and inline, as suggested by Neil Brown (note both functions have two callers). Remove an obviously bogus check while we're there (checking whether unsigned value is negative). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Neil Brown <neilb@suse.de>	2008-02-01 16:42:07 -05:00
J. Bruce Fields	5c002b3bb2	nfsd: allow root to set uid and gid on create The server silently ignores attempts to set the uid and gid on create. Based on the comment, this appears to have been done to prevent some overly-clever IRIX client from causing itself problems. Perhaps we should remove that hack completely. For now, at least, it makes sense to allow root (when no_root_squash is set) to set uid and gid. While we're there, since nfsd_create and nfsd_create_v3 share the same logic, pull that out into a separate function. And spell out the individual modifications of ia_valid instead of doing them both at once inside a conditional. Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report and original patch on which this is based. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:07 -05:00
Oleg Drokin	29dbf54615	lockd: fix a leak in nlmsvc_testlock asynchronous request handling Without the patch, there is a leakage of nlmblock structure refcount that holds a reference nlmfile structure, that holds a reference to struct file, when async GETFL is used (-EINPROGRESS return from file_ops->lock()), and also in some error cases. Fix up a style nit while we're here. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
Frank Filz	406a7ea97d	nfsd: Allow AIX client to read dir containing mountpoints This patch addresses a compatibility issue with a Linux NFS server and AIX NFS client. I have exported /export as fsid=0 with sec=krb5:krb5i I have mount --bind /home onto /export/home I have exported /export/home with sec=krb5i The AIX client mounts / -o sec=krb5:krb5i onto /mnt If I do an ls /mnt, the AIX client gets a permission error. Looking at the network traceIwe see a READDIR looking for attributes FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a NFS4ERR_WRONGSEC which the AIX client is not expecting. Since the AIX client is only asking for an attribute that is an attribute of the parent file system (pseudo root in my example), it seems reasonable that there should not be an error. In discussing this issue with Bruce Fields, I initially proposed ignoring the error in nfsd4_encode_dirent_fattr() if all that was being asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however, Bruce suggested that we avoid calling cross_mnt() if only these attributes are requested. The following patch implements bypassing cross_mnt() if only FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there is some complexity in the code in nfsd4_encode_fattr(), I didn't want to duplicate code (and introduce a maintenance nightmare), so I added a parameter to nfsd4_encode_fattr() that indicates whether it should ignore cross mounts and simply fill in the attribute using the passed in dentry as opposed to it's parent. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	39325bd03f	nfsd4: fix bad seqid on lock request incompatible with open mode The failure to return a stateowner from nfs4_preprocess_seqid_op() means in the case where a lock request is of a type incompatible with an open (due to, e.g., an application attempting a write lock on a file open for read), means that fs/nfsd/nfs4xdr.c:ENCODE_SEQID_OP_TAIL() never bumps the seqid as it should. The client, attempting to close the file afterwards, then gets an (incorrect) bad sequence id error. Worse, this prevents the open file from ever being closed, so we leak state. Thanks to Benny Halevy and Trond Myklebust for analysis, and to Steven Wilton for the report and extensive data-gathering. Cc: Benny Halevy <bhalevy@panasas.com> Cc: Steven Wilton <steven.wilton@team.eftel.com.au> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
Oleg Drokin	b7e6b86948	lockd: fix reference count leaks in async locking case In a number of places where we wish only to translate nlm_drop_reply to rpc_drop_reply errors we instead return early with rpc_drop_reply, skipping some important end-of-function cleanup. This results in reference count leaks when lockd is doing posix locking on GFS2. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	404ec117be	nfsd4: recognize callback channel failure earlier When the callback channel fails, we inform the client of that by returning a cb_path_down error the next time it tries to renew its lease. If we wait most of a lease period before deciding that a callback has failed and that the callback channel is down, then we decrease the chances that the client will find out in time to do anything about it. So, mark the channel down as soon as we recognize that an rpc has failed. However, continue trying to recall delegations anyway, in hopes it will come back up. This will prevent more delegations from being given out, and ensure cb_path_down is returned to renew calls earlier, while still making the best effort to deliver recalls of existing delegations. Also fix a couple comments and remove a dprink that doesn't seem likely to be useful. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00

... 4 5 6 7 8 ...

8399 Commits