OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Trond Myklebust	39967ddf19	NFS: Reduce the stack footprint of nfs_rmdir Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:26 -04:00
Trond Myklebust	d346890bea	NFS: Reduce stack footprint of nfs_proc_remove() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:26 -04:00
Trond Myklebust	3b14d6542d	NFS: Reduce stack footprint of nfs3_proc_readlink() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:25 -04:00
Trond Myklebust	136f2627c9	NFS: Reduce the stack footprint of nfs_link() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:25 -04:00
Trond Myklebust	aa49b4cf7d	NFS: Reduce stack footprint of nfs_readdir() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:25 -04:00
Trond Myklebust	011fff7239	NFS: Reduce stack footprint of nfs3_proc_rename() and nfs4_proc_rename() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:25 -04:00
Trond Myklebust	a3cba2aad9	NFS: Reduce stack footprint of nfs_revalidate_inode() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:24 -04:00
Trond Myklebust	c407d41a16	NFSv4: Reduce stack footprint of nfs4_proc_access() and nfs3_proc_access() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:24 -04:00
Trond Myklebust	4f727296d2	NFSv4: Reduce the stack footprint of nfs4_remote_referral_get_sb Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:23 -04:00
Trond Myklebust	8bac9db9cf	NFSv4: Reduce stack footprint of nfs4_get_root() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:23 -04:00
Trond Myklebust	04ffdbe2e6	NFS: Reduce the stack footprint of nfs_follow_remote_path() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:23 -04:00
Trond Myklebust	e1fb4d05d5	NFS: Reduce the stack footprint of nfs_lookup Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:23 -04:00
Trond Myklebust	364d015e52	NFSv4: Reduce the stack footprint of try_location() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:22 -04:00
Trond Myklebust	fbca779a8d	NFS: Reduce the stack footprint of nfs_create_server Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:22 -04:00
Trond Myklebust	a4d7f16806	NFS: Reduce the stack footprint of nfs_follow_mountpoint() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:22 -04:00
Trond Myklebust	815409d22d	NFSv4: Eliminate nfs4_path_walk() All we really want is the ability to retrieve the root file handle. We no longer need the ability to walk down the path, since that is now done in nfs_follow_remote_path(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:21 -04:00
Trond Myklebust	2d36bfde85	NFS: Add helper functions for allocating filehandles and fattr structs NFS Filehandles and struct fattr are really too large to be allocated on the stack. This patch adds in a couple of helper functions to allocate them dynamically instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-14 15:09:21 -04:00
Linus Torvalds	4fc4c3ce0d	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: don't leak user struct on inotify release inotify: race use after free/double free in inotify inode marks inotify: clean up the inotify_add_watch out path Inotify: undefined reference to `anon_inode_getfd' Manual merge to remove duplicate "select ANON_INODES" from Kconfig file	2010-05-14 11:49:42 -07:00
Pavel Emelyanov	b3b38d842f	inotify: don't leak user struct on inotify release inotify_new_group() receives a get_uid-ed user_struct and saves the reference on group->inotify_data.user. The problem is that free_uid() is never called on it. Issue seem to be introduced by `63c882a0` (inotify: reimplement inotify using fsnotify) after 2.6.30. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Eric Paris <eparis@parisplace.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-05-14 11:53:36 -04:00
Eric Paris	e08733446e	inotify: race use after free/double free in inotify inode marks There is a race in the inotify add/rm watch code. A task can find and remove a mark which doesn't have all of it's references. This can result in a use after free/double free situation. Task A Task B ------------ ----------- inotify_new_watch() allocate a mark (refcnt == 1) add it to the idr inotify_rm_watch() inotify_remove_from_idr() fsnotify_put_mark() refcnt hits 0, free take reference because we are on idr [at this point it is a use after free] [time goes on] refcnt may hit 0 again, double free The fix is to take the reference BEFORE the object can be found in the idr. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: <stable@kernel.org>	2010-05-14 11:52:57 -04:00
Eric Paris	3dbc6fb6a3	inotify: clean up the inotify_add_watch out path inotify_add_watch explictly frees the unused inode mark, but it can just use the generic code. Just do that. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-05-14 11:51:07 -04:00
Steven Whitehouse	6a99be5d7b	GFS2: Fix typo A missing ! in a test. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-14 14:05:51 +01:00
Steve French	baa4563317	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/cifs/inode.c	2010-05-13 22:19:32 +00:00
Linus Torvalds	4462dc0284	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: guard against hardlinking directories	2010-05-13 10:36:16 -07:00
J. Bruce Fields	4dc6ec00f6	nfsd4: implement reclaim_complete This is a mandatory operation. Also, here (not in open) is where we should be committing the reboot recovery information. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-13 12:03:11 -04:00
Benny Halevy	ab707e1565	nfsd4: nfsd4_destroy_session must set callback client under the state lock nfsd4_set_callback_client must be called under the state lock to atomically set or unset the callback client and shutting down the previous one. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-13 11:59:11 -04:00
Benny Halevy	d76829889a	nfsd4: keep a reference count on client while in use Get a refcount on the client on SEQUENCE, Release the refcount and renew the client when all respective compounds completed. Do not expire the client by the laundromat while in use. If the client was expired via another path, free it when the compounds complete and the refcount reaches 0. Note that unhash_client_locked must call list_del_init on cl_lru as it may be called twice for the same client (once from nfs4_laundromat and then from expire_client) Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-13 11:58:54 -04:00
Benny Halevy	07cd4909a6	nfsd4: mark_client_expired Mark the client as expired under the client_lock so it won't be renewed when an nfsv4.1 session is done, after it was explicitly expired during processing of the compound. Do not renew a client mark as expired (in particular, it is not on the lru list anymore) Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-13 11:47:22 -04:00
Benny Halevy	46583e2597	nfsd4: introduce nfs4_client.cl_refcount Currently just initialize the cl_refcount to 1 and decrement in expire_client(), conditionally freeing the client when the refcount reaches 0. To be used later by nfsv4.1 compounds to keep the client from timing out while in use. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-13 11:47:03 -04:00
Jan Kara	002baeecf5	vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW \| O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-13 08:46:04 -07:00
Linus Torvalds	cdf5f61ed1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: preserve seq # on requeued messages after transient transport errors ceph: fix cap removal races ceph: zero unused message header, footer fields ceph: fix locking for waking session requests after reconnect ceph: resubmit requests on pg mapping change (not just primary change) ceph: fix open file counting on snapped inodes when mds returns no caps ceph: unregister osd request on failure ceph: don't use writeback_control in writepages completion ceph: unregister bdi before kill_anon_super releases device name	2010-05-12 18:47:29 -07:00
David Howells	7ac512aa82	CacheFiles: Fix error handling in cachefiles_determine_cache_security() cachefiles_determine_cache_security() is expected to return with a security override in place. However, if set_create_files_as() fails, we fail to do this. In this case, we should just reinstate the security override that was set by the caller. Furthermore, if set_create_files_as() fails, we should dispose of the new credentials we were in the process of creating. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-12 18:23:58 -07:00
Russell King	e7b702b1a8	Inotify: undefined reference to `anon_inode_getfd' Fix: fs/built-in.o: In function `sys_inotify_init1': summary.c:(.text+0x347a4): undefined reference to `anon_inode_getfd' found by kautobuild with arms bcmring_defconfig, which ends up with INOTIFY_USER enabled (through the 'default y') but leaves ANON_INODES unset. However, inotify_user.c uses anon_inode_getfd(). Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-05-12 11:03:40 -04:00
Bob Peterson	cc0581bd61	GFS2: stuck in inode wait, no glocks stuck This patch changes the lock ordering when gfs2 reclaims unlinked dinodes, thereby avoiding a livelock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-12 09:55:39 +01:00
Bob Peterson	eaefbf968a	GFS2: Eliminate useless err variable This patch removes an unneeded "err" variable that is always returned as zero. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-12 09:52:50 +01:00
David S. Miller	278554bd65	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ath/ar9170/usb.c drivers/scsi/iscsi_tcp.c net/ipv4/ipmr.c	2010-05-12 00:05:35 -07:00
Sage Weil	e84346b726	ceph: preserve seq # on requeued messages after transient transport errors If the tcp connection drops and we reconnect to reestablish a stateful session (with the mds), we need to resend previously sent (and possibly received) messages with the _same_ seq # so that they can be dropped on the other end if needed. Only assign a new seq once after the message is queued. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 21:20:38 -07:00
Sage Weil	f818a73674	ceph: fix cap removal races The iterate_session_caps helper traverses the session caps list and tries to grab an inode reference. However, the __ceph_remove_cap was clearing the inode backpointer _before_ removing itself from the session list, causing a null pointer dereference. Clear cap->ci under protection of s_cap_lock to avoid the race, and to tightly couple the list and backpointer state. Use a local flag to indicate whether we are releasing the cap, as cap->session may be modified by a racing thread in iterate_session_caps. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 20:56:31 -07:00
Steve French	aa3e5572c5	[CIFS] drop quota operation stubs CIFS has stubs for XFS-style quotas without an actual implementation backing them, hidden behind a config option not visible in Kconfig. Remove these stubs for now as the quota operations will see some major changes and this code simply gets in the way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-12 02:24:12 +00:00
Benny Halevy	84d38ac9ab	nfsd4: refactor expire_client Separate out unhashing of the client and session. To be used later by the laundromat. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-11 21:02:02 -04:00
Benny Halevy	36acb66bda	nfsd4: extend the client_lock to cover cl_lru To be used later on to hold a reference count on the client while in use by a nfsv4.1 compound. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-11 21:02:02 -04:00
Benny Halevy	328efbab0f	nfsd4: use list_move in move_to_confirmed rather than list_del_init, list_add Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-11 21:02:01 -04:00
Benny Halevy	be1fdf6c43	nfsd4: fold release_session into expire_client and grab the client lock once for all the client's sessions. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-11 21:02:01 -04:00
Benny Halevy	9089f1b478	nfsd4: rename sessionid_lock to client_lock In preparation to share the lock's scope to both client and session hash tables. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-11 21:02:01 -04:00
Robin Holt	34441427aa	revert "procfs: provide stack information for threads" and its fixup commits Originally, commit `d899bf7b` ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit `c44972f1` ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit `89240ba0` ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit `9ebd4eba7` ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit `1306d603f` ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted `c44972f1` ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted `89240ba0` ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-11 17:33:41 -07:00
Sage Weil	45c6ceb547	ceph: zero unused message header, footer fields We shouldn't leak any prior memory contents to other parties. And random data, particularly in the 'version' field, can cause problems down the line. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 15:17:40 -07:00
Jeff Layton	3d69438031	cifs: guard against hardlinking directories When we made serverino the default, we trusted that the field sent by the server in the "uniqueid" field was actually unique. It turns out that it isn't reliably so. Samba, in particular, will just put the st_ino in the uniqueid field when unix extensions are enabled. When a share spans multiple filesystems, it's quite possible that there will be collisions. This is a server bug, but when the inodes in question are a directory (as is often the case) and there is a collision with the root inode of the mount, the result is a kernel panic on umount. Fix this by checking explicitly for directory inodes with the same uniqueid. If that is the case, then we can assume that using server inode numbers will be a problem and that they should be disabled. Fixes Samba bugzilla 7407 Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-11 20:57:50 +00:00
David Howells	c61ea31dac	CacheFiles: Fix occasional EIO on call to vfs_unlink() Fix an occasional EIO returned by a call to vfs_unlink(): [ 4868.465413] CacheFiles: I/O Error: Unlink failed [ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error [ 4947.320011] CacheFiles: File cache on md3 unregistering [ 4947.320041] FS-Cache: Withdrawing cache "mycache" [ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles) [ 5127.348716] CacheFiles: File cache on md3 registered [ 7076.871081] CacheFiles: I/O Error: Unlink failed [ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error [ 7116.780891] CacheFiles: File cache on md3 unregistering [ 7116.780937] FS-Cache: Withdrawing cache "mycache" [ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles) [ 7296.813432] CacheFiles: File cache on md3 registered What happens is this: (1) A cached NFS file is seen to have become out of date, so NFS retires the object and immediately acquires a new object with the same key. (2) Retirement of the old object is done asynchronously - so the lookup/create to generate the new object may be done first. This can be a problem as the old object and the new object must exist at the same point in the backing filesystem (i.e. they must have the same pathname). (3) The lookup for the new object sees that a backing file already exists, checks to see whether it is valid and sees that it isn't. It then deletes that file and creates a new one on disk. (4) The retirement phase for the old file is then performed. It tries to delete the dentry it has, but ext4_unlink() returns -EIO because the inode attached to that dentry no longer matches the inode number associated with the filename in the parent directory. The trace below shows this quite well. [md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1) [md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100) NFS has retired the old cookie and asked for a new one. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_DYING] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP] [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING] The old object (OBJ52) is going through the terminal states to get rid of it, whilst the new object - (OBJ53) - is coming into being. [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0}) [kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,) [kslowd] lookup '@68' [kslowd] next -> ffff88002ce41bd0 positive [kslowd] advance [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff8800369faac8 positive The new object has looked up the subdir in which the file would be in (getting dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry ffff8800369faac8). [kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object [kslowd] <== cachefiles_bury_object() = 0 It then checks the file's xattrs to see if it's valid. NFS says that the auxiliary data indicate the file is out of date (obvious to us - that's why NFS ditched the old version and got a new one). CacheFiles then deletes the old file (dentry ffff8800369faac8). [kslowd] redo lookup [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff88002cd94288 negative [kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}} CacheFiles then redoes the lookup and gets a negative result in a new dentry (ffff88002cd94288) which it then creates a file for. [kslowd] ==> cachefiles_mark_object_active(,OBJ53) [kslowd] <== cachefiles_mark_object_active() = 0 [kslowd] === OBTAINED_OBJECT === [kslowd] <== cachefiles_walk_to_object() = 0 [148247] [kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE] The new object is then marked active and the state machine moves to the available state - at which point NFS can start filling the object. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20}) [kslowd] ==> fscache_release_object() [kslowd] ==> cachefiles_drop_object({OBJ52,2}) [kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8}) The old object, meanwhile, goes on with being retired. If allocation occurs first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to become available before it can continue. [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193) CacheFiles: I/O Error: Unlink failed FS-Cache: Cache cachefiles stopped due to I/O error CacheFiles then tries to delete the file for the old object, but the dentry it has (ffff8800369faac8) no longer points to a valid inode for that directory entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino. [kslowd] <== cachefiles_bury_object() = -5 [kslowd] <== cachefiles_delete_object() = -5 [kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE] (Note that the above trace includes extra information beyond that produced by the upstream code). The fix is to note when an object that is being retired has had its object deleted preemptively by a replacement object that is being created, and to skip the second removal attempt in such a case. Reported-by: Greg M <gregm@servu.net.au> Reported-by: Mark Moseley <moseleymark@gmail.com> Reported-by: Romain DEGEZ <romain.degez@smartjog.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-11 10:07:53 -07:00
Sage Weil	9abf82b8bc	ceph: fix locking for waking session requests after reconnect The session->s_waiting list is protected by mdsc->mutex, not s_mutex. This was causing (rare) s_waiting list corruption. Fix errors paths too, while we're here. A more thorough cleanup of this function is coming soon. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:57 -07:00
Sage Weil	d85b705663	ceph: resubmit requests on pg mapping change (not just primary change) OSD requests need to be resubmitted on any pg mapping change, not just when the pg primary changes. Resending only when the primary changes results in occasional 'hung' requests during osd cluster recovery or rebalancing. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:56 -07:00
Sage Weil	04d000eb35	ceph: fix open file counting on snapped inodes when mds returns no caps It's possible the MDS will not issue caps on a snapped inode, in which case an open request may not __ceph_get_fmode(), botching the open file counting. (This is actually a server bug, but the client shouldn't BUG out in this case.) Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:55 -07:00
Sage Weil	0ceed5db32	ceph: unregister osd request on failure The osd request wasn't being unregistered when the osd returned a failure code, even though the result was returned to the caller. This would cause it to eventually time out, and then crash the kernel when it tried to resend the request using a stale page vector. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:18 -07:00
Changli Gao	a93d2f1744	sched, wait: Use wrapper functions epoll should not touch flags in wait_queue_t. This patch introduces a new function __add_wait_queue_exclusive(), for the users, who use wait queue as a LIFO queue. __add_wait_queue_tail_exclusive() is introduced too instead of add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as it is a duplicate of __remove_wait_queue(), disliked by users, and with less users. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: <containers@lists.linux-foundation.org> LKML-Reference: <1273214006-2979-1-git-send-email-xiaosuo@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-11 17:43:58 +02:00
Suresh Jayaraman	fdb3603800	cifs: propagate cifs_new_fileinfo() error back to the caller ..otherwise memory allocation errors go undetected. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-11 15:42:21 +00:00
Bill Pemberton	8ac97b74bc	jbd2: use NULL instead of 0 when pointer is needed Fixes sparse warning: fs/jbd2/journal.c:1892:9: warning: Using plain integer as NULL pointer Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> CC: linux-ext4@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-05-11 10:04:36 +02:00
Jiri Slaby	6a727b43be	FS / libfs: Implement simple_write_to_buffer It will be used in suspend code and serves as an easy wrap around copy_from_user. Similar to simple_read_from_buffer, it takes care of transfers with proper lengths depending on available and count parameters and advances ppos appropriately. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-05-10 23:08:17 +02:00
Joel Becker	547ba7c8ef	ocfs2: Block signals for mkdir/link/symlink/O_CREAT. Once file or link creation gets going, it can't be interrupted by a signal. They're not idempotent. This blocks signals in ocfs2_mknod(), ocfs2_link(), and ocfs2_symlink() once we start actually changing things. ocfs2_mknod() covers mknod(), creat(), mkdir(), and open(O_CREAT). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-10 11:56:52 -07:00
Joel Becker	e4b963f10e	ocfs2: Wrap signal blocking in void functions. ocfs2 sometimes needs to block signals around dlm operations, but it currently does it with sigprocmask(). Even worse, it's checking the error code of sigprocmask(). The in-kernel sigprocmask() can only error if you get the SIG_* argument wrong. We don't. Wrap the sigprocmask() calls with ocfs2_[un]block_signals(). These functions are void, but they will BUG() if somehow sigprocmask() returns an error. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-10 11:50:10 -07:00
Suresh Jayaraman	fae683f764	cifs: add comments explaining cifs_new_fileinfo behavior The comments make it clear the otherwise subtle behavior of cifs_new_fileinfo(). Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> -- fs/cifs/dir.c \| 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-10 17:59:51 +00:00
Ian Kent	f7422464b5	autofs4-2.6.34-rc1 - fix link_count usage After commit `1f36f774b2` ("Switch !O_CREAT case to use of do_last()") in 2.6.34-rc1 autofs direct mounts stopped working. This is caused by current->link_count being 0 when ->follow_link() is called from do_filp_open(). I can't work out why this hasn't been seen before Als patch series. This patch removes the autofs dependence on current->link_count. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-10 09:48:10 -07:00
Suresh Jayaraman	51c8176472	cifs: remove unused parameter from cifs_posix_open_inode_helper() ..a left over from the commit `3321b791b2`. Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-10 13:52:00 +00:00
David Woodhouse	0ae28a35bc	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/mtd/mtdcore.c Pull in the bdi fixes and ARM platform changes that other outstanding patches depend on.	2010-05-10 14:32:46 +01:00
Abhijith Das	7e619bc3e6	GFS2: Fix writing to non-page aligned gfs2_quota structures This is the upstream fix for this bug. This patch differs from the RHEL5 fix (Red Hat bz #555754) which simply writes to the 8-byte value field of the quota. In upstream quota code, we're required to write the entire quota (88 bytes) which can be split across a page boundary. We check for such quotas, and read/write the two parts from/to the corresponding pages holding these parts. With this patch, I don't see the bug anymore using the reproducer in Red Hat bz 555754. I successfully ran a couple of simple tests/mounts/ umounts and it doesn't seem like this patch breaks anything else. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-10 13:07:11 +01:00
Anand Gadiyar	a8cd4561ea	fix "seperate" typos in comments s/seperate/separate Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-05-10 11:56:30 +02:00
Ryusuke Konishi	d240e06713	nilfs2: disallow remount of snapshot from/to a regular mount Snapshots and regular ro/rw mounts are essentially-different within the meaning whether the checkpoint is static or not and is marked with a snapshot flag or not. The current implemenation, however, allows to remount a snapshot to a regular rw-mount if the checkpoint number equals the latest one. This transition is actually impossible since changing a checkpoint to a snapshot makes another checkpoint, thus the condition is never satisfied. This fixes the weird state of affairs, and specifically separates snapshots and regular rw/ro-mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	cdce214e39	nilfs2: use huge_encode_dev/huge_decode_dev This replaces uses of new_encode_dev/new_decode_dev with their 64-bit counterparts, huge_encode_dev/huge_decode_dev respectively. This is just for clarification and has no impact on the disk format. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	b87ca91948	nilfs2: update comment on deactivate_super at nilfs_get_sb deactivate_super was replaced with deactivate_locked_super, but the comment of nilfs_get_sb remain unchanged. This renews the comment. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	e2d1591a13	nilfs2: replace MS_VERBOSE with MS_SILENT MS_VERBOSE is deprecated. This replaces it with MS_SILENT in reference to get_sb_bdev function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	4571b82cdc	nilfs2: add missing initialization of s_mode An fmode_t argument is passed to kill_block_super() through s_mode member of the super_block structure. This is used to release the block device with the same mode, however, nilfs does not set s_mode anywhere. This modifies nilfs_get_sb function to properly initialize the s_mode member. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:33 +09:00
Ryusuke Konishi	13e905592b	nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive The second argument of open_bdev_exclusive/close_bdev_exclusive takes fmode_t flags instead of mount flags. This fixes the misuse. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:33 +09:00
Ryusuke Konishi	25294d8c37	nilfs2: use checkpoint number instead of timestamp to select super block Nilfs maintains two super blocks, and selects the new one on mount time if they both have valid checksums and their timestamps differ. However, this has potential for mis-selection since the system clock may be rewinded and the resolution of the timestamps is not high. Usually this doesn't become an issue because both super blocks are updated at the same time when the file system is unmounted. Even if the file system wasn't unmounted cleanly, the roll-forward recovery will find the proper log which stores the latest super root. Thus, the issue can appear only if update of one super block fails and the clock happens to be rewinded. This fixes the issue by using checkpoint numbers instead of timestamps to pick the super block storing the location of the latest log. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:32 +09:00
Ryusuke Konishi	34cb9b5c97	nilfs2: add missing endian conversion on super block magic number This adds missing endian conversions in comparision of the magic number of super blocks. It was coincidence that prior versions didn't incur problems; the upper byte of the magic number happened to be equal to the lower byte. But, semantically it's wrong to depend on this. This won't change anything else nor suffer any compatibility issues. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:32 +09:00
Ryusuke Konishi	4e819509cb	nilfs2: make nilfs_sc_*_ops static This kills the following sparse warnings: fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static? fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static? fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static? Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Ryusuke Konishi	db55d92252	nilfs2: add kernel doc comments to persistent object allocator functions The implementation of persistent object allocator (alloc.c) is poorly documented. This adds kernel doc style comments on that functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Li Hong	fdce895ea5	nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its address can't be set to sci->sc_timer and passed in several procedures. It works now by chance, just because other procedures are called by nilfs_segctor_thread() directly or indirectly and the stack hasn't been deallocated yet. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Li Hong	154ac5a830	nilfs2: remove nilfs_segctor_init() in segment.c There are only two lines of code in nilfs_segctor_init(). From a logic design view, the first line 'sci->sc_seq_done = sci->sc_seq_request;' should be put in nilfs_segctor_new(). Even in nilfs_segctor_new(), this initialization is needless because sci is kzalloc-ed. So nilfs_segctor_init() is only a wrap call to nilfs_segctor_start_thread(). Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Ryusuke Konishi	50614bcf29	nilfs2: insert checkpoint number in segment summary header This adds a field to record the latest checkpoint number in the nilfs_segment_summary structure. This will help to recover the latest checkpoint number from logs on disk. This field is intended for crucial cases in which super blocks have lost pointer to the latest log. Even though this will change the disk format, both backward and forward compatibility is preserved by a size field prepared in the segment summary header. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Li Hong	9f130263f3	nilfs2: add a print message after loading nilfs2 Printing a message after loading a file system is a practice. Add this to provide a better user-friendly experience. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Li Hong	41c88bd74d	nilfs2: cleanup multi kmem_cache_{create,destroy} code This cleanup patch gives several improvements: - Moving all kmem_cache_{create_destroy} calls into one place, which removes some small function calls, cleans up error check code and clarify the logic. - Mark all initial code in __init section. - Remove some very obvious comments. - Adjust some declarations. - Fix some space-tab issues. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Ryusuke Konishi	aaed1d5bfa	nilfs2: move out checksum routines to segment buffer code This moves out checksum routines in log writer to segbuf.c for cleanup. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Ryusuke Konishi	1e2b68bf28	nilfs2: move pointer to super root block into logs This moves a pointer to buffer storing super root block to each log buffer from nilfs_sc_info struct for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Ryusuke Konishi	277a6a3417	nilfs2: change default of 'errors' mount option to 'remount-ro' mode Like ext3, nilfs has 'errors' mount option to allow specifying desired behavior on severe errors. Currently, the default action is 'errors=continue' and has potential to advance filesystem corruption for severe errors. This will change the action to 'errors=remount-ro' to avoid the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Li Hong	73bb48869b	nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path() nilfs_btree_release_path() and nilfs_btree_free_path() are bound into each other tightly. Make them into one procedure to clearify the logic and avoid some misusages. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:29 +09:00
Li Hong	f905440f5e	nilfs2: Combine nilfs_btree_alloc_path() and nilfs_btree_init_path() nilfs_btree_alloc_path() and nilfs_btree_init_path() are bound into each other tightly. Make them into one procedure to clearify the logic and avoid some misusages. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:29 +09:00
J. Bruce Fields	5d4cec2f2f	nfsd4: fix bare destroy_session null dereference It's legal to send a DESTROY_SESSION outside any session (as the only operation in a compound), in which case cstate->session will be NULL; check for that case. While we're at it, move these checks into a separate helper function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-07 19:08:47 -04:00
Linus Torvalds	9167746716	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix RCU issues in the NFSv4 delegation code NFSv4: Fix the locking in nfs_inode_reclaim_delegation()	2010-05-07 13:59:48 -07:00
Joern Engel	6f485b4187	logfs: handle powerfail on NAND flash The write buffer may not have been written and may no longer be written due to an interrupted write in the affected page. Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-07 19:38:40 +02:00
Dan Carpenter	ccf31c10f1	logfs: handle errors from get_mtd_device() The get_mtd_device() function returns error pointers on failure and if we don't handle it, it leads to a crash. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-07 14:16:09 +02:00
Joern Engel	58e323cf5e	logfs: remove unused variable Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-07 14:15:04 +02:00
Steven Whitehouse	913a71d250	GFS2: Add some useful messages The following patch adds a message to indicate when barriers have been disabled due to a block device which doesn't support them. You could already tell this via the mount options in /proc/mounts, but all the other filesystems also log a message at the same time. Also, the same mechanisms are used to indicate when the lock demote interface has been used (only ever used for debugging) which is a request from our support team. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-06 11:03:29 +01:00
Sage Weil	54ad023ba8	ceph: don't use writeback_control in writepages completion The ->writepages writeback_control is not still valid in the writepages completion. We were touching it solely to adjust pages_skipped when there was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials), causing an oops in the writeback code shortly thereafter. Updating pages_skipped on error isn't correct anyway, so let's just rip out this (clearly broken) code to pass the wbc to the completion. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-05 21:31:40 -07:00
Sunil Mushran	0467ae954d	ocfs2/dlm: Increase o2dlm lockres hash size Lockres hash size of 16KB is far too small for large filesystems (where we have hundreds of thousands of lock resources stored in the table). This patch increases it to 128KB. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:20:01 -07:00
Tao Ma	c901fb0073	ocfs2: Make ocfs2_extend_trans() really extend. In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's blocks. But if jbd2_journal_extend() fails, it will only restart with the the new number of blocks. This tends to be awkward since in most cases we want additional reserved blocks. It makes our code harder to mantain since the caller can't be sure all the original blocks will not be accessed and dirtied again. There are 15 callers of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add h_buffer_credits before they call ocfs2_extend_trans(). This makes ocfs2_extend_trans() really extend atop the original block count. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:09 -07:00
Tao Ma	3e4218df31	ocfs2/trivial: Code cleanup for allocation reservation. Two tiny cleanup for allocation reservation. 1. Remove some extra codes in ocfs2_local_alloc_find_clear_bits. 2. Remove an unuseful variables in ocfs2_find_resv_lhs. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:09 -07:00
Tao Ma	b065556a7d	ocfs2: make ocfs2_adjust_resv_from_alloc simple. When we allocate some bits from the reservation, we always allocate from the r_start(see ocfs2_resmap_resv_bits). So there should be no reason to check between r_start and start. And I don't think we will change this behaviour later by allocating from some bits after r_start. Why not make ocfs2_adjust_resv_from_alloc simple for now? The only chance we have to adjust the reservation is when we haven't reached the end. With this patch, the function is more readable. Note: btw, this patch also fixes an original bug in the function which I haven't found before. if (end < ocfs2_resv_end(resv)) rhs = end - ocfs2_resv_end(resv); This code is of course buggy. ;) Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:09 -07:00
Sunil Mushran	4b37fcb7d4	ocfs2: Make nointr a default mount option OCFS2 has never really supported intr. This patch acknowledges this reality and makes nointr the default mount option. In a later patch, we intend to support intr. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:08 -07:00
Sunil Mushran	5c80d4c9e5	ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE o2dlm join and leave messages are more than informational as they are required for debugging locking issues. This patch changes them from KERN_INFO to KERN_NOTICE. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:08 -07:00
Srinivas Eeda	23fd9abdc8	o2net: log socket state changes This patch logs socket state changes that lead to socket shutdown. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:08 -07:00
Wengang Wang	a5196ec5ef	ocfs2: print node # when tcp fails Print the node number of a peer node if sending it a message failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:08 -07:00
Mark Fasheh	83f92318fa	ocfs2: Add dir_resv_level mount option The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:07 -07:00
Mark Fasheh	b07f8f24df	ocfs2: change default reservation window sizes The default reservation size of 4 (32-bit windows) is a bit too ambitious. Scale it back to 16 bits (resv_level=2). I have been testing various sizes on a 4-node cluster which runs a mixed workload that is heavily threaded. With a 256MB local alloc, I get roughly the following levels of average file fragmentation: resv_level=0 70% resv_level=1 21% resv_level=2 23% resv_level=3 24% resv_level=4 60% resv_level=5 did not test resv_level=6 60% resv_level=2 seemed like a good compromise between not letting windows be too small, but not so big that heavier workloads will immediately suffer without tuning. This patch also change the behavior of directory reservations - they now track file reservations. The previous compromise of giving directory windows only 8 bits wound up fragmenting more at some window sizes because file allocations had smaller unused windows to poach from. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:07 -07:00
Mark Fasheh	6b82021b9e	ocfs2: increase the default size of local alloc windows I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:07 -07:00
Mark Fasheh	73c8a80003	ocfs2: clean up localalloc mount option size parsing This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:06 -07:00
Mark Fasheh	a57c8fd2ad	ocfs2: remove ocfs2_local_alloc_in_range() Inodes are always allocated from the global bitmap now so we don't need this any more. Also, the existing implementation bounces reservations around needlessly. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:31 -07:00
Mark Fasheh	33d5d380d6	ocfs2: allocate btree internal block groups from the global bitmap Otherwise, the need for a very large contiguous allocation tends to wreak havoc on many inode allocation reservations on the local alloc, thus ruining any chances for contiguousness. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:31 -07:00
Mark Fasheh	e3b4a97dbe	ocfs2: use allocation reservations for directory data Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Mark Fasheh	4fe370afaa	ocfs2: use allocation reservations during file write Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Mark Fasheh	d02f00cc05	ocfs2: allocation reservations This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Joel Becker	ec20cec7a3	ocfs2: Make ocfs2_journal_dirty() void. jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:17:29 -07:00
James Morris	0ffbe2699c	Merge branch 'master' into next	2010-05-06 10:56:07 +10:00
Steve French	bdfae149c5	[CIFS] Remove unused cifs_oplock_cachep CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-06 00:38:16 +00:00
Jeff Layton	26efa0bac9	cifs: have decode_negTokenInit set flags in server struct ...rather than the secType. This allows us to get rid of the MSKerberos securityEnum. The client just makes a decision at upcall time. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-05 23:24:11 +00:00
Jeff Layton	198b568278	cifs: break negotiate protocol calls out of cifs_setup_session So that we can reasonably set up the secType based on both the NegotiateProtocol response and the parsed mount options. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-05 23:18:27 +00:00
Joern Engel	c0c79c31c9	logfs: fix sync Rather self-explanatory. Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-05 22:33:36 +02:00
Joern Engel	bba0b5c2c2	logfs: fix compile failure When CONFIG_BLOCK is not enabled: fs/logfs/super.c:142: error: implicit declaration of function 'bdev_get_queue' fs/logfs/super.c:142: error: invalid type argument of '->' (have 'int') Found by Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-05 22:32:52 +02:00
John Kacur	2f07a88b30	udf: BKL ioctl pushdown Convert udf_ioctl to an unlocked_ioctl and push the BKL down into it. Signed-off-by: John Kacur <jkacur@redhat.com Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-05 16:36:17 +02:00
Christoph Hellwig	ad6bb90f34	GFS2: fix quota state reporting We need to report both the accounting and enforcing flags if we are in enforcing mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-05 09:39:55 +01:00
Benjamin Marzinski	5e687eac1b	GFS2: Various gfs2_logd improvements This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-05 09:39:18 +01:00
Linus Torvalds	7572e56314	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). ocfs2: Avoid direct write if we fall back to buffered I/O ocfs2_dlmfs: Fix math error when reading LVB. ocfs2: Update VFS inode's id info after reflink. ocfs2: potential ERR_PTR dereference on error paths ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod() ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code ocfs2: Reset status if we want to restart file extension. ocfs2: Compute metaecc for superblocks during online resize. ocfs2: Check the owner of a lockres inside the spinlock ocfs2: one more warning fix in ocfs2_file_aio_write(), v2 ocfs2_dlmfs: User DLM_* when decoding file open flags.	2010-05-04 16:33:18 -07:00
Sage Weil	5dfc589a84	ceph: unregister bdi before kill_anon_super releases device name Unregister and destroy the bdi in put_super, after mount is r/o, but before put_anon_super releases the device name. For symmetry, bdi_destroy in destroy_client (we bdi_init in create_client). Only set s_bdi if bdi_register succeeds, since we use it to decide whether to bdi_unregister. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-04 16:14:46 -07:00
Prasad Joshi	24797535e1	logfs: initialize li->li_refcount li_refcount was not re-initialized in function logfs_init_inode(), small patch that will fix the problem Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-04 22:17:08 +02:00
Joern Engel	05ebad8529	logfs: commit reservations under space pressure Ensures we only return -ENOSPC when there really is no space. Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-04 19:41:09 +02:00
Joern Engel	20503664b0	logfs: survive logfs_buf_recover read errors Refusing to mount beats a kernel crash. Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-04 19:37:04 +02:00
J. Bruce Fields	5306293c9c	Merge commit 'v2.6.34-rc6' Conflicts: fs/nfsd/nfs4callback.c	2010-05-04 11:29:05 -04:00
Benny Halevy	dbd65a7e44	nfsd4: use local variable in nfs4svc_encode_compoundres 'cs' is already computed, re-use it. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-04 10:10:36 -04:00
Joel Becker	d577632e65	ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). gcc warns that a variable is uninitialized. It's actually handled, but an early return fools gcc. Let's just initialize the variable to a garbage value that will crash if the usage is ever broken. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-03 19:15:49 -07:00
Linus Torvalds	d93ac51c7a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: remove bad auth_x kmem_cache ceph: fix lockless caps check ceph: clear dir complete, invalidate dentry on replayed rename ceph: fix direct io truncate offset ceph: discard incoming messages with bad seq # ceph: fix seq counting for skipped messages ceph: add missing #includes ceph: fix leaked spinlock during mds reconnect ceph: print more useful version info on module load ceph: fix snap realm splits ceph: clear dir complete on d_move	2010-05-03 16:36:19 -07:00
Sage Weil	b0930f8d38	ceph: remove bad auth_x kmem_cache It's useless, since our allocations are already a power of 2. And it was allocated per-instance (not globally), which caused a name collision when we tried to mount a second file system with auth_x enabled. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:25 -07:00
Sage Weil	7ff899da02	ceph: fix lockless caps check The __ variant requires caller to hold i_lock. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:25 -07:00
Sage Weil	ea1409f961	ceph: clear dir complete, invalidate dentry on replayed rename If a rename operation is resent to the MDS following an MDS restart, the client does not get a full reply (containing the resulting metadata) back. In that case, a ceph_rename() needs to compensate by doing anything useful that fill_inode() would have, like d_move(). It also needs to invalidate the dentry (to workaround the vfs_rename_dir() bug) and clear the dir complete flag, just like fill_trace(). Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:25 -07:00
Sage Weil	5c6a2cdb4f	ceph: fix direct io truncate offset truncate_inode_pages_range wants the end offset to align with the last byte in a page. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:25 -07:00
Sage Weil	ae18756b9f	ceph: discard incoming messages with bad seq # We can get old message seq #'s after a tcp reconnect for stateful sessions (i.e., the MDS). If we get a higher seq #, that is an error, and we shouldn't see any bad seq #'s for stateless (mon, osd) connections. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:24 -07:00
Sage Weil	684be25c52	ceph: fix seq counting for skipped messages Increment in_seq even when the message is skipped for some reason. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:24 -07:00
Sage Weil	d45d0d970f	ceph: add missing #includes Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:24 -07:00
Sage Weil	0b0c06d147	ceph: fix leaked spinlock during mds reconnect Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:23 -07:00
Sage Weil	c8f16584ac	ceph: print more useful version info on module load Decouple the client version from the server side. Print relevant protocol and map version info instead. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:23 -07:00
Sage Weil	91dee39eeb	ceph: fix snap realm splits The snap realm split was checking i_snap_realm, not the list_head, to determine if an inode belonged in the new realm. The check always failed, which meant we always moved the inode, corrupting the old realm's list and causing various crashes. Also wait to release old realm reference to avoid possibility of use after free. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:23 -07:00
Sage Weil	c10f5e12ba	ceph: clear dir complete on d_move d_move() reorders the d_subdirs list, breaking the readdir result caching. Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on rename. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-03 10:49:22 -07:00
Ryusuke Konishi	973bec34bf	nilfs2: fix sync silent failure As of `32a88aa1`, __sync_filesystem() will return 0 if s_bdi is not set. And nilfs does not set s_bdi anywhere. I noticed this problem by the warning introduced by the recent commit `5129a469` ("Catch filesystem lacking s_bdi"). WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e() Hardware name: PowerEdge 2850 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38 Call Trace: [<c1028422>] warn_slowpath_common+0x60/0x90 [<c102845f>] warn_slowpath_null+0xd/0x10 [<c1095936>] vfs_kern_mount+0xc5/0x14e [<c1095a03>] do_kern_mount+0x32/0xbd [<c10a811e>] do_mount+0x671/0x6d0 [<c1073794>] ? __get_free_pages+0x1f/0x21 [<c10a684f>] ? copy_mount_options+0x2b/0xe2 [<c107b634>] ? strndup_user+0x48/0x67 [<c10a81de>] sys_mount+0x61/0x8f [<c100280c>] sysenter_do_call+0x12/0x32 This ensures to set s_bdi for nilfs and fixes the sync silent failure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-03 07:36:01 -07:00
J. Bruce Fields	26c0c75e69	nfsd4: fix unlikely race in session replay case In the replay case, the renew_client(session->se_client); happens after we've droppped the sessionid_lock, and without holding a reference on the session; so there's nothing preventing the session being freed before we get here. Thanks to Benny Halevy for catching a bug in an earlier version of this patch. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Benny Halevy <bhalevy@panasas.com>	2010-05-03 08:32:31 -04:00
David Howells	17d2c0a0c4	NFS: Fix RCU issues in the NFSv4 delegation code Fix a number of RCU issues in the NFSv4 delegation code. (1) delegation->cred doesn't need to be RCU protected as it's essentially an invariant refcounted structure. By the time we get to nfs_free_delegation(), the delegation is being released, so no one else should be attempting to use the saved credentials, and they can be cleared. However, since the list of delegations could still be under traversal at this point by such as nfs_client_return_marked_delegations(), the cred should be released in nfs_do_free_delegation() rather than in nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is insufficient as that doesn't stop the cred from being destroyed, and nor does calling put_rpccred() after call_rcu(), given that the latter is asynchronous. (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use rcu_derefence_protected() because they can only be called if nfs_client::cl_lock is held, and that guards against anyone changing nfsi->delegation under it. Furthermore, the barrier imposed by rcu_dereference() is superfluous, given that the spin_lock() is also a barrier. (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client struct so that it can issue lockdep advice based on clp->cl_lock for (2). (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation() should use rcu_access_pointer() outside the spinlocked region as they merely examine the pointer and don't follow it, thus rendering unnecessary the need to impose a partial ordering over the one item of interest. These result in an RCU warning like the following: [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by mount.nfs4/2281: #0: (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80 #1: (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a stack backtrace: Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110 Call Trace: [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs] [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs] [<ffffffff810c2d92>] clear_inode+0x9e/0xf8 [<ffffffff810c3028>] dispose_list+0x67/0x10e [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810b25bc>] deactivate_super+0x68/0x80 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8 [<ffffffff810c681b>] release_mounts+0x9a/0xb0 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs] [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs] [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs] [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8 [<ffffffff810c810b>] do_mount+0x782/0x7f9 [<ffffffff810c8205>] sys_mount+0x83/0xbe [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b Also on: fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs] [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs] [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs] ... And: fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs] [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs] [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs] [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs] ... Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-05-01 12:37:18 -04:00
Trond Myklebust	8f649c3762	NFSv4: Fix the locking in nfs_inode_reclaim_delegation() Ensure that we correctly rcu-dereference the delegation itself, and that we protect against removal while we're changing the contents. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2010-05-01 12:36:18 -04:00
Joern Engel	ccc0197b02	logfs: Close i_ino reuse race logfs_seek_hole() may return the same offset it is passed as argument. Found by Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-01 18:02:34 +02:00
Joern Engel	bd2b3f2959	logfs: fix logfs_seek_hole() logfs_seek_hole(inode, 0x200) would crap itself if the inode contained just 0x1ff (or fewer) blocks. Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-01 18:02:30 +02:00
Joern Engel	ad342631f1	logfs: Return -EINVAL if filesystem image doesn't match Signed-off-by: Joern Engel <joern@logfs.org>	2010-05-01 18:02:20 +02:00
Li Dongyang	6b933c8e6f	ocfs2: Avoid direct write if we fall back to buffered I/O when we fall back to buffered write from direct write, we call __generic_file_aio_write() but that will end up doing direct write even we are only prepared to do buffered write because the file has the O_DIRECT flag set. This is a fix for https://bugzilla.novell.com/show_bug.cgi?id=591039 revised with Joel's comments. Signed-off-by: Li Dongyang <lidongyang@novell.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-30 13:45:13 -07:00
Joel Becker	f9221fd803	Merge branch 'skip_delete_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2-mark into ocfs2-fixes	2010-04-30 13:37:29 -07:00
David Teigland	89d799d008	dlm: fix ast ordering for user locks Commit `7fe2b3190b` fixed possible misordering of completion asts (casts) and blocking asts (basts) for kernel locks. This patch does the same for locks taken by user space applications. Signed-off-by: David Teigland <teigland@redhat.com>	2010-04-30 14:52:51 -05:00
Dan Carpenter	99fb19d49e	dlm: cleanup remove unused code Smatch complains because "lkb" is never NULL. Looking at it, the original code actually adds the new element to the end of the list fine, so we can just get rid of the if condition. This code is four years old and no one has complained so it must work. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>	2010-04-30 14:52:28 -05:00
Ralf Baechle	12b1b32168	Inotify: Fix build failure in inotify user support CONFIG_INOTIFY_USER defined but CONFIG_ANON_INODES undefined will result in the following build failure: LD vmlinux fs/built-in.o: In function 'sys_inotify_init1': (.text.sys_inotify_init1+0x22c): undefined reference to 'anon_inode_getfd' fs/built-in.o: In function `sys_inotify_init1': (.text.sys_inotify_init1+0x22c): relocation truncated to fit: R_MIPS_26 against 'anon_inode_getfd' make[2]: * [vmlinux] Error 1 make[1]: * [sub-make] Error 2 make: *** [all] Error 2 Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-30 10:14:56 -07:00
Linus Torvalds	e97e7120eb	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: add a shrinker to background inode reclaim	2010-04-29 19:49:34 -07:00
Linus Torvalds	fed0a9c644	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: exofs: Fix "add bdi backing to mount session" fall out fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK	2010-04-29 17:18:07 -07:00
Dave Chinner	9bf729c0af	xfs: add a shrinker to background inode reclaim On low memory boxes or those with highmem, kernel can OOM before the background reclaims inodes via xfssyncd. Add a shrinker to run inode reclaim so that it inode reclaim is expedited when memory is low. This is more complex than it needs to be because the VM folk don't want a context added to the shrinker infrastructure. Hence we need to add a global list of XFS mount structures so the shrinker can traverse them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-04-29 16:22:13 -05:00
Boaz Harrosh	3c2023dd8e	exofs: Fix "add bdi backing to mount session" fall out The patch: add bdi backing to mount session (`b3d0ab7e60`) Has a bug in the placement of the bdi member at struct exofs_sb_info. The layout member must be kept last. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-29 20:35:29 +02:00
Jens Axboe	5477d0face	fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK When CONFIG_BLOCK is set, it ends up getting backing-dev.h included. But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is include <linux/backing-dev.h> directly from the file it's used from, so do that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-29 20:33:35 +02:00
Linus Torvalds	27fb8d7b1f	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4 nfs: fix some issues in nfs41_proc_reclaim_complete() NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear NFS: Fix an unstable write data integrity race nfs: testing for null instead of ERR_PTR() NFS: rsize and wsize settings ignored on v4 mounts NFSv4: Don't attempt an atomic open if the file is a mountpoint SUNRPC: Fix a bug in rpcauth_prune_expired	2010-04-29 10:23:44 -07:00
Arnd Bergmann	f80a0ca6ad	pktcdvd: improve BKL and compat_ioctl.c usage The pktcdvd driver uses proper locking and does not need the BKL in the ioctl and llseek functions of the character device, so kill both. Moving the compat_ioctl handling from common code into the driver itself fixes build problems when CONFIG_BLOCK is disabled. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-29 08:44:37 -07:00
Boaz Harrosh	a36fed12a4	exofs: Fix "add bdi backing to mount session" fall out Commit `b3d0ab7e60` ("exofs: add bdi backing to mount session") has a bug in the placement of the bdi member at struct exofs_sb_info. The layout member must be kept last. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-29 07:59:16 -07:00
Joern Engel	2e531fa0d0	LogFS: Fix typo in `b6349ac8` Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-29 15:19:28 +02:00
Dan Carpenter	3272c8a57b	logfs: testing the wrong variable There is a typo here. We should test "last" instead of "first". Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-29 15:19:27 +02:00
ZhangJieJing	2fde99cb55	UBIFS: mark VFS SB RO too If some read/write error happens (eg.CRC error), UBIFS swotches to read-only mode, but the VFS infomation still not update. This patch add this also make /proc/mounts update. Signed-off-by: Zhang Jiejing <kzjeef@gmail.com>	2010-04-29 15:12:18 +03:00
Al Viro	d9e80b7de9	nfs d_revalidate() is too trigger-happy with d_drop() If dentry found stale happens to be a root of disconnected tree, we can't d_drop() it; its d_hash is actually part of s_anon and d_drop() would simply hide it from shrink_dcache_for_umount(), leading to all sorts of fun, including busy inodes on umount and oopsen after that. Bug had been there since at least 2006 (commit c636eb already has it), so it's definitely -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-28 20:40:03 -07:00
Xiaotian Feng	9699eda6bc	nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4 With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for export_path in nfs4_validate_text_mount_data, so we need to free it then. This is addressed in following kmemleak report: unreferenced object 0xffff88016bf48a50 (size 16): comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s) hex dump (first 16 bytes): 2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5 /opt/work.kkkkk. backtrace: [<ffffffff814b34f9>] kmemleak_alloc+0x60/0xa7 [<ffffffff81102c76>] kmemleak_alloc_recursive.clone.5+0x1b/0x1d [<ffffffff811046b3>] __kmalloc_track_caller+0x18f/0x1b7 [<ffffffff810e1b08>] kstrndup+0x37/0x54 [<ffffffffa0336971>] nfs_parse_devname+0x152/0x204 [nfs] [<ffffffffa0336af3>] nfs4_validate_text_mount_data+0xd0/0xdc [nfs] [<ffffffffa0338deb>] nfs_get_sb+0x325/0x736 [nfs] [<ffffffff81113671>] vfs_kern_mount+0xbd/0x17c [<ffffffff81113798>] do_kern_mount+0x4d/0xed [<ffffffff81129a87>] do_mount+0x787/0x7fe [<ffffffff81129b86>] sys_mount+0x88/0xc2 [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-28 13:46:28 -04:00
Dan Carpenter	acf82b85a7	nfs: fix some issues in nfs41_proc_reclaim_complete() The original code passed an ERR_PTR() to rpc_put_task() and instead of returning zero on success it returned -ENOMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-28 13:45:12 -04:00
Linus Torvalds	970b06485f	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: coda: move backing-dev.h kernel include inside __KERNEL__ mtd: ensure that bdi entries are properly initialized and registered Move mtd_bdi_*mappable to mtdcore.c btrfs: convert to using bdi_setup_and_register() Catch filesystems lacking s_bdi drbd: Terminate a connection early if sending the protocol fails drbd: fix memory leak Fix JFFS2 sync silent failure smbfs: add bdi backing to mount session ncpfs: add bdi backing to mount session exofs: add bdi backing to mount session ecryptfs: add bdi backing to mount session coda: add bdi backing to mount session cifs: add bdi backing to mount session afs: add bdi backing to mount session. 9p: add bdi backing to mount session bdi: add helper function for doing init and register of a bdi for a file system block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer	2010-04-28 07:56:05 -07:00
Jeff Layton	ebe6aa5ac4	cifs: eliminate "first_time" parm to CIFS_SessSetup We can use the is_first_ses_reconnect() function to determine this. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-28 00:36:17 +00:00
Linus Torvalds	11e39d993d	Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: nfsd4: bug in read_buf	2010-04-27 16:26:21 -07:00
Jerome Marchand	3835541dd4	procfs: fix tid fdinfo Correct the file_operations struct in fdinfo entry of tid_base_stuff[]. Presently /proc//task//fdinfo contains symlinks to opened files like /proc/*/fd/. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-27 16:26:03 -07:00
Trond Myklebust	ba8b06e67e	NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in nfs_page_async_flush. According to the trace in https://bugzilla.novell.com/show_bug.cgi?id=599628 the problem appears to be due to nfs_wb_page() not waiting for the PG_writeback flag to clear. There is a ditto problem in nfs_wb_page_cancel() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-27 18:33:54 -04:00
Christoph Egger	16a5b3c414	Remove redundant check for CONFIG_MMU The checks for CONFIG_MMU at this location are duplicated as all the code is located inside a #ifndef CONFIG_MMU block. So the first conditional block will always be included while the second never will. Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-27 09:01:26 -07:00
Linus Torvalds	bc113f151a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: squashfs: fix potential buffer over-run on 4K block file systems squashfs: add missing buffer free squashfs: fix warn_on when root inode is corrupted squashfs: fix locking bug in zlib wrapper	2010-04-27 08:59:38 -07:00
Steve French	d54ff73259	[CIFS] Fix lease break for writes On lease break we were breaking to readonly leases always even if write requested. Also removed experimental ifdef around setlease code Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-27 04:38:15 +00:00
Jeff Layton	9bf67e516f	cifs: save the dialect chosen by server Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-27 02:17:08 +00:00
Neil Brown	2bc3c1179c	nfsd4: bug in read_buf When read_buf is called to move over to the next page in the pagelist of an NFSv4 request, it sets argp->end to essentially a random number, certainly not an address within the page which argp->p now points to. So subsequent calls to READ_BUF will think there is much more than a page of spare space (the cast to u32 ensures an unsigned comparison) so we can expect to fall off the end of the second page. We never encountered thsi in testing because typically the only operations which use more than two pages are write-like operations, which have their own decoding logic. Something like a getattr after a write may cross a page boundary, but it would be very unusual for it to cross another boundary after that. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-26 15:39:08 -04:00
Dan Carpenter	ad6cca6d5d	cifs: change && to \|\| This is a typo, if pvolume_info were NULL it would oops. This function is used in clean up and error handling. The current code never passes a NULL pvolume_info, but it could pass a NULL *pvolume_info if the kmalloc() failed. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-26 19:08:01 +00:00
Jeff Layton	04912d6a20	cifs: rename "extended_security" to "global_secflags" ...since that more accurately describes what that variable holds. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-26 18:55:33 +00:00
Jeff Layton	d00c28de55	cifs: move tcon find/create into separate function ...and out of cifs_mount. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-26 18:55:31 +00:00
Jeff Layton	36988c76f0	cifs: move SMB session creation code into separate function ...it's mostly part of cifs_mount. Break it out into a separate function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-26 18:55:28 +00:00
Jeff Layton	a5fc4ce018	cifs: track local_nls in volume info Add a local_nls field to the smb_vol struct and keep a pointer to the local_nls in it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-26 18:54:54 +00:00
Dave Chinner	dd77ef924c	xfs: more swap extent fixes for dynamic fork offsets A new xfsqa test (226) with a prototype xfs_fsr change to try to handle dynamic fork offsets better triggers an assertion failure where the inode data fork is in btree format, yet there is room in the inode for it to be in extent format. The two inodes look like: before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96 before: ino 0x115 (temp), num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56 after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96 after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56 Basically the target inode ends up with 5 extents in btree format, but it had space for 6 extents in extent format, so ends up incorrect. Notably here the broot size is the same, and that is where the kernel code is going wrong - the btree root will fit, so it lets the swap go ahead. The check should not allow the swap to take place if the number of extents while in btree format is less than the number of extents that can fit in the inode in extent format. Adding that check will prevent this swap and corruption from occurring. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-04-26 12:38:51 -05:00
Jens Axboe	e6d086d83c	btrfs: convert to using bdi_setup_and_register() It's now a provided helper, so get rid of the internal setup and btrfs atomic_t bdi enumerator. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-26 10:27:54 +02:00
Linus Torvalds	202f2bb070	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Issue the discard operation before releasing the blocks to be reused ext4: Fix buffer head leaks after calls to ext4_get_inode_loc() ext4: Fix possible lost inode write in no journal mode	2010-04-25 10:01:51 -07:00
Jörn Engel	5129a469a9	Catch filesystems lacking s_bdi noop_backing_dev_info is used only as a flag to mark filesystems that don't have any backing store, like tmpfs, procfs, spufs, etc. Signed-off-by: Joern Engel <joern@logfs.org> Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes to the noop_backing_dev_info is not legal and will not result in them being flushed, but we already catch this condition in __mark_inode_dirty() when checking for a registered bdi. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-25 08:54:42 +02:00
Phillip Lougher	e0d1f70010	squashfs: fix potential buffer over-run on 4K block file systems Sizing the buffer based on block size is incorrect, leading to a potential buffer over-run on 4K block size file systems (because the metadata block size is always 8K). This bug doesn't seem have triggered because 4K block size file systems are not default, and also because metadata blocks after compression tend to be less than 4K. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-04-25 02:09:05 +01:00
Phillip Lougher	370ec3d1ed	squashfs: add missing buffer free Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-04-25 02:09:05 +01:00
Phillip Lougher	1cb08e9738	squashfs: fix warn_on when root inode is corrupted Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where the root inode has been corrupted. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Reported-by: Steve Grubb <sgrubb@redhat.com>	2010-04-25 01:49:17 +01:00
Anton Blanchard	b8af67e268	fs/block_dev.c: fix performance regression in O_DIRECT\|O_SYNC writes to block devices We are seeing a large regression in database performance on recent kernels. The database opens a block device with O_DIRECT\|O_SYNC and a number of threads write to different regions of the file at the same time. A simple test case is below. I haven't defined DEVICE since getting it wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we see about 17MB/sec and only a few threads in IO wait: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 3 0 16170 656 2259 0 0 86 14 0 0 2 0 16704 695 2408 0 0 92 8 0 0 2 0 17308 744 2653 0 0 86 14 0 0 2 0 17933 759 2777 0 0 89 10 0 Most threads are blocking in vfs_fsync_range, which has: mutex_lock(&mapping->host->i_mutex); err = fop->fsync(file, dentry, datasync); if (!ret) ret = err; mutex_unlock(&mapping->host->i_mutex); commit `148f948ba8` (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers some explanation of what is going on: Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. Thanks Jan for such a good commit message! As well as dropping i_mutex, Christoph suggests we should remove the call to sync_blockdev(): > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on > the block device inode, which is exactly what we did just before calling > into ->fsync The patch below incorporates both suggestions. With it the testcase improves from 17MB/s to 68M/sec: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 7 0 65536 1000 3878 0 0 70 30 0 0 34 0 69632 1016 3921 0 1 46 53 0 0 57 0 69632 1000 3921 0 0 55 45 0 0 53 0 69640 754 4111 0 0 81 19 0 Testcase: #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define NR_THREADS 64 #define BUFSIZE (64 * 1024) #define DEVICE "/dev/mapper/XXXXXX" #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1)) static int fd; static void doit(void arg) { unsigned long offset = (long)arg; char b, buf; b = malloc(BUFSIZE + 1024); buf = (char )ALIGN((unsigned long)b, 1024); memset(buf, 0, BUFSIZE); while (1) pwrite(fd, buf, BUFSIZE, offset); } int main(int argc, char argv[]) { int flags = O_RDWR\|O_DIRECT; int i; unsigned long offset = 0; if (argc > 1 && !strcmp(argv[1], "O_SYNC")) flags \|= O_SYNC; fd = open(DEVICE, flags); if (fd == -1) { perror("open"); exit(1); } for (i = 0; i < NR_THREADS-1; i++) { pthread_t tid; pthread_create(&tid, NULL, doit, (void )offset); offset += BUFSIZE; } doit((void )offset); return 0; } Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:26 -07:00
Jeff Mahoney	fb2162df74	reiserfs: fix corruption during shrinking of xattrs Commit `48b32a3553` ("reiserfs: use generic xattr handlers") introduced a problem that causes corruption when extended attributes are replaced with a smaller value. The issue is that the reiserfs_setattr to shrink the xattr file was moved from before the write to after the write. The root issue has always been in the reiserfs xattr code, but was papered over by the fact that in the shrink case, the file would just be expanded again while the xattr was written. The end result is that the last 8 bytes of xattr data are lost. This patch fixes it to use new_size. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jethro Beekman <kernel@jbeekman.nl> Cc: Greg Surbey <gregsurbey@hotmail.com> Cc: Marco Gatti <marco.gatti@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:24 -07:00
Jeff Mahoney	cac36f7071	reiserfs: fix permissions on .reiserfs_priv Commit `677c9b2e39` ("reiserfs: remove privroot hiding in lookup") removed the magic from the lookup code to hide the .reiserfs_priv directory since it was getting loaded at mount-time instead. The intent was that the entry would be hidden from the user via a poisoned d_compare, but this was faulty. This introduced a security issue where unprivileged users could access and modify extended attributes or ACLs belonging to other users, including root. This patch resolves the issue by properly hiding .reiserfs_priv. This was the intent of the xattr poisoning code, but it appears to have never worked as expected. This is fixed by using d_revalidate instead of d_compare. This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way. The effort involved in working out the corner cases wrt permissions and caching outweigh the benefit of the feature. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Reported-by: Matt McCutchen <matt@mattmccutchen.net> Tested-by: Matt McCutchen <matt@mattmccutchen.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:24 -07:00
Joel Becker	a36d515c7a	ocfs2_dlmfs: Fix math error when reading LVB. When asked for a partial read of the LVB in a dlmfs file, we can accidentally calculate a negative count. Reported-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 15:24:59 -07:00
Tao Ma	c21a534e2f	ocfs2: Update VFS inode's id info after reflink. In reflink we update the id info on the disk but forgot to update the corresponding information in the VFS inode. Update them accordingly when we want to preserve the attributes. Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 14:43:22 -07:00
Dan Carpenter	0350cb078f	ocfs2: potential ERR_PTR dereference on error paths If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 14:42:06 -07:00
Mark Fasheh	a9743fcdc0	ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod() If we get a failure during creation of an inode we'll allow the orphan code to remove the inode, which is correct. However, we need to ensure that we don't get any errors after the call to ocfs2_add_entry(), otherwise we could leave a dangling directory reference. The solution is simple - in both cases, all I had to do was move ocfs2_dentry_attach_lock() above the ocfs2_add_entry() call. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:42:22 -07:00
Li Dongyang	062d340384	ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod, so we can kill the inode in case of error. [ Fixed up comment style -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:05:00 -07:00
Li Dongyang	ab41fdc8fd	ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR when we get an error after allocating one, so that we can kill the inode. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:05:00 -07:00
Li Dongyang	d4cd1871cf	ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call iput with the inode we failed with, but the inode wipe code will complain because we don't add the inode to orphan dir. One solution would be to lock the orphan dir during the entire transaction, but that's too heavy for a rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which tells the inode wipe code that it won't find this inode in the orphan dir. [ Merge fixes and comment style cleanups -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:03:49 -07:00
Josef Bacik	3a3076f4d6	Cleanup generic block based fiemap This cleans up a few of the complaints of __generic_block_fiemap. I've fixed all the typing stuff, used inline functions instead of macros, gotten rid of a couple of variables, and made sure the size and block requests are all block aligned. It also fixes a problem where sometimes FIEMAP_EXTENT_LAST wasn't being set properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-23 10:39:48 -07:00
Phillip Lougher	792590c723	squashfs: fix locking bug in zlib wrapper Fix locking bug in zlib wrapper introduced by recent decompressor changes. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-04-23 02:54:54 +01:00
Jiri Kosina	6c9468e9eb	Merge branch 'master' into for-next	2010-04-23 02:08:44 +02:00
Trond Myklebust	71d0a6112a	NFS: Fix an unstable write data integrity race Commit `2c61be0a94` (NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible) exposed a race on file close. In order to ensure correct close-to-open behaviour, we want to wait for all outstanding background commit operations to complete. This patch adds an inode flag that indicates if a commit operation is under way, and provides a mechanism to allow ->write_inode() to wait for its completion if this is a data integrity flush. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:57 -04:00
Dan Carpenter	cdd29ecfcb	nfs: testing for null instead of ERR_PTR() nfs_path() returns an ERR_PTR(), it doesn't return null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:56 -04:00
Chuck Lever	356e76b855	NFS: rsize and wsize settings ignored on v4 mounts NFSv4 mounts ignore the rsize and wsize mount options, and always use the default transfer size for both. This seems to be because all NFSv4 mounts are now cloned, and the cloning logic doesn't copy the rsize and wsize settings from the parent nfs_server. I tested Fedora's 2.6.32.11-99 and it seems to have this problem as well, so I'm guessing that .33, .32, and perhaps older kernels have this issue as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Stable <stable@kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:56 -04:00
Trond Myklebust	1f063d2cdf	NFSv4: Don't attempt an atomic open if the file is a mountpoint Fix https://bugzilla.kernel.org/show_bug.cgi?id=15789 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-04-22 15:35:55 -04:00
Steve French	fa588e0c57	[CIFS] Allow null nd (as nfs server uses) on create While creating a file on a server which supports unix extensions such as Samba, if a file is being created which does not supply nameidata (i.e. nd is null), cifs client can oops when calling cifs_posix_open. Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-22 19:21:55 +00:00
Dan Carpenter	d03859a4ac	nfsd: potential ERR_PTR dereference on exp_export() error paths. We "goto finish" from several places where "exp" is an ERR_PTR. Also I changed the check for "fsid_key" so that it was consistent with the check I added. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-22 12:03:02 -04:00
J. Bruce Fields	5771635592	nfsd4: complete enforcement of 4.1 op ordering Enforce the rules about compound op ordering. Motivated by implementing RECLAIM_COMPLETE, for which the client is implicit in the current session, so it is important to ensure a succesful SEQUENCE proceeds the RECLAIM_COMPLETE. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-22 11:35:14 -04:00
J. Bruce Fields	4b21d0defc	nfsd4: allow 4.0 clients to change callback path The rfc allows a client to change the callback parameters, but we didn't previously implement it. Teach the callbacks to rerun themselves (by placing themselves on a workqueue) when they recognize that their rpc task has been killed and that the callback connection has changed. Then we can change the callback connection by setting up a new rpc client, modifying the nfs4 client to point at it, waiting for any work in progress to complete, and then shutting down the old client. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-22 11:34:02 -04:00
J. Bruce Fields	2bf23875f5	nfsd4: rearrange cb data structures Mainly I just want to separate the arguments used for setting up the tcp client from the rest. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-22 11:34:02 -04:00
J. Bruce Fields	b12a05cbdf	nfsd4: cl_count is unused Now that the shutdown sequence guarantees callbacks are shut down before the client is destroyed, we no longer have a use for cl_count. We'll probably reinstate a reference count on the client some day, but it will be held by users other than callbacks. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-22 11:34:02 -04:00
J. Bruce Fields	b5a1a81e5c	nfsd4: don't sleep in lease-break callback The NFSv4 server's fl_break callback can sleep (dropping the BKL), in order to allocate a new rpc task to send a recall to the client. As far as I can tell this doesn't cause any races in the current code, but the analysis is difficult. Also, the sleep here may complicate the move away from the BKL. So, just schedule some work to do the job for us instead. The work will later also prove useful for restarting a call after the callback information is changed. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-22 11:34:01 -04:00
Jens Axboe	424264b7b2	smbfs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:37:07 +02:00
Jens Axboe	f1970c73cb	ncpfs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:31:11 +02:00
Jens Axboe	b3d0ab7e60	exofs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:26:04 +02:00
Jens Axboe	9df9c8b930	ecryptfs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:22:04 +02:00
Jens Axboe	5163d90076	coda: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:12:40 +02:00
Jens Axboe	8044f7f468	cifs: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 12:09:48 +02:00
Jens Axboe	e1da022275	afs: add bdi backing to mount session. This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 11:58:18 +02:00
Jens Axboe	0ed07ddb56	9p: add bdi backing to mount session This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-22 11:42:00 +02:00
Eric Dumazet	989a297920	fasync: RCU and fine grained locking kill_fasync() uses a central rwlock, candidate for RCU conversion, to avoid cache line ping pongs on SMP. fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short section instead during whole list scan. Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and fasync_{remove\|add}_entry(). This spinlock is IRQ safe, so sock_fasync() doesnt need its own implementation and can use fasync_helper(), to reduce code size and complexity. We can remove __kill_fasync() direct use in net/socket.c, and rename it to kill_fasync_rcu(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-21 16:19:29 -07:00
Pavel Shilovsky	2c964d1f7c	[CIFS] Fix losing locks during fork() When process does fork() private_data of files with lock list stays the same for file descriptors of the parent and of the child. While finishing the child closes files and deletes locks from the list even if unlocking fails. When the child process finishes the parent doesn't have lock in lock list and can't unlock previously before fork() locked region after the child process finished. This patch provides behaviour to save locks in lock list if unlocking fails. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-21 19:44:24 +00:00
Linus Torvalds	1ef6ce7a34	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68knommu: allow 4 coldfire serial ports m68knommu: fix coldfire tcdrain m68knommu: remove a duplicate vector setting line for 68360 Fix m68k-uclinux's rt_sigreturn trampoline m68knommu: correct the CC flags for Coldfire M5272 targets uclinux: error message when FLAT reloc symbol is invalid, v2	2010-04-21 12:33:12 -07:00
Linus Torvalds	255f41c595	Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs: [LogFS] Split large truncated into smaller chunks [LogFS] Set s_bdi [LogFS] Prevent mempool_destroy NULL pointer dereference [LogFS] Move assertion [LogFS] Plug 8 byte information leak [LogFS] Prevent memory corruption on large deletes [LogFS] Remove unused method Fix trivial conflict with added header includes in fs/logfs/super.c	2010-04-21 12:31:12 -07:00
Linus Torvalds	9befb55ef5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: add jfs specific ->setattr call jfs: fix diAllocExt error in resizing filesystem jfs_dmap.[ch]: trivial typo fix: s/heigth/height/g	2010-04-21 12:30:07 -07:00
David Howells	083fd8b21a	AFS: Don't pass error value to page_cache_release() in error handling In the error handling in afs_mntpt_do_automount(), we pass an error pointer to page_cache_release() if read_mapping_page() failed. Instead, we should extend the gotos around the error handling we don't need. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-21 12:27:43 -07:00
Steve French	f19159dc5a	[CIFS] Cleanup various minor breakage in previous cFYI cleanup Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-21 04:12:10 +00:00
Joe Perches	b6b38f704a	[CIFS] Neaten cERROR and cFYI macros, reduce text space Neaten cERROR and cFYI macros, reduce text space ~2.5K Convert '__FILE__ ": " fmt' to '"%s: " fmt', __FILE__' to save text space Surround macros with do {} while Add parentheses to macros Make statement expression macro from macro with assign Remove now unnecessary parentheses from cFYI and cERROR uses defconfig with CIFS support old $ size fs/cifs/built-in.o text data bss dec hex filename 156012 1760 148 157920 268e0 fs/cifs/built-in.o defconfig with CIFS support old $ size fs/cifs/built-in.o text data bss dec hex filename 153508 1760 148 155416 25f18 fs/cifs/built-in.o allyesconfig old: $ size fs/cifs/built-in.o text data bss dec hex filename 309138 3864 74824 387826 5eaf2 fs/cifs/built-in.o allyesconfig new $ size fs/cifs/built-in.o text data bss dec hex filename 305655 3864 74824 384343 5dd57 fs/cifs/built-in.o Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-21 03:50:45 +00:00
Jun Sun	d7dfee3f5d	uclinux: error message when FLAT reloc symbol is invalid, v2 This patch fixes a cosmetic error in printk. Text segment and data/bss segment are allocated from two different areas. It is not meaningful to give the diff between them in the error reporting messages. Signed-off-by: Jun Sun <jsun@junsun.net> Signed-off-by: Greg Ungerer <gerg@uclinux.org>	2010-04-21 13:28:49 +10:00
Nick Piggin	315e995c63	[CIFS] use add_to_page_cache_lru add_to_page_cache_lru is exported, so it should be used. Benefits over using a private pagevec: neater code, 128 bytes fewer stack used, percpu lru ordering is preserved, and finally don't need to flush pagevec before returning so batching may be shared with other LRU insertions. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-04-21 03:18:28 +00:00
Theodore Ts'o	b90f687018	ext4: Issue the discard operation before releasing the blocks to be reused Otherwise, we can end up having data corruption because the blocks could get reused and then discarded! https://bugzilla.kernel.org/show_bug.cgi?id=15579 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-04-20 16:51:59 -04:00
Joern Engel	b6349ac89e	[LogFS] Split large truncated into smaller chunks Truncate would do an almost limitless amount of work without invoking the garbage collector in between. Split it up into more manageable, though still large, chunks. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-20 21:44:10 +02:00
Linus Torvalds	05ce7bfe54	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Convert __DQUOT_PARANOIA symbol to standard config option	2010-04-20 09:39:40 -07:00
Jan Kara	62af9b5205	quota: Convert __DQUOT_PARANOIA symbol to standard config option Make __DQUOT_PARANOIA define from the old days a standard config option and turn it off by default. This gets rid of a quota warning about writes before quota is turned on for systems with ext4 root filesystem. Currently there's no way to legally solve this because /etc/mtab has to be written before quota is turned on on most systems. Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-20 18:25:25 +02:00
Linus Torvalds	9b030e2006	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Turn lower lookup error messages into debug messages eCryptfs: Copy lower directory inode times and size on link ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode ecryptfs: fix error code for missing xattrs in lower fs eCryptfs: Decrypt symlink target for stat size eCryptfs: Strip metadata in xattr flag in encrypted view eCryptfs: Clear buffer before reading in metadata xattr eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front eCryptfs: Fix metadata in xattr feature regression	2010-04-19 14:20:32 -07:00
Tyler Hicks	9f37622f89	eCryptfs: Turn lower lookup error messages into debug messages Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted file name have caused many users to become concerned about their data. Since this is a rather harmless condition, I'm moving this warning to only be printed when the ecryptfs_verbosity module param is 1. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:18 -05:00
Tyler Hicks	3a8380c075	eCryptfs: Copy lower directory inode times and size on link The timestamps and size of a lower inode involved in a link() call was being copied to the upper parent inode. Instead, we should be copying lower parent inode's timestamps and size to the upper parent inode. I discovered this bug using the POSIX test suite at Tuxera. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:15 -05:00
Jeff Mahoney	133b8f9d63	ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode Since tmpfs has no persistent storage, it pins all its dentries in memory so they have d_count=1 when other file systems would have d_count=0. ->lookup is only used to create new dentries. If the caller doesn't instantiate it, it's freed immediately at dput(). ->readdir reads directly from the dcache and depends on the dentries being hashed. When an ecryptfs mount is mounted, it associates the lower file and dentry with the ecryptfs files as they're accessed. When it's umounted and destroys all the in-memory ecryptfs inodes, it fput's the lower_files and d_drop's the lower_dentries. Commit `4981e081` added this and a d_delete in 2008 and several months later commit `caeeeecf` removed the d_delete. I believe the d_drop() needs to be removed as well. The d_drop effectively hides any file that has been accessed via ecryptfs from the underlying tmpfs since it depends on it being hashed for it to be accessible. I've removed the d_drop on my development node and see no ill effects with basic testing on both tmpfs and persistent storage. As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs BUGs on umount. This is due to the dentries being unhashed. tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop the reference pinning the dentry. It skips unhashed and negative dentries, but shrink_dcache_for_umount_subtree doesn't. Since those dentries still have an elevated d_count, we get a BUG(). This patch removes the d_drop call and fixes both issues. This issue was reported at: https://bugzilla.novell.com/show_bug.cgi?id=567887 Reported-by: Árpád Bíró <biroa@demasz.hu> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:13 -05:00
Christian Pulvermacher	cfce08c6bd	ecryptfs: fix error code for missing xattrs in lower fs If the lower file system driver has extended attributes disabled, ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP. This breaks execution of programs in the ecryptfs mount, since the kernel expects the latter error when checking for security capabilities in xattrs. Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:42:09 -05:00
Tyler Hicks	3a60a1686f	eCryptfs: Decrypt symlink target for stat size Create a getattr handler for eCryptfs symlinks that is capable of reading the lower target and decrypting its path. Prior to this patch, a stat's st_size field would represent the strlen of the encrypted path, while readlink() would return the strlen of the decrypted path. This could lead to confusion in some userspace applications, since the two values should be equal. https://bugs.launchpad.net/bugs/524919 Reported-by: Loïc Minier <loic.minier@canonical.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-04-19 14:41:51 -05:00
J. Bruce Fields	3c4ab2aaa9	nfsd4: indentation cleanup Looks like a put-and-paste mistake. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-19 15:12:51 -04:00
Joern Engel	b8639077ab	[LogFS] Set s_bdi Since `32a88aa1` sync() was turned into a NOP for logfs. Worse, sync() would not return an error, giving the illusion that writeout had actually happened. Afaics jffs2 was broken as well. Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-17 19:54:27 +02:00
J. Bruce Fields	408b79bcc3	nfsd4: consistent session flag setting We should clear these flags on any new create_session, not just on the first one. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-04-16 21:47:37 -04:00
Dave Chinner	f1d486a361	xfs: don't warn on EAGAIN in inode reclaim Any inode reclaim flush that returns EAGAIN will result in the inode reclaim being attempted again later. There is no need to issue a warning into the logs about this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-04-16 13:51:44 -05:00
Dave Chinner	b6f8dd49db	xfs: ensure that sync updates the log tail correctly Updates to the VFS layer removed an extra ->sync_fs call into the filesystem during the sync process (from the quota code). Unfortunately the sync code was unknowingly relying on this call to make sure metadata buffers were flushed via a xfs_buftarg_flush() call to move the tail of the log forward in memory before the final transactions of the sync process were issued. As a result, the old code would write a very recent log tail value to the log by the end of the sync process, and so a subsequent crash would leave nothing for log recovery to do. Hence in qa test 182, log recovery only replayed a small handle for inode fsync transactions in this case. However, with the removal of the extra ->sync_fs call, the log tail was now not moved forward with the inode fsync transactions near the end of the sync procese the first (and only) buftarg flush occurred after these transactions went to disk. The result is that log recovery now sees a large number of transactions for metadata that is already on disk. This usually isn't a problem, but when the transactions include inode chunk allocation, the inode create transactions and all subsequent changes are replayed as we cannt rely on what is on disk is valid. As a result, if the inode was written and contains unlogged changes, the unlogged changes are lost, thereby violating sync semantics. The fix is to always issue a transaction after the buftarg flush occurs is the log iѕ not idle or covered. This results in a dummy transaction being written that contains the up-to-date log tail value, which will be very recent. Indeed, it will be at least as recent as the old code would have left on disk, so log recovery will behave exactly as it used to in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-04-16 13:51:23 -05:00
Dmitry Monakhov	c7f2e1f0ac	jfs: add jfs specific ->setattr call generic setattr not longer responsible for quota transfer. use jfs_setattr for all jfs's inodes. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2010-04-16 08:05:50 -05:00
Bill Pemberton	2b0b39517d	jfs: fix diAllocExt error in resizing filesystem Resizing the filesystem would result in an diAllocExt error in some instances because changes in bmp->db_agsize would not get noticed if goto extendBmap was called. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: jfs-discussion@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org	2010-04-16 08:01:20 -05:00
Tao Ma	79681842e1	ocfs2: Reset status if we want to restart file extension. In __ocfs2_extend_allocation, we will restart our file extension if ((!status) && restart_func). But there is a bug that the status is still left as -EGAIN. This is really an old bug, but it is masked by the return value of ocfs2_journal_dirty. So it show up when we make ocfs2_journal_dirty void. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-16 03:10:54 -07:00
Joern Engel	1f1b0008e8	[LogFS] Prevent mempool_destroy NULL pointer dereference It would probably be better to just accept NULL pointers in mempool_destroy(). But for the current -rc series let's keep things simple. This patch was lost in the cracks for a while. Kevin Cernekee <cernekee@gmail.com> had to rediscover the problem and send a similar patch because of it. :( Signed-off-by: Joern Engel <joern@logfs.org>	2010-04-15 08:03:57 +02:00
Linus Torvalds	96e35b40c0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: use separate class for ceph sockets' sk_lock ceph: reserve one more caps space when doing readdir ceph: queue_cap_snap should always queue dirty context ceph: fix dentry reference leak in dcache readdir ceph: decode v5 of osdmap (pool names) [protocol change] ceph: fix ack counter reset on connection reset ceph: fix leaked inode ref due to snap metadata writeback race ceph: fix snap context reference leaks ceph: allow writeback of snapped pages older than 'oldest' snapc ceph: fix dentry rehashing on virtual .snap dir	2010-04-14 18:45:31 -07:00
Bob Peterson	1a0eae8848	GFS2: glock livelock This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-04-14 16:48:05 +01:00
Linus Torvalds	0fdfe5ad28	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: fix delegated locking NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible NFS: Fix a race with the new commit code NFS: Ensure that writeback_single_inode() calls write_inode() when syncing NFS: Fix the mode calculation in nfs_find_open_context NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR	2010-04-13 15:10:16 -07:00

... 3 4 5 6 7 ...

18010 Commits