OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sage Weil	d8de9ab63a	ceph: avoid carrying Fw cap during write into page cache The generic_file_aio_write call may block on balance_dirty_pages while we flush data to the OSDs. If we hold a reference to the FILE_WR cap during that interval revocation by the MDS (e.g., to do a stat(2)) may be very slow. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:27:34 -07:00
Greg Farnum	8f04d42276	ceph: report f_bfree based on kb_avail rather than diffing. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2011-07-26 11:27:06 -07:00
Sage Weil	e77dc3e9c0	ceph: only queue capsnap if caps are dirty We used to go into this branch if i_wrbuffer_ref_head was non-zero. This was an ancient check from before we were careful about dealing with all kinds of caps (and not just dirty pages). It is cleaner to only queue a capsnap if there is an actual dirty cap. If we are racing with... something...we will end up here with ci->i_wrbuffer_refs but no dirty caps. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:41 -07:00
Sage Weil	af0ed569d7	ceph: fix snap writeback when racing with writes There are two problems that come up when we try to queue a capsnap while a write is in progress: - The FILE_WR cap is held, but not yet dirty, so we may queue a capsnap with dirty == 0. That will crash later in __ceph_flush_snaps(). Or on the FILE_WR cap if a write is in progress. - We may not have i_head_snapc set, which causes problems pretty quickly. Look to the snaprealm in this case. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:31 -07:00
Sage Weil	9cfa1098dc	ceph: use flag bit for at_end readdir flag This saves us a word of memory per file. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:18 -07:00
Sage Weil	4918b6d140	ceph: add F_SYNC file flag to force sync (non-O_DIRECT) io This allows us to force IO through the sync path which you normally only get when multiple clients are reading/writing to the same file or by mounting with -o sync. Among other things, this lets test programs verify correctness with a single mount. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:07 -07:00
Sage Weil	252c6728de	ceph: add flags field to file_info Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:25:27 -07:00
Linus Torvalds	1d87c28e68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: Cleanup: check return codes of crypto api calls CIFS: Fix oops while mounting with prefixpath [CIFS] Redundant null check after dereference cifs: use cifs_dirent in cifs_save_resume_key cifs: use cifs_dirent to replace cifs_get_name_from_search_buf cifs: introduce cifs_dirent cifs: cleanup cifs_filldir	2011-07-26 11:11:28 -07:00
Jeff Layton	c46c887744	vfs: document locking requirements for d_move, __d_move and d_materialise_unique Adding a comment to d_materialise_unique per Al's request... d_move and __d_move have some pretty substantial locking requirements, but they are not clearly documented. Add some comments spelling them out. Also, document the requirement for the i_mutex of the parent in d_materialise_unique. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:41:14 -04:00
Linus Torvalds	f01ef569cd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits) mm: properly reflect task dirty limits in dirty_exceeded logic writeback: don't busy retry writeback on new/freeing inodes writeback: scale IO chunk size up to half device bandwidth writeback: trace global_dirty_state writeback: introduce max-pause and pass-good dirty limits writeback: introduce smoothed global dirty limit writeback: consolidate variable names in balance_dirty_pages() writeback: show bdi write bandwidth in debugfs writeback: bdi write bandwidth estimation writeback: account per-bdi accumulated written pages writeback: make writeback_control.nr_to_write straight writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr() writeback: trace event writeback_queue_io writeback: trace event writeback_single_inode writeback: remove .nonblocking and .encountered_congestion writeback: remove writeback_control.more_io writeback: skip balance_dirty_pages() for in-memory fs writeback: add bdi_dirty_limit() kernel-doc writeback: avoid extra sync work at enqueue time writeback: elevate queue_io() into wb_writeback() ... Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c	2011-07-26 10:39:54 -07:00
Al Viro	41c96486f2	omfs: fix (mode & S_IFDIR) abuse granted, on a filesystem that has only regular files and directories it happens to work, but really should be S_ISDIR(mode)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:05:28 -04:00
Al Viro	569254b0cc	btrfs: S_ISREG(mode) is not mode & S_IFREG... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:05:05 -04:00
Al Viro	61effb519c	jffs2: S_ISLNK(mode & S_IFMT) is pointless it's S_ISLNK(mode), TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:00:35 -04:00
Al Viro	24a01d4ee4	v9fs_iop_get_acl: get rid of unused variable Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:42 -04:00
Eric Dumazet	a209dfc7b0	vfs: dont chain pipe/anon/socket on superblock s_inodes list Workloads using pipes and sockets hit inode_sb_list_lock contention. superblock s_inodes list is needed for quota, dirty, pagecache and fsnotify management. pipe/anon/socket fs are clearly not candidates for these. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:09 -04:00
Dan Carpenter	bacb2d816c	fs: add missing unlock in default_llseek() A recent change in linux-next, `982d816581` "fs: add SEEK_HOLE and SEEK_DATA flags" added some direct returns on error, but it should have been a goto out. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:09 -04:00
Jan Kara	2d859db3e4	ext4: fix data corruption in inodes with journalled data When journalling data for an inode (either because it is a symlink or because the filesystem is mounted in data=journal mode), ext4_evict_inode() can discard unwritten data by calling truncate_inode_pages(). This is because we don't mark the buffer / page dirty when journalling data but only add the buffer to the running transaction and thus mm does not know there are still unwritten data. Fix the problem by carefully tracking transaction containing inode's data, committing this transaction, and writing uncheckpointed buffers when inode should be reaped. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 09:07:11 -04:00
Steven Whitehouse	1923703991	GFS2: Fix mount hang caused by certain access pattern to sysfs files Depending upon the order of userspace/kernel during the mount process, this can result in a hang without the _all version of the completion. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-26 10:18:37 +01:00
Linus Torvalds	e08dc1325f	p9: avoid unused variable warning Commit `4e34e719e4` ("fs: take the ACL checks to common code") removed the use of the 'acl' variable in v9fs_iop_get_acl(), but left the variable definition around. Remove it to get rid of the warning: fs/9p/acl.c: In function ‘v9fs_iop_get_acl’: fs/9p/acl.c:101:20: warning: unused variable ‘acl’ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 23:43:53 -07:00
Linus Torvalds	91d44d9999	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: Make ZLIB compression support optional Squashfs: Update documentation for XZ and add squashfs-tools devel tree	2011-07-25 22:50:35 -07:00
Linus Torvalds	2dad3206db	Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linux * 'for-3.1' of git://linux-nfs.org/~bfields/linux: nfsd: don't break lease on CLAIM_DELEGATE_CUR locks: rename lock-manager ops nfsd4: update nfsv4.1 implementation notes nfsd: turn on reply cache for NFSv4 nfsd4: call nfsd4_release_compoundargs from pc_release nfsd41: Deny new lock before RECLAIM_COMPLETE done fs: locks: remove init_once nfsd41: check the size of request nfsd41: error out when client sets maxreq_sz or maxresp_sz too small nfsd4: fix file leak on open_downgrade nfsd4: remember to put RW access on stateid destruction NFSD: Added TEST_STATEID operation NFSD: added FREE_STATEID operation svcrpc: fix list-corrupting race on nfsd shutdown rpc: allow autoloading of gss mechanisms svcauth_unix.c: quiet sparse noise svcsock.c: include sunrpc.h to quiet sparse noise nfsd: Remove deprecated nfsctl system call and related code. NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND Fix up trivial conflicts in Documentation/feature-removal-schedule.txt	2011-07-25 22:49:19 -07:00
Linus Torvalds	84635d68be	vfs: fix check_acl compile error when CONFIG_FS_POSIX_ACL is not set Commit `e77819e57f` ("vfs: move ACL cache lookup into generic code") didn't take the FS_POSIX_ACL config variable into account - when that is not set, ACL's go away, and the cache helper functions do not exist, causing compile errors like fs/namei.c: In function 'check_acl': fs/namei.c:191:10: error: implicit declaration of function 'negative_cached_acl' fs/namei.c:196:2: error: implicit declaration of function 'get_cached_acl' fs/namei.c:196:6: warning: assignment makes pointer from integer without a cast fs/namei.c:212:11: error: implicit declaration of function 'set_cached_acl' Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 22:47:03 -07:00
Linus Torvalds	45b583b10a	Merge 'akpm' patch series * Merge akpm patch series: (122 commits) drivers/connector/cn_proc.c: remove unused local Documentation/SubmitChecklist: add RCU debug config options reiserfs: use hweight_long() reiserfs: use proper little-endian bitops pnpacpi: register disabled resources drivers/rtc/rtc-tegra.c: properly initialize spinlock drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time() drivers/rtc: add support for Qualcomm PMIC8xxx RTC drivers/rtc/rtc-s3c.c: support clock gating drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200 init: skip calibration delay if previously done misc/eeprom: add eeprom access driver for digsy_mtc board misc/eeprom: add driver for microwire 93xx46 EEPROMs checkpatch.pl: update $logFunctions checkpatch: make utf-8 test --strict checkpatch.pl: add ability to ignore various messages checkpatch: add a "prefer __aligned" check checkpatch: validate signature styles and To: and Cc: lines checkpatch: add __rcu as a sparse modifier checkpatch: suggest using min_t or max_t ... Did this as a merge because of (trivial) conflicts in - Documentation/feature-removal-schedule.txt - arch/xtensa/include/asm/uaccess.h that were just easier to fix up in the merge than in the patch series.	2011-07-25 21:00:19 -07:00
Akinobu Mita	9d6bf5aa17	reiserfs: use hweight_long() Use hweight_long() to count free bits in the bitmap. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:17 -07:00
Akinobu Mita	0c2fd1bfb1	reiserfs: use proper little-endian bitops Using __test_and_{set,clear}_bit_le() with ignoring its return value can be replaced with __{set,clear}_bit_le(). This introduces reiserfs_{set,clear}_le_bit for __{set,clear}_bit_le and does the above change with them. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:17 -07:00
Hugh Dickins	708e3508c2	tmpfs: clone shmem_file_splice_read() Copy __generic_file_splice_read() and generic_file_splice_read() from fs/splice.c to shmem_file_splice_read() in mm/shmem.c. Make page_cache_pipe_buf_ops and spd_release_page() accessible to it. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:11 -07:00
David Rientjes	be8f684d73	oom: make deprecated use of oom_adj more verbose /proc/pid/oom_adj is deprecated and scheduled for removal in August 2012 according to Documentation/feature-removal-schedule.txt. This patch makes the warning more verbose by making it appear as a more serious problem (the presence of a stack trace and being multiline should attract more attention) so that applications still using the old interface can get fixed. Very popular users of the old interface have been converted since the oom killer rewrite has been introduced. udevd switched to the /proc/pid/oom_score_adj interface for v162, kde switched in 4.6.1, and opensshd switched in 5.7p1. At the start of 2012, this should be changed into a WARN() to emit all such incidents and then finally remove the tunable in August 2012 as scheduled. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:09 -07:00
Becky Bruce	2b37c35e65	fs/hugetlbfs/inode.c: fix pgoff alignment checking on 32-bit This: vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT) is incorrect on 32-bit. It causes us to & the pgoff with something that looks like this (for a 4m hugepage): 0xfff003ff. The mask should be flipped and then shifted, to give you 0x0000_03fff. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:07 -07:00
Linus Torvalds	14067ff536	vfs: make gcc generate more obvious code for acl permission checking The "fsuid is the inode owner" case is not necessarily always the likely case, but it's the case that doesn't do anything odd and that we want in straight-line code. Make gcc not generate random "jump around for the fun of it" code. This just helps me read profiles. That thing is one of the hottest parts of the whole pathname lookup. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 19:55:52 -07:00
Shirish Pargaonkar	14cae3243b	cifs: Cleanup: check return codes of crypto api calls Check return codes of crypto api calls and either log an error or log an error and return from the calling function with error. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 22:12:10 +00:00
Pavel Shilovsky	f5bc1e755d	CIFS: Fix oops while mounting with prefixpath commit `fec11dd9a0` caused a regression when we have already mounted //server/share/a and want to mount //server/share/a/b. The problem is that lookup_one_len calls __lookup_hash with nd pointer as NULL. Then __lookup_hash calls do_revalidate in the case when dentry exists and we end up with NULL pointer deference in cifs_d_revalidate: if (nd->flags & LOOKUP_RCU) return -ECHILD; Fix this by checking nd for NULL. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 22:06:40 +00:00
Steve French	e010a5ef95	[CIFS] Redundant null check after dereference Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 22:04:32 +00:00
Christoph Hellwig	eaf35b1ea8	cifs: use cifs_dirent in cifs_save_resume_key Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:43:14 +00:00
Christoph Hellwig	f16d59b417	cifs: use cifs_dirent to replace cifs_get_name_from_search_buf This allows us to parse the on the wire structures only once in cifs_filldir. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:40:53 +00:00
Christoph Hellwig	cda0ec6a86	cifs: introduce cifs_dirent Introduce a generic directory entry structure, and factor the parsing of the various on the wire structures that can represent one into a common helper. Switch cifs_entry_is_dot over to use it as a start. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:36:44 +00:00
Mark Fasheh	38a1a91953	btrfs: don't BUG_ON allocation errors in btrfs_drop_snapshot In addition to properly handling allocation failure from btrfs_alloc_path, I also fixed up the kzalloc error handling code immediately below it. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-25 14:35:15 -07:00
Mark Fasheh	92b8e897f6	btrfs: Don't BUG_ON alloc_path errors in find_next_chunk I also removed the BUG_ON from error return of find_next_chunk in init_first_rw_device(). It turns out that the only caller of init_first_rw_device() also BUGS on any nonzero return so no actual behavior change has occurred here. do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk() which can now return -ENOMEM. Instead of setting space_info->full on any error from btrfs_alloc_chunk() I catch and return every error value _except_ -ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-25 14:34:54 -07:00
Christoph Hellwig	9feed6f8fb	cifs: cleanup cifs_filldir Use sensible variable names and formatting and remove some superflous checks on entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:05:10 +00:00
Linus Torvalds	d3ec4844d4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) fs: Merge split strings treewide: fix potentially dangerous trailing ';' in #defined values/expressions uwb: Fix misspelling of neighbourhood in comment net, netfilter: Remove redundant goto in ebt_ulog_packet trivial: don't touch files that are removed in the staging tree lib/vsprintf: replace link to Draft by final RFC number doc: Kconfig: `to be' -> `be' doc: Kconfig: Typo: square -> squared doc: Konfig: Documentation/power/{pm => apm-acpi}.txt drivers/net: static should be at beginning of declaration drivers/media: static should be at beginning of declaration drivers/i2c: static should be at beginning of declaration XTENSA: static should be at beginning of declaration SH: static should be at beginning of declaration MIPS: static should be at beginning of declaration ARM: static should be at beginning of declaration rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check Update my e-mail address PCIe ASPM: forcedly -> forcibly gma500: push through device driver tree ... Fix up trivial conflicts: - arch/arm/mach-ep93xx/dma-m2p.c (deleted) - drivers/gpio/gpio-ep93xx.c (renamed and context nearby) - drivers/net/r8169.c (just context changes)	2011-07-25 13:56:39 -07:00
Linus Torvalds	0003230e82	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: take the ACL checks to common code bury posix_acl_..._masq() variants kill boilerplates around posix_acl_create_masq() generic_acl: no need to clone acl just to push it to set_cached_acl() kill boilerplate around posix_acl_chmod_masq() reiserfs: cache negative ACLs for v1 stat format xfs: cache negative ACLs if there is no attribute fork 9p: do no return 0 from ->check_acl without actually checking vfs: move ACL cache lookup into generic code CIFS: Fix oops while mounting with prefixpath xfs: Fix wrong return value of xfs_file_aio_write fix devtmpfs race caam: don't pass bogus S_IFCHR to debugfs_create_...() get rid of create_proc_entry() abuses - proc_mkdir() is there for purpose asus-wmi: ->is_visible() can't return negative fix jffs2 ACLs on big-endian with 16bit mode_t 9p: close ACL leaks ocfs2_init_acl(): fix a leak VFS : mount lock scalability for internal mounts	2011-07-25 12:53:15 -07:00
Trond Myklebust	ed1e6211a0	NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation() nfs_mark_return_delegation() is usually called without any locking, and so it is not safe to dereference delegation->inode. Since the inode is only used to discover the nfs_client anyway, it makes more sense to have the callers pass a valid pointer to the nfs_server as a parameter. Reported-by: Ian Kent <raven@themaw.net> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-25 15:37:29 -04:00
Jeff Layton	73ca1001ed	nfs: don't use d_move in nfs_async_rename_done If the task that initiated the sillyrename ends up being killed by a fatal signal, then it will eventually return back to userspace and end up releasing the i_mutex. d_move however needs to be done while holding the i_mutex. Instead of using d_move here, just unhash the old and new dentries to prevent them from being found by lookups. With this change though, the dentries are now incorrect post-rename and do not reflect the actual name of the file on the server. I'm proceeding under the assumption that since they are unhashed that this isn't really a problem. In order for the sillydelete to still work though, the dname must be copied earlier when setting up the sillydelete info, and the name must be recopied if the sillydelete info has to be moved to a new dentry. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-25 15:00:21 -04:00
Stephen Rothwell	5f00bcb38e	Merge branch 'master' into devel and apply fixup from Stephen Rothwell: vfs/nfs: fixup for nfs_open_context change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-25 14:53:52 -04:00
Christoph Hellwig	4e34e719e4	fs: take the ACL checks to common code Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:30:23 -04:00
Al Viro	edde854e8b	bury posix_acl_..._masq() variants made static; no callers left outside of posix_acl.c. posix_acl_clone() also has lost all external callers and became static... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:32 -04:00
Al Viro	826cae2f2b	kill boilerplates around posix_acl_create_masq() new helper: posix_acl_create(&acl, gfp, mode_p). Replaces acl with modified clone, on failure releases acl and replaces with NULL. Returns 0 or -ve on error. All callers of posix_acl_create_masq() switched. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:32 -04:00
Al Viro	95203befa8	generic_acl: no need to clone acl just to push it to set_cached_acl() In-core acls are copy-on-write, so the reference taken by set_cached_acl() will do just fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:31 -04:00
Al Viro	bc26ab5f65	kill boilerplate around posix_acl_chmod_masq() new helper: posix_acl_chmod(&acl, gfp, mode). Replaces acl with modified clone or with NULL if that has failed; returns 0 or -ve on error. All callers of posix_acl_chmod_masq() switched to that - they'd been doing exactly the same thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:30 -04:00
Christoph Hellwig	4482a087d4	reiserfs: cache negative ACLs for v1 stat format Always set up a negative ACL cache entry if the inode can't have ACLs. That behaves much better than doing this check inside ->check_acl. Also remove the left over MAY_NOT_BLOCK check. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:25:38 -04:00
Christoph Hellwig	6311b10800	xfs: cache negative ACLs if there is no attribute fork Always set up a negative ACL cache entry if the inode doesn't have an attribute fork. That behaves much better than doing this check inside ->check_acl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:25:38 -04:00
Christoph Hellwig	ebbb0ef287	9p: do no return 0 from ->check_acl without actually checking If we do not want to use ACLs we at least need to perform normal Unix permission checks. From the comment I'm not quite sure that's what is intended, but if 0p wants to do permission checks entirely on the server it needs to do so in ->permission, not in ->check_acl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:25:38 -04:00
Linus Torvalds	e77819e57f	vfs: move ACL cache lookup into generic code This moves logic for checking the cached ACL values from low-level filesystems into generic code. The end result is a streamlined ACL check that doesn't need to load the inode->i_op->check_acl pointer at all for the common cached case. The filesystems also don't need to check for a non-blocking RCU walk case in their acl_check() functions, because that is all handled at a VFS layer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:23:39 -04:00
Pavel Shilovsky	3ca30d40a9	CIFS: Fix oops while mounting with prefixpath commit `fec11dd9a0` caused a regression when we have already mounted //server/share/a and want to mount //server/share/a/b. The problem is that lookup_one_len calls __lookup_hash with nd pointer as NULL. Then __lookup_hash calls do_revalidate in the case when dentry exists and we end up with NULL pointer deference in cifs_d_revalidate: if (nd->flags & LOOKUP_RCU) return -ECHILD; Fix this by checking nd for NULL. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:23:21 -04:00
Markus Trippelsdorf	340a0a01b9	xfs: Fix wrong return value of xfs_file_aio_write The fsync prototype change commit `02c24a8218` accidentally overwrote the ssize_t return value of xfs_file_aio_write with 0 for SYNC type writes. Fix this by checking if an error occured when calling xfs_file_fsync and only change the return value in this case. In addition xfs_file_fsync actually returns a normal negative error, so fix this, too. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:23:21 -04:00
Linus Torvalds	096a705bbc	Merge branch 'for-3.1/core' of git://git.kernel.dk/linux-block * 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits) block: strict rq_affinity backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu block: fix patch import error in max_discard_sectors check block: reorder request_queue to remove 64 bit alignment padding CFQ: add think time check for group CFQ: add think time check for service tree CFQ: move think time check variables to a separate struct fixlet: Remove fs_excl from struct task. cfq: Remove special treatment for metadata rqs. block: document blk_plug list access block: avoid building too big plug list compat_ioctl: fix make headers_check regression block: eliminate potential for infinite loop in blkdev_issue_discard compat_ioctl: fix warning caused by qemu block: flush MEDIA_CHANGE from drivers on close(2) blk-throttle: Make total_nr_queued unsigned block: Add __attribute__((format(printf...) and fix fallout fs/partitions/check.c: make local symbols static block:remove some spare spaces in genhd.c block:fix the comment error in blkdev.h ...	2011-07-25 10:33:36 -07:00
Dave Kleikamp	3c2c226285	jfs: clean up some compiler warnings jfs has a few variables being set but never used. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-07-25 11:01:12 -05:00
Al Viro	963945bf93	fix jffs2 ACLs on big-endian with 16bit mode_t casting int * to mode_t * is not a good thing - on a lot of big-endian architectures mode_t happens to be smaller than int and there it breaks quite spectaculary... Fucked-up-by: commit `cfc8dc6f6f` Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:12:01 -04:00
Al Viro	1ec95bf34d	9p: close ACL leaks Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:10:18 -04:00
Al Viro	c0d960f038	ocfs2_init_acl(): fix a leak Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:10:09 -04:00
Tim Chen	423e0ab086	VFS : mount lock scalability for internal mounts For a number of file systems that don't have a mount point (e.g. sockfs and pipefs), they are not marked as long term. Therefore in mntput_no_expire, all locks in vfs_mount lock are taken instead of just local cpu's lock to aggregate reference counts when we release reference to file objects. In fact, only local lock need to have been taken to update ref counts as these file systems are in no danger of going away until we are ready to unregister them. The attached patch marks file systems using kern_mount without mount point as long term. The contentions of vfs_mount lock is now eliminated. Before un-registering such file system, kern_unmount should be called to remove the long term flag and make the mount point ready to be freed. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:08:32 -04:00
Wu Fengguang	fcc5c22218	writeback: don't busy retry writeback on new/freeing inodes Fix a system hang bug introduced by commit `b7a2441f99` ("writeback: remove writeback_control.more_io") and `e8dfc3058` ("writeback: elevate queue_io() into wb_writeback()") easily reproducible with high memory pressure and lots of file creation/deletions, for example, a kernel build in limited memory. It hangs when some inode is in the I_NEW, I_FREEING or I_WILL_FREE state, the flusher will get stuck busy retrying that inode, never releasing wb->list_lock. The lock in turn blocks all kinds of other tasks when they are trying to grab it. As put by Jan, it's a safe change regarding data integrity. I_FREEING or I_WILL_FREE inodes are written back by iput_final() and it is reclaim code that is responsible for eventually removing them. So writeback code can safely ignore them. I_NEW inodes should move out of this state when they are fully set up and in the writeback round following that, we will consider them for writeback. So the change makes sense. CC: Jan Kara <jack@suse.cz> Reported-by: Hugh Dickins <hughd@google.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-24 10:46:51 +08:00
Robin Dong	b7ca1e8ec5	ext4: correct comment for ext4_ext_check_cache The comment for ext4_ext_check_cache has a litte mistake. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 21:53:25 -04:00
Robin Dong	0737964bc9	ext4: correct the debug message in ext4_ext_insert_extent The debug message in ext4_ext_insert_extent before moving extent is incorrect (the "from xx to xx"). Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 21:51:07 -04:00
Robin Dong	5718789da5	ext4: remove unused argument in ext4_ext_next_leaf_block The argument "inode" in function ext4_ext_next_allocated_block looks useless, so clean it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 21:49:07 -04:00
Tao Ma	6a0fe49308	ext4: remove ac_repeats from ext4_allocation_context ac_repeats isn't referenced in the mballoc code. So remove it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 16:18:55 -04:00
Tao Ma	ced156e464	ext4: don't increment s_mb_buddies_generated in ext4_mb_release In ext4_mb_release, we use s_mb_buddies_generated++. Although the output is OK, but I don't think we need this extra ++. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 16:18:05 -04:00
Tao Ma	529da704ad	ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy ext4_mb_load_buddy() calls ext4_get_group_info() for setting both "grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp". Reported-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 16:07:26 -04:00
Casey Bodley	0c12eaffdf	nfsd: don't break lease on CLAIM_DELEGATE_CUR CLAIM_DELEGATE_CUR is used in response to a broken lease; allowing it to break the lease and return EAGAIN leaves the client unable to make progress in returning the delegation nfs4_get_vfs_file() now takes struct nfsd4_open for access to the claim type, and calls nfsd_open() with NFSD_MAY_NOT_BREAK_LEASE when claim type is CLAIM_DELEGATE_CUR Signed-off-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-23 14:58:17 -04:00
Aneesh Kumar K.V	48e370ff93	fs/9p: add 9P2000.L unlinkat operation unlinkat - Remove a directory entry size[4] Tunlinkat tag[2] dirfid[4] name[s] flag[4] size[4] Runlinkat tag[2] older Tremove have the below request format size[4] Tremove tag[2] fid[4] The remove message is used to remove a directory entry either file or directory The remove opreation is actually a directory opertation and should ideally have dirfid, if not we cannot represent the fid on server with anything other than name. We will have to derive the directory name from fid in the Tremove request. NOTE: The operation doesn't clunk the unlink fid. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:52 -05:00
Aneesh Kumar K.V	9e8fb38e7d	fs/9p: add 9P2000.L renameat operation renameat - change name of file or directory size[4] Trenameat tag[2] olddirfid[4] oldname[s] newdirfid[4] newname[s] size[4] Rrenameat tag[2] older Trename have the below request format size[4] Trename tag[2] fid[4] newdirfid[4] name[s] The rename message is used to change the name of a file, possibly moving it to a new directory. The rename opreation is actually a directory opertation and should ideally have olddirfid, if not we cannot represent the fid on server with anything other than name. We will have to derive the old directory name from fid in the Trename request. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:51 -05:00
Aneesh Kumar K.V	ed80fcfac2	fs/9p: Always ask new inode in create This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:50 -05:00
Prem Karat	a2dd43bb0d	fs/9p: Fix invalid mount options/args Without this fix, if any invalid mount options/args are passed while mouting the 9p fs, no error (-EINVAL) is returned and default arg value is assigned. This fix returns -EINVAL when an invalid arguement is found while parsing mount options. Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V	fd2421f544	fs/9p: When doing inode lookup compare qid details and inode mode bits. This make sure we don't use wrong inode from the inode hash. The inode number of the file deleted is reused by the next file system object created and if we only use inode number for inode hash lookup we could end up with wrong struct inode. Also compare inode generation number. Not all Linux file system provide st_gen in userspace. So it could be 0; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V	2053d67c54	fs/9p: remove rename work around in 9p Now that VFS does the right thing remove the work around. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:47 -05:00
Linus Torvalds	bbd9d6f7fb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits) vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp isofs: Remove global fs lock jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. mm/truncate.c: fix build for CONFIG_BLOCK not enabled fs:update the NOTE of the file_operations structure Remove dead code in dget_parent() AFS: Fix silly characters in a comment switch d_add_ci() to d_splice_alias() in "found negative" case as well simplify gfs2_lookup() jfs_lookup(): don't bother with . or .. get rid of useless dget_parent() in btrfs rename() and link() get rid of useless dget_parent() in fs/btrfs/ioctl.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers drivers: fix up various ->llseek() implementations fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek Ext4: handle SEEK_HOLE/SEEK_DATA generically Btrfs: implement our own ->llseek fs: add SEEK_HOLE and SEEK_DATA flags reiserfs: make reiserfs default to barrier=flush ... Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new shrinker callout for the inode cache, that clashed with the xfs code to start the periodic workers later.	2011-07-22 19:02:39 -07:00
Jan Kara	b22570d9ab	ext3: Fix data corruption in inodes with journalled data When journalling data for an inode (either because it is a symlink or because the filesystem is mounted in data=journal mode), ext3_evict_inode() can discard unwritten data by calling truncate_inode_pages(). This is because we don't mark the buffer / page dirty when journalling data but only add the buffer to the running transaction and thus mm does not know there are still unwritten data. Fix the problem by carefully tracking transaction containing inode's data, committing this transaction, and writing uncheckpointed buffers when inode should be reaped. Signed-off-by: Jan Kara <jack@suse.cz>	2011-07-23 01:49:00 +02:00
Konstantin Khlebnikov	5a9a43646c	vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp Replace unclear (struct dentry ) to (struct file ) typecast with ERR_CAST() macro. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:13 -04:00
Jan Kara	d769b3c2ab	isofs: Remove global fs lock sbi->s_mutex isn't needed for isofs at all so we can just remove it. Generally, since isofs is always mounted read-only, filesystem structure cannot change under us. So buffer_head contents stays constant after it's filled in. That leaves us with possible changes of global data structures. Superblock changes only during filesystem mount (even remount does not change it), inodes are only filled in during reading from disk. So there are no changes of these structures to bother about. Arguments why sbi->s_mutex can be removed at each place: isofs_readdir: Accesses sb, inode, filp, local variables => s_mutex not needed isofs_lookup: Protected by directory's i_mutex. Accesses sb, inode, dentry, local variables => s_mutex not needed rock_ridge_symlink_readpage: Protected by page lock. Accesses sb, inode, local variables => s_mutex not needed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:12 -04:00
Al Viro	22ba747f66	jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory We don't generate IN_DELETE_SELF on victim of overwriting rename() if it happens to be a directory. Trivially fixed by doing to ->i_nlink what we do ->pino_nlink a couple of lines later in jffs2_rename(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:11 -04:00
Al Viro	841590ce16	fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. On ramfs and other simple_rename() users IN_DELETE_SELF is not generated for victim of overwriting rename() if it's is a directory. Works on most of the local filesystems and really trivial to fix... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:11 -04:00
Matthew Garrett	dee28e72b6	pstore: Allow the user to explicitly choose a backend pstore only allows one backend to be registered at present, but the system may provide several. Add a parameter to allow the user to choose which backend will be used rather than just relying on load order. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:39 -07:00
Matthew Garrett	b94fdd077e	pstore: Make "part" unsigned We'll never have a negative part, so just make this an unsigned int. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:29 -07:00
Matthew Garrett	56280682ce	pstore: Add extra context for writes and erases EFI only provides small amounts of individual storage, and conventionally puts metadata in the storage variable name. Rather than add a metadata header to the (already limited) variable storage, it's easier for us to modify pstore to pass all the information we need to construct a unique variable name to the appropriate functions. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:20 -07:00
Matthew Garrett	638c1fd303	pstore: Extend API for more flexibility in new backends Some pstore implementations may not have a static context, so extend the API to pass the pstore_info struct to all calls and allow for a context pointer. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:06 -07:00
Linus Torvalds	8209f53d79	Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits) ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction connector: add an event for monitoring process tracers ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task() ptrace_init_task: initialize child->jobctl explicitly has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/ ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/ ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented() ptrace: ptrace_reparented() should check same_thread_group() redefine thread_group_leader() as exit_signal >= 0 do not change dead_task->exit_signal kill task_detached() reparent_leader: check EXIT_DEAD instead of task_detached() make do_notify_parent() __must_check, update the callers __ptrace_detach: avoid task_detached(), check do_notify_parent() kill tracehook_notify_death() make do_notify_parent() return bool ptrace: s/tracehook_tracer_task()/ptrace_parent()/ ...	2011-07-22 15:06:50 -07:00
Linus Torvalds	c1f792a5bf	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits) xfs: add size update tracepoint to IO completion xfs: convert AIL cursors to use struct list_head xfs: remove confusing ail cursor wrapper xfs: use a cursor for bulk AIL insertion xfs: failure mapping nfs fh to inode should return ESTALE xfs: Remove the second parameter to xfs_sb_count() xfs: remove the dead XFS_DABUF_DEBUG code xfs: remove leftovers of the old btree tracing code xfs: remove the dead QUOTADEBUG code xfs: remove the unused xfs_buf_delwri_sort function xfs: remove wrappers around b_iodone xfs: remove wrappers around b_fspriv xfs: add a proper transaction pointer to struct xfs_buf xfs: factor out xfs_da_grow_inode_int xfs: factor out xfs_dir2_leaf_find_stale xfs: cleanup struct xfs_dir2_free xfs: reshuffle dir2 headers xfs: start periodic workers later Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc" xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact() ...	2011-07-22 13:16:33 -07:00
Linus Torvalds	6aaf4404ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: don't limit active work items dlm: use workqueue for callbacks dlm: remove deadlock debug print dlm: improve rsb searches dlm: keep lkbs in idr dlm: fix kmalloc args dlm: don't do pointless NULL check, use kzalloc and fix order of arguments dlm: dump address of unknown node dlm: use vmalloc for hash tables dlm: show addresses in configfs	2011-07-22 13:16:07 -07:00
Linus Torvalds	ba1f9db908	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: ensure bio requests are not smaller than the hardware sectors hfsplus: Add additional range check to handle on-disk corruptions hfsplus: Add error propagation for hfsplus_ext_write_extent_locked hfsplus: add error checking for hfs_find_init() hfsplus: lift the 2TB size limit hfsplus: fix overflow in hfsplus_read_wrapper hfsplus: fix overflow in hfsplus_get_block hfsplus: assignments inside `if' condition clean-up	2011-07-22 13:12:17 -07:00
Linus Torvalds	49302baa64	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: combine duplicated block freeing routines GFS2: Add S_NOSEC support GFS2: Automatically adjust glock min hold time GFS2: Cache dir hash table in a contiguous buffer	2011-07-22 13:10:41 -07:00
Linus Torvalds	59a7ac1211	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: (32 commits) MAINTAINERS: change e-mail of Adrian Hunter UBIFS: fix master node recovery UBIFS: improve power cut emulation testing UBIFS: rename recovery testing variables UBIFS: remove custom list of superblocks UBIFS: stop re-defining UBI operations UBIFS: switch to I/O helpers UBIFS: switch to ubifs_leb_write UBIFS: switch to ubifs_leb_read UBIFS: introduce more I/O helpers UBIFS: always print stacktrace when switching to R/O mode UBIFS: remove unused and unneeded debugging function UBIFS: add global debugfs knobs UBIFS: introduce debugfs helpers UBIFS: re-arrange debugging code a bit UBIFS: be more informative in failure mode UBIFS: switch self-check knobs to debugfs UBIFS: lessen amount of debugging check types UBIFS: introduce helper functions for debugging checks and tests UBIFS: amend debugging inode size check function prototype ...	2011-07-22 13:09:35 -07:00
Wang Sheng-Hui	03b5bb3429	ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get In ext2_xattr_get(), the code will acquire xattr_sem first, later checks the length of xattr name_len > 255. It's unnecessarily time consuming and also ext2_xattr_set() checks the length before other checks. So move the check before acquiring xattr_sem to make these two functions consistent. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-07-22 19:41:16 +02:00
Jean Delvare	df2e301fee	fs: Merge split strings No idea why these were split in the first place... Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-07-22 16:47:15 +02:00
Seth Forshee	6596528e39	hfsplus: ensure bio requests are not smaller than the hardware sectors Currently all bio requests are 512 bytes, which may fail for media whose physical sector size is larger than this. Ensure these requests are not smaller than the block device logical block size. BugLink: http://bugs.launchpad.net/bugs/734883 Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-22 16:37:44 +02:00
Naohiro Aota	aac4e4198e	hfsplus: Add additional range check to handle on-disk corruptions 'recoff' is read from disk and used for an argument to memcpy, so if the value read from disk is larger than the page size, it result to "general protection fault". This patch add additional range check for the value, so that disk fuzz won't cause such fault. Signed-off-by: Naohiro Aota <naota@elisp.net> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-22 16:36:56 +02:00
Oleg Nesterov	eac1b5e57d	ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever Test-case: void tfunc(void arg) { execvp("true", NULL); return NULL; } int main(void) { int pid; if (fork()) { pthread_t t; kill(getpid(), SIGSTOP); pthread_create(&t, NULL, tfunc, NULL); for (;;) pause(); } pid = getppid(); assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0); while (wait(NULL) > 0) ptrace(PTRACE_CONT, pid, 0,0); return 0; } It is racy, exit_notify() does __wake_up_parent() too. But in the likely case it triggers the problem: de_thread() does release_task() and the old leader goes away without the notification, the tracer sleeps in do_wait() without children/tracees. Change de_thread() to do __wake_up_parent(traced_leader->parent). Since it is already EXIT_DEAD we can do this without ptrace_unlink(), EXIT_DEAD threads do not exist from do_wait's pov. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2011-07-22 15:10:49 +02:00
Phillip Lougher	cc6d349714	Squashfs: Make ZLIB compression support optional Squashfs now supports XZ and LZO compression in addition to ZLIB. As such it no longer makes sense to always include ZLIB support. In particular embedded systems may only use LZO or XZ compression, and the ability to exclude ZLIB support will reduce kernel size. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>	2011-07-22 03:01:28 +01:00
Linus Torvalds	2bafc7a275	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: CIFS: Fix wrong length in cifs_iovec_read	2011-07-21 14:28:01 -07:00
Linus Torvalds	b91da88fed	vfs: drop conditional inode prefetch in __do_lookup_rcu It seems to hurt performance in real life. Yes, the inode will be used later, but the conditional doesn't seem to predict all that well (negative dentries are not uncommon) and it looks like the cost of prefetching is simply higher than depending on the cache doing the right thing. As usual. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 11:01:42 -07:00
Jan Beulich	b307d4655a	FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop The compiler, at least for ix86 and m68k, validly warns that the comparison: next <= (loff_t)-1 is always true (and it's always true also for x86-64 and probably all other arches - as long as pgoff_t isn't wider than loff_t). The intention appears to be to avoid wrapping of "next", so rather than eliminating the pointless comparison, fix the loop to indeed get exited when "next" would otherwise wrap. On m68k the following warning is observed: fs/fscache/page.c: In function '__fscache_uncache_all_inode_pages': fs/fscache/page.c:979: warning: comparison is always false due to limited range of data type Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 10:59:16 -07:00
Pavel Shilovsky	2cebaa58b7	CIFS: Fix wrong length in cifs_iovec_read Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-21 00:48:05 +00:00
Al Viro	86c98e8cdb	Remove dead code in dget_parent() ->d_parent is never NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:04 -04:00
David Howells	e4b9f00581	AFS: Fix silly characters in a comment Fix silly characters in a comment in AFS code (some weird characters replaced the word 'flag' some point way back). Reported-by: viro@ZenIV.linux.org.uk Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:03 -04:00
Al Viro	4513d899c4	switch d_add_ci() to d_splice_alias() in "found negative" case as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:02 -04:00
Al Viro	6c673ab393	simplify gfs2_lookup() d_splice_alias() will DTRT when given NULL or ERR_PTR Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:02 -04:00
Al Viro	79ac5a46c5	jfs_lookup(): don't bother with . or .. they'll never be passed to ->lookup() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:01 -04:00
Al Viro	10d9f309d8	get rid of useless dget_parent() in btrfs rename() and link() ->d_parent is locked and stable there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:00 -04:00
Al Viro	2fbe8c8ad1	get rid of useless dget_parent() in fs/btrfs/ioctl.c both callers there have dentry->d_parent stabilized by the fact that their caller had obtained dentry from lookup_one_len() and had not dropped ->i_mutex on parent since then. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:00 -04:00
Josef Bacik	02c24a8218	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:59 -04:00
Josef Bacik	06222e491e	fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases we just return -EINVAL, in others we do the normal generic thing, and in others we're simply making sure that the properly due-dilligence is done. For example in NFS/CIFS we need to make sure the file size is update properly for the SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself that is all we have to do. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:58 -04:00
Josef Bacik	c334b1138b	Ext4: handle SEEK_HOLE/SEEK_DATA generically Since Ext4 has its own lseek we need to make sure it handles SEEK_HOLE/SEEK_DATA. For now just do the same thing that is done in the generic case, somebody else can come along and make it do fancy things later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:57 -04:00
Josef Bacik	b26751575a	Btrfs: implement our own ->llseek In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek. Basically for the normal SEEK_*'s we will just defer to the generic helper, and for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest hole or data. Currently this helper doesn't check for delalloc bytes for prealloc space, so for now treat prealloc as data until that is fixed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:56 -04:00
Josef Bacik	982d816581	fs: add SEEK_HOLE and SEEK_DATA flags This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. We need to match solaris here, and the definitions are o If /whence/ is SEEK_HOLE, the offset of the start of the next hole greater than or equal to the supplied offset is returned. The definition of a hole is provided near the end of the DESCRIPTION. o If /whence/ is SEEK_DATA, the file pointer is set to the start of the next non-hole file region greater than or equal to the supplied offset. So in the generic case the entire file is data and there is a virtual hole at the end. That means we will just return i_size for SEEK_HOLE and will return the same offset for SEEK_DATA. This is how Solaris does it so we have to do it the same way. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:56 -04:00
Christoph Hellwig	b4d5b10fb2	reiserfs: make reiserfs default to barrier=flush Change the default reiserfs mount option to barrier=flush. Based on a patch from Jeff Mahoney in the SuSE tree. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:55 -04:00
Christoph Hellwig	00eacd66cd	ext3: make ext3 mount default to barrier=1 This patch turns on barriers by default for ext3. mount -o barrier=0 will turn them off. Based on a patch from Chris Mason in the SuSE tree. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Ted Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:54 -04:00
Al Viro	b85fd6bdc9	don't open-code parent_ino() in assorted ->readdir() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:54 -04:00
Al Viro	2def9e4ec7	minix_getattr(): don't bother with ->d_parent we can find superblock easier, TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:53 -04:00
Al Viro	ee60498f3e	coda_venus_readdir(): use offsetof() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:52 -04:00
Kay Sievers	f15146380d	fs: seq_file - add event counter to simplify poll() support Moving the event counter into the dynamically allocated 'struc seq_file' allows poll() support without the need to allocate its own tracking structure. All current users are switched over to use the new counter. Requested-by: Andrew Morton akpm@linux-foundation.org Acked-by: NeilBrown <neilb@suse.de> Tested-by: Lucas De Marchi lucas.demarchi@profusion.mobi Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:50 -04:00
Christoph Hellwig	72c5052ddc	fs: move inode_dio_done to the end_io handler For filesystems that delay their end_io processing we should keep our i_dio_count until the the processing is done. Enable this by moving the inode_dio_done call to the end_io handler if one exist. Note that the actual move to the workqueue for ext4 and XFS is not done in this patch yet, but left to the filesystem maintainers. At least for XFS it's not needed yet either as XFS has an internal equivalent to i_dio_count. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:50 -04:00
Christoph Hellwig	aacfc19c62	fs: simplify the blockdev_direct_IO prototype Simple filesystems always pass inode->i_sb_bdev as the block device argument, and never need a end_io handler. Let's simply things for them and for my grepping activity by dropping these arguments. The only thing not falling into that scheme is ext4, which passes and end_io handler without needing special flags (yet), but given how messy the direct I/O code there is use of __blockdev_direct_IO in one instead of two out of three cases isn't going to make a large difference anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:49 -04:00
Christoph Hellwig	df2d6f2658	fs: always maintain i_dio_count Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING. This these filesystems to also protect truncate against direct I/O requests by using common code. Right now the only non-DIO_LOCKING filesystem that appears to do so is XFS, which uses an opencoded variant of the i_dio_count scheme. Behaviour doesn't change for filesystems never calling inode_dio_wait. For ext4 behaviour changes when using the dioread_nonlock option, which previously was missing any protection between truncate and direct I/O reads. For ocfs2 that handcrafted i_dio_count manipulations are replaced with the common code now enable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:48 -04:00
Christoph Hellwig	562c72aa57	fs: move inode_dio_wait calls into ->setattr Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:47 -04:00
Christoph Hellwig	bd5fe6c5eb	fs: kill i_alloc_sem i_alloc_sem is a rather special rw_semaphore. It's the last one that may be released by a non-owner, and it's write side is always mirrored by real exclusion. It's intended use it to wait for all pending direct I/O requests to finish before starting a truncate. Replace it with a hand-grown construct: - exclusion for truncates is already guaranteed by i_mutex, so it can simply fall way - the reader side is replaced by an i_dio_count member in struct inode that counts the number of pending direct I/O requests. Truncate can't proceed as long as it's non-zero - when i_dio_count reaches non-zero we wake up a pending truncate using wake_up_bit on a new bit in i_flags - new references to i_dio_count can't appear while we are waiting for it to read zero because the direct I/O count always needs i_mutex (or an equivalent like XFS's i_iolock) for starting a new operation. This scheme is much simpler, and saves the space of a spinlock_t and a struct list_head in struct inode (typically 160 bits on a non-debug 64-bit system). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:46 -04:00
Christoph Hellwig	f9b5570d7f	fs: simplify handling of zero sized reads in __blockdev_direct_IO Reject zero sized reads as soon as we know our I/O length, and don't borther with locks or allocations that might have to be cleaned up otherwise. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:45 -04:00
Jan Kara	9ea7df534e	ext4: Rewrite ext4_page_mkwrite() to use generic helpers Rewrite ext4_page_mkwrite() to use __block_page_mkwrite() helper. This removes the need of using i_alloc_sem to avoid races with truncate which seems to be the wrong locking order according to lock ordering documented in mm/rmap.c. Also calling ext4_da_write_begin() as used by the old code seems to be problematic because we can decide to flush delay-allocated blocks which will acquire s_umount semaphore - again creating unpleasant lock dependency if not directly a deadlock. Also add a check for frozen filesystem so that we don't busyloop in page fault when the filesystem is frozen. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:45 -04:00
Christoph Hellwig	5826869158	fat: remove i_alloc_sem abuse Add a new rw_semaphore to protect bmap against truncate. Previous i_alloc_sem was abused for this, but it's going away in this series. Note that we can't simply use i_mutex, given that the swapon code calls ->bmap under it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:44 -04:00
Tobias Klauser	8c5dc70aae	VFS: Fixup kerneldoc for generic_permission() The flags parameter went away in d749519b444db985e40b897f73ce1898b11f997e Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:43 -04:00
Dave Chinner	8daaa83145	xfs: make use of new shrinker callout for the inode cache Convert the inode reclaim shrinker to use the new per-sb shrinker operations. This allows much bigger reclaim batches to be used, and allows the XFS inode cache to be shrunk in proportion with the VFS dentry and inode caches. This avoids the problem of the VFS caches being shrunk significantly before the XFS inode cache is shrunk resulting in imbalances in the caches during reclaim. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:42 -04:00
Dave Chinner	8ab47664d5	vfs: increase shrinker batch size Now that the per-sb shrinker is responsible for shrinking 2 or more caches, increase the batch size to keep econmies of scale for shrinking each cache. Increase the shrinker batch size to 1024 objects. To allow for a large increase in batch size, add a conditional reschedule to prune_icache_sb() so that we don't hold the LRU spin lock for too long. This mirrors the behaviour of the __shrink_dcache_sb(), and allows us to increase the batch size without needing to worry about problems caused by long lock hold times. To ensure that filesystems using the per-sb shrinker callouts don't cause problems, document that the object freeing method must reschedule appropriately inside loops. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:41 -04:00
Dave Chinner	0e1fdafd93	superblock: add filesystem shrinker operations Now we have a per-superblock shrinker implementation, we can add a filesystem specific callout to it to allow filesystem internal caches to be shrunk by the superblock shrinker. Rather than perpetuate the multipurpose shrinker callback API (i.e. nr_to_scan == 0 meaning "tell me how many objects freeable in the cache), two operations will be added. The first will return the number of objects that are freeable, the second is the actual shrinker call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:41 -04:00
Dave Chinner	4f8c19fdf3	inode: remove iprune_sem Now that we have per-sb shrinkers with a lifecycle that is a subset of the superblock lifecycle and can reliably detect a filesystem being unmounted, there is not longer any race condition for the iprune_sem to protect against. Hence we can remove it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:40 -04:00
Dave Chinner	b0d40c92ad	superblock: introduce per-sb cache shrinker infrastructure With context based shrinkers, we can implement a per-superblock shrinker that shrinks the caches attached to the superblock. We currently have global shrinkers for the inode and dentry caches that split up into per-superblock operations via a coarse proportioning method that does not batch very well. The global shrinkers also have a dependency - dentries pin inodes - so we have to be very careful about how we register the global shrinkers so that the implicit call order is always correct. With a per-sb shrinker callout, we can encode this dependency directly into the per-sb shrinker, hence avoiding the need for strictly ordering shrinker registrations. We also have no need for any proportioning code for the shrinker subsystem already provides this functionality across all shrinkers. Allowing the shrinker to operate on a single superblock at a time means that we do less superblock list traversals and locking and reclaim should batch more effectively. This should result in less CPU overhead for reclaim and potentially faster reclaim of items from each filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:10 -04:00
J. Bruce Fields	8fb47a4fbf	locks: rename lock-manager ops Both the filesystem and the lock manager can associate operations with a lock. Confusingly, one of them (fl_release_private) actually has the same name in both operation structures. It would save some confusion to give the lock-manager ops different names. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-20 20:23:19 -04:00
Dave Chinner	55fb25d5b3	xfs: add size update tracepoint to IO completion For improving insight into IO completion behaviour. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-20 18:38:04 -05:00
Dave Chinner	af3e40228f	xfs: convert AIL cursors to use struct list_head The list of active AIL cursors uses a roll-your-own linked list with special casing for the AIL push cursor. Simplify this code by replacing the list with standard struct list_head lists, and use a separate list_head to track the active cursors. This allows us to treat the AIL push cursor as a generic cursor rather than as a special case, further simplifying the code. Further, fix the duplicate push cursor initialisation that the special case handling was hiding, and clean up all the comments around the active cursor list handling. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-20 18:37:46 -05:00
Dave Chinner	16b5902943	xfs: remove confusing ail cursor wrapper xfs_trans_ail_cursor_set() doesn't set the cursor to the current log item, it sets it to the next item. There is already a function for doing this - xfs_trans_ail_cursor_next() - and the _set function is simply a two line wrapper. Remove it and open code the setting of the cursor in the two locations that call it to remove the confusion. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-20 18:37:37 -05:00
Dave Chinner	1d8c95a363	xfs: use a cursor for bulk AIL insertion Delayed logging can insert tens of thousands of log items into the AIL at the same LSN. When the committing of log commit records occur, we can get insertions occurring at an LSN that is not at the end of the AIL. If there are thousands of items in the AIL on the tail LSN, each insertion has to walk the AIL to find the correct place to insert the new item into the AIL. This can consume large amounts of CPU time and block other operations from occurring while the traversals are in progress. To avoid this repeated walk, use a AIL cursor to record where we should be inserting the new items into the AIL without having to repeat the walk. The cursor infrastructure already provides this functionality for push walks, so is a simple extension of existing code. While this will not avoid the initial walk, it will avoid repeating it tens of thousands of times during a single checkpoint commit. This version includes logic improvements from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-20 18:37:20 -05:00
J. Bruce Fields	ad1a2c878c	xfs: failure mapping nfs fh to inode should return ESTALE On xfs exports, nfsd is incorrectly returning ENOENT instead of ESTALE on attempts to use a filehandle of a deleted file (spotted with pynfs test PUTFH3). The ENOENT was coming from xfs_iget. (It's tempting to wonder whether we should just map all xfs_iget errors to ESTALE, but I don't believe so--xfs_iget can also return ENOMEM at least, which we wouldn't want mapped to ESTALE.) While we're at it, the other return of ENOENT in xfs_nfs_get_inode() also looks wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-20 18:35:21 -05:00
Chandra Seetharaman	adab0f67d1	xfs: Remove the second parameter to xfs_sb_count() Remove the second parameter to xfs_sb_count() since all callers of the function set them. Also, fix the header comment regarding it being called periodically. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-20 18:35:03 -05:00
Bernd Schubert	c878c73f8d	ext3: Fix compilation with -DDX_DEBUG Compilation of ext3/namei.c brought up an error and warning messages when compiled with -DDX_DEBUG. Signed-off-by: Bernd Schubert<bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Jan Kara <jack@suse.cz>	2011-07-20 23:16:33 +02:00
Dave Chinner	12ad3ab661	superblock: move pin_sb_for_writeback() to fs/super.c The per-sb shrinker has the same requirement as the writeback threads of ensuring that the superblock is usable and pinned for the time it takes to run the work. Both need to take a passive reference to the sb, take a read lock on the s_umount lock and then only continue if an unmount is not in progress. pin_sb_for_writeback() does this exactly, so move it to fs/super.c and rename it to grab_super_passive() and exporting it via fs/internal.h for all the VFS code to be able to use. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:38 -04:00
Dave Chinner	09cc9fc7a7	inode: move to per-sb LRU locks With the inode LRUs moving to per-sb structures, there is no longer a need for a global inode_lru_lock. The locking can be made more fine-grained by moving to a per-sb LRU lock, isolating the LRU operations of different filesytsems completely from each other. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:36 -04:00
Dave Chinner	98b745c647	inode: Make unused inode LRU per superblock The inode unused list is currently a global LRU. This does not match the other global filesystem cache - the dentry cache - which uses per-superblock LRU lists. Hence we have related filesystem object types using different LRU reclaimation schemes. To enable a per-superblock filesystem cache shrinker, both of these caches need to have per-sb unused object LRU lists. Hence this patch converts the global inode LRU to per-sb LRUs. The patch only does rudimentary per-sb propotioning in the shrinker infrastructure, as this gets removed when the per-sb shrinker callouts are introduced later on. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:35 -04:00
Dave Chinner	fcb94f72d3	inode: convert inode_stat.nr_unused to per-cpu counters Before we split up the inode_lru_lock, the unused inode counter needs to be made independent of the global inode_lru_lock. Convert it to per-cpu counters to do this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:34 -04:00
Al Viro	a9049376ee	make d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) ... and simplify the living hell out of callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:26 -04:00
Al Viro	0c1aa9a952	deuglify squashfs_lookup() d_splice_alias(NULL, dentry) is equivalent to d_add(dentry, NULL), NULL so no need for that if (inode) ... in there (or ERR_PTR(0), for that matter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:24 -04:00
Al Viro	5b4b299cc7	nfsd4_list_rec_dir(): don't bother with reopening rec_file just rewind it to the beginning before vfs_readdir() and be done with that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:23 -04:00
Al Viro	e7f5909707	kill useless checks for sb->s_op == NULL never is... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:21 -04:00
Al Viro	0ee5dc676a	btrfs: kill magical embedded struct superblock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:20 -04:00
Al Viro	fb408e6ccc	get rid of pointless checks for dentry->sb == NULL it never is... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:19 -04:00
Al Viro	a4464dbc0c	Make ->d_sb assign-once and always non-NULL New helper (non-exported, fs/internal.h-only): __d_alloc(sb, name). Allocates dentry, sets its ->d_sb to given superblock and sets ->d_op accordingly. Old d_alloc(NULL, name) callers are converted to that (all of them know what superblock they want). d_alloc() itself is left only for parent != NULl case; uses __d_alloc(), inserts result into the list of parent's children. Note that now ->d_sb is assign-once and never NULL and ->d_parent is never NULL either. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:17 -04:00
Al Viro	e3c3d9c838	unexport kern_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:16 -04:00
Al Viro	e0a0124936	switch vfs_path_lookup() to struct path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:14 -04:00
Al Viro	ed75e95de5	kill lookup_create() folded into the only caller (kern_path_create()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:12 -04:00
Al Viro	dae6ad8f37	new helpers: kern_path_create/user_path_create combination of kern_path_parent() and lookup_create(). Does not expose struct nameidata to caller. Syscalls converted to that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:05 -04:00
Al Viro	49084c3bb2	kill LOOKUP_CONTINUE LOOKUP_PARENT is equivalent to it now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:03 -04:00
Al Viro	8aeb376ca0	nfs: LOOKUP_{OPEN,CREATE,EXCL} is set only on the last step Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:02 -04:00
Al Viro	4352780386	cifs_lookup(): LOOKUP_OPEN is set only on the last component Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:00 -04:00
Al Viro	a127e0af59	ceph: LOOKUP_OPEN is set only when it's the last component Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:59 -04:00
Al Viro	5c0f360b08	jfs_ci_revalidate() is safe from RCU mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:57 -04:00
Al Viro	407938e79e	LOOKUP_CREATE and LOOKUP_RENAME_TARGET can be set only on the last step Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:56 -04:00
Al Viro	dd7dd556e4	no need to check for LOOKUP_OPEN in ->create() instances ... it will be set in nd->flag for all cases with non-NULL nd (i.e. when called from do_last()). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:56 -04:00
Al Viro	bf6c7f6c7b	don't pass nameidata to vfs_create() from ecryptfs_create() Instead of playing with removal of LOOKUP_OPEN, mangling (and restoring) nd->path, just pass NULL to vfs_create(). The whole point of what's being done there is to suppress any attempts to open file by underlying fs, which is what nd == NULL indicates. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:54 -04:00
Al Viro	8a5e929dd2	don't transliterate lower bits of ->intent.open.flags to FMODE_... ->create() instances are much happier that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:52 -04:00
Al Viro	554a8b9f54	Don't pass nameidata when calling vfs_create() from mknod() All instances can cope with that now (and ceph one actually starts working properly). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:49 -04:00
Al Viro	f7c85868fc	fix mknod() on nfs4 (hopefully) a) check the right flags in ->create() (LOOKUP_OPEN, not LOOKUP_CREATE) b) default (!LOOKUP_OPEN) open_flags is O_CREAT\|O_EXCL\|FMODE_READ, not 0 c) lookup_instantiate_filp() should be done only with LOOKUP_OPEN; otherwise we need to issue CLOSE, lest we leak stateid on server. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:46 -04:00
Al Viro	511415980a	nameidata_to_nfs_open_context() doesn't need nameidata, actually... just open flags; switched to passing just those and renamed to create_nfs_open_context() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:45 -04:00
Al Viro	3d4ff43d89	nfs_open_context doesn't need struct path either just dentry, please... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:44 -04:00
Al Viro	82a2c1b77a	nfs4_opendata doesn't need struct path either Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:42 -04:00
Al Viro	643168c2dc	nfs4_closedata doesn't need to mess with struct path instead of path_get()/path_put(), we can just use nfs_sb_{,de}active() to pin the superblock down. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:41 -04:00
Al Viro	7c97c200e2	cifs: fix the type of cifs_demultiplex_thread() ... and get rid of a bogus typecast, while we are at it; it's not just that we want a function returning int and not void, but cast to pointer to function taking void * and returning void would be (void ()(void )) and not (void )(void ), TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:39 -04:00
Al Viro	beefebf1aa	ecryptfs_inode_permission() doesn't need to bail out on RCU ... now that inode_permission() can take MAY_NOT_BLOCK and handle it properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:38 -04:00
Al Viro	d2d9e9fbc2	merge do_revalidate() into its only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:34 -04:00
Al Viro	4ad5abb3d0	no reason to keep exec_permission() separate now cache footprint alone makes it a bad idea... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:32 -04:00
Al Viro	d594e7ec4d	massage generic_permission() to treat directories on a separate path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:30 -04:00
Al Viro	eecdd358b4	->permission() sanitizing: don't pass flags to exec_permission() pass mask instead; kill security_inode_exec_permission() since we can use security_inode_permission() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:29 -04:00
Al Viro	10556cb21a	->permission() sanitizing: don't pass flags to ->permission() not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:24 -04:00
Al Viro	2830ba7f34	->permission() sanitizing: don't pass flags to generic_permission() redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:22 -04:00
Al Viro	7e40145eb1	->permission() sanitizing: don't pass flags to ->check_acl() not used in the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:21 -04:00
Al Viro	9c2c703929	->permission() sanitizing: pass MAY_NOT_BLOCK to ->check_acl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:19 -04:00
Al Viro	1fc0f78ca9	->permission() sanitizing: MAY_NOT_BLOCK Duplicate the flags argument into mask bitmap. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:18 -04:00
Al Viro	178ea73521	kill check_acl callback of generic_permission() its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:16 -04:00
Al Viro	07b8ce1ee8	lockless get_write_access/deny_write_access new helpers: atomic_inc_unless_negative()/atomic_dec_unless_positive() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:14 -04:00
Al Viro	f4d6ff89d8	move exec_permission() up to the rest of permission-related functions ... and convert the comment before it into linuxdoc form. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:13 -04:00
Al Viro	3bfa784a65	kill file_permission() completely convert the last remaining caller to inode_permission() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:11 -04:00
Al Viro	1b5d783c94	consolidate BINPRM_FLAGS_ENFORCE_NONDUMP handling new helper: would_dump(bprm, file). Checks if we are allowed to read the file and if we are not - sets ENFORCE_NODUMP. Exported, used in places that previously open-coded the same logics. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:10 -04:00
Al Viro	78f32a9b47	switch path_init() to exec_permission() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:08 -04:00
Al Viro	6f28610974	switch udf_ioctl() to inode_permission() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:07 -04:00
Al Viro	4cf27141cb	make exec_permission(dir) really equivalent to inode_permission(dir, MAY_EXEC) capability overrides apply only to the default case; if fs has ->permission() that does _not_ call generic_permission(), we have no business doing them. Moreover, if it has ->permission() that does call generic_permission(), we have no need to recheck capabilities. Besides, the capability overrides should apply only if we got EACCES from acl_permission_check(); any other value (-EIO, etc.) should be returned to caller, capabilities or not capabilities. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:05 -04:00
Al Viro	43e15cdbef	new helper: iterate_supers_type() Call the given function for all superblocks of given type. Function gets a superblock (with s_umount locked shared) and (void *) argument supplied by caller of iterator. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:04 -04:00
Josef Bacik	44396f4b5c	fs: add a DCACHE_NEED_LOOKUP flag for d_flags Btrfs (and I'd venture most other fs's) stores its indexes in nice disk order for readdir, but unfortunately in the case of anything that stats the files in order that readdir spits back (like oh say ls) that means we still have to do the normal lookup of the file, which means looking up our other index and then looking up the inode. What I want is a way to create dummy dentries when we find them in readdir so that when ls or anything else subsequently does a stat(), we already have the location information in the dentry and can go straight to the inode itself. The lookup stuff just assumes that if it finds a dentry it is done, it doesn't perform a lookup. So add a DCACHE_NEED_LOOKUP flag so that the lookup code knows it still needs to run i_op->lookup() on the parent to get the inode for the dentry. I have tested this with btrfs and I went from something that looks like this http://people.redhat.com/jwhiter/ls-noreada.png To this http://people.redhat.com/jwhiter/ls-good.png Thats a savings of 1300 seconds, or 22 minutes. That is a significant savings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:03 -04:00
Akinobu Mita	f7b88631a8	fs/libfs.c: fix simple_attr_write() on 32bit machines Assume that /sys/kernel/debug/dummy64 is debugfs file created by debugfs_create_x64(). # cd /sys/kernel/debug # echo 0x1234567812345678 > dummy64 # cat dummy64 0x0000000012345678 # echo 0x80000000 > dummy64 # cat dummy64 0xffffffff80000000 A value larger than INT_MAX cannot be written to the debugfs file created by debugfs_create_u64 or debugfs_create_x64 on 32bit machine. Because simple_attr_write() uses simple_strtol() for the conversion. To fix this, use simple_strtoll() instead. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-19 22:09:30 -07:00
Linus Torvalds	e501f29c72	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: fix race in rcu lookup of pruned dentry Fix cifs_get_root() [ Edited the last commit to get rid of a 'unused variable "seq"' warning due to Al editing the patch. - Linus ]	2011-07-19 21:50:21 -07:00
Linus Torvalds	5943026240	vfs: fix race in rcu lookup of pruned dentry Don't update *inode in __follow_mount_rcu() until we'd verified that there is mountpoint there. Kudos to Hugh Dickins for catching that one in the first place and eventually figuring out the solution (and catching a braino in the earlier version of patch). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-19 21:49:01 -07:00
David Teigland	10d1459faf	dlm: don't limit active work items Allow multiple workqueue items (locks with callbacks) to be processed concurrently. There should be no reason not to take advantage of this workqueue feature. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-19 14:22:32 -05:00
Al Viro	fec11dd9a0	Fix cifs_get_root() Add missing ->i_mutex, convert to lookup_one_len() instead of (broken) open-coded analog, cope with getting something like a//b as relative pathname. Simplify the hell out of it, while we are there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2011-07-18 13:51:58 -04:00
Linus Torvalds	d36c30181c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: hppfs_lookup(): don't open-code lookup_one_len() hppfs: fix dentry leak cramfs: get_cramfs_inode() returns ERR_PTR() on failure ufs should use d_splice_alias() fix exofs ->get_parent() ceph analog of cifs build_path_from_dentry() race fix cifs: build_path_from_dentry() race fix	2011-07-18 09:03:15 -07:00
J. Bruce Fields	1091006c5e	nfsd: turn on reply cache for NFSv4 It's sort of ridiculous that we've never had a working reply cache for NFSv4. On the other hand, we may still not: our current reply cache is likely not very good, especially in the TCP case (which is the only case that matters for v4). What we really need here is some serious testing. Anyway, here's a start. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-18 09:39:01 -04:00
J. Bruce Fields	3e98abffd1	nfsd4: call nfsd4_release_compoundargs from pc_release This simplifies cleanup a bit. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-18 09:38:02 -04:00
Robin Dong	d46203159e	ext4: avoid eh_entries overflow before insert extent_idx If eh_entries is equal to (or greater than) eh_max, the operation of inserting new extent_idx will make number of entries overflow. So check eh_entries before inserting the new extent_idx. Although there is no bug case according the code (function ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split is called only if the index block has free space), the right logic should be "lookup the capacity before insertion". Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-17 23:43:42 -04:00
Robin Dong	015861badd	ext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXT This patch avoids an extraneous lookup of the extent cache in ext4_ext_map_blocks() when the flag EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent. The existing logic was performing the lookup but not making use of the result. The patch simply reverses the order of evaluation in the condition. Since ext4_ext_in_cache() does not initialize newex on misses, bypassing its invocation does not introduce any new issue in this regard. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Eric Gouriou <egouriou@google.com>	2011-07-17 23:27:43 -04:00
Al Viro	0916a5e45f	hppfs_lookup(): don't open-code lookup_one_len() ... and it's getting it wrong, too - missing ->d_revalidate() calls when it's dealing with filesystem (procfs) that has non-trivial ->d_revalidate()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-17 23:22:48 -04:00
Al Viro	3cc0658e35	hppfs: fix dentry leak Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-17 23:22:17 -04:00
Al Viro	0577d1ba41	cramfs: get_cramfs_inode() returns ERR_PTR() on failure ... and we want to report these failures in ->lookup() anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-17 23:22:02 -04:00
Al Viro	642c937b4e	ufs should use d_splice_alias() it's NFS-exportable, so... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-17 23:21:35 -04:00
Allison Henderson	c6a0371cbe	ext4: remove unneeded parameter to ext4_ext_remove_space() This patch removes the extra parameter in ext4_ext_remove_space() which is no longer needed. Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-17 23:21:03 -04:00
Al Viro	a803b8067e	fix exofs ->get_parent() NULL is not a possible return value for that method, TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-17 23:20:29 -04:00
Allison Henderson	f7d0d3797f	ext4: punch hole optimizations: skip un-needed extent lookup This patch optimizes the punch hole operation by skipping the tree walking code that is used by truncate. Since punch hole is done through map blocks, the path to the extent is already known in this function, so we do not need to look it up again. Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-17 23:17:02 -04:00
Dan Ehrenberg	3eb0865843	ext4: ignore a stripe width of 1 If the stripe width was set to 1, then this patch will ignore that stripe width and ext4 will act as if the stripe width were 0 with respect to optimizing allocations. Signed-off-by: Dan Ehrenberg <dehrenberg@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-17 21:18:51 -04:00
Dan Ehrenberg	d7a1fee135	ext4: make the preallocation size be a multiple of stripe size Previously, if a stripe width was provided, then it would be used as the preallocation granularity, with no santiy checking and no way to override this. Now, mb_prealloc_size defaults to the smallest multiple of stripe size that is greater than or equal to the old default mb_prealloc_size, and this can be overridden with the sysfs interface. Signed-off-by: Dan Ehrenberg <dehrenberg@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-17 21:11:30 -04:00
Linus Torvalds	f560f6697f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] update cifs to version 1.74 [CIFS] update limit for snprintf in cifs_construct_tcon cifs: Fix signing failure when server mandates signing for NTLMSSP	2011-07-17 12:49:55 -07:00
Al Viro	1b71fe2efa	ceph analog of cifs build_path_from_dentry() race fix ... unfortunately, cifs bug got copied. Fix is essentially the same. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-16 23:43:58 -04:00
Al Viro	dc137bf553	cifs: build_path_from_dentry() race fix deal with d_move() races properly; rename_lock read-retry loop, rcu_read_lock() held while walking to root, d_lock held over subtraction from namelen and copying the component to stabilize ->d_name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-16 23:37:20 -04:00
Bernd Schubert	265c6a0f92	ext4: fix compilation with -DDX_DEBUG Compilation of ext4/namei.c brought up an error and warning messages when compiled with -DDX_DEBUG Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-16 19:41:23 -04:00
J. Bruce Fields	f85ef69ce0	pnfs: simplify pnfs files module autoloading Embed the necessary alias into the module rather than waiting for someone to add it to /etc/modprobe.conf Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 19:21:58 -04:00
J. Bruce Fields	674e405b8b	nfs: document nfsv4 sillyrename issues Somebody working on this code asked what the deal was with NFSv4, since this comment notes that it's v2/v3's statelessness that requires sillyrename. Shouldn't hurt to document the answer. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 19:14:00 -04:00
Mi Jinlong	ab1350b2b3	nfsd41: Deny new lock before RECLAIM_COMPLETE done Before nfs41 client's RECLAIM_COMPLETE done, nfs server should deny any new locks or opens. rfc5661: " Whenever a client establishes a new client ID and before it does the first non-reclaim operation that obtains a lock, it MUST send a RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no locks to reclaim. If non-reclaim locking operations are done before the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. " Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 19:00:40 -04:00
Miklos Szeredi	ee19cc406d	fs: locks: remove init_once From: Miklos Szeredi <mszeredi@suse.cz> Remove SLAB initialization entirely, as suggested by Bruce and Linus. Allocate with __GFP_ZERO instead and only initialize list heads. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 19:00:39 -04:00
Mi Jinlong	ae82a8d06f	nfsd41: check the size of request Check in SEQUENCE that the request doesn't exceed maxreq_sz for the given session. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 19:00:00 -04:00
Mi Jinlong	1b74c25bc1	nfsd41: error out when client sets maxreq_sz or maxresp_sz too small According to RFC5661, 18.36.3, "if the client selects a value for ca_maxresponsesize such that a replier on a channel could never send a response,the server SHOULD return NFS4ERR_TOOSMALL in the CREATE_SESSION reply." So, error out when the client sets a maxreq_sz less than the minimum possible SEQUENCE request size, or sets a maxresp_sz less than the minimum possible SEQUENCE reply size. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:51 -04:00
J. Bruce Fields	f197c27196	nfsd4: fix file leak on open_downgrade Stateid's hold a read reference for a read open, a write reference for a write open, and an additional one of each for each read+write open. The latter wasn't getting put on a downgrade, so something like: open RW open R downgrade to R was resulting in a file leak. Also fix an imbalance in an error path. Regression from `7d94784293` "nfsd4: fix downgrade/lock logic". Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:49 -04:00
J. Bruce Fields	499f3edc23	nfsd4: remember to put RW access on stateid destruction Without this, for example, open read open read+write close will result in a struct file leak. Regression from `7d94784293` "nfsd4: fix downgrade/lock logic". Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:49 -04:00
Bryan Schumaker	1745680454	NFSD: Added TEST_STATEID operation This operation is used by the client to check the validity of a list of stateids. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:48 -04:00
Bryan Schumaker	e1ca12dfb1	NFSD: added FREE_STATEID operation This operation is used by the client to tell the server to free a stateid. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:47 -04:00
NeilBrown	49b28684fd	nfsd: Remove deprecated nfsctl system call and related code. As promised in feature-removal-schedule.txt it is time to remove the nfsctl system call. Userspace has perferred to not use this call throughout 2.6 and it has been excluded in the default configuration since 2.6.36 (9 months ago). So this patch removes all the code that was being compiled out. There are still references to sys_nfsctl in various arch systemcall tables and related code. These should be cleaned out too, probably in the next merge window. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:42 -04:00
Benny Halevy	094b5d74f4	NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as the client ID derived from the session ID of SEQUENCE is not the same as the client ID to be destroyed. If the client IDs are the same, then the server MUST return NFS4ERR_CLIENTID_BUSY. (that's not implemented yet) If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only operation in the COMPOUND request (otherwise, the server MUST return NFS4ERR_NOT_ONLY_OP). This fixes the error return; before, we returned NFS4ERR_OP_NOT_IN_SESSION; after this patch, we return NFS4ERR_NOTSUPP. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:41 -04:00
David Teigland	23e8e1aaac	dlm: use workqueue for callbacks Instead of creating our own kthread (dlm_astd) to deliver callbacks for all lockspaces, use a per-lockspace workqueue to deliver the callbacks. This eliminates complications and slowdowns from many lockspaces sharing the same thread. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-15 12:30:43 -05:00
Linus Torvalds	da1b001a2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fix loop checks in d_materialise_unique() Fix ->d_lock locking order in unlazy_walk()	2011-07-15 09:55:39 -07:00
Trond Myklebust	94b134ac8e	NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL This is not part of an external ABI... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:24 -04:00
Trond Myklebust	9e00abc3c2	SUNRPC: sunrpc should not explicitly depend on NFS config options Change explicit references to CONFIG_NFS_V4_1 to implicit ones Get rid of the unnecessary defines in backchannel_rqst.c and bc_svc.c: the Makefile takes care of those dependency. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:23 -04:00
Trond Myklebust	1f9453578f	NFS: Clean up - simplify the switch to read/write-through-MDS Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of completely reinitialising the struct nfs_pageio_descriptor. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:22 -04:00
Trond Myklebust	dce81290ee	NFS: Move the pnfs write code into pnfs.c ...and ensure that we recoalese to take into account differences in differences in block sizes when falling back to write through the MDS. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:22 -04:00
Trond Myklebust	493292ddc7	NFS: Move the pnfs read code into pnfs.c ...and ensure that we recoalese to take into account differences in block sizes when falling back to read through the MDS. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:21 -04:00
Trond Myklebust	d9156f9f36	NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed If an attempt to do pNFS fails, and we have to fall back to writing through the MDS, then we may want to re-coalesce the requests that we already have since the block size for the MDS read/writes may be different to that of the DS read/writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:21 -04:00
Trond Myklebust	d097971d8a	NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request Instead of looking up the rsize and wsize, the routines that generate the RPC requests should really be using the pg_bsize, since that is what we use when deciding whether or not to coalesce write requests... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:20 -04:00
Trond Myklebust	50828d7e67	NFS: Cache rpc_ops in struct nfs_pageio_descriptor Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:20 -04:00
Trond Myklebust	275acaafd4	NFS: Clean up: split out the RPC transmission from nfs_pagein_multi/one ...and do the same for nfs_flush_multi/one. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:12:17 -04:00
Peng Tao	3b6091846d	NFS: fix return value of nfs_pagein_one/nfs_flush_one Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-15 09:11:28 -04:00
Eric Sandeen	46fcb2ed29	GFS2: combine duplicated block freeing routines __gfs2_free_data and __gfs2_free_meta are almost identical, and can be trivially combined. [This is as per Eric's original patch minus gfs2_free_data() which had no callers left and plus the conversion of the bmap.c calls to these functions. All in all, a nice clean up] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:32:52 +01:00
Steven Whitehouse	9964afbb79	GFS2: Add S_NOSEC support This adds S_NOSEC support to GFS2. We set/reset the flag either when a user calls setattr or when we have just regained the glock from another node. The flag is only set if there are no xattrs on the inode and there is no suid bit set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>	2011-07-15 09:32:35 +01:00
Bob Peterson	7cf8dcd3b6	GFS2: Automatically adjust glock min hold time This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:32:11 +01:00
Steven Whitehouse	17d539f049	GFS2: Cache dir hash table in a contiguous buffer This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:31:48 +01:00
Al Viro	1836750115	fix loop checks in d_materialise_unique() Both __d_unalias() and __d_materialise_dentry() need loop prevention. Grab rename_lock in caller, check for loops there... As a side benefit, we have dentry_lock_for_move() called only under rename_lock, which seriously reduces deadlock potential of the execrable "locking order" used for ->d_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-14 21:33:41 -04:00
Mark Fasheh	17e9f796bd	btrfs: Don't BUG_ON alloc_path errors in btrfs_balance() Dealing with this seems trivial - the only caller of btrfs_balance() is btrfs_ioctl() which passes the error code directly back to userspace. There also isn't much state to unwind (if I'm wrong about this point, we can always safely move the allocation to the top of btrfs_balance() anyway). Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:45 -07:00
Mark Fasheh	1748f843a0	btrfs: Don't BUG_ON alloc_path errors in btrfs_read_locked_inode btrfs_iget() also needed an update so that errors from btrfs_locked_inode() are caught and bubbled back up. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:45 -07:00
Mark Fasheh	0eb0e19cde	btrfs: Don't BUG_ON alloc_path errors in btrfs_truncate_inode_items I moved the path allocation up a few lines to the top of the function so that we couldn't get into the state where we've dropped delayed items and the extent cache but fail due to -ENOMEM. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:45 -07:00
Mark Fasheh	1e5063d093	btrfs: Don't BUG_ON alloc_path errors in replay_one_buffer() The two ->process_func call sites in tree-log.c which were ignoring a return code have also been updated to gracefully exit as well. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:44 -07:00
Mark Fasheh	d8926bb3ba	btrfs: don't BUG_ON btrfs_alloc_path() errors This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation failure. All the sites that are fixed in this patch were checked by me to be fairly trivial to fix because of at least one of two criteria: - Callers of the function catch errors from it already so bubbling the error up will be handled. - Callers of the function might BUG_ON any nonzero return code in which case there is no behavior changed (but we still got to remove a BUG_ON) The following functions were updated: btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group, btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written, btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink, insert_reserved_file_extent, and run_delalloc_nocow Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:44 -07:00
David Teigland	883ba74f43	dlm: remove deadlock debug print gfs2 recently began using this feature heavily, creating more debug output than we want to see. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-14 12:31:49 -05:00
Linus Torvalds	5dcd07b9f3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Resolve inode eviction and ail list interaction bug GFS2: Fix race during filesystem mount GFS2: force a log flush when invalidating the rindex glock	2011-07-14 10:20:42 -07:00

... 3 4 5 6 7 ...

23911 Commits