OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Benny Halevy	0ac68d1799	knfsd: nfsd4: fix enc_stateid_sz for nfsd callbacks enc_stateid_sz should be given in u32 words units, not bytes, so we were overestimating the buffer space needed here. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	f7fede4b27	knfsd: nfsd4: silence a compiler warning in ACL code Silence a compiler warning in the ACL code, and add a comment making clear the initialization serves no other purpose. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Marc Eshel	9a8db97e77	knfsd: lockd: nfsd4: use same grace period for lockd and nfsd4 Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot, during which clients may reclaim locks from the previous server instance, but may not acquire new locks. Currently the lockd and nfsd enforce grace periods of different lengths. This may cause problems when we reboot a server with both v2/v3 and v4 clients. For example, if the lockd grace period is shorter (as is likely the case), then a v3 client might acquire a new lock that conflicts with a lock already held (but not yet reclaimed) by a v4 client. This patch calculates a lease time that lockd and nfsd can both use. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Andrew Morton	12127498c8	nfsd warning fix gcc-4.3: fs/nfsd/nfsctl.c: In function 'write_getfs': fs/nfsd/nfsctl.c:248: warning: cast from pointer to integer of different size Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	019ab801cf	knfsd: exportfs: split out reconnecting a dentry from find_exported_dentry There's a clear subfunctionality of reconnecting a given dentry to the main dentry tree in find_exported_dentry, that can be called both for the dentry we're looking for or it's parent directory. This patch splits the subfunctionality out into a separate helper to make the code more readable and document it's intent. As a nice side-optimization we can avoid getting a superfluous dentry reference count in the case we need to reconnect a directory on it's own. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	dd90b50906	knfsd: exportfs: add find_disconnected_root helper Break the loop that finds the root of a disconnected subtree into a helper of its own to make reading easier and document the intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	fb66a1989c	knfsd: exportfs: move acceptable check into find_acceptable_alias All callers of find_acceptable_alias check if the current dentry is acceptable before looking for other acceptable aliases using find_acceptable_alias. Move the check into find_acceptable_alias to make the code a little more dense and add a comment to find_acceptable_alias that documents its intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	d7dd618a59	knfsd: exportfs: untangle ISDIR logic in find_exported_dentry Rework some logic in find_exported_dentry so that we only have a single S_ISDIR check and logic that makes clear to the reader what we're really doing here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	10f11c341d	knfsd: exportfs: remove CALL macro Currently exportfs uses a way to call methods very differently from the rest of the kernel. This patch changes it to the standard conventions for method calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	d37065cd6d	knfsd: exportfs: add procedural interface for NFSD Currently NFSD calls directly into filesystems through the export_operations structure. I plan to change this interface in various ways in later patches, and want to avoid the export of the default operations to NFSD, so this patch adds two simple exportfs_encode_fh/exportfs_decode_fh helpers for NFSD to call instead of poking into exportfs guts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	5ca2960733	knfsd: exportfs: remove iget abuse When the exportfs interface was added the expectation was that filesystems provide an operation to convert from a file handle to an inode/dentry, but it kept a backwards compat option that still calls into iget. Calling into iget from non-filesystem code is very bad, because it gives too little information to filesystem, and simply crashes if the filesystem doesn't implement the ->read_inode routine. Fortunately there are only two filesystems left using this fallback: efs and jfs. This patch moves a copy of export_iget to each of those to implement the get_dentry method. While this is a temporary increase of lines of code in the kernel it allows for a much cleaner interface and important code restructuring in later patches. [akpm@linux-foundation.org: add jfs_get_inode_flags() declaration] Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Christoph Hellwig	a569425512	knfsd: exportfs: add exportfs.h header currently the export_operation structure and helpers related to it are in fs.h. fs.h is already far too large and there are very few places needing the export bits, so split them off into a separate header. [akpm@linux-foundation.org: fix cifs build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Tejun Heo	9281acea6a	kallsyms: make KSYM_NAME_LEN include space for trailing '\0' KSYM_NAME_LEN is peculiar in that it does not include the space for the trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating buffer. This is nonsense and error-prone. Moreover, when the caller forgets that it's very likely to subtly bite back by corrupting the stack because the last position of the buffer is always cleared to zero. This patch increments KSYM_NAME_LEN by one and updates code accordingly. * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro is fixed. * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together, MODULE_NAME_LEN was treated as if it didn't include space for the trailing '\0'. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Paulo Marques <pmarques@grupopie.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:03 -07:00
Rafael J. Wysocki	8314418629	Freezer: make kernel threads nonfreezable by default Currently, the freezer treats all tasks as freezable, except for the kernel threads that explicitly set the PF_NOFREEZE flag for themselves. This approach is problematic, since it requires every kernel thread to either set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't care for the freezing of tasks at all. It seems better to only require the kernel threads that want to or need to be frozen to use some freezer-related code and to remove any freezer-related code from the other (nonfreezable) kernel threads, which is done in this patch. The patch causes all kernel threads to be nonfreezable by default (ie. to have PF_NOFREEZE set by default) and introduces the set_freezable() function that should be called by the freezable kernel threads in order to unset PF_NOFREEZE. It also makes all of the currently freezable kernel threads call set_freezable(), so it shouldn't cause any (intentional) change of behaviour to appear. Additionally, it updates documentation to describe the freezing of tasks more accurately. [akpm@linux-foundation.org: build fixes] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:02 -07:00
Nick Piggin	787d2214c1	fs: introduce some page/buffer invariants It is a bug to set a page dirty if it is not uptodate unless it has buffers. If the page has buffers, then the page may be dirty (some buffers dirty) but not uptodate (some buffers not uptodate). The exception to this rule is if the set_page_dirty caller is racing with truncate or invalidate. A buffer can not be set dirty if it is not uptodate. If either of these situations occurs, it indicates there could be some data loss problem. Some of these warnings could be a harmless one where the page or buffer is set uptodate immediately after it is dirtied, however we should fix those up, and enforce this ordering. Bring the order of operations for truncate into line with those of invalidate. This will prevent a page from being able to go !uptodate while we're holding the tree_lock, which is probably a good thing anyway. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:02 -07:00
Rusty Russell	8e1f936b73	mm: clean up and kernelify shrinker registration I can never remember what the function to register to receive VM pressure is called. I have to trace down from __alloc_pages() to find it. It's called "set_shrinker()", and it needs Your Help. 1) Don't hide struct shrinker. It contains no magic. 2) Don't allocate "struct shrinker". It's not helpful. 3) Call them "register_shrinker" and "unregister_shrinker". 4) Call the function "shrink" not "shrinker". 5) Reduce the 17 lines of waffly comments to 13, but document it properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: David Chinner <dgc@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:00 -07:00
Andy Whitcroft	5ad333eb66	Lumpy Reclaim V4 When we are out of memory of a suitable size we enter reclaim. The current reclaim algorithm targets pages in LRU order, which is great for fairness at order-0 but highly unsuitable if you desire pages at higher orders. To get pages of higher order we must shoot down a very high proportion of memory; >95% in a lot of cases. This patch set adds a lumpy reclaim algorithm to the allocator. It targets groups of pages at the specified order anchored at the end of the active and inactive lists. This encourages groups of pages at the requested orders to move from active to inactive, and active to free lists. This behaviour is only triggered out of direct reclaim when higher order pages have been requested. This patch set is particularly effective when utilised with an anti-fragmentation scheme which groups pages of similar reclaimability together. This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms the foundation. Credit to Mel Gorman for sanitity checking. Mel said: The patches have an application with hugepage pool resizing. When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can be resized with greater reliability. Testing on a desktop machine with 2GB of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own was very slow as the success rate was quite low. Without lumpy-reclaim, each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages. With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical. [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup] [bunk@stusta.de: static declarations for internal functions] [a.p.zijlstra@chello.nl: initial lumpy V2 implementation] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Mel Gorman	769848c038	Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated It is often known at allocation time whether a page may be migrated or not. This patch adds a flag called __GFP_MOVABLE and a new mask called GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated using the page migration mechanism or reclaimed by syncing with backing storage and discarding. An API function very similar to alloc_zeroed_user_highpage() is added for __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The flags used by alloc_zeroed_user_highpage() are not changed because it would change the semantics of an existing API. After this patch is applied there are no in-kernel users of alloc_zeroed_user_highpage() so it probably should be marked deprecated if this patch is merged. Note that this patch includes a minor cleanup to the use of __GFP_ZERO in shmem.c to keep all flag modifications to inode->mapping in the shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of Hugh Dickens. Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the concept. Credit to Hugh Dickens for catching issues with shmem swap vector and ramfs allocations. [akpm@linux-foundation.org: build fix] [hugh@veritas.com: __GFP_ZERO cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Eric Van Hensbergen	10fa16e75c	9p: fix debug compilation error s/9p/v9fs.c: In function 'v9fs_parse_options': fs/9p/v9fs.c:134: error: 'p9_debug_level' undeclared (first use in this function) Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2007-07-16 16:03:25 -05:00
Satyam Sharma	5b37696fda	utime(s): Honour CAP_FOWNER when times==NULL do_utimes() does not honour CAP_FOWNER when times==NULL. Trivial and obvious one-line fix. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 12:14:08 -07:00
Anton Altaparmakov	959bc220df	Fix LDM for new field in the VOL5 VBLK. Teach LDM about a new field encountered with Windows Vista. This fixes LDM for people using Vista who have disabled drive letter assignment from one or more volumes. Doing this introduces a so far unknown field in the LDM database in the VOL5 VBLK structure which causes the LDM driver to fail to parse the VBLK structure and hence LDM fails to parse the disk altogether. This patch teaches the driver about this field. Thanks got to Ashton Mills <amills@iinet.com.au> for reporting the problem and working with me on getting it fixed. It is now working for him. Signed-off-by: Anton Altaparmakov <aia21@cantab.net> CC: Richard Russon <ldm@flatcap.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 12:01:30 -07:00
Linus Torvalds	10b275ddfd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: [PATCH] sched: fix up fs/proc/array.c whitespace problems [PATCH] sched: prettify prio_to_wmult[] [PATCH] sched: document prio_to_wmult[] [PATCH] sched: improve weight-array comments [PATCH] sched: remove dead code from task_stime() Fixed up trivial conflict in fs/proc/array.c	2007-07-16 11:02:49 -07:00
Linus Torvalds	add096909d	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits) [PATCH] ocfs2: zero_user_page conversion ocfs2: Support xfs style space reservation ioctls ocfs2: support for removing file regions ocfs2: update truncate handling of partial clusters ocfs2: btree support for removal of arbirtrary extents ocfs2: Support creation of unwritten extents ocfs2: support writing of unwritten extents ocfs2: small cleanup of ocfs2_write_begin_nolock() ocfs2: btree changes for unwritten extents ocfs2: abstract btree growing calls ocfs2: use all extent block suballocators ocfs2: plug truncate into cached dealloc routines ocfs2: simplify deallocation locking ocfs2: harden buffer check during mapping of page blocks ocfs2: shared writeable mmap ocfs2: factor out write aops into nolock variants ocfs2: rework ocfs2_buffered_write_cluster() ocfs2: take ip_alloc_sem during entire truncate ocfs2: Add "preferred slot" mount option [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c ...	2007-07-16 10:52:55 -07:00
Linus Torvalds	14dc524972	Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: splice: direct splicing updates ppos twice more ACSI removal umem: Fix match of pci_ids in umem driver umem: Remove references to dead CONFIG_MM_MAP_MEMORY variable remove the documentation for the legacy CDROM drivers	2007-07-16 10:48:20 -07:00
Steve French	7e42ca886b	[CIFS] Typo in previous patch Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-16 17:40:02 +00:00
Linus Torvalds	b91cba52e9	Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (68 commits) sh: sh-rtc support for SH7709. sh: Revert __xdiv64_32 size change. sh: Update r7785rp defconfig. sh: Export div symbols for GCC 4.2 and ST GCC. sh: fix race in parallel out-of-tree build sh: Kill off dead mach.c for hp6xx. sh: hd64461.h cleanup and added comments. sh: Update the alignment when 4K stacks are used. sh: Add a .bss.page_aligned section for 4K stacks. sh: Don't let SH-4A clobber SH-4 CFLAGS. sh: Add parport stub for SuperIO ports. sh: Drop -Wa,-dsp for DSP tuning. sh: Update dreamcast defconfig. fb: pvr2fb: A few more __devinit annotations for PCI. fb: pvr2fb: Fix up section mismatch warnings. sh: Select IPR-IRQ for SH7091. sh: Correct __xdiv64_32/div64_32 return value size. sh: Fix timer-tmu build for SH-3. sh: Add cpu and mach links to CLEAN_FILES. sh: Preliminary support for the SH-X3 CPU. ...	2007-07-16 10:32:02 -07:00
OGAWA Hirofumi	98283bb49c	fat: Fix the race of read/write the FAT12 entry FAT12 entry is 12bits, so it needs 2 phase to update the value. And writer and reader access it without any lock, so reader can get the half updated value. This fixes the long standing race condition by adding a global spinlock to only FAT12 for avoiding any impact against FAT16/32. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 10:31:01 -07:00
Eric	6fa20d4fb5	[CIFS] zero_user_page() conversions Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-16 16:23:19 +00:00
Geert Uytterhoeven	98701dc19e	compat32: ignore the LOOP_CLR_FD ioctl compat32: Ignore the LOOP_CLR_FD ioctl for the loop block device, to kill an annoying kernel message when e.g. busybox umount is used. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Robert P. J. Day	29e3f34777	NLS: Remove obsolete Makefile entries Since the corresponding source files no longer exist, remove the irrelevant Makefile entries for them. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Badari Pulavarty	5e70030d4c	ext4: statfs speed up This is a patch that speeds up statfs. It is very simple - the "overhead" calculation, which takes a huge amount of time for large filesystems, never changes unless the size of the filesystem itself changes. That means we can store it in memory and only recalculate if the filesystem has been resized (almost never). It also fixes a minor problem that we never update the on-disk superblock free blocks/inodes counts until the filesystem is unmounted. While not fatal, we may as well update that on disk when we have the information, and it makes things like debugfs and dumpe2fs report a bit more accurate info. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Badari Pulavarty	a71ce8c6c9	ext3: statfs speed up This is a patch that speeds up statfs. It is very simple - the "overhead" calculation, which takes a huge amount of time for large filesystems, never changes unless the size of the filesystem itself changes. That means we can store it in memory and only recalculate if the filesystem has been resized (almost never). It also fixes a minor problem that we never update the on-disk superblock free blocks/inodes counts until the filesystem is unmounted. While not fatal, we may as well update that on disk when we have the information, and it makes things like debugfs and dumpe2fs report a bit more accurate info. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Badari Pulavarty	2235219b77	ext2: statfs speed up This is a patch that speeds up statfs. It is very simple - the "overhead" calculation, which takes a huge amount of time for large filesystems, never changes unless the size of the filesystem itself changes. That means we can store it in memory and only recalculate if the filesystem has been resized (almost never). It also fixes a minor problem that we never update the on-disk superblock free blocks/inodes counts until the filesystem is unmounted. While not fatal, we may as well update that on disk when we have the information, and it makes things like debugfs and dumpe2fs report a bit more accurate info. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
J. Bruce Fields	4b4e5a1411	Fix trivial typos in anon_inodes.c comments Trivial typo and grammar fixes. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Borislav Petkov	6c675bd43c	ext4: fix error handling in ext4_create_journal Fix error handling in ext4_create_journal according to kernel conventions. Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
Borislav Petkov	952d9de116	ext3: fix error handling in ext3_create_journal() Fix error handling in ext3_create_journal according to kernel conventions. Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
Cyrill Gorcunov	d375b97037	UDF: fix function name from udf_crc16 to udf_crc We have to change udf_crc16() name to udf_crc() to be able to play with CRC test. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
vignesh babu	9e8c4273ef	is_power_of_2: ufs/super.c Replace (n & (n-1)) with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Acked-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Andrea Arcangeli	1d9d02feee	move seccomp from /proc to a prctl This reduces the memory footprint and it enforces that only the current task can enable seccomp on itself (this is a requirement for a strightforward [modulo preempt ;) ] TIF_NOTSC implementation). Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Andrew Morton	4210df283c	bd_claim_by_disk: fix warning Fix this: fs/block_dev.c: In function 'bd_claim_by_disk': fs/block_dev.c:970: warning: 'found' may be used uninitialized in this function and given that free_bd_holder() now needs free(NULL)-is-legal behaviour, we can simplify bd_release_from_kobject(). Cc: Bjorn Steinbrink <B.Steinbrink@gmx.de> Cc: Johannes Weiner <hannes-kernel@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Johannes Weiner	4e91672c76	Replace obscure constructs in fs/block_dev.c Replace some funky codepaths in fs/block_dev.c with cleaner versions of the affected places. [akpm@linux-foundation.org: fix return value] Signed-off-by: Johannes Weiner <hannes-kernel@saeurebad.de> Cc: Bjorn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Adrian Bunk	948730b0e3	fs/namespace.c should #include "internal.h" Every file should include the headers containing the prototypes for its global functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Duane Griffin	d45bce8faf	HFS+: add custom dentry hash and comparison operations Add custom dentry hash and comparison operations for HFS+ filesystems that are case-insensitive and/or do automatic unicode decomposition. The new operations reuse the existing HFS+ ASCII to unicode conversion, unicode decomposition and case folding functionality. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:49 -07:00
Duane Griffin	1e96b7ca1e	HFS+: refactor ASCII to unicode conversion routine for later reuse The HFS+ filesystem is case-insensitive and does automatic unicode decomposition by default, but does not provide custom dentry operations. This can lead to multiple dentries being cached for lookups on a filename with varying case and/or character (de)composition. These patches add custom dentry hash and comparison operations for case-sensitive and/or automatically decomposing HFS+ filesystems. Unicode decomposition and case-folding are performed as required to ensure equivalent filenames are hashed to the same values and compare as equal. This patch: Refactor existing HFS+ ASCII to unicode string conversion routine to split out character conversion functionality. This will be reused by the custom dentry hash and comparison routines. This approach avoids unnecessary memory allocation compared to using the string conversion routine directly in the new functions. [akpm@linux-foundation.org: avoid use-of-uninitialised] Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:49 -07:00
Toshiyuki Okajima	29bc5b4f73	mistaken ext4_inode_bitmap for ext4_block_bitmap In ext4_new_blocks(), one of two ext4_block_bitmap() calls should be ext4_inode_bitmap() call. It is not harmful in normal processing, but it should be fixed. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:49 -07:00
vignesh babu	f482394ccb	is_power_of_2(): jbd Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2(). Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
vignesh babu	3fc74269c8	is_power_of_2: ext3/super.c Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2() Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Christoph Hellwig	681dcd9543	drop obsolete sys_ioctl export sys_ioctl() was only exported for our first version of compat ioctl handling. Now that the whole compat ioctl handling mess is more or less sorted out there are no more modular users left and we can kill it. There's one exception and that's sparc64's solaris compat module, but sparc64 has it's own export predating the generic one by years for that which this patch leaves untouched. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Eric W. Biederman	213dd266d4	namespace: ensure clone_flags are always stored in an unsigned long While working on unshare support for the network namespace I noticed we were putting clone flags in an int. Which is weird because the syscall uses unsigned long and we at least need an unsigned to properly hold all of the unshare flags. So to make the code consistent, this patch updates the code to use unsigned long instead of int for the clone flags in those places where we get it wrong today. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Dave Hansen	e3a68e30d2	ext3: remove extra IS_RDONLY() check ext3_change_inode_journal_flag() is only called from one location: ext3_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY() call in it so this one is superfluous. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Vasily Tarasov	b716395e2b	diskquota: 32bit quota tools on 64bit architectures OpenVZ Linux kernel team has discovered the problem with 32bit quota tools working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function was replaced by sys_quotactl() with the comment "sys_quotactl seems to be 32/64bit clean, enable it for 32bit" However this isn't right. Look at if_dqblk structure: struct if_dqblk { __u64 dqb_bhardlimit; __u64 dqb_bsoftlimit; __u64 dqb_curspace; __u64 dqb_ihardlimit; __u64 dqb_isoftlimit; __u64 dqb_curinodes; __u64 dqb_btime; __u64 dqb_itime; __u32 dqb_valid; }; For 32 bit quota tools sizeof(if_dqblk) == 0x44. But for 64 bit kernel its size is 0x48, 'cause of alignment! Thus we got a problem. Attached patch reintroduce sys32_quotactl() function, that handles this and related situations. [michal.k.k.piotrowski@gmail.com: build fix] [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n] Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Jan Kara	32c3773011	ext4: fix deadlock in ext4_remount() and orphan list handling ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a transaction started with ext4_mark_recovery_complete() waits for a transaction holding sb->s_lock, thus leading to a possible deadlock. At the moment we call ext4_mark_recovery_complete() from ext4_remount() we have done all the work needed for remounting and thus we are safe to drop sb->s_lock before we wait for transactions to commit. Note that at this moment we are still guarded by s_umount lock against other remounts/umounts. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Sandeen <sandeen@sandeen.net> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Jan Kara	030703e49d	ext3: fix deadlock in ext3_remount() and orphan list handling ext3_orphan_add() and ext3_orphan_del() functions lock sb->s_lock with a transaction started with ext3_mark_recovery_complete() waits for a transaction holding sb->s_lock, thus leading to a possible deadlock. At the moment we call ext3_mark_recovery_complete() from ext3_remount() we have done all the work needed for remounting and thus we are safe to drop sb->s_lock before we wait for transactions to commit. Note that at this moment we are still guarded by s_umount lock against other remounts/umounts. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Sandeen <sandeen@sandeen.net> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Cedric Le Goater	467e9f4b50	fix create_new_namespaces() return value dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong, create_new_namespaces() uses ERR_PTR() to catch an error. This means that the subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or copy_utsname(). Modify create_new_namespaces() to also use the errors returned by the copy_*_ns routines and not to systematically return ENOMEM. [oleg@tv-sign.ru: better changelog] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Andrew Morton	4d3b573ad9	binfmt_elf warning fix fs/binfmt_elf.c: In function 'load_elf_binary': fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't use the resulting value. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Andrew Morton	64d67d2177	revert "vanishing ioctl handler debugging" Revert my do_ioctl() debugging patch: Paul fixed the bug. Cc: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Lee Schermerhorn	b4c07bce79	hugetlbfs: handle empty options string I was seeing a null pointer deref in fs/super.c:vfs_kern_mount(). Some file system get_sb() handler was returning NULL mnt_sb with a non-negative return value. I also noticed a "hugetlbfs: Bad mount option:" message in the log. Turns out that hugetlbfs_parse_options() was not checking for an empty option string after call to strsep(). On failure, hugetlbfs_parse_options() returns 1. hugetlbfs_fill_super() just passed this return code back up the call stack where vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb. Apparently introduced by patch: hugetlbfs-use-lib-parser-fix-docs.patch The problem was exposed by this line in my fstab: none /huge hugetlbfs defaults 0 0 It can also be demonstrated by invoking mount of hugetlbfs directly with no options or a bogus option. This patch: 1) adds the check for empty option to hugetlbfs_parse_options(), 2) enhances the error message to bracket any unrecognized option with quotes , 3) modifies hugetlbfs_parse_options() to return -EINVAL on any unrecognized option, 4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb() handler that returns a NULL mnt->mnt_sb with a return value >= 0. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Randy Dunlap	e73a75fa7f	hugetlbfs: use lib/parser, fix docs Use lib/parser.c to parse hugetlbfs mount options. Correct docs in hugetlbpage.txt. old size of hugetlbfs_fill_super: 675 bytes new size of hugetlbfs_fill_super: 686 bytes (hugetlbfs_parse_options() is inlined) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Wyatt Banks	a5001a2780	HFSPlus: change kmalloc/memset to kzalloc Removed kmalloc and memset in favor of kzalloc. To explain the HFSPLUS_SB() macro in the removed memset call: hfsplus_fs.h:#define HFSPLUS_SB(super) ((struct hfsplus_sb_info )(super)->s_fs_info) Signed-off-by: Wyatt Banks <wyatt@banksresearch.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Maxim Uvarov	b663a79c19	taskstats: add context-switch counters Make available to the user the following task and process performance statistics: * Involuntary Context Switches (task_struct->nivcsw) * Voluntary Context Switches (task_struct->nvcsw) Statistics information is available from: 1. taskstats interface (Documentation/accounting/) 2. /proc/PID/status (task only). This data is useful for detecting hyperactivity patterns between processes. [akpm@linux-foundation.org: cleanup] Signed-off-by: Maxim Uvarov <muvarov@ru.mvista.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Jonathan Lim <jlim@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Vasily Averin	a6c15c2b0f	ext3/ext4: orphan list corruption due bad inode After ext3 orphan list check has been added into ext3_destroy_inode() (please see my previous patch) the following situation has been detected: EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file (37901290), 0 Inode 00000101a15b7840: orphan list check failed! 00000773 6f665f00 74616d72 00000573 65725f00 06737270 66000000 616d726f ... Call Trace: [<ffffffff80211ea9>] ext3_destroy_inode+0x79/0x90 [<ffffffff801a2b16>] sys_unlink+0x126/0x1a0 [<ffffffff80111479>] error_exit+0x0/0x81 [<ffffffff80110aba>] system_call+0x7e/0x83 First messages said that unlinked inode has i_nlink=0, then ext3_unlink() adds this inode into orphan list. Second message means that this inode has not been removed from orphan list. Inode dump has showed that i_fop = &bad_file_ops and it can be set in make_bad_inode() only. Then I've found that ext3_read_inode() can call make_bad_inode() without any error/warning messages, for example in the following case: ... if (inode->i_nlink == 0) { if (inode->i_mode == 0 \|\| !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) { /* this inode is deleted */ brelse (bh); goto bad_inode; ... Bad inode can live some time, ext3_unlink can add it to orphan list, but ext3_delete_inode() do not deleted this inode from orphan list. As result we can have orphan list corruption detected in ext3_destroy_inode(). However it is not clear for me how to fix this issue correctly. As far as i see is_bad_inode() is called after iget() in all places excluding ext3_lookup() and ext3_get_parent(). I believe it makes sense to add bad inode check to these functions too and call iput if bad inode detected. Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Vasily Averin	9f7dd93de0	ext3/ext4: orphan list check on destroy_inode Customers claims to ext3-related errors, investigation showed that ext3 orphan list has been corrupted and have the reference to non-ext3 inode. The following debug helps to understand the reasons of this issue. [akpm@linux-foundation.org: update for print_hex_dump() changes] Signed-off-by: Vasily Averin <vvs@sw.ru> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Alexey Dobriyan	aa0ac36518	Remove capability.h from mm.h I forgot to remove capability.h from mm.h while removing sched.h! This patch remedies that, because the only inline function which was using CAP_something was made out of line. Cross-compile tested without regressions on: all powerpc defconfigs all mips defconfigs all m68k defconfigs all arm defconfigs all ia64 defconfigs alpha alpha-allnoconfig alpha-defconfig alpha-up arm i386 i386-allnoconfig i386-defconfig i386-up ia64 ia64-allnoconfig ia64-defconfig ia64-up m68k mips parisc parisc-allnoconfig parisc-defconfig parisc-up powerpc powerpc-up s390 s390-allnoconfig s390-defconfig s390-up sparc sparc-allnoconfig sparc-defconfig sparc-up sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up um-x86_64 x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Alexey Dobriyan	cb510b8172	seq_file: more atomicity in traverse() Original problem: in some circumstances seq_file interface can present infinite proc file to the following script when normally said proc file is finite: while read line; do [do something with $line] done </proc/$FILE bash, to implement such loop does essentially read(0, buf, 128); [find \n] lseek(0, -difference, SEEK_CUR); Consider, proc file prints list of objects each of them consists of many lines, each line is shorter than 128 bytes. Two objects in list, with ->index'es being 0 and 1. Current one is 1, as bash prints second object line by line. Imagine first object being removed right before lseek(). traverse() will be called, because there is negative offset. traverse() will reset ->index to 0 (!). traverse() will call ->next() and get NULL in any usual iterate-over-list code using list_for_each_entry_continue() and such. There is one object in list now after all... traverse() will return 0, lseek() will update file position and pretend everything is OK. So, what we have now: ->f_pos points to place where second object will be printed, but ->index is 0. seq_read() instead of returning EOF, will start printing first line of first object every time it's called, until enough objects are added to ->f_pos return in bounds. Fix is to update ->index only after we're sure we saw enough objects down the road. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Ulrich Drepper	4a19542e5f	O_CLOEXEC for SCM_RIGHTS Part two in the O_CLOEXEC saga: adding support for file descriptors received through Unix domain sockets. The patch is once again pretty minimal, it introduces a new flag for recvmsg and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit is not used otherwise but the networking people will know better. This new flag is not recognized by recvfrom and recv. These functions cannot be used for that purpose and the asymmetry this introduces is not worse than the already existing MSG_CMSG_COMPAT situations. The patch must be applied on the patch which introduced O_CLOEXEC. It has to remove static from the new get_unused_fd_flags function but since scm.c cannot live in a module the function still hasn't to be exported. Here's a test program to make sure the code works. It's so much longer than the actual patch... #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <sys/un.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0x40000000 #endif int main (int argc, char argv[]) { if (argc > 1) { int fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 \|\| errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } struct sockaddr_un sun; strcpy (sun.sun_path, "./testsocket"); sun.sun_family = AF_UNIX; char databuf[] = "hello"; struct iovec iov[1]; iov[0].iov_base = databuf; iov[0].iov_len = sizeof (databuf); union { struct cmsghdr hdr; char bytes[CMSG_SPACE (sizeof (int))]; } buf; struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1, .msg_control = buf.bytes, .msg_controllen = sizeof (buf) }; struct cmsghdr cmsg = CMSG_FIRSTHDR (&msg); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN (sizeof (int)); msg.msg_controllen = cmsg->cmsg_len; pid_t child = fork (); if (child == -1) error (1, errno, "fork"); if (child == 0) { int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (bind (sock, (struct sockaddr ) &sun, sizeof (sun)) < 0) error (1, errno, "bind"); if (listen (sock, SOMAXCONN) < 0) error (1, errno, "listen"); int conn = accept (sock, NULL, NULL); if (conn == -1) error (1, errno, "accept"); (int ) CMSG_DATA (cmsg) = sock; if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0) error (1, errno, "sendmsg"); return 0; } / For a test suite this should be more robust like a barrier in shared memory. / sleep (1); int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (connect (sock, (struct sockaddr ) &sun, sizeof (sun)) < 0) error (1, errno, "connect"); unlink (sun.sun_path); (int ) CMSG_DATA (cmsg) = -1; if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0) error (1, errno, "recvmsg"); int fd = (int ) CMSG_DATA (cmsg); if (fd == -1) error (1, 0, "no descriptor received"); char fdname[20]; snprintf (fdname, sizeof (fdname), "%d", fd); execl ("/proc/self/exe", argv[0], fdname, NULL); puts ("execl failed"); return 1; } [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch] [akpm@linux-foundation.org: build fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Buesch <mb@bu3sch.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Ulrich Drepper	f23513e8d9	Introduce O_CLOEXEC The problem is as follows: in multi-threaded code (or more correctly: all code using clone() with CLONE_FILES) we have a race when exec'ing. thread #1 thread #2 fd=open() fork + exec fcntl(fd,F_SETFD,FD_CLOEXEC) In some applications this can happen frequently. Take a web browser. One thread opens a file and another thread starts, say, an external PDF viewer. The result can even be a security issue if that open file descriptor refers to a sensitive file and the external program can somehow be tricked into using that descriptor. Just adding O_CLOEXEC support to open() doesn't solve the whole set of problems. There are other ways to create file descriptors (socket, epoll_create, Unix domain socket transfer, etc). These can and should be addressed separately though. open() is such an easy case that it makes not much sense putting the fix off. The test program: #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif int main (int argc, char *argv[]) { int fd; if (argc > 1) { fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 \|\| errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } fd = open ("/proc/self/exe", O_RDONLY \| O_CLOEXEC); printf ("in parent: new fd = %d\n", fd); char buf[20]; snprintf (buf, sizeof (buf), "%d", fd); execl ("/proc/self/exe", argv[0], buf, NULL); puts ("execl failed"); return 1; } [kyle@parisc-linux.org: parisc fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Eric W. Biederman	4a2d44590a	buffer: kill old incorrect comment Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Matthias Kaehlcke	203a2935c7	fs/block_dev.c: use list_for_each_entry() fs/block_dev.c: Use list_for_each_entry() instead of list_for_each() in nr_blockdev_pages() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Alexey Dobriyan	00c5746da9	mutex_unlock() later in seq_lseek() All manipulations with struct seq_file::version are done under struct seq_file::lock except one introduced in commit d6b7a781c51c91dd054e5c437885205592faac21 aka "[PATCH] Speed up /proc/pid/maps" Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:44 -07:00
Jan Kara	a6739af8b9	ext2: fix a comment when ext2_release_file() is called Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:44 -07:00
Alexey Dobriyan	da58a16173	/proc/*/environ: wrong placing of ptrace_may_attach() check It's a bit dopey-looking and can permit a task to cause a pagefault in an mm which it doesn't have permission to read from. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:44 -07:00
young dave	f17e121fd0	remove useless tolower in isofs Remove useless tolower in isofs Signed-off-by: dave young <hidave.darkstar@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:43 -07:00
Randy Dunlap	03a9c30c23	AFS: drop explicit extern Don't use explicit extern specifier and quieten sparse warning: fs/afs/vnode.c:564:12: warning: function 'afs_vnode_link' with external linkage has definition Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:43 -07:00
David Howells	e8d6c55412	AFS: implement file locking Implement file locking for AFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:43 -07:00
Changli Gao	99fc06df72	procfs directory entry cleanup Function proc_register() will assign proc_dir_operations and proc_dir_inode_operations to ent's members proc_fops and proc_iops correctly if ent is a directory. So the early assignment isn't necessary. Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:43 -07:00
Micah Cowan	17973f5af7	Only send SIGXFSZ when exceeding rlimits. Some users have been having problems with utilities like cp or dd dumping core when they try to copy a file that's too large for the destination filesystem (typically, > 4gb). Apparently, some defunct standards required SIGXFSZ to be sent in such circumstances, but SUS only requires/allows it for when a written file exceeds the process's resource limits. I'd like to limit SIGXFSZs to the bare minimum required by SUS. Patch sent per http://lkml.org/lkml/2007/4/10/302 Signed-off-by: Micah Cowan <micahcowan@ubuntu.com> Acked-by: Alan Cox <alan@redhat.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:43 -07:00
Jan Kratochvil	60bfba7e85	PIE randomization This patch is using mmap()'s randomization functionality in such a way that it maps the main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). Origin of this patch is in exec-shield (http://people.redhat.com/mingo/exec-shield/) [jkosina@suse.cz: pie randomization: fix BAD_ADDR macro] Signed-off-by: Jan Kratochvil <honza@jikos.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Dave Jones	c3ed85a36f	isofs: fix up CodingStyle fs/isofs/* had a bunch of CodingStyle issues. * Indentation was a mix of spaces and tabs * "int * foo" instead of "int foo" "while ( foo )" instead of "while (foo)" * if (foo) blah; on one line instead of two * Missing printk KERN_ levels * lots of trailing whitespace * lines >80 columns changed to wrap. * Unnecessary prototype removed by shuffling code order in C file. Should be no functional changes other than slight size increase due to printk changes. Further improvement possible, but this is a start.. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Denver Gingerich	d05e96fe46	fix compiler warnings in acorn.c warning: 'adfs_partition' defined but not used warning: 'riscix_partition' defined but not used warning: 'linux_partition' defined but not used Signed-off-by: Denver Gingerich <denver@ossguy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
OGAWA Hirofumi	9aacd59934	fat: gcc 4.3 warning fix This patch fixes the following warnings. fs/fat/dir.c: In function 'fat_parse_long': include/linux/msdos_fs.h:294: warning: array subscript is above array bounds include/linux/msdos_fs.h:295: warning: array subscript is above array bounds include/linux/msdos_fs.h:295: warning: array subscript is above array bounds The ->name is defined as "name[8], ext[3]", but fat_checksum() uses those as name[11]. There is no actual problem, but it's not a good manner. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Pavel Emelianov	259902ea95	Make NFS client use seq_list_xxx helpers This includes /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes entries. Both need to show the header and use the list_head. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Pavel Emelianov	b0765fb857	Make /proc/self/mounts(tats) use seq_list_xxx helpers One more simple and stupid switching to the new API. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Pavel Emelianov	25216b0039	Make /proc/tty/drivers use seq_list_xxx helpers Simple and stupid like some previous ones. Just use new API. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Pavel Emelianov	a6a8bd6d28	Make AFS use seq_list_xxx helpers These proc files show some header before dumping the list, so the seq_list_start_head() is used. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Andrew Morton	85420ccad1	vxfs warning fixes gcc-4.3: fs/freevxfs/vxfs_lookup.c: In function 'vxfs_find_entry': fs/freevxfs/vxfs_lookup.c:139: warning: cast from pointer to integer of different size fs/freevxfs/vxfs_lookup.c: In function 'vxfs_readdir': fs/freevxfs/vxfs_lookup.c:294: warning: cast from pointer to integer of different size Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Cyrill Gorcunov	647bd61a5f	UDF: check for allocated memory for inode data This patch adds checking for granted memory while filling up inode data to prevent possible NULL pointer usage. If there is not enough memory to fill inode data we just mark it as "bad". Also some whitespace cleanup. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Cyrill Gorcunov	c9c64155f5	UDF: check for allocated memory for data of new inodes Add checking for granted memory for inode data at the moment of its creation. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Tomas Janousek	d62141414a	Use boot based time for uptime in /proc Commit `411187fb05` caused uptime not to increase during suspend. This may cause confusion so I restore the old behaviour by using the boot based time instead of monotonic for uptime. Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Tomas Janousek	924b42d5a2	Use boot based time for process start time and boot time in /proc Commit `411187fb05` caused boot time to move and process start times to become invalid after suspend. Using boot based time for those restores the old behaviour and fixes the issue. [akpm@linux-foundation.org: little cleanup] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Alexey Dobriyan	786d7e1612	Fix rmmod/read/write races in /proc entries Fix following races: =========================================== 1. Write via ->write_proc sleeps in copy_from_user(). Module disappears meanwhile. Or, more generically, system call done on /proc file, method supplied by module is called, module dissapeares meanwhile. pde = create_proc_entry() if (!pde) return -ENOMEM; pde->write_proc = ... open write copy_from_user pde = create_proc_entry(); if (!pde) { remove_proc_entry(); return -ENOMEM; /* module unloaded / } boom* ========================================== 2. bogo-revoke aka proc_kill_inodes() remove_proc_entry vfs_read proc_kill_inodes [check ->f_op validness] [check ->f_op->read validness] [verify_area, security permissions checks] ->f_op = NULL; if (file->f_op->read) /* ->f_op dereference, boom */ NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's see how this scheme behaves, then extend if needed for directories. Directories creators in /proc only set ->owner for them, so proxying for directories may be unneeded. NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write, ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release. If your in-tree module uses something else, yell on me. Full audit pending. [akpm@linux-foundation.org: build fix] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:39 -07:00
Andrew Morton	fc9a07e7bf	invalidate_mapping_pages(): add cond_resched invalidate_mapping_pages() can sometimes take a long time (millions of pages to free). Long enough for the softlockup detector to trigger. We used to have a cond_resched() in there but I took it out because the drop_caches code calls invalidate_mapping_pages() under inode_lock. The patch adds a nasty flag and puts the cond_resched() back. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:36 -07:00
Jan Kara	f89b779508	jbd2 commit: fix transaction dropping We have to check that also the second checkpoint list is non-empty before dropping the transaction. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:34 -07:00
Jan Kara	fe28e42b99	jbd commit: fix transaction dropping We have to check that also the second checkpoint list is non-empty before dropping the transaction. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:34 -07:00
Nate	8803863a90	[CIFS] use simple_prepare_write to zero page data It's common for file systems to need to zero data on either side of a write, if a page is not Uptodate during prepare_write. It just so happens that simple_prepare_write() in libfs.c does exactly that, so we can avoid duplication and just call that function to zero page data. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-16 15:45:13 +00:00
Jens Axboe	bcd4f3acba	splice: direct splicing updates ppos twice OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> reported that he's noticed nfsd read corruption in recent kernels, and did the hard work of discovering that it's due to splice updating the file position twice. This means that the next operation would start further ahead than it should. nfsd_vfs_read() splice_direct_to_actor() while(len) { do_splice_to() [update sd->pos] -> generic_file_splice_read() [read from sd->pos] nfsd_direct_splice_actor() -> __splice_from_pipe() [update sd->pos] There's nothing wrong with the core splice code, but the direct splicing is an addon that calls both input and output paths. So it has to take care in locally caching offset so it remains correct. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 15:02:48 +02:00
Avi Kivity	d6d2816849	KVM: Remove kvmfs in favor of the anonymous inodes source kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the new anonymous inodes source does much better. Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:49 +03:00
Ingo Molnar	8ea0260668	[PATCH] sched: fix up fs/proc/array.c whitespace problems while changing task_stime() i noticed a whitespace style problem in array.c - fix it. While at it, fix all the other style problems too, most of them in the scheduler-stats related portions of array.c. There is no change in functionality: text data bss dec hex filename 4356 28 0 4384 1120 array.o-before 4356 28 0 4384 1120 array.o-after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-16 09:46:31 +02:00
Ingo Molnar	5926c50b83	[PATCH] sched: remove dead code from task_stime() Alexey Dobriyan noticed that task_stime() contains a piece of dead code. (which is a remnant of earlier versions of this code) Remove that code. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-16 09:46:30 +02:00
Linus Torvalds	d2a9a8ded4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix a race condition bug in umount which caused a segfault 9p: re-enable mount time debug option 9p: cache meta-data when cache=loose net/9p: set error to EREMOTEIO if trans->write returns zero net/9p: change net/9p module name to 9pnet 9p: Reorganization of 9p file system code	2007-07-15 16:44:53 -07:00
Linus Torvalds	2d896c780d	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits) [XFS] Fix lockdep annotations for xfs_lock_inodes [LIB]: export radix_tree_preload() [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode [XFS] Compat ioctl handler for handle operations [XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1. [XFS] Clean up function name handling in tracing code [XFS] Quota inode has no parent. [XFS] Concurrent Multi-File Data Streams [XFS] Use uninitialized_var macro to stop warning about rtx [XFS] XFS should not be looking at filp reference counts [XFS] Use is_power_of_2 instead of open coding checks [XFS] Reduce shouting by removing unnecessary macros from dir2 code. [XFS] Simplify XFS min/max macros. [XFS] Kill off xfs_count_bits [XFS] Cancel transactions on xfs_itruncate_start error. [XFS] Use do_div() on 64 bit types. [XFS] Fix remount,readonly path to flush everything correctly. [XFS] Cleanup inode extent size hint extraction [XFS] Prevent ENOSPC from aborting transactions that need to succeed [XFS] Prevent deadlock when flushing inodes on unmount ...	2007-07-15 16:43:43 -07:00
Al Viro	7c9e3c2e3b	wrong order of arguments of ->readdir() Shows how many people are testing coda - the bug had been there for 5 years and results of stepping on it are not subtle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-15 16:40:51 -07:00
Steve French	4a379e6657	[CIFS] Fix build break - inet.h not included when experimental ifdef off Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-15 21:58:28 +00:00
Steve French	2d785a50a8	[CIFS] Add support for new POSIX unlink In the cleanup phase of the dbench test, we were noticing sharing violation followed by failed directory removals when dbench did not close the test files before the cleanup phase started. Using the new POSIX unlink, which Samba has supported for a few months, avoids this. Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-15 01:48:57 +00:00
Eric Van Hensbergen	9e2f6688c0	9p: re-enable mount time debug option During reorganization, the mount time debug option was removed in favor of module-load-time parameters. However, the mount time option is still a useful for feature during debug and for user-fault isolation when the module is compiled into the kernel. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2007-07-14 15:14:14 -05:00
Eric Van Hensbergen	9523a841b1	9p: cache meta-data when cache=loose This patch expands the impact of the loose cache mode to allow for cached metadata increasing the performance of directory listings and other metadata read operations. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2007-07-14 15:14:08 -05:00
Latchesar Ionkov	bd238fb431	9p: Reorganization of 9p file system code This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p. It moves the transport, packet marshalling and connection layers to net/9p leaving only the VFS related files in fs/9p. This work is being done in preparation for in-kernel 9p servers as well as alternate 9p clients (other than VFS). Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2007-07-14 15:13:40 -05:00
David Chinner	0f1145cc18	[XFS] Fix lockdep annotations for xfs_lock_inodes SGI-PV: 967035 SGI-Modid: xfs-linux-melb:xfs-kern:29026a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 18:09:42 +10:00
Michal Marek	faa63e9584	[XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode * 32bit struct xfs_fsop_bulkreq has different size and layout of members, no matter the alignment. Move the code out of the #else branch (why was it there in the first place?). Define _32 variants of the ioctl constants. * 32bit struct xfs_bstat is different because of time_t and on i386 because of different padding. Make xfs_bulkstat_one() accept a custom "output formatter" in the private_data argument which takes care of the xfs_bulkstat_one_compat() that takes care of the different layout in the compat case. * i386 struct xfs_inogrp has different padding. Add a similar "output formatter" mecanism to xfs_inumbers(). SGI-PV: 967354 SGI-Modid: xfs-linux-melb:xfs-kern:29102a Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:42:50 +10:00
Michal Marek	1fa503df66	[XFS] Compat ioctl handler for handle operations 32bit struct xfs_fsop_handlereq has different size and offsets (due to pointers). TODO: case XFS_IOC_{FSSETDM,ATTRLIST,ATTRMULTI}_BY_HANDLE still not handled. SGI-PV: 967354 SGI-Modid: xfs-linux-melb:xfs-kern:29101a Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:41:49 +10:00
Michal Marek	547e00c3c6	[XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1. i386 struct xfs_fsop_geom_v1 has no padding after the last member, so the size is different. SGI-PV: 967354 SGI-Modid: xfs-linux-melb:xfs-kern:29100a Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:41:39 +10:00
Eric Sandeen	3a59c94c4b	[XFS] Clean up function name handling in tracing code Remove the hardcoded "fnames" for tracing, and just embed them in tracing macros via __FUNCTION__. Kills a lot of #ifdefs too. SGI-PV: 967353 SGI-Modid: xfs-linux-melb:xfs-kern:29099a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:41:24 +10:00
David Chinner	b11f94d537	[XFS] Quota inode has no parent. Avoid using a special "zero inode" as the parent of the quota inode as this can confuse the filestreams code into thinking the quota inode has a parent. We do not want the quota inode to follow filestreams allocation rules, so pass a NULL as the parent inode and detect this condition when doing stream associations. SGI-PV: 964469 SGI-Modid: xfs-linux-melb:xfs-kern:29098a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:41:12 +10:00
David Chinner	2a82b8be8a	[XFS] Concurrent Multi-File Data Streams In media spaces, video is often stored in a frame-per-file format. When dealing with uncompressed realtime HD video streams in this format, it is crucial that files do not get fragmented and that multiple files a placed contiguously on disk. When multiple streams are being ingested and played out at the same time, it is critical that the filesystem does not cross the streams and interleave them together as this creates seek and readahead cache miss latency and prevents both ingest and playout from meeting frame rate targets. This patch set creates a "stream of files" concept into the allocator to place all the data from a single stream contiguously on disk so that RAID array readahead can be used effectively. Each additional stream gets placed in different allocation groups within the filesystem, thereby ensuring that we don't cross any streams. When an AG fills up, we select a new AG for the stream that is not in use. The core of the functionality is the stream tracking - each inode that we create in a directory needs to be associated with the directories' stream. Hence every time we create a file, we look up the directories' stream object and associate the new file with that object. Once we have a stream object for a file, we use the AG that the stream object point to for allocations. If we can't allocate in that AG (e.g. it is full) we move the entire stream to another AG. Other inodes in the same stream are moved to the new AG on their next allocation (i.e. lazy update). Stream objects are kept in a cache and hold a reference on the inode. Hence the inode cannot be reclaimed while there is an outstanding stream reference. This means that on unlink we need to remove the stream association and we also need to flush all the associations on certain events that want to reclaim all unreferenced inodes (e.g. filesystem freeze). SGI-PV: 964469 SGI-Modid: xfs-linux-melb:xfs-kern:29096a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Vlad Apostolov <vapo@sgi.com>	2007-07-14 15:40:53 +10:00
Andrew Morton	0892ccd6fe	[XFS] Use uninitialized_var macro to stop warning about rtx Appease gcc in regards to "warning: 'rtx' is used uninitialized in this function". SGI-PV: 907752 SGI-Modid: xfs-linux-melb:xfs-kern:29007a Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:40:02 +10:00
Christoph Hellwig	fbf3ce8d8e	[XFS] XFS should not be looking at filp reference counts A check for file_count is always a bad idea. Linux has the ->release method to deal with cleanups on last close and ->flush is only for the very rare case where we want to perform an operation on every drop of a reference to a file struct. This patch gets rid of vop_close and surrounding code in favour of simply doing the page flushing from ->release. SGI-PV: 966562 SGI-Modid: xfs-linux-melb:xfs-kern:28952a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:37:37 +10:00
Vignesh Babu	16a087d8e1	[XFS] Use is_power_of_2 instead of open coding checks SGI-PV: 966576 SGI-Modid: xfs-linux-melb:xfs-kern:28950a Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:37:12 +10:00
Christoph Hellwig	bbaaf53808	[XFS] Reduce shouting by removing unnecessary macros from dir2 code. SGI-PV: 966505 SGI-Modid: xfs-linux-melb:xfs-kern:28947a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:37:02 +10:00
David Chinner	54aa8e26e9	[XFS] Simplify XFS min/max macros. SGI-PV: 964547 SGI-Modid: xfs-linux-melb:xfs-kern:28945a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nscott@aconex.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:36:53 +10:00
Eric Sandeen	24ad33ff71	[XFS] Kill off xfs_count_bits xfs_count_bits is only called once, and is then compared to 0. IOW, what it really wants to know is, is the bitmap empty. This can be done more simply, certainly. SGI-PV: 966503 SGI-Modid: xfs-linux-melb:xfs-kern:28944a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:36:43 +10:00
Jesper Juhl	87ae3c2411	[XFS] Cancel transactions on xfs_itruncate_start error. SGI-PV: 966502 SGI-Modid: xfs-linux-melb:xfs-kern:28943a Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:36:17 +10:00
Christoph Hellwig	39726be2a2	[XFS] Use do_div() on 64 bit types. SGI-PV: 966145 SGI-Modid: xfs-linux-melb:xfs-kern:28889a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:36:08 +10:00
David Chinner	516b2e7c26	[XFS] Fix remount,readonly path to flush everything correctly. The remount readonly path can fail to writeback properly because we still have active transactions after calling xfs_quiesce_fs(). Further investigation shows that this path is broken in the same ways that the xfs freeze path was broken so fix it the same way. SGI-PV: 964464 SGI-Modid: xfs-linux-melb:xfs-kern:28869a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:35:58 +10:00
David Chinner	957d0ebed0	[XFS] Cleanup inode extent size hint extraction SGI-PV: 966004 SGI-Modid: xfs-linux-melb:xfs-kern:28866a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:35:36 +10:00
David Chinner	84e1e99f11	[XFS] Prevent ENOSPC from aborting transactions that need to succeed During delayed allocation extent conversion or unwritten extent conversion, we need to reserve some blocks for transactions reservations. We need to reserve these blocks in case a btree split occurs and we need to allocate some blocks. Unfortunately, we've only ever reserved the number of data blocks we are allocating, so in both the unwritten and delalloc case we can get ENOSPC to the transaction reservation. This is bad because in both cases we cannot report the failure to the writing application. The fix is two-fold: 1 - leverage the reserved block infrastructure XFS already has to reserve a small pool of blocks by default to allow specially marked transactions to dip into when we are at ENOSPC. Default setting is min(5%, 1024 blocks). 2 - convert critical transaction reservations to be allowed to dip into this pool. Spots changed are delalloc conversion, unwritten extent conversion and growing a filesystem at ENOSPC. This also allows growing the filesytsem to succeed at ENOSPC. SGI-PV: 964468 SGI-Modid: xfs-linux-melb:xfs-kern:28865a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:35:19 +10:00
David Chinner	641c56fbfe	[XFS] Prevent deadlock when flushing inodes on unmount When we are unmounting the filesystem, we flush all the inodes to disk. Unfortunately, if we have an inode cluster that has just been freed and marked stale sitting in an incore log buffer (i.e. hasn't been flushed to disk), it will be holding all the flush locks on the inodes in that cluster. xfs_iflush_all() which is called during unmount walks all the inodes trying to reclaim them, and it doing so calls xfs_finish_reclaim() on each inode. If the inode is dirty, if grabs the flush lock and flushes it. Unfortunately, find dirty inodes that already have their flush lock held and so we sleep. At this point in the unmount process, we are running single-threaded. There is nothing more that can push on the log to force the transaction holding the inode flush locks to disk and hence we deadlock. The fix is to issue a log force before flushing the inodes on unmount so that all the flush locks will be released before we start flushing the inodes. SGI-PV: 964538 SGI-Modid: xfs-linux-melb:xfs-kern:28862a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:33:38 +10:00
Tim Shimmin	0164af51ce	[XFS] Log the agf_length change in xfs_growfs_data_private(). SGI-PV: 963528 SGI-Modid: xfs-linux-melb:xfs-kern:28856a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2007-07-14 15:32:59 +10:00
David Chinner	effd120edb	[XFS] Map unwritten extents correctly for I/o completion processing If we have multiple unwritten extents within a single page, we fail to tell the I/o completion construction handlers we need a new handle for the second and subsequent blocks in the page. While we still issue the I/O correctly, we do not have the correct ranges recorded in the ioend structures and hence when we go to convert the unwritten extents we screw it up. Make sure we start a new ioend every time the mapping changes so that we convert the correct ranges on I/O completion. SGI-PV: 964647 SGI-Modid: xfs-linux-melb:xfs-kern:28797a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:32:49 +10:00
David Chinner	45c3414112	[XFS] Apply transaction delta counts atomically to incore counters With the per-cpu superblock counters, batch updates are no longer atomic across the entire batch of changes. This is not an issue if each individual change in the batch is applied atomically. Unfortunately, free block count changes are not applied atomically, and they are applied in a manner guaranteed to cause problems. Essentially, the free block count reservation that the transaction took initially is returned to the in core counters before a second delta takes away what is used. because these two operations are not atomic, we can race with another thread that can use the returned transaction reservation before the transaction takes the space away again and we can then get ENOSPC being reported in a spot where we don't have an ENOSPC condition, nor should we ever see one there. Fix it up by rolling the two deltas into the one so it can be applied safely (i.e. atomically) to the incore counters. SGI-PV: 964465 SGI-Modid: xfs-linux-melb:xfs-kern:28796a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:32:09 +10:00
David Chinner	b2826136a1	[XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize(). SGI-PV: 965636 SGI-Modid: xfs-linux-melb:xfs-kern:28777a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:31:03 +10:00
David Chinner	e927af90aa	[XFS] Block on unwritten extent conversion during synchronous direct I/O. Currently we do not wait on extent conversion to occur, and hence we can return to userspace from a synchronous direct I/O write without having completed all the actions in the write. Hence a read after the write may see zeroes (unwritten extent) rather than the data that was written. Block the I/O completion by triggering a synchronous workqueue flush to ensure that the conversion has occurred before we return to userspace. SGI-PV: 964092 SGI-Modid: xfs-linux-melb:xfs-kern:28775a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:30:52 +10:00
David Chinner	f4a9f28a90	[XFS] Flush the block device before closing it on unmount. SGI-PV: 965630 SGI-Modid: xfs-linux-melb:xfs-kern:28774a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:30:05 +10:00
David Chinner	4e5ae8386b	[XFS] xfs_bmapi fails to update the previous extent pointer When processing multiple extent maps, xfs_bmapi needs to keep track of the extent behind the one it is currently working on to be able to trim extent ranges correctly. Failing to update the previous pointer can result in corrupted extent lists in memory and this will result in panics or assert failures. Update the previous pointer correctly when we move to the next extent to process. SGI-PV: 965631 SGI-Modid: xfs-linux-melb:xfs-kern:28773a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:29:37 +10:00
David Chinner	210c6f1caa	[XFS] Fix the transaction flags to make lazy superblock counters work. SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28653a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:29:02 +10:00
David Chinner	92821e2ba4	[XFS] Lazy Superblock Counters When we have a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify free block counts. When these counts are modified in a transaction, they must eventually lock the superblock buffer and apply the mods. The buffer then remains locked until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. The result of contention on the incore superblock buffer is that transaction rates fall - the more pressure that is put on the superblock buffer, the slower things go. The key to removing the contention is to not require the superblock fields in question to be locked. We do that by not marking the superblock dirty in the transaction. IOWs, we modify the incore superblock but do not modify the cached superblock buffer. In short, we do not log superblock modifications to critical fields in the superblock on every transaction. In fact we only do it just before we write the superblock to disk every sync period or just before unmount. This creates an interesting problem - if we don't log or write out the fields in every transaction, then how do the values get recovered after a crash? the answer is simple - we keep enough duplicate, logged information in other structures that we can reconstruct the correct count after log recovery has been performed. It is the AGF and AGI structures that contain the duplicate information; after recovery, we walk every AGI and AGF and sum their individual counters to get the correct value, and we do a transaction into the log to correct them. An optimisation of this is that if we have a clean unmount record, we know the value in the superblock is correct, so we can avoid the summation walk under normal conditions and so mount/recovery times do not change under normal operation. One wrinkle that was discovered during development was that the blocks used in the freespace btrees are never accounted for in the AGF counters. This was once a valid optimisation to make; when the filesystem is full, the free space btrees are empty and consume no space. Hence when it matters, the "accounting" is correct. But that means the when we do the AGF summations, we would not have a correct count and xfs_check would complain. Hence a new counter was added to track the number of blocks used by the free space btrees. This is an on-disk format change. As a result of this, lazy superblock counters are a mkfs option and at the moment on linux there is no way to convert an old filesystem. This is possible - xfs_db can be used to twiddle the right bits and then xfs_repair will do the format conversion for you. Similarly, you can convert backwards as well. At some point we'll add functionality to xfs_admin to do the bit twiddling easily.... SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28652a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:28:50 +10:00
Andrew Morton	3260f78ad6	[XFS] Use generic shrinker interfaces in XFS. SGI-PV: 964986 SGI-Modid: xfs-linux-melb:xfs-kern:28642a Signed-Off-By: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:23:53 +10:00
David Chinner	92dfe8d266	[XFS] Make hole punching at EOF atomic. If hole punching at EOF is done as two steps (i.e. truncate then extend) the file is in a transient state between the two steps where an application can see the incorrect file size. Punching a hole to EOF needs to be treated in teh same way as all other hole punching cases so that the file size is never seen to change. SGI-PV: 962012 SGI-Modid: xfs-linux-melb:xfs-kern:28641a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:23:40 +10:00
David Chinner	511105b3d7	[XFS] Fix vmalloc leak on mount/unmount. When setting the length of the iclogbuf to write out we should just be changing the desired byte count rather completely reassociating the buffer memory with the buffer. Reassociating the buffer memory changes the apparent length of the buffer and hence when we free the buffer, we don't free all the vmap()d space we originally allocated. SGI-PV: 964983 SGI-Modid: xfs-linux-melb:xfs-kern:28640a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:23:23 +10:00
Christoph Hellwig	ca165b8892	[XFS] Fix double free in xfs_buf_get_noaddr error handling path SGI-PV: 964983 SGI-Modid: xfs-linux-melb:xfs-kern:28639a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:22:50 +10:00
David Chinner	3db296f341	[XFS] Fix use-after-free during log unmount. Don't reference the log buffer after running the callbacks as the callback can trigger the log buffers to be freed during unmount. SGI-PV: 964545 SGI-Modid: xfs-linux-melb:xfs-kern:28567a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:22:34 +10:00
David Chinner	40095b64f5	[XFS] Sleeping with the ilock waiting for I/O completion is Bad. Recent fixes to the filesystem freezing code introduced a vn_iowait call in the middle of the sync code. Unfortunately, at the point where this call was added we are holding the ilock. The ilock is needed by I/O completion for unwritten extent conversion and now updating the file size. Hence I/o cannot complete if we hold the ilock while waiting for I/O completion. Fix up the bug and clean the code up around it. SGI-PV: 963674 SGI-Modid: xfs-linux-melb:xfs-kern:28566a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:22:18 +10:00
Nathan Scott	4cc929ee30	[XFS] Don't grow filesystems past the size they can index. When growing a filesystem we don't check to see if the new size overflows the page cache index range, so we can do silly things like grow a filesystem page 16TB on a 32bit. Check new filesystem sizes against the limits the kernel can support. SGI-PV: 957886 SGI-Modid: xfs-linux-melb:xfs-kern:28563a Signed-Off-By: Nathan Scott <nscott@aconex.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:21:29 +10:00
Christoph Hellwig	1fa40b01ae	[XFS] Only use refcounted pages for I/O Many block drivers (aoe, iscsi) really want refcountable pages in bios, which is what almost everyone send down. XFS unfortunately has a few places where it sends down buffers that may come from kmalloc, which breaks them. Fix the places that use kmalloc()d buffers. SGI-PV: 964546 SGI-Modid: xfs-linux-melb:xfs-kern:28562a Signed-Off-By: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:21:14 +10:00
Linus Torvalds	16cefa8c38	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: (122 commits) sunrpc: drop BKL around wrap and unwrap NFSv4: Make sure unlock is really an unlock when cancelling a lock NLM: fix source address of callback to client SUNRPC client: add interface for binding to a local address SUNRPC server: record the destination address of a request SUNRPC: cleanup transport creation argument passing NFSv4: Make the NFS state model work with the nosharedcache mount option NFS: Error when mounting the same filesystem with different options NFS: Add the mount option "nosharecache" NFS: Add support for mounting NFSv4 file systems with string options NFS: Add final pieces to support in-kernel mount option parsing NFS: Introduce generic mount client API NFS: Add enums and match tables for mount option parsing NFS: Improve debugging output in NFS in-kernel mount client NFS: Clean up in-kernel NFS mount NFS: Remake nfsroot_mount as a permanent part of NFS client SUNRPC: Add a convenient default for the hostname when calling rpc_create() SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name SUNRPC: Rename rpcb_getport_external routine SUNRPC: Allow rpcbind requests to be interrupted by a signal. ...	2007-07-13 16:46:18 -07:00
Jens Axboe	4fbef206da	nfsd: fix nfsd_vfs_read() splice actor setup When nfsd was transitioned to use splice instead of sendfile() for data transfers, a line setting the page index was lost. Restore it, so that nfsd is functional when that path is used. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-13 16:45:43 -07:00
Jens Axboe	51a92c0f6c	splice: fix offset mangling with direct splicing (sendfile) If the output actor doesn't transfer the full amount of data, we will increment ppos too much. Two related bugs in there: - We need to break out and return actor() retval if it is shorted than what we spliced into the pipe. - Adjust ppos only according to actor() return. Also fix loop problem in generic_file_splice_read(), it should not keep going when data has already been transferred. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-13 14:14:31 +02:00
James Morris	29ce20586b	security: revalidate rw permissions for sys_splice and sys_vmsplice Revalidate read/write permissions for splice(2) and vmslice(2), in case security policy has changed since the files were opened. Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-13 14:14:29 +02:00
Steve French	50c2f75388	[CIFS] whitespace/formatting fixes This should be the last big batch of whitespace/formatting fixes. checkpatch warnings for the cifs directory are down about 90% and many of the remaining ones are harder to remove or make the code harder to read. Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-13 00:33:32 +00:00
Zhang Rui	91a6902958	sysfs: add parameter "struct bin_attribute " in .read/.write methods for sysfs binary attributes Well, first of all, I don't want to change so many files either. What I do: Adding a new parameter "struct bin_attribute " in the .read/.write methods for the sysfs binary attributes. In fact, only the four lines change in fs/sysfs/bin.c and include/linux/sysfs.h do the real work. But I have to update all the files that use binary attributes to make them compatible with the new .read and .write methods. I'm not sure if I missed any. :( Why I do this: For a sysfs attribute, we can get a pointer pointing to the struct attribute in the .show/.store method, while we can't do this for the binary attributes. I don't know why this is different, but this does make it not so handy to use the binary attributes as the regular ones. So I think this patch is reasonable. :) Who benefits from it: The patch that exposes ACPI tables in sysfs requires such an improvement. All the table binary attributes share the same .read method. Parameter "struct bin_attribute *" is used to get the table signature and instance number which are used to distinguish different ACPI table binary attributes. Without this parameter, we need to offer different .read methods for different ACPI table binary attributes. This is impossible as there are various ACPI tables on different platforms, and we don't know what they are until they are loaded. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:09 -07:00
Tejun Heo	51225039f3	sysfs: make directory dentries and inodes reclaimable This patch makes dentries and inodes for sysfs directories reclaimable. * sysfs_notify() is modified to walk sysfs_dirent tree instead of dentry tree. * sysfs_update_file() and sysfs_chmod_file() use sysfs_get_dentry() to grab the victim dentry. * sysfs_rename_dir() and sysfs_move_dir() grab all dentries using sysfs_get_dentry() on startup. * Dentries for all shadowed directories are pinned in memory to serve as lookup start point. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:09 -07:00
Tejun Heo	53e0ae9269	sysfs: implement sysfs_get_dentry() Some sysfs operations require dentry and inode. sysfs_get_dentry() looks up and gets dentry for the specified sysfs_dirent. It finds the first ancestor with dentry attached and starts looking up dentries from there. Looking up from the nearest ancestor is necessary to support shadowed directories because we can't reliably lookup dentry for one of the shadows. Dentries for each shadow will be pinned in memory such that they can serve as the starting point for dentry lookup. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:09 -07:00
Tejun Heo	a0edd7c848	sysfs: move sysfs_drop_dentry() to dir.c and make it static After add/remove path restructuring, the only user of sysfs_drop_dentry() is sysfs_addrm_finish(). Move sysfs_drop_dentry() to dir.c and make it static. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:09 -07:00
Tejun Heo	fb6896da37	sysfs: restructure add/remove paths and fix inode update The original add/remove code had the following problems. * parent's timestamps are updated on dentry instantiation. this is incorrect with reclaimable files. * updating parent's timestamps isn't synchronized. * parent nlink update assumes the inode is accessible which won't be true once directory dentries are made reclaimable. This patch restructures add/remove paths to resolve the above problems. Add/removal are done in the following steps. 1. sysfs_addrm_start() : acquire locks including sysfs_mutex and other resources. 2-a. sysfs_add_one() : add new sd. linking the new sd into the children list is caller's responsibility. 2-b. sysfs_remove_one() : remove a sd. unlinking the sd from the children list is caller's responsibility. 3. sysfs_addrm_finish() : release all resources and clean up. Steps 2-a and/or 2-b can be repeated multiple times. Parent's inode is looked up during sysfs_addrm_start(). If available (always at the moment), it's pinned and nlink is updated as sd's are added and removed. Timestamps are updated during finish if any sd has been added or removed. If parent's inode is not available during start, sysfs_mutex ensures that parent inode is not created till add/remove is complete. All the complexity is contained inside the helper functions. Especially, dentry/inode handling is properly hidden from the rest of sysfs which now mostly operate on sysfs_dirents. As an added bonus, codes which use these helpers to add and remove sysfs_dirents are now more structured and simpler. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:09 -07:00
Tejun Heo	3007e997de	sysfs: use sysfs_mutex to protect the sysfs_dirent tree As kobj sysfs dentries and inodes are gonna be made reclaimable, i_mutex can't be used to protect sysfs_dirent tree. Use sysfs_mutex globally instead. As the whole tree is protected with sysfs_mutex, there is no reason to keep sysfs_rename_sem. Drop it. While at it, add docbook comments to functions which require sysfs_mutex locking. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Tejun Heo	5f9953237f	sysfs: consolidate sysfs spinlocks Replace sysfs_lock and kobj_sysfs_assoc_lock with sysfs_assoc_lock. sysfs_lock was originally to be used to protect sysfs_dirent tree but mutex seems better choice, so there is no reason to keep sysfs_lock separate. Merge the two spinlocks into one. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Tejun Heo	608e266a2d	sysfs: make kobj point to sysfs_dirent instead of dentry As kobj sysfs dentries and inodes are gonna be made reclaimable, dentry can't be used as naming token for sysfs file/directory, replace kobj->dentry with kobj->sd. The only external interface change is shadow directory handling. All other changes are contained in kobj and sysfs. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Tejun Heo	f0b0af4792	sysfs: implement sysfs_find_dirent() and sysfs_get_dirent() Implement sysfs_find_dirent() and sysfs_get_dirent(). sysfs_dirent_exist() is replaced by sysfs_find_dirent(). These will be used to make directory entries reclamiable. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Tejun Heo	380e6fbb72	sysfs: implement SYSFS_FLAG_REMOVED flag Implement SYSFS_FLAG_REMOVED flag which currently is used only to improve sanity check in sysfs_deactivate(). The flag will be used to make directory entries reclamiable. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Tejun Heo	b402d72cf7	sysfs: rename sysfs_dirent->s_type to s_flags and make room for flags Rename sysfs_dirent->s_type to s_flags, pack type into lower eight bits and reserve the rest for flags. sysfs_type() can used to access the type. All existing sd->s_type accesses are converted to use sysfs_type(). While at it, type test is changed to equality test instead of bit-and test where appropriate. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Tejun Heo	d0bcb5689a	sysfs: make sysfs_drop_dentry() access inodes using ilookup() sysfs_drop_dentry() used to go through sd->s_dentry and sd->s_parent->s_dentry to access the inodes. This is incorrect because inode can be cached without dentry. This patch makes sysfs_drop_dentry() access inodes using ilookup() on sd->s_ino. This is both correct and simpler. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:08 -07:00
Rafael J. Wysocki	9d9307dabb	sysfs: Fix oops in sysfs_drop_dentry on x86_64 Fix oops on x86_64 caused by the dereference of dir in sysfs_drop_dentry() made before checking if dir is not NULL (cf. http://marc.info/?l=linux-kernel&m=118151626704924&w=2). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	0c73f18b7d	sysfs: use singly-linked list for sysfs_dirent tree Make sysfs_dirent use singly linked list for its tree structure. sysfs_link_sibling() and sysfs_unlink_sibling() functions are added to handle simpler cases. It adds some complexity and cpu cycle overhead but reduced memory footprint is worthwhile on big machines. This change reduces the sizeof sysfs_dirent from 104 to 88 on 64bit and from 60 to 52 on 32bit. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	8619f97989	sysfs: slim down sysfs_dirent->s_active Make sysfs_dirent->s_active an atomic_t instead of rwsem. This reduces the size of sysfs_dirent from 136 to 104 on 64bit and from 76 to 60 on 32bit with lock debugging turned off. With lock debugging turned on the reduction is much larger. s_active starts at zero and each active reference increments s_active. Putting a reference decrements s_active. Deactivation subtracts SD_DEACTIVATED_BIAS which is currently INT_MIN and assumed to be small enough to make s_active negative. If s_active is negative, sysfs_get() no longer grants new references. Deactivation succeeds immediately if there is no active user; otherwise, it waits using a completion for the last put. Due to the removal of lockdep tricks, this change makes things less trickier in release_sysfs_dirent(). As all the complexity is contained in three s_active functions, I think it's more readable this way. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	b6b4a4399c	sysfs: move s_active functions to fs/sysfs/dir.c These functions are about to receive more complexity and doesn't really need to be inlined in the first place. Move them from fs/sysfs/sysfs.h to fs/sysfs/dir.c. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	0b8ead82f5	sysfs: fix root sysfs_dirent -> root dentry association The root sysfs_dirent didn't point to the root dentry fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	8312a8d7c1	sysfs: use iget_locked() instead of new_inode() After dentry is reclaimed, sysfs always used to allocate new dentry and inode if the file is accessed again. This causes problem with operations which only pin the inode. For example, if inotify watch is added to a sysfs file and the dentry for the file is reclaimed, the next update event creates new dentry and new inode making the inotify watch miss all the events from there on. This patch fixes it by using iget_locked() instead of new_inode(). sysfs_new_inode() is renamed to sysfs_get_inode() and inode is initialized iff the inode is newly allocated. sysfs_instantiate() is responsible for unlocking new inodes. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	fc9f54b998	sysfs: reorganize sysfs_new_indoe() and sysfs_create() Reorganize/clean up sysfs_new_inode() and sysfs_create(). * sysfs_init_inode() is separated out from sysfs_new_inode() and is responsible for basic initialization. * sysfs_instantiate() replaces the last step of sysfs_create() and is responsible for dentry instantitaion. * type-specific initialization is moved out to the callers. * mode is specified only once when creating a sysfs_dirent. * spurious list_del_init(&sd->s_sibling) dropped from create_dir() This change is to * prepare for inode allocation fix. * separate alloc and init code for synchronization update. * make dentry/inode initialization more flexible for later changes. This patch doesn't introduce visible behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	7f7cfffe60	sysfs: fix parent refcounting during rename and move Parent reference wasn't properly transferred during rename and move. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:07 -07:00
Tejun Heo	42b37df6ab	sysfs: make sysfs_alloc_ino() static sysfs_alloc_ino() isn't used out side of fs/sysfs/dir.c. Make it static. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:06 -07:00
Tejun Heo	7b595756ec	sysfs: kill unnecessary attribute->owner sysfs is now completely out of driver/module lifetime game. After deletion, a sysfs node doesn't access anything outside sysfs proper, so there's no reason to hold onto the attribute owners. Note that often the wrong modules were accounted for as owners leading to accessing removed modules. This patch kills now unnecessary attribute->owner. Note that with this change, userland holding a sysfs node does not prevent the backing module from being unloaded. For more info regarding lifetime rule cleanup, please read the following message. http://article.gmane.org/gmane.linux.kernel/510293 (tweaked by Greg to not delete the field just yet, to make it easier to merge things properly.) Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:06 -07:00
Tejun Heo	dbde0fcf9f	sysfs: reimplement sysfs_drop_dentry() This patch reimplements sysfs_drop_dentry() such that remove_dir() can use it to drop dentry instead of using a separate mechanism. With this change, making directories reclaimable is much easier. This patch used to contain fixes for two race conditions around sd->s_dentry but that part has been separated out and included into mainline early as commit `6aa054aadf` and `dd14cbc994`. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:06 -07:00
Tejun Heo	198a2a8470	sysfs: separate out sysfs_attach_dentry() Consolidate sd <-> dentry association into sysfs_attach_dentry() and call it after dentry and inode are properly set up. This is in preparation of sysfs_drop_dentry() updates. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:05 -07:00
Tejun Heo	73107cb3ad	sysfs: kill attribute file orphaning Now that sysfs_dirent can be disconnected from kobject on deletion, there is no need to orphan each attribute files. All [bin_]attribute nodes are automatically orphaned when the parent node is deleted. Kill attribute file orphaning. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:05 -07:00
Tejun Heo	0ab66088c8	sysfs: implement sysfs_dirent active reference and immediate disconnect sysfs: implement sysfs_dirent active reference and immediate disconnect Opening a sysfs node references its associated kobject, so userland can arbitrarily prolong lifetime of a kobject which complicates lifetime rules in drivers. This patch implements active reference and makes the association between kobject and sysfs immediately breakable. Now each sysfs_dirent has two reference counts - s_count and s_active. s_count is a regular reference count which guarantees that the containing sysfs_dirent is accessible. As long as s_count reference is held, all sysfs internal fields in sysfs_dirent are accessible including s_parent and s_name. The newly added s_active is active reference count. This is acquired by invoking sysfs_get_active() and it's the caller's responsibility to ensure sysfs_dirent itself is accessible (should be holding s_count one way or the other). Dereferencing sysfs_dirent to access objects out of sysfs proper requires active reference. This includes access to the associated kobjects, attributes and ops. The active references can be drained and denied by calling sysfs_deactivate(). All active sysfs_dirents must be deactivated after deletion but before the default reference is dropped. This enables immediate disconnect of sysfs nodes. Once a sysfs_dirent is deleted, it won't access any entity external to sysfs proper. Because attr/bin_attr ops access both the node itself and its parent for kobject, they need to hold active references to both. sysfs_get/put_active_two() helpers are provided to help grabbing both references. Parent's is acquired first and released last. Unlike other operations, mmapped area lingers on after mmap() is finished and the module implement implementing it and kobj need to stay referenced till all the mapped pages are gone. This is accomplished by holding one set of active references to the bin_attr and its parent if there have been any mmap during lifetime of an openfile. The references are dropped when the openfile is released. This change makes sysfs lifetime rules independent from both kobject's and module's. It not only fixes several race conditions caused by sysfs not holding onto the proper module when referencing kobject, but also helps fixing and simplifying lifetime management in driver model and drivers by taking sysfs out of the equation. Please read the following message for more info. http://article.gmane.org/gmane.linux.kernel/510293 Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:05 -07:00
Tejun Heo	eb36165353	sysfs: implement bin_buffer Implement bin_buffer which contains a mutex and pointer to PAGE_SIZE buffer to properly synchronize accesses to per-openfile buffer and prepare for immediate-kobj-disconnect. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	2b29ac252a	sysfs: reimplement symlink using sysfs_dirent tree sysfs symlink is implemented by referencing dentry and kobject from sysfs_dirent - symlink entry references kobject, dentry is used to walk the tree. This complicates object lifetimes rules and is dangerous - for example, there is no way to tell to which module the target of a symlink belongs and referencing that kobject can make it linger after the module is gone. This patch reimplements symlink using only sysfs_dirent tree. sd for a symlink points and holds reference to the target sysfs_dirent and all walking is done using sysfs_dirent tree. Simpler and safer. Please read the following message for more info. http://article.gmane.org/gmane.linux.kernel/510293 Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	aecdcedaab	sysfs: implement kobj_sysfs_assoc_lock kobj->dentry can go away anytime unless the user controls when the associated sysfs node is deleted. This patch implements kobj_sysfs_assoc_lock which protects kobj->dentry. This will be used to maintain kobj based API when converting sysfs to use sysfs_dirent tree instead of dentry/kobject. Note that this lock belongs to kobject/driver-model not sysfs. Once sysfs is converted to not use kobject in its interface, this can be removed from sysfs. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	3e5190380e	sysfs: make sysfs_dirent->s_element a union Make sd->s_element a union of sysfs_elem_{dir\|symlink\|attr\|bin_attr} and rename it to s_elem. This is to achieve... * some level of type checking : changing symlink to point to sysfs_dirent instead of kobject is much safer and less painful now. * easier / standardized dereferencing * allow sysfs_elem_* to contain more than one entry Where possible, pointer is obtained by directly deferencing from sd instead of going through other entities. This reduces dependencies to dentry, inode and kobject. to_attr() and to_bin_attr() are unused now and removed. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	0c096b507f	sysfs: add sysfs_dirent->s_name Add s_name to sysfs_dirent. This is to further reduce dependency to the associated dentry. Name is copied for directories and symlinks but not for attributes. Where possible, name dereferences are converted to use sd->s_name. sysfs_symlink->link_name and sysfs_get_name() are unused now and removed. This change allows symlink to be implemented using sysfs_dirent tree proper, which is the last remaining dentry-dependent sysfs walk. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	13b3086d2e	sysfs: add sysfs_dirent->s_parent Add sysfs_dirent->s_parent. With this patch, each sd points to and holds a reference to its parent. This allows walking sysfs tree without referencing sd->s_dentry which can go away anytime if the user doesn't control when it's deleted. sd->s_parent is initialized and parent is referenced in sysfs_attach_dirent(). Reference to parent is released when the sd is released, so as long as reference to a sd is held, s_parent can be followed. dentry walk in sysfs_readdir() is convereted to s_parent walk. This will be used to reimplement symlink such that it uses only sysfs_dirent tree. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	a26cd7226c	sysfs: consolidate sysfs_dirent creation functions Currently there are four functions to create sysfs_dirent - __sysfs_new_dirent(), sysfs_new_dirent(), __sysfs_make_dirent() and sysfs_make_dirent(). Other than sysfs_make_dirent(), no function has two users if calls to implement other functions are excluded. This patch consolidates sysfs_dirent creation functions into the following two. * sysfs_new_dirent() : allocate and initialize * sysfs_attach_dirent() : attach to sysfs_dirent hierarchy and/or associate with dentry This simplifies interface and gives callers more flexibility. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:04 -07:00
Tejun Heo	996b73764e	sysfs: flatten and fix sysfs_rename_dir() error handling Error handling in sysfs_rename_dir() was broken. * When lookup_one_len() fails, 0 is returned. * If parent inode check fails, returns with inode mutex and rename rwsem held. This patch fixes the above bugs and flattens error handling such that it's more readable and easier to modify. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:03 -07:00
Tejun Heo	dfeb9fb034	sysfs: flatten cleanup paths in sysfs_add_link() and create_dir() Flatten cleanup paths in sysfs_add_link() and create_dir() to improve readability and ease further changes to these functions. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:03 -07:00
Tejun Heo	93e3cd8270	sysfs: fix error handling in binattr write() Error handling in fs/sysfs/bin.c:write() was wrong because size_t count is used to receive return value from flush_write() which is negative on failure. This patch updates write() such that int variable is used instead. read() is updated the same way for consistency. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:03 -07:00
Tejun Heo	7a23ad4404	sysfs: make sysfs_put() ignore NULL sd Make sysfs_put() ignore NULL sd instead of oopsing. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:03 -07:00
Tejun Heo	2b611bb7ab	sysfs: allocate inode number using ida sysfs used simple incrementing allocator which is not guaranteed to be unique. This patch makes sysfs use ida to give each sd a unique and packed inode number. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:03 -07:00
Tejun Heo	fa7f912ad4	sysfs: move release_sysfs_dirent() to dir.c There is no reason this function should be inlined and soon to follow sysfs object reference simplification will make it heavier. Move it to dir.c. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:03 -07:00
Jan Kara	cfc94cdf8e	debugfs: add rename for debugfs files Implement debugfs_rename() to allow renaming files/directories in debugfs. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-11 16:09:00 -07:00
Steve French	7521a3c566	[CIFS] Fix oops in cifs_create when nfsd server exports cifs mount nfsd is passing null nameidata (probably the only one doing that) on call to create - cifs was missing one check for this. Note that running nfsd over a cifs mount requires specifying fsid on the nfs exports entry and requires mounting cifs with serverino mount option. Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-11 18:30:34 +00:00
Frank Filz	137d6acaa6	NFSv4: Make sure unlock is really an unlock when cancelling a lock I ran into a curious issue when a lock is being canceled. The cancellation results in a lock request to the vfs layer instead of an unlock request. This is particularly insidious when the process that owns the lock is exiting. In that case, sometimes the erroneous lock is applied AFTER the process has entered zombie state, preventing the lock from ever being released. Eventually other processes block on the lock causing a slow degredation of the system. In the 2.6.16 kernel this was investigated on, the problem is compounded by the fact that the cl_sem is held while blocking on the vfs lock, which results in most processes accessing the nfs file system in question hanging. In more detail, here is how the situation occurs: first _nfs4_do_setlk(): static int _nfs4_do_setlk(struct nfs4_state state, int cmd, struct file_lock fl, int reclaim) ... ret = nfs4_wait_for_completion_rpc_task(task); if (ret == 0) { ... } else data->cancelled = 1; then nfs4_lock_release(): static void nfs4_lock_release(void calldata) ... if (data->cancelled != 0) { struct rpc_task task; task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp, data->arg.lock_seqid); The problem is the same file_lock that was passed in to _nfs4_do_setlk() gets passed to nfs4_do_unlck() from nfs4_lock_release(). So the type is still F_RDLCK or FWRLCK, not F_UNLCK. At some point, when cancelling the lock, the type needs to be changed to F_UNLCK. It seemed easiest to do that in nfs4_do_unlck(), but it could be done in nfs4_lock_release(). The concern I had with doing it there was if something still needed the original file_lock, though it turns out the original file_lock still needs to be modified by nfs4_do_unlck() because nfs4_do_unlck() uses the original file_lock to pass to the vfs layer, and a copy of the original file_lock for the RPC request. It seems like the simplest solution is to force all situations where nfs4_do_unlck() is being used to result in an unlock, so with that in mind, I made the following change: Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:49 -04:00
Frank van Maarseveen	c98451bdb2	NLM: fix source address of callback to client Use the destination address of the original NLM request as the source address in callbacks to the client. Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:49 -04:00
Trond Myklebust	6f2e64d3e1	NFSv4: Make the NFS state model work with the nosharedcache mount option Consider the case where the user has mounted the remote filesystem server:/foo on the two local directories /bar and /baz using the nosharedcache mount option. The files /bar/file and /baz/file are represented by different inodes in the local namespace, but refer to the same file /foo/file on the server. Consider the case where a process opens both /bar/file and /baz/file, then closes /bar/file: because the nfs4_state is not shared between /bar/file and /baz/file, the kernel will see that the nfs4_state for /bar/file is no longer referenced, so it will send off a CLOSE rpc call. Unless the open_owners differ, then that CLOSE call will invalidate the open state on /baz/file too. Conclusion: we cannot share open state owners between two different non-shared mount instances of the same filesystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:48 -04:00
Trond Myklebust	275a5d24bf	NFS: Error when mounting the same filesystem with different options Unless the user sets the NFS_MOUNT_NOSHAREDCACHE mount flag, we should return EBUSY if the filesystem is already mounted on a superblock that has set conflicting mount options. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:48 -04:00
Trond Myklebust	75180df2ed	NFS: Add the mount option "nosharecache" Prior to David Howell's mount changes in 2.6.18, users who mounted different directories which happened to be from the same filesystem on the server would get different super blocks, and hence could choose different mount options. As long as there were no hard linked files that crossed from one subtree to another, this was quite safe. Post the changes, if the two directories are on the same filesystem (have the same 'fsid'), they will share the same super block, and hence the same mount options. Add a flag to allow users to elect not to share the NFS super block with another mount point, even if the fsids are the same. This will allow users to set different mount options for the two different super blocks, as was previously possible. It is still up to the user to ensure that there are no cache coherency issues when doing this, however the default behaviour will be to share super blocks whenever two paths result in the same fsid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:48 -04:00
Chuck Lever	8007122520	NFS: Add support for mounting NFSv4 file systems with string options Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:48 -04:00
Chuck Lever	136d558ce7	NFS: Add final pieces to support in-kernel mount option parsing Hook in final components required for supporting in-kernel mount option parsing for NFSv2 and NFSv3 mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:48 -04:00
Chuck Lever	0076d7b7ba	NFS: Introduce generic mount client API For NFSv2 and v3 mounts, the first step is to contact the server's MOUNTD and request the file handle for the root of the mounted share. Add a function to the NFS client that handles this operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:47 -04:00
Chuck Lever	bf0fd7680f	NFS: Add enums and match tables for mount option parsing This generic infrastructure works for both NFS and NFSv4 mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:47 -04:00
Chuck Lever	013a8c1ab5	NFS: Improve debugging output in NFS in-kernel mount client Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:47 -04:00
Chuck Lever	19207231c9	NFS: Clean up in-kernel NFS mount Clean up white space and coding conventions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:46 -04:00
Chuck Lever	3ea97309e6	NFS: Remake nfsroot_mount as a permanent part of NFS client In preparation for supporting NFSv2 and NFSv3 mount option handling in the kernel NFS client, convert mount_clnt.c to be a permanent part of the NFS client, instead of built only when CONFIG_ROOT_NFS is enabled. In addition, we also replace the "struct sockaddr_in *" argument with something more generic, to help support IPv6 at some later point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:46 -04:00
Chuck Lever	43780b87fa	SUNRPC: Add a convenient default for the hostname when calling rpc_create() A couple of callers just use a stringified IP address for the rpc client's hostname. Move the logic for constructing this into rpc_create(), so it can be shared. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:46 -04:00
Chuck Lever	cce63cd637	SUNRPC: Rename rpcb_getport_external routine In preparation for handling NFS mount option parsing in the kernel, rename rpcb_getport_external as rpcb_get_port_sync, and make it available always (instead of only when CONFIG_ROOT_NFS is enabled). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:46 -04:00
Chuck Lever	f0768ebd09	NFS: Introduce nfs4_validate_mount_options Refactor NFSv4 mount processing to break out mount data validation in the same way it's broken out in the NFSv2/v3 mount path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:45 -04:00
Chuck Lever	5df36e78da	NFS: Clean up nfs_validate_mount_data Move error handling code out of the main code path. The switch statement was also improperly indented, according to Documentation/CodingStyle. This prepares nfs_validate_mount_data for the addition of option string parsing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:45 -04:00
Chuck Lever	fc50d58fd0	NFS: Clean-up: Refactor IP address sanity checks in NFS client NFS and NFSv4 mounts can now share server address sanity checking. And, it provides an easy mechanism for adding IPv6 address checking at some later point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:44 -04:00
Chuck Lever	4d81cd1611	NFS: Clean-up: fix a compiler warning in fs/nfs/super.c /home/cel/linux/fs/nfs/super.c: In function 'nfs_pseudoflavour_to_name': /home/cel/linux/fs/nfs/super.c:270: warning: comparison between signed and unsigned Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:44 -04:00
Chuck Lever	0655960f76	NFS: Clean up error handling in nfs_get_sb The error return logic in nfs_get_sb now matches nfs4_get_sb, and is more maintainable. A subsequent patch will take advantage of this simplification. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:44 -04:00
Chuck Lever	29eb981a3b	NFS: Clean-up: Replace nfs_copy_user_string with strndup_user The new string utility function strndup_user can be used instead of nfs_copy_user_string, eliminating an unnecessary duplication of function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:44 -04:00
Chuck Lever	5680d48be8	NFS: Clean-up: Define macros for maximum host and export path name lengths Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:44 -04:00
Chuck Lever	9eaa67c6a5	NFS: Clean-up: use correct type when converting NFS blocks to local blocks inode->i_blocks is a blkcnt_t these days, which can be a u64 or unsigned long, depending on the setting of CONFIG_LSF. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:44 -04:00
Trond Myklebust	8bda4e4c98	NFSv4: Fix up stateid locking... We really don't need to grab both the state->so_owner and the inode->i_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:43 -04:00
Trond Myklebust	1ac7e2fd35	NFSv4: Clean up the callers of nfs4_open_recover_helper() Rely on nfs4_try_open_cached() when appropriate. Also fix an RCU violation in _nfs4_do_open_reclaim() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:43 -04:00
Trond Myklebust	6ee4126890	NFSv4: Don't call OPEN if we already have an open stateid for a file If we already have a stateid with the correct open mode for a given file, then we can reuse that stateid instead of re-issuing an OPEN call without violating the close-to-open caching semantics. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:43 -04:00
Trond Myklebust	aac00a8d0a	NFSv4: Check for the existence of a delegation in nfs4_open_prepare() We should not be calling open() on an inode that has a delegation unless we're doing a reclaim. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:43 -04:00
Trond Myklebust	3e309914a1	NFSv4: Clean up _nfs4_proc_open() Use a flag instead of the 'data->rpc_status = -ENOMEM hack. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:42 -04:00
Trond Myklebust	1b370bc28f	NFSv4: Allow nfs4_opendata_to_nfs4_state to return errors. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:42 -04:00
Trond Myklebust	6f43ddccb3	NFSv4: Improve the debugging of bad sequence id errors... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:42 -04:00
Trond Myklebust	003707c722	NFSv4: Always use the delegation if we have one Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:41 -04:00
Trond Myklebust	0f9f95e0ad	NFSv4: Clean up confirmation of sequence ids... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:41 -04:00
Trond Myklebust	412c77cee6	NFSv4: Defer inode revalidation when setting up a delegation Currently we force a synchronous call to __nfs_revalidate_inode() in nfs_inode_set_delegation(). This not only ensures that we cannot call nfs_inode_set_delegation from an asynchronous context, but it also slows down any call to open(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:41 -04:00
Trond Myklebust	8383e4602c	NFSv4: Use RCU to protect delegations Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:41 -04:00
Trond Myklebust	13437e12fb	NFSv4: Support recalling delegations by stateid part 2 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:41 -04:00
Trond Myklebust	9016302784	NFSv4: Support recalling delegations by stateid There appear to be some rogue servers out there that issue multiple delegations with different stateids for the same file. Ensure that when we return delegations, we do so on a per-stateid basis rather than a per-file basis. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:40 -04:00
Trond Myklebust	2ced46c270	NFSv4: Fix up a bug in nfs4_open_recover() Don't clobber the delegation info... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:40 -04:00
Trond Myklebust	549d6ed5e8	NFSv4: set the delegation in nfs4_opendata_to_nfs4_state This ensures that nfs4_open_release() and nfs4_open_confirm_release() can now handle an eventual delegation that was returned with out open. As such, it fixes a delegation "leak" when the user breaks out of an open call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:40 -04:00
Trond Myklebust	1c816efa24	NFSv4: Fix a bug in __nfs4_find_state_byowner The test for state->state == 0 does not tell you that the stateid is in the process of being freed. It really tells you that the stateid is not yet initialised... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:40 -04:00
Trond Myklebust	1b45c46cf7	NFSv4: Fix atomic open for execute... Currently we do not check for the FMODE_EXEC flag as we should. For that particular case, we need to perform an ACCESS call to the server in order to check that the file is executable. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:40 -04:00
Trond Myklebust	9f958ab885	NFSv4: Reduce the chances of an open_owner identifier collision Currently we just use a 32-bit counter. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	88d9093997	NFSv4: nfs_increment_open_seqid should not return a value It is a void function... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	e6889620e8	NFSv4: Fix underestimate of NFSv4 lookup request size Also fix up the underestimate of fs_locations Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	2cebf82883	NFSv4: Fix the underestimate of NFSv4 open request size The maximum size depends on the filename size and a number of other elements which are currently not being counted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	bd625ba80d	NFSv4: Fix the NFSv4 owner and owner_group size estimates Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	7af654f8d1	NFSv4: Don't reuse expired nfs4_state_owner structs That just confuses certain NFSv4 servers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	27b3f949b7	NFSv4: Fix a credential reference leak in nfs4_get_state_owner() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	587142f85f	NFS: Replace NFS_I(inode)->req_lock with inode->i_lock There is no justification for keeping a special spinlock for the exclusive use of the NFS writeback code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	4e56e082dd	NFSv4: Clean up _nfs4_proc_lookup() vs _nfs4_proc_lookupfh() They differ only slightly in the arguments they take. Why have they not been merged? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	1be27f3660	SUNRPC: Remove the tk_auth macro... We should almost always be deferencing the rpc_auth struct by means of the credential's cr_auth field instead of the rpc_clnt->cl_auth anyway. Fix up that historical mistake, and remove the macro that propagated it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:37 -04:00
Trond Myklebust	f61534dfd3	SUNRPC: Remove redundant calls to rpciod_up()/rpciod_down() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:30 -04:00
Trond Myklebust	90c5755ff5	SUNRPC: Kill rpc_clnt->cl_oneshot Replace it with explicit calls to rpc_shutdown_client() or rpc_destroy_client() (for the case of asynchronous calls). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:29 -04:00
Trond Myklebust	34f52e3591	SUNRPC: Convert rpc_clnt->cl_users to a kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:28 -04:00
Trond Myklebust	c6d00e639b	NFSv4: Convert struct nfs4_opendata to use struct kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:28 -04:00
Trond Myklebust	3bec63db55	NFS: Convert struct nfs_open_context to use a kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:27 -04:00
Trond Myklebust	edc05fc1c2	NFS: reduce latency by using conditional rescheduling in nfs_scan_list Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:27 -04:00
Trond Myklebust	dce34ce298	NFS: Prevent integer overflow in nfs_scan_list() Also ensure that nfs_inode ncommit and npages are large enough to represent all possible values for the number of pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:27 -04:00
Trond Myklebust	2aefa10431	NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	5c36968343	NFS cleanup: speed up nfs_scan_commit using radix tree tags Add a tag for requests that are waiting for a COMMIT Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	9fd367f0f3	NFS cleanup: Rename NFS_PAGE_TAG_WRITEBACK to NFS_PAGE_TAG_LOCKED Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	c03b402461	NFS: Convert struct nfs_page to use krefs Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	a50f7951a3	NFS: Fix an Oops in the nfs_access_cache_shrinker() The nfs_access_cache_shrinker may race with nfs_access_zap_cache(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Trond Myklebust	e2f032e9ef	NFS: nfs3_proc_create() should use nfs_post_op_update_inode() Also get rid of a redundant call to nfs_setattr_update_inode(). The call to nfs3_proc_setattr() already takes care of that. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Jeff Layton	aa53ed541a	NFS4: on a O_EXCL OPEN make sure SETATTR sets the fields holding the verifier The Linux NFS4 client simply skips over the bitmask in an O_EXCL open call and so it doesn't bother to reset any fields that may be holding the verifier. This patch has us save the first two words of the bitmask (which is all the current client has #defines for). The client then later checks this bitmask and turns on the appropriate flags in the sattr->ia_verify field for the following SETATTR call. This patch only currently checks to see if the server used the atime and mtime slots for the verifier (which is what the Linux server uses for this). I'm not sure of what other fields the server could reasonably use, but adding checks for others should be trivial. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Trond Myklebust	fc6ae3cf48	NFS: Re-enable forced umounts They disappeared some time around 2.6.18. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Jeff Layton	83d93f2229	NFS: Use GFP_HIGHUSER for page allocation in nfs_symlink() nfs_symlink() allocates a GFP_KERNEL page for the pagecache. Most pagecache pages are allocated using GFP_HIGHUSER, and there's no reason not to do that in nfs_symlink() as well. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2007-07-10 23:40:25 -04:00
Trond Myklebust	a0356862bc	NFS: Fix nfs_reval_fsid() We don't need to revalidate the fsid on the root directory. It suffices to revalidate it on the current directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	b39e625b6e	NFSv4: Clean up nfs4_call_async() Use rpc_run_task() instead of doing it ourselves. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	4a35bd41af	NFSv4: Ensure that nfs4_do_close() doesn't race with umount nfs4_do_close() does not currently have any way to ensure that the user won't attempt to unmount the partition while the asynchronous RPC call is completing. This again may cause Oopses in nfs_update_inode(). Add a vfsmount argument to nfs4_close_state to ensure that the partition remains mounted while we're closing the file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	ad389da79f	NFSv4: Ensure asynchronous open() calls always pin the mountpoint A number of race conditions may currently ensue if the user presses ^C and then unmounts the partition while an asynchronous open() is in progress. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	539cd03a57	NFSv4: Cleanup: pass the nfs_open_context to open recovery code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	88be9f990f	NFS: Replace vfsmount and dentry in nfs_open_context with struct path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	de05a0cc2a	NFS: Minor read optimisation... Since PG_uptodate may now end up getting set during the call to nfs_wb_page(), we can avoid putting a read request on the wire in those situations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	44dd151d5c	NFS: Don't mark a written page as uptodate until it is on disk The write may fail, so we should not mark the page as uptodate until we are certain that the data has been accepted and written to disk by the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	d9df8d6b38	NFS: Don't fail an O_DIRECT read/write if get_user_pages() returns pages There is no need to fail the entire O_DIRECT read/write just because get_user_pages() returned fewer pages than we requested. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Chuck Lever	070ea60214	NFS: Clean ups in fs/nfs/direct.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Pavel Emelianov	bcf67e1625	Make common helpers for seq_files that work with list_heads Many places in kernel use seq_file API to iterate over a regular list_head. The code for such iteration is identical in all the places, so it's worth introducing a common helpers. This makes code about 300 lines smaller: The first version of this patch made the helper functions static inline in the seq_file.h header. This patch moves them to the fs/seq_file.c as Andrew proposed. The vmlinux .text section sizes are as follows: 2.6.22-rc1-mm1: 0x001794d5 with the previous version: 0x00179505 with this patch: 0x00179135 The config file used was make allnoconfig with the "y" inclusion of all the possible options to make the files modified by the patch compile plus drivers I have on the test node. This patch: Many places in kernel use seq_file API to iterate over a regular list_head. The code for such iteration is identical in all the places, so it's worth introducing a common helpers. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-10 17:51:13 -07:00
Eric Sandeen	54c57dc3b6	[PATCH] ocfs2: zero_user_page conversion Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:10 -07:00
Mark Fasheh	b25801038d	ocfs2: Support xfs style space reservation ioctls We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to allocate and deallocate regions to a file without zeroing data or changing i_size. Though renamed, the structure passed in from user is identical to struct xfs_flock64. The three fields that are actually used right now are l_whence, l_start and l_len. This should get ocfs2 immediate compatibility with userspace software using the pre-existing xfs ioctls. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:09 -07:00
Mark Fasheh	063c4561f5	ocfs2: support for removing file regions Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and will remove existing extents within that range. Partial clusters will be zeroed so that any read from within the region will return zeros. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:08 -07:00
Mark Fasheh	35edec1d52	ocfs2: update truncate handling of partial clusters The partial cluster zeroing code used during truncate usually assumes that the rightmost byte in the range to be zeroed lies on a cluster boundary. This makes sense for truncate, but punching holes might require zeroing on non-aligned rightmost boundaries. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:07 -07:00
Mark Fasheh	d0c7d7082e	ocfs2: btree support for removal of arbirtrary extents Add code to the btree paths to support the removal of arbitrary regions within an existing extent. With proper higher level support this can be used to "punch holes" in a file. Truncate (a special case of hole punching) could also be converted to use these methods. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:05 -07:00
Mark Fasheh	2ae99a6037	ocfs2: Support creation of unwritten extents This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:04 -07:00
Mark Fasheh	b27b7cbcf1	ocfs2: support writing of unwritten extents Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:03 -07:00
Mark Fasheh	0d172baa55	ocfs2: small cleanup of ocfs2_write_begin_nolock() We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:01 -07:00
Mark Fasheh	328d5752e1	ocfs2: btree changes for unwritten extents Writes to a region marked as unwritten might result in a record split or merge. We can support splits by making minor changes to the existing insert code. Merges require left rotations which mostly re-use right rotation support functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:00 -07:00
Mark Fasheh	c3afcbb344	ocfs2: abstract btree growing calls The top level calls and logic for growing a tree can easily be abstracted out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree(). This allows future code to easily grow btrees when needed. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:58 -07:00
Mark Fasheh	1f6697d072	ocfs2: use all extent block suballocators Now that we have a method to deallocate blocks from them, each node should allocate extent blocks from their local suballocator file. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:56 -07:00
Mark Fasheh	59a5e416d1	ocfs2: plug truncate into cached dealloc routines Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:55 -07:00
Mark Fasheh	2b604351bc	ocfs2: simplify deallocation locking Deallocation of suballocator blocks, most notably extent blocks, might involve multiple suballocator inodes. The locking for this can get extremely complicated, especially when the suballocator inodes to delete from aren't known until deep within an unrelated codepath. Implement a simple scheme for recording the blocks to be unlinked so that the actual deallocation can be done in a context which won't deadlock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:54 -07:00
Mark Fasheh	bce997682f	ocfs2: harden buffer check during mapping of page blocks We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:52 -07:00
Mark Fasheh	7307de8051	ocfs2: shared writeable mmap Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:51 -07:00
Mark Fasheh	607d44aa3f	ocfs2: factor out write aops into nolock variants ocfs2_mkwrite() will want this so that it can add some mmap specific checks before asking for a write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:49 -07:00
Mark Fasheh	3a307ffc27	ocfs2: rework ocfs2_buffered_write_cluster() Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:46 -07:00
Mark Fasheh	2e89b2e48e	ocfs2: take ip_alloc_sem during entire truncate Use of the alloc sem during truncate was too narrow - we want to protect the i_size change and page truncation against mmap now. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:57 -07:00
Sunil Mushran	baf4661a82	ocfs2: Add "preferred slot" mount option ocfs2 will attempt to assign the node the slot# provided in the mount option. Failure to assign the preferred slot is not an error. This small feature can be useful for automated testing. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:54 -07:00
Shani Moideen	5fb0f7f010	[KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:52 -07:00
Christoph Hellwig	800deef3f6	[PATCH] ocfs2: use list_for_each_entry where benefical Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:49 -07:00
Joel Becker	e6df3a663a	ocfs2: Wake up a starting region if it gets killed in the background. Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the heartbeat region while it is starting up. Then o2hb_region_dev_write() can check to see if it is alive and act accordingly. This prevents a hang (not being woken) and a crash (if it's woken by a signal). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:46 -07:00
Joel Becker	16c6a4f24d	ocfs2: live heartbeat depends on the local node configuration Removing the local node configuration out from underneath a running heartbeat is "bad". Provide an API in the ocfs2 nodemanager to request a configfs dependancy on the local node, then use it in heartbeat. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:43 -07:00
Joel Becker	14829422be	ocfs2: Depend on configfs heartbeat items. ocfs2 mounts require a heartbeat region. Use the new configfs_depend_item() facility to actually depend on them so they can't go away from under us. First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem. Then teach o2hb_register_callbacks to take a UUID and depend on the appropriate region. Finally, teach all users of o2hb to pass a UUID or NULL if they don't require a pin. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:40 -07:00
Joel Becker	631d1febab	configfs: config item dependancies. Sometimes other drivers depend on particular configfs items. For example, ocfs2 mounts depend on a heartbeat region item. If that region item is removed with rmdir(2), the ocfs2 mount must BUG or go readonly. Not happy. This provides two additional API calls: configfs_depend_item() and configfs_undepend_item(). A client driver can call configfs_depend_item() on an existing item to tell configfs that it is depended on. configfs will then return -EBUSY from rmdir(2) for that item. When the item is no longer depended on, the client driver calls configfs_undepend_item() on it. These API cannot be called underneath any configfs callbacks, as they will conflict. They can block and allocate. A client driver probably shouldn't calling them of its own gumption. Rather it should be providing an API that external subsystems call. How does this work? Imagine the ocfs2 mount process. When it mounts, it asks for a heart region item. This is done via a call into the heartbeat code. Inside the heartbeat code, the region item is looked up. Here, the heartbeat code calls configfs_depend_item(). If it succeeds, then heartbeat knows the region is safe to give to ocfs2. If it fails, it was being torn down anyway, and heartbeat can gracefully pass up an error. [ Fixed some bad whitespace in configfs.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:18:59 -07:00
Joel Becker	299894cc90	configfs: accessing item hierarchy during rmdir(2) Add a notification callback, ops->disconnect_notify(). It has the same prototype as ->drop_item(), but it will be called just before the item linkage is broken. This way, configfs users who want to do work while the object is still in the heirarchy have a chance. Client drivers will still need to config_item_put() in their ->drop_item(), if they implement it. They need do nothing in ->disconnect_notify(). They don't have to provide it if they don't care. But someone who wants to be notified before ci_parent is set to NULL can now be notified. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:11:01 -07:00
Johannes Berg	6d748924b7	[PATCH] configsfs buffer: use mutex Seems copied from sysfs, but I don't see a reason here nor there to use a semaphore instead of a mutex. Convert. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:10:58 -07:00
Joel Becker	e6bd07aee7	configfs: Convert subsystem semaphore to mutex Convert the su_sem member of struct configfs_subsystem to a struct mutex, as that's what it is. Also convert all the users and update Documentation/configfs.txt and Documentation/configfs_example.c accordingly. [ Conflict in fs/dlm/config.c with commit `3168b0780d` manually resolved. --Mark ] Inspired-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:10:56 -07:00
Satyam Sharma	3fe6c5ce11	[PATCH] configfs+dlm: Rename config_group_find_obj and state semantics clearly Configfs being based upon sysfs code, config_group_find_obj() is probably so named because of the similar kset_find_obj() in sysfs. However, "kobject"s in sysfs become "config_item"s in configfs, so let's call it config_group_find_item() instead, for sake of uniformity, and make corresponding change in the users of this function. BTW a crucial difference between kset_find_obj and config_group_find_item is in locking expectations. kset_find_obj does its locking by itself, but config_group_find_item expects the caller to do the locking. The reason for this: kset's have their own locks, config_group's don't but instead rely on the subsystem mutex. And, subsystem needn't necessarily be around when config_group_find_item() is called. So let's state these locking semantics explicitly, and rectify the comment, otherwise bugs could continue to occur in future, as they did in the past (refer commit d82b8191e238 in gfs2-2.6-fixes.git). [ I also took the opportunity to fix some bad whitespace and double-empty lines. --Joel ] [ Conflict in fs/dlm/config.c with commit `3168b0780d` manually resolved. --Mark ] Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:02:31 -07:00
Satyam Sharma	9b1d9aa4e9	[PATCH] configfs+dlm: Separate out __CONFIGFS_ATTR into configfs.h fs/dlm/config.c contains a useful generic macro called __CONFIGFS_ATTR that is similar to sysfs' __ATTR macro that makes defining attributes easy for any user of configfs. Separate it out into configfs.h so that other users (forthcoming in dynamic netconsole patchset) can use it too. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 16:52:27 -07:00
Satyam Sharma	4c62b53454	configfs: misc cleanups 1. item.c:config_item_cleanup() is a private function (only called by config_item_release() in same file). However, it is spuriously exported in include/linux/configfs.h, so remove that export and make it static in item.c. Also, it is no longer exported / interface function, so no need to give comment for this function (the comment was stating obvious thing, anyway). 2. Kernel-doc comment format does not allow empty line between end of comment and start of function (declaration line). There were several such spurious empty lines in item.c, so fix them. fs/configfs/item.c \| 15 +++------------ include/linux/configfs.h \| 1 - 2 files changed, 3 insertions(+), 13 deletions(-) Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 16:52:25 -07:00
Joel Becker	b23cdde4c6	configfs: consistent attribute size The attribute store/show code currently limits attributes at PAGE_SIZE. This code comes from sysfs, where it still works that way. However, PAGE_SIZE is not constant. A 16k attribute string works on ia64 but not on x86. Really a subsystem shouldn't allow different attribute sizes based on platform. As such, limit all simple attributes to 4k. This works on all platforms, and is consistent with all current code. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 16:52:22 -07:00
Linus Torvalds	9f9d763216	Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] vmlogrdr function annotation. [S390] s390: rename CPU_IDLE to S390_CPU_IDLE [S390] cio: Remove prototype for non-existing function cmf_reset(). [S390] zcrypt: fix request timeout handling [S390] system call optimization. [S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE [S390] Remove volatile from atomic_t [S390] Program check in diag 210 under 31 bit [S390] Bogomips calculation for 64 bit. [S390] smp: Merge smp_count_cpus() and smp_get_save_areas(). [S390] zcore: Fix __user annotation. [S390] fixed cdl-format detection. [S390] sclp: Test facility list before executing a service call. [S390] sclp: introduce some new interfaces. [S390] Fixed comment typo. [S390] vmcp cleanup	2007-07-10 14:46:09 -07:00
Linus Torvalds	1b21f458dd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits) [GFS2] Accept old format NFS filehandles [GFS2] Small fixes to logging code [DLM] dump more lock values [GFS2] Remove i_mode passing from NFS File Handle [GFS2] Obtaining no_formal_ino from directory entry [GFS2] git-gfs2-nmw-build-fix [GFS2] System won't suspend with GFS2 file system mounted [GFS2] remounting w/o acl option leaves acls enabled [GFS2] inode size inconsistency [DLM] Telnet to port 21064 can stop all lockspaces [GFS2] Fix gfs2_block_truncate_page err return [GFS2] Addendum to the journaled file/unmount patch [GFS2] Simplify multiple glock aquisition [GFS2] assertion failure after writing to journaled file, umount [GFS2] Use zero_user_page() in stuffed_readpage() [GFS2] Remove bogus '\0' in rgrp.c [GFS2] Journaled file write/unstuff bug [DLM] don't require FS flag on all nodes [GFS2] Fix deallocation issues [GFS2] return conflicts for GETLK ...	2007-07-10 13:56:13 -07:00
Linus Torvalds	01370f0603	Merge branch 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block * 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block: pipe: add documentation and comments pipe: change the ->pin() operation to ->confirm() Remove remnants of sendfile() xip sendfile removal splice: completely document external interface with kerneldoc sendfile: remove bad_sendfile() from bad_file_ops shmem: convert to using splice instead of sendfile() relay: use splice_to_pipe() instead of open-coding the pipe loop pipe: allow passing around of ops private pointer splice: divorce the splice structure/function definitions from the pipe header splice: relay support sendfile: convert nfsd to splice_direct_to_actor() sendfile: convert nfs to using splice_read() loop: convert to using splice_direct_to_actor() instead of sendfile() splice: add void cookie to the actor data sendfile: kill generic_file_sendfile() sendfile: remove .sendfile from filesystems that use generic_file_sendfile() sys_sendfile: switch to using ->splice_read, if available vmsplice: add vmsplice-to-user support splice: abstract out actor data	2007-07-10 13:51:06 -07:00
Steven Whitehouse	3ebf44902f	[GFS2] Accept old format NFS filehandles On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote: > > -#define GFS2_LARGE_FH_SIZE 10 > > - > > -struct gfs2_fh_obj { > > - struct gfs2_inum_host this; > > - u32 imode; > > -}; > > +#define GFS2_LARGE_FH_SIZE 8 > > Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE > or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older > gfs version anymore. Stale filehandles because of a new kernel version > are a big no-no, so please add back code to handle the old filehandles > on the decode side. > This should fix that problem I think since its only relating to end of the fh we can just ignore that field in order to accept the older format. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wendy Cheng <wcheng@redhat.com>	2007-07-10 12:28:27 +01:00
Stefan Haberland	bf1a95a225	[S390] fixed cdl-format detection. CDL formated DASDs are now detected correctly even if no VOL1 label is on the disk. This prevents possible loss of data. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-07-10 11:24:44 +02:00
Jens Axboe	0845718daf	pipe: add documentation and comments As per Andrew Mortons request, here's a set of documentation for the generic pipe_buf_operations hooks, the pipe, and pipe_buffer structures. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:16 +02:00
Jens Axboe	cac36bb06e	pipe: change the ->pin() operation to ->confirm() The name 'pin' was badly chosen, it doesn't pin a pipe buffer in the most commonly used sense in the kernel. So change the name to 'confirm', after debating this issue with Hugh Dickins a bit. A good return from ->confirm() means that the buffer is really there, and that the contents are good. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	d96e6e7164	Remove remnants of sendfile() There are now zero users of .sendfile() in the kernel, so kill it from the file_operations structure and in do_sendfile(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Carsten Otte	d054fe3d10	xip sendfile removal This patch removes xip_file_sendfile, the sendfile implementation for xip without replacement. Those customers that use xip on s390 are not using sendfile() as far as we know, and so far s390 is the only platform this could potentially be used on so far. Having sendfile is not a popular feature for execute in place file systems, however we have a working implementation of splice_read() based on fs/splice.c if anyone asks for it. At this point in time, it does not seem preferable to merge splice_read() for xip because it causes extra maintenence effort due to code duplication and it requires struct page behind the xip memory segment. We'd like to get rid of that in favor of supporting flash based embedded platforms (Monta Vista work) soon. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	932cc6d4f7	splice: completely document external interface with kerneldoc Also add fs/splice.c as a kerneldoc target with a smaller blurb that should be expanded to better explain the overview of splice. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	d6f517568f	sendfile: remove bad_sendfile() from bad_file_ops do_sendfile() prefers splice over sendfile, so it should not trigger (directly, at least). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	497f9625c2	pipe: allow passing around of ops private pointer relay needs this for proper consumption handling, and the network receive support needs it as well to lookup the sk_buff on pipe release. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	d6b29d7cee	splice: divorce the splice structure/function definitions from the pipe header We need to move even more stuff into the header so that folks can use the splice_to_pipe() implementation instead of open-coding a lot of pipe knowledge (see relay implementation), so move to our own header file finally. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	cf8208d0ea	sendfile: convert nfsd to splice_direct_to_actor() Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	f0930fffa9	sendfile: convert nfs to using splice_read() Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	5ffc4ef45b	sendfile: remove .sendfile from filesystems that use generic_file_sendfile() They can use generic_file_splice_read() instead. Since sys_sendfile() now prefers that, there should be no change in behaviour. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:13 +02:00
Jens Axboe	534f2aaa6a	sys_sendfile: switch to using ->splice_read, if available This patch makes sendfile prefer to use ->splice_read(), if it's available in the file_operations structure. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:12 +02:00
Jens Axboe	6a14b90bb6	vmsplice: add vmsplice-to-user support A bit of a cheat, it actually just copies the data to userspace. But this makes the interface nice and symmetric and enables people to build on splice, with room for future improvement in performance. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:12 +02:00
Jens Axboe	c66ab6fa70	splice: abstract out actor data For direct splicing (or private splicing), the output may not be a file. So abstract out the handling into a specified actor function and put the data in the splice_desc structure earlier, so we can build on top of that. This is the first step in better splice handling for drivers, and also for implementing vmsplice _to_ user memory. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:12 +02:00
Adrian Bunk	72d3a38ee0	unexport bio_{,un}map_user bio_{,un}map_user no longer have any modular users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:03:34 +02:00
Steve French	fb8c4b14d9	[CIFS] whitespace cleanup More than halfway there Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-10 01:16:18 +00:00
Linus Torvalds	71d441ddb5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: JFS: Update print_hex_dump() syntax JFS: use print_hex_dump() rather than private dump_mem() function JFS: Whitespace cleanup and remove some dead code	2007-07-09 13:09:16 -07:00
Ingo Molnar	43ae34cb4c	sched: scheduler debugging, core scheduler debugging core: implement /proc/sched_debug and /proc/<PID>/sched files for scheduler debugging. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:52:00 +02:00
Balbir Singh	172ba844a8	sched: update delay-accounting to use CFS's precise stats update delay-accounting to use CFS's precise stats. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:52:00 +02:00
Ingo Molnar	b27f03d4bd	sched: make use of precise accounting for /proc task stats make use of CFS's precise accounting to drive /proc/<pid>/stat statistics. this code was co-authored by: Balbir Singh <balbir@linux.vnet.ibm.com> Dmitry Adamushko <dmitry.adamushko@gmail.com> Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>	2007-07-09 18:51:59 +02:00
Ingo Molnar	62480d13d5	sched: remove the SleepAVG field remove the SleepAVG field from /proc/<pid>/status, as with the removal of the sleep-average code this value no longer makes sense. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:51:59 +02:00
Steven Whitehouse	a0a24741ca	[GFS2] Small fixes to logging code This reverts part of an earlier patch which tried to reclaim gfs2_bufdata structures too early and resulted in a "use after free" case (this bit from me). Also a change to not write out log headers unless we really need to (in the case of flushing nothing we don't need a header) from Bob. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2007-07-09 15:43:07 +01:00
Steve French	b609f06ac4	[CIFS] Fix packet signatures for NTLMv2 case Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-09 07:55:14 +00:00
David Teigland	ac90a25525	[DLM] dump more lock values Add two more output fields (lkb_flags and rsb nodeid) to the new debugfs file that dumps one lock per line. Also, dump all locks instead of just mastered locks. Accordingly, use a suffix of _locks instead of _master. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:13 +01:00
Wendy Cheng	35dcc52e3a	[GFS2] Remove i_mode passing from NFS File Handle GFS2 has been passing i_mode within NFS File Handle. Other than the wrong assumption that there is always room for this extra 16 bit value, the current gfs2_get_dentry doesn't really need the i_mode to work correctly. Note that GFS2 NFS code does go thru the same lookup code path as direct file access route (where the mode is obtained from name lookup) but gfs2_get_dentry() is coded for different purpose. It is not used during lookup time. It is part of the file access procedure call. When the call is invoked, if on-disk inode is not in-memory, it has to be read-in. This makes i_mode passing a useless overhead. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:11 +01:00
Wendy Cheng	bb9bcf0616	[GFS2] Obtaining no_formal_ino from directory entry GFS2 lookup code doesn't ask for inode shared glock. This implies during in-memory inode creation for existing file, GFS2 will not disk-read in the inode contents. This leaves no_formal_ino un-initialized during lookup time. The un-initialized no_formal_ino is subsequently encoded into file handle. Clients will get ESTALE error whenever it tries to access these files. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:08 +01:00
akpm@linux-foundation.org	f4fadb23ca	[GFS2] git-gfs2-nmw-build-fix Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:06 +01:00
Abhijith Das	b365762924	[GFS2] System won't suspend with GFS2 file system mounted The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad, gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend mechanism was trying to freeze them. I put in calls to refrigerator() in the loops for all the daemons and suspend works as expected. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:04 +01:00
Bob Peterson	569a7b6c2e	[GFS2] remounting w/o acl option leaves acls enabled This patch is for bugzilla bug #245663. This crosswrites a fix from gfs1 (bz #210369) so that the mount options are reset properly upon remount. This was tested on system trin-10. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:01 +01:00
Wendy Cheng	090ffaa55d	[GFS2] inode size inconsistency This should have been part of the NFS patch #1 but somehow I missed it when packaging the patches. It is not a critical issue as the others (I hope). RHEL 5.1 31.el5 kernel runs fine without this change. Our truncate code is chopped into two parts, one for vfs inode changes (in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two operatons are, unfortunately, not atomic. So it could happens that vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei fails (say kernel temporarily out of memory). This would leave gfs inode i_di.di_size out of sync with vfs inode i_size. It will later confuse gfs2_commit_write() if a write is issued. Last time I checked, it will cause file corruption. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:59 +01:00
Patrick Caulfield	97d848365e	[DLM] Telnet to port 21064 can stop all lockspaces This patch fixes Red Hat bz#245892 Opening a tcp connection from a cluster member to another cluster member targeting the dlm port it is enough to stop every dlm operation in the cluster. This means that GFS and rgmanager will hang. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:57 +01:00
S. Wendy Cheng	1875f2f31b	[GFS2] Fix gfs2_block_truncate_page err return Code segment inside gfs2_block_truncate_page() doesn't set the return code correctly. This causes NFSD erroneously returns EIO back to client with setattr procedure call (truncate error). Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:54 +01:00
Robert Peterson	773ed1a044	[GFS2] Addendum to the journaled file/unmount patch This patch is an addendum to the previous journaled file/unmount patch. It fixes a problem discovered during testing. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:52 +01:00
Steven Whitehouse	eaf5bd3cac	[GFS2] Simplify multiple glock aquisition There is a bug in the code which acquires multiple glocks where if the initial out-of-order attempt fails part way though we can land up trying to acquire the wrong number of glocks. This is part of the fix for red hat bz #239737. The other part of the bz doesn't apply to upstream kernels since it was fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6 Since the out-of-order code doesn't appear to add anything to the performance of GFS2, this patch just removed it rather than trying to fix it. It should be much easier to see whats going on here now. In addition, we don't allocate any memory unless we are using a lot of glocks (which is a relatively uncommon case). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:50 +01:00
Robert Peterson	2332c4435b	[GFS2] assertion failure after writing to journaled file, umount This patch passes all my nasty tests that were causing the code to fail under one circumstance or another. Here is a complete summary of all changes from today's git tree, in order of appearance: 1. There are now separate variables for metadata buffer accounting. 2. Variable sd_log_num_hdrs is no longer needed, since the header accounting is taken care of by the reserve/refund sequence. 3. Fixed a tiny grammatical problem in a comment. 4. Added a new function "calc_reserved" to calculate the reserved log space. This isn't entirely necessary, but it has two benefits: First, it simplifies the gfs2_log_refund function greatly. Second, it allows for easier debugging because I could sprinkle the code with calls to this function to make sure the accounting is proper (by adding asserts and printks) at strategic point of the code. 5. In log_pull_tail there apparently was a kludge to fix up the accounting based on a "pull" parameter. The buffer accounting is now done properly, so the kludge was removed. 6. File sync operations were making a call to gfs2_log_flush that writes another journal header. Since that header was unplanned for (reserved) by the reserve/refund sequence, the free space had to be decremented so that when log_pull_tail gets called, the free space is be adjusted properly. (Did I hear you call that a kludge? well, maybe, but a lot more justifiable than the one I removed). 7. In the gfs2_log_shutdown code, it optionally syncs the log by specifying the PULL parameter to log_write_header. I'm not sure this is necessary anymore. It just seems to me there could be cases where shutdown is called while there are outstanding log buffers. 8. In the (data)buf_lo_before_commit functions, I changed some offset values from being calculated on the fly to being constants. That simplified some code and we might as well let the compiler do the calculation once rather than redoing those cycles at run time. 9. This version has my rewritten databuf_lo_add function. This version is much more like its predecessor, buf_lo_add, which makes it easier to understand. Again, this might not be necessary, but it seems as if this one works as well as the previous one, maybe even better, so I decided to leave it in. 10. In databuf_lo_before_commit, a previous data corruption problem was caused by going off the end of the buffer. The proper solution is to have the proper limit in place, rather than stopping earlier. (Thus my previous attempt to fix it is wrong). If you don't wrap the buffer, you're stopping too early and that causes more log buffer accounting problems. 11. In lops.h there are two new (previously mentioned) constants for figuring out the data offset for the journal buffers. 12. There are also two new functions, buf_limit and databuf_limit to calculate how many entries will fit in the buffer. 13. In function gfs2_meta_wipe, it needs to distinguish between pinned metadata buffers and journaled data buffers for proper journal buffer accounting. It can't use the JDATA gfs2_inode flag because it's sometimes passed the "real" inode and sometimes the "metadata inode" and the inode flags will be random bits in a metadata gfs2_inode. It needs to base its decision on which was passed in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:47 +01:00
Steven Whitehouse	2840501ac8	[GFS2] Use zero_user_page() in stuffed_readpage() As suggested by Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Robert P. J. Day <rpjday@mindspring.com>	2007-07-09 08:23:45 +01:00
Steven Whitehouse	c4201214cb	[GFS2] Remove bogus '\0' in rgrp.c Not sure how it slipped in, but we don't want it anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:43 +01:00
Robert Peterson	8fb68595d5	[GFS2] Journaled file write/unstuff bug This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:40 +01:00
David Teigland	fad59c1390	[DLM] don't require FS flag on all nodes Mask off the recently added DLM_LSFL_FS flag when setting the exflags. This way all the nodes in the lockspace aren't required to have the FS flag set, since we later check that exflags matches among all nodes. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:38 +01:00
Abhijith Das	d93cfa9884	[GFS2] Fix deallocation issues There were two issues during deallocation of unlinked inodes. The first was relating to the use of a "try" lock which in the case of the inode lock wasn't trying hard enough to deallocate in all circumstances (now changed to a normal glock) and in the case of the iopen lock didn't wait for the demotion of the shared lock before attempting to get the exclusive lock, and thereby sometimes (timing dependent) not completing the deallocation when it should have done. The second issue related to the lack of a way to invalidate dcache entries on remote nodes (now fixed by this patch) which meant that unlinks were taking a long time to return disk space to the fs. By adding some code to invalidate the dcache entries across the cluster for unlinked inodes, that is now fixed. This patch was written jointly by Abhijith Das and Steven Whitehouse. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:36 +01:00
David Teigland	a7a2ff8a95	[GFS2] return conflicts for GETLK We weren't returning the correct result when GETLK found a conflict, which is indicated by userspace passing back a 1. Signed-off-by: Abhijith Das <adas redhat com> Signed-off-by: David Teigland <teigland redhat com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:33 +01:00
David Teigland	d88101d4d8	[GFS2] set plock owner in GETLK info Set the owner field in the plock info sent to userspace for GETLK. Without this, gfs_controld won't correctly see when the GETLK from a process matches one of the process's existing locks. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:31 +01:00
akpm@linux-foundation.org	037bcbb756	[GFS2] gfs2_lookupi() uninitialised var fix fs/gfs2/inode.c: In function 'gfs2_lookupi': fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function Looks like a real bug to me. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:29 +01:00
Steven Whitehouse	c8cdf47937	[GFS2] Recovery for lost unlinked inodes Under certain circumstances its possible (though rather unlikely) that inodes which were unlinked by one node while still open on another might get "lost" in the sense that they don't get deallocated if the node which held the inode open crashed before it was unlinked. This patch adds the recovery code which allows automatic deallocation of the inode if its found during block allocation (the sensible time to look for such inodes since we are scanning the rgrp's bitmaps anyway at this time, so it adds no overhead to do this). Since the inode will have had its i_nlink set to zero, all we need to trigger recovery is a lookup and an iput(), and the normal deallocation code takes care of the rest. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:26 +01:00
Robert Peterson	b35997d448	[GFS2] Can't mount GFS2 file system on AoE device This patch fixes bug 243131: Can't mount GFS2 file system on AoE device. When using AoE devices with lock_nolock, there is no locking table, so gfs2 (and gfs1) uses the superblock s_id. This turns out to be the device name in some cases. In the case of AoE, the device contains a slash, (e.g. "etherd/e1.1p2") which is an invalid character when we try to register the table in sysfs. This patch replaces the "/" with underscore. Rather than add a new variable to the stack, I'm just reusing a (char *) variable that's no longer used: table. This code has been tested on the failing system using a RHEL5 patch. The upstream code was tested by using gfs2_tool sb to interject a "/" into the table name of a clustered gfs2 file system. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:24 +01:00
Steven Whitehouse	e1cc86037b	[GFS2] Fix bug in error path of inode This fixes a bug in the ordering of operations in the error path of createi. Its not valid to do an iput() when holding the inode's glock since the iput() will (in this case) result in delete_inode() being called which needs to grab the lock itself. This was causing the recursive lock checking code to trigger. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:22 +01:00
Steven Whitehouse	ffed8ab342	[GFS2] Fix typo in rename of directories A typo caused us to pass a NULL pointer when renaming directories. It was accidentally introduced in: [GFS2] Clean up inode number handling Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:19 +01:00
Patrick Caulfield	44f487a553	[DLM] variable allocation Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:17 +01:00
Josef Bacik	292e539e93	[DLM] fix reference counting This is a fix for the patch 021d2ff3a08019260a1dc002793c92d6bf18afb6 I left off a dlm_hold_rsb which causes the box to panic if you try to use debugfs. This patch fixes the problem. Sorry about that, Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:15 +01:00

... 5 6 7 8 9 ...

6296 Commits