linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Chris Mason	4184ea7f90	Btrfs: Fix locking around adding new space_info Storage allocated to different raid levels in btrfs is tracked by a btrfs_space_info structure, and all of the current space_infos are collected into a list_head. Most filesystems have 3 or 4 of these structs total, and the list is only changed when new raid levels are added or at unmount time. This commit adds rcu locking on the list head, and properly frees things at unmount time. It also clears the space_info->full flag whenever new space is added to the FS. The locking for the space info list goes like this: reads: protected by rcu_read_lock() writes: protected by the chunk_mutex At unmount time we don't need special locking because all the readers are gone. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-03-10 12:39:20 -04:00
Linus Torvalds	1c91ffc896	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix spinlock assertions on UP systems	2009-03-09 09:13:16 -07:00
Chris Mason	b9447ef80b	Btrfs: fix spinlock assertions on UP systems btrfs_tree_locked was being used to make sure a given extent_buffer was properly locked in a few places. But, it wasn't correct for UP compiled kernels. This switches it to using assert_spin_locked instead, and renames it to btrfs_assert_tree_locked to better reflect how it was really being used. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-03-09 11:45:38 -04:00
Linus Torvalds	5b61f6accf	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix ext4_free_inode() vs. ext4_claim_inode() race	2009-03-08 10:24:57 -07:00
Christoph Hellwig	c141b2928f	xfs: only issues a cache flush on unmount if barriers are enabled Currently we unconditionally issue a flush from xfs_free_buftarg, but since 2.6.29-rc1 this gives a warning in the style of end_request: I/O error, dev vdb, sector 0 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-03-06 17:35:12 -06:00
Christoph Hellwig	7d46be4a25	xfs: prevent lockdep false positive in xfs_iget_cache_miss The inode can't be locked by anyone else as we just created it a few lines above and it's not been added to any lookup data structure yet. So use a trylock that must succeed to get around the lockdep warnings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-03-06 17:34:59 -06:00
Christoph Hellwig	ff392c497b	xfs: prevent kernel crash due to corrupted inode log format Andras Korn reported an oops on log replay causes by a corrupted xfs_inode_log_format_t passing a 0 size to kmem_zalloc. This patch handles to small or too large numbers of log regions gracefully by rejecting the log replay with a useful error message. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Andras Korn <korn-sgi.com@chardonnay.math.bme.hu> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-03-06 17:34:45 -06:00
Roel Kluin	f4f8056a86	Squashfs: frag_size should be signed, as it can hold an error result Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2009-03-05 00:55:31 +00:00
Phillip Lougher	118e1ef6fa	Squashfs: Fix oops when reading fsfuzzer corrupted filesystems This fixes a code regression caused by the recent mainlining changes. The recent code changes call zlib_inflate repeatedly, decompressing into separate 4K buffers, this code didn't check for the possibility that zlib_inflate might ask for too many buffers when decompressing corrupted data. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2009-03-05 00:31:12 +00:00
Eric Sandeen	7ce9d5d1f3	ext4: fix ext4_free_inode() vs. ext4_claim_inode() race I was seeing fsck errors on inode bitmaps after a 4 thread dbench run on a 4 cpu machine: Inode bitmap differences: -50736 -(50752--50753) etc... I believe that this is because ext4_free_inode() uses atomic bitops, and although ext4_new_inode() used to also use atomic bitops for synchronization, commit `393418676a` changed this to use the sb_bgl_lock, so that we could also synchronize against read_inode_bitmap and initialization of uninit inode tables. However, that change left ext4_free_inode using atomic bitops, which I think leaves no synchronization between setting & unsetting bits in the inode table. The below patch fixes it for me, although I wonder if we're getting at all heavy-handed with this spinlock... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 18:38:18 -05:00
Linus Torvalds	36b31106b7	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: don't call jbd2_journal_force_commit_nested without journal ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2 ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze()	2009-03-02 15:42:26 -08:00
Christoph Hellwig	5cf8cf4146	Fix FREEZE/THAW compat_ioctl regression Commit `8e961870bb` removed the FREEZE/THAW handling in xfs_compat_ioctl but never added any compat handler back, so now any freeze/thaw request from a 32-bit binary ond 64-bit userspace will fail. As these ioctls are 32/64-bit compatible two simple COMPATIBLE_IOCTL entries in fs/compat_ioctl.c will do the job. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-27 16:27:45 -08:00
Benny Halevy	adc487204a	EXPORT_SYMBOL(d_obtain_alias) rather than EXPORT_SYMBOL_GPL Commit `4ea3ada295` declares d_obtain_alias() as EXPORT_SYMBOL_GPL where it's supposed to replace d_alloc_anon which was previously declared as EXPORT_SYMBOL and thus available to any loadable module. This patch reverts that. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-27 16:26:20 -08:00
Linus Torvalds	221be177e6	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: [MTD] [MAPS] Remove MODULE_DEVICE_TABLE() from ck804rom driver. [JFFS2] fix mount crash caused by removed nodes [JFFS2] force the jffs2 GC daemon to behave a bit better [MTD] [MAPS] blackfin async requires complex mappings [MTD] [MAPS] blackfin: fix memory leak in error path [MTD] [MAPS] physmap: fix wrong free and del_mtd_{partition,device} [MTD] slram: Handle negative devlength correctly [MTD] map_rom has NULL erase pointer [MTD] [LPDDR] qinfo_probe depends on lpddr	2009-02-26 14:45:57 -08:00
wengang wang	28d57d4377	ocfs2: add IO error check in ocfs2_get_sector() Check for IO error in ocfs2_get_sector(). Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:12 -08:00
Tiger Yang	4442f51826	ocfs2: set gap to seperate entry and value when xattr in bucket This patch set a gap (4 bytes) between xattr entry and name/value when xattr in bucket. This gap use to seperate entry and name/value when a bucket is full. It had already been set when xattr in inode/block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Tao Ma	c8b9cf9a7c	ocfs2: lock the metaecc process for xattr bucket For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks with io_mutex held. While for xattr bucket, it is calculated by the whole buckets. So we have to add a spin_lock to prevent multiple processes calculating metaecc. Signed-off-by: Tao Ma <tao.ma@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Tao Ma	89a907afe0	ocfs2: Use the right access_* method in ctime update of xattr. In ctime updating of xattr, it use the wrong type of access for inode, so use ocfs2_journal_access_di instead. Reported-and-Tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Sunil Mushran	53ecd25e14	ocfs2/dlm: Make dlm_assert_master_handler() kill itself instead of the asserter In dlm_assert_master_handler(), if we get an incorrect assert master from a node that, we reply with EINVAL asking the asserter to die. The problem is that an assert is sent after so many hoops, it is invariably the node that thinks the asserter is wrong, is actually wrong. So instead of killing the asserter, this patch kills the assertee. This patch papers over a race that is still being addressed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Sunil Mushran	dabc47de7a	ocfs2/dlm: Use ast_lock to protect ast_list The code was using dlm->spinlock instead of dlm->ast_lock to protect the ast_list. This patch fixes the issue. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Sunil Mushran	c74ff8bb22	ocfs2: Cleanup the lockname print in dlmglue.c The dentry lock has a different format than other locks. This patch fixes ocfs2_log_dlm_error() macro to make it print the dentry lock correctly. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Sunil Mushran	7dc102b737	ocfs2/dlm: Retract fix for race between purge and migrate Mainline commit `d4f7e650e5` attempts to delay the dlm_thread from sending the drop ref message if the lockres is being migrated. The problem is that we make the dlm_thread wait for the migration to complete. This causes a deadlock as dlm_thread also participates in the lockres migration process. A better fix for the original oss bugzilla#1012 is in testing. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Tao Ma	47be12e4ee	ocfs2: Access and dirty the buffer_head in mark_written. In __ocfs2_mark_extent_written, when we meet with the situation of c_split_covers_rec, the old solution just replace the extent record and forget to access and dirty the buffer_head. This will cause a problem when the unwritten extent is in an extent block. So access and dirty it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Linus Torvalds	64e71303e4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: try committing transaction before returning ENOSPC Btrfs: add better -ENOSPC handling	2009-02-26 10:37:00 -08:00
Jens Axboe	b2bf96833c	block: fix bogus gcc warning for uninitialized var usage Newer gcc throw this warning: fs/bio.c: In function ?bio_alloc_bioset?: fs/bio.c:305: warning: ?p? may be used uninitialized in this function since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-26 10:45:48 +01:00
Eric Sandeen	8f64b32eb7	ext4: don't call jbd2_journal_force_commit_nested without journal Running without a journal, I oopsed when I ran out of space, because we called jbd2_journal_force_commit_nested() from ext4_should_retry_alloc() without a journal. This should take care of it, I think. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-26 00:57:35 -05:00
Theodore Ts'o	d8ae4601a4	ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2 In fs/Makefile, ext3 was placed before ext2 so that a root filesystem that possessed a journal, it would be mounted as ext3 instead of ext2. This was necessary because a cleanly unmounted ext3 filesystem was fully backwards compatible with ext2, and could be mounted by ext2 --- but it was desirable that it be mounted with ext3 so that the journaling would be enabled. The ext4 filesystem supports new incompatible features, so there is no danger of an ext4 filesystem being mistaken for an ext2 filesystem. At that point, the relative ordering of ext4 with respect to ext2 didn't matter until ext4 gained the ability to mount filesystems without a journal starting in 2.6.29-rc1. Now that this is the case, given that ext4 is before ext2, it means that root filesystems that were using the plain-jane ext2 format are getting mounted using the ext4 filesystem driver, which is a change in behavior which could be surprising to users. It's doubtful that there are that many ext2-only root filesystem users that would also have ext4 compiled into the kernel, but to adhere to the principle of least surprise, the correct ordering in fs/Makefile is ext3, followed by ext2, and finally ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-28 09:50:01 -05:00
Theodore Ts'o	8b1a8ff8b3	ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze() Commit `c4be0c1d` added error checking to ext4_freeze() when calling ext4_commit_super(). Unfortunately the patch failed to remove the original call to ext4_commit_super(), with the net result that when freezing the filesystem, the superblock gets written twice, the first time without error checking. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-28 00:08:53 -05:00
Linus Torvalds	694593e337	Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: proc: fix PG_locked reporting in /proc/kpageflags	2009-02-24 15:42:08 -08:00
Linus Torvalds	4daa0682af	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin() ext4: Add fallback for find_group_flex	2009-02-24 15:39:34 -08:00
Helge Bahmann	e07a4b9217	proc: fix PG_locked reporting in /proc/kpageflags Expr always evaluates to zero. Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-02-24 21:17:58 +03:00
Krzysztof Sachanowicz	cac711211a	proc: proc_get_inode should de_put when inode already initialized de_get is called before every proc_get_inode, but corresponding de_put is called only when dropping last reference to an inode. This might cause something like remove_proc_entry: /proc/stats busy, count=14496 to be printed to the syslog. The fix is to call de_put in case of an already initialized inode in proc_get_inode. Signed-off-by: Krzysztof Sachanowicz <analyzer1@gmail.com> Tested-by: Marcin Pilipczuk <marcin.pilipczuk@gmail.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-23 18:25:32 -08:00
Jan Kara	ebd3610b11	ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin() Functions ext4_write_begin() and ext4_da_write_begin() call grab_cache_page_write_begin() without AOP_FLAG_NOFS. Thus it can happen that page reclaim is triggered in that function and it recurses back into the filesystem (or some other filesystem). But this can lead to various problems as a transaction is already started at that point. Add the necessary flag. http://bugzilla.kernel.org/show_bug.cgi?id=11688 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-22 21:09:59 -05:00
Theodore Ts'o	05bf9e839d	ext4: Add fallback for find_group_flex This is a workaround for find_group_flex() which badly needs to be replaced. One of its problems (besides ignoring the Orlov algorithm) is that it is a bit hyperactive about returning failure under suspicious circumstances. This can lead to spurious ENOSPC failures even when there are inodes still available. Work around this for now by retrying the search using find_group_other() if find_group_flex() returns -1. If find_group_other() succeeds when find_group_flex() has failed, log a warning message. A better block/inode allocator that will fix this problem for real has been queued up for the next merge window. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-21 12:13:24 -05:00
Linus Torvalds	710320d579	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts [CIFS] improve posix semantics of file create [CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS cifs: posix fill in inode needed by posix open cifs: properly handle case where CIFSGetSrvInodeNumber fails cifs: refactor new_inode() calls and inode initialization [CIFS] Prevent OOPs when mounting with remote prefixpath. [CIFS] ipv6_addr_equal for address comparison	2009-02-21 09:11:28 -08:00
Thomas Gleixner	4c41bd0ec9	[JFFS2] fix mount crash caused by removed nodes At scan time we observed following scenario: node A inserted node B inserted node C inserted -> sets overlapped flag on node B node A is removed due to CRC failure -> overlapped flag on node B remains while (tn->overlapped) tn = tn_prev(tn); ==> crash, when tn_prev(B) is referenced. When the ultimate node is removed at scan time and the overlapped flag is set on the penultimate node, then nothing updates the overlapped flag of that node. The overlapped iterators blindly expect that the ultimate node does not have the overlapped flag set, which causes the scan code to crash. It would be a huge overhead to go through the node chain on node removal and fix up the overlapped flags, so detecting such a case on the fly in the overlapped iterators is a simpler and reliable solution. Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-02-21 11:09:29 +01:00
Steve French	eca6acf915	[CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts When two different users mount the same Windows 2003 Server share using CIFS, the first session mounted can be invalidated. Some servers invalidate the first smb session when a second similar user (e.g. two users who get mapped by server to "guest") authenticates an smb session from the same client. By making sure that we set the 2nd and subsequent vc numbers to nonzero values, this ensures that we will not have this problem. Fixes Samba bug 6004, problem description follows: How to reproduce: - configure an "open share" (full permissions to Guest user) on Windows 2003 Server (I couldn't reproduce the problem with Samba server or Windows older than 2003) - mount the share twice with different users who will be authenticated as guest. noacl,noperm,user=john,dir_mode=0700,domain=DOMAIN,rw noacl,noperm,user=jeff,dir_mode=0700,domain=DOMAIN,rw Result: - just the mount point mounted last is accessible: Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:37:10 +00:00
Steve French	c3b2a0c640	[CIFS] improve posix semantics of file create Samba server added support for a new posix open/create/mkdir operation a year or so ago, and we added support to cifs for mkdir to use it, but had not added the corresponding code to file create. The following patch helps improve the performance of the cifs create path (to Samba and servers which support the cifs posix protocol extensions). Using Connectathon basic test1, with 2000 files, the performance improved about 15%, and also helped reduce network traffic (17% fewer SMBs sent over the wire) due to saving a network round trip for the SetPathInfo on every file create. It should also help the semantics (and probably the performance) of write (e.g. when posix byte range locks are on the file) on file handles opened with posix create, and adds support for a few flags which would have to be ignored otherwise. Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:37:09 +00:00
Steve French	69765529d7	[CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS Fixes kernel bug #10451 http://bugzilla.kernel.org/show_bug.cgi?id=10451 Certain NAS appliances do not set the operating system or network operating system fields in the session setup response on the wire. cifs was oopsing on the unexpected zero length response fields (when trying to null terminate a zero length field). This fixes the oops. Acked-by: Jeff Layton <jlayton@redhat.com> CC: stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:37:09 +00:00
Jeff Layton	44f68fadd8	cifs: posix fill in inode needed by posix open function needed to prepare for posix open Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:37:08 +00:00
Jeff Layton	950ec52880	cifs: properly handle case where CIFSGetSrvInodeNumber fails ...if it does then we pass a pointer to an unintialized variable for the inode number to cifs_new_inode. Have it pass a NULL pointer instead. Also tweak the function prototypes to reduce the amount of casting. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:37:08 +00:00
Jeff Layton	132ac7b77c	cifs: refactor new_inode() calls and inode initialization Move new inode creation into a separate routine and refactor the callers to take advantage of it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:37:07 +00:00
Igor Mammedov	e4cce94c9c	[CIFS] Prevent OOPs when mounting with remote prefixpath. Fixes OOPs with message 'kernel BUG at fs/cifs/cifs_dfs_ref.c:274!'. Checks if the prefixpath in an accesible while we are still in cifs_mount and fails with reporting a error if we can't access the prefixpath Should fix Samba bugs 6086 and 5861 and kernel bug 12192 Signed-off-by: Igor Mammedov <niallain@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-02-21 03:36:21 +00:00
Linus Torvalds	264b299006	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check file pointer in btrfs_sync_file	2009-02-20 17:59:14 -08:00
Josef Bacik	4e06bdd6cb	Btrfs: try committing transaction before returning ENOSPC This fixes a problem where we could return -ENOSPC when we may actually have plenty of space, the space is just pinned. Instead of returning -ENOSPC immediately, commit the transaction first and then try and do the allocation again. This patch also does chunk allocation for metadata if we pass the 80% threshold for metadata space. This will help with stack usage since the chunk allocation will happen early on, instead of when the allocation is happening. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-20 10:59:53 -05:00
Josef Bacik	6a63209fc0	Btrfs: add better -ENOSPC handling This is a step in the direction of better -ENOSPC handling. Instead of checking the global bytes counter we check the space_info bytes counters to make sure we have enough space. If we don't we go ahead and try to allocate a new chunk, and then if that fails we return -ENOSPC. This patch adds two counters to btrfs_space_info, bytes_delalloc and bytes_may_use. bytes_delalloc account for extents we've actually setup for delalloc and will be allocated at some point down the line. bytes_may_use is to keep track of how many bytes we may use for delalloc at some point. When we actually set the extent_bit for the delalloc bytes we subtract the reserved bytes from the bytes_may_use counter. This keeps us from not actually being able to allocate space for any delalloc bytes. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-20 11:00:09 -05:00
Chris Mason	2cfbd50b53	Btrfs: check file pointer in btrfs_sync_file fsync can be called by NFS with a null file pointer, and btrfs was oopsing in this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-20 10:55:10 -05:00
Linus Torvalds	620565ef5f	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: Revert "[XFS] remove old vmap cache" Revert "[XFS] use scalable vmap API"	2009-02-19 13:09:32 -08:00
Felix Blyakher	27e88bf6af	Revert "[XFS] remove old vmap cache" This reverts commit `d2859751cd`. This commit caused regression. We'll try to fix use of new vmap API for next release. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-02-19 13:15:55 -06:00
Felix Blyakher	7fdf582447	Revert "[XFS] use scalable vmap API" This reverts commit `95f8e302c0`. This commit caused regression. We'll try to fix use of new vmap API for next release. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-02-19 13:15:44 -06:00
Linus Torvalds	ba95fd47d1	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list block: fix booting from partitioned md array block: revert part of `18ce3751cc` cciss: PCI power management reset for kexec paride/pg.c: xs(): &&/\|\| confusion fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free block: fix bad definition of BIO_RW_SYNC bsg: Fix sense buffer bug in SG_IO	2009-02-18 18:33:04 -08:00
Ingo Molnar	f04b30de3c	inotify: fix GFP_KERNEL related deadlock Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep assert: [ 1093.677775] [ 1093.677781] ================================= [ 1093.680031] [ INFO: inconsistent lock state ] [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1 [ 1093.680031] --------------------------------- [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 1093.680031] (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80 [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at: [ 1093.680031] [<c01696b9>] mark_held_locks+0x43/0x5b [ 1093.680031] [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e [ 1093.680031] [<c01cf8b0>] kmem_cache_alloc+0x20/0x150 [ 1093.680031] [<c040d0ec>] idr_pre_get+0x27/0x6c [ 1093.680031] [<c02056e3>] inotify_handle_get_wd+0x25/0xad [ 1093.680031] [<c0205f43>] inotify_add_watch+0x7a/0x129 [ 1093.680031] [<c020679e>] sys_inotify_add_watch+0x20f/0x250 [ 1093.680031] [<c010389e>] sysenter_do_call+0x12/0x35 [ 1093.680031] [<ffffffff>] 0xffffffff [ 1093.680031] irq event stamp: 60417 [ 1093.680031] hardirqs last enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59 [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59 [ 1093.680031] softirqs last enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d [ 1093.680031] [ 1093.680031] other info that might help us debug this: [ 1093.680031] 2 locks held by kswapd0/308: [ 1093.680031] #0: (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189 [ 1093.680031] #1: (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb [ 1093.680031] [ 1093.680031] stack backtrace: [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1 [ 1093.680031] Call Trace: [ 1093.680031] [<c016947a>] valid_state+0x12a/0x13d [ 1093.680031] [<c016954e>] mark_lock+0xc1/0x1e9 [ 1093.680031] [<c016a5b4>] ? check_usage_forwards+0x0/0x3f [ 1093.680031] [<c016ab74>] __lock_acquire+0x2c6/0xac8 [ 1093.680031] [<c01688d9>] ? register_lock_class+0x17/0x228 [ 1093.680031] [<c016b3d3>] lock_acquire+0x5d/0x7a [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c08824c4>] __mutex_lock_common+0x3a/0x4cb [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c08829ed>] mutex_lock_nested+0x2e/0x36 [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c0205942>] inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c01e6672>] dentry_iput+0x90/0xc2 [ 1093.680031] [<c01e67a3>] d_kill+0x21/0x45 [ 1093.680031] [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355 [ 1093.680031] [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb [ 1093.680031] [<c01b05ed>] shrink_slab+0x121/0x189 [ 1093.680031] [<c01b0d12>] kswapd+0x39f/0x561 [ 1093.680031] [<c01ae499>] ? isolate_pages_global+0x0/0x233 [ 1093.680031] [<c0157eae>] ? autoremove_wake_function+0x0/0x43 [ 1093.680031] [<c01b0973>] ? kswapd+0x0/0x561 [ 1093.680031] [<c0157daf>] kthread+0x41/0x82 [ 1093.680031] [<c0157d6e>] ? kthread+0x0/0x82 [ 1093.680031] [<c01043ab>] kernel_thread_helper+0x7/0x10 inotify_handle_get_wd() does idr_pre_get() which does a kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under extreme MM pressure. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: MinChan Kim <minchan.kim@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:56 -08:00
Bill Nottingham	2db69a9340	vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls. Otherwise, these don't work when called from 32-bit userspace on 64-bit kernels. Cc: Jiri Kosina <jkosina@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:56 -08:00
Peter Zijlstra	ada723dcd6	fs/super.c: add lockdep annotation to s_umount Li Zefan said: Thread 1: for ((; ;)) { mount -t cpuset xxx /mnt > /dev/null 2>&1 cat /mnt/cpus > /dev/null 2>&1 umount /mnt > /dev/null 2>&1 } Thread 2: for ((; ;)) { mount -t cpuset xxx /mnt > /dev/null 2>&1 umount /mnt > /dev/null 2>&1 } (Note: It is irrelevant which cgroup subsys is used.) After a while a lockdep warning showed up: ============================================= [ INFO: possible recursive locking detected ] 2.6.28 #479 --------------------------------------------- mount/13554 is trying to acquire lock: (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321 but task is already holding lock: (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321 other info that might help us debug this: 1 lock held by mount/13554: #0: (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321 stack backtrace: Pid: 13554, comm: mount Not tainted 2.6.28-mc #479 Call Trace: [<c044ad2e>] validate_chain+0x4c6/0xbbd [<c044ba9b>] __lock_acquire+0x676/0x700 [<c044bb82>] lock_acquire+0x5d/0x7a [<c049d888>] ? sget+0x5e/0x321 [<c061b9b8>] down_write+0x34/0x50 [<c049d888>] ? sget+0x5e/0x321 [<c049d888>] sget+0x5e/0x321 [<c045a2e7>] ? cgroup_set_super+0x0/0x3e [<c045959f>] ? cgroup_test_super+0x0/0x2f [<c045bcea>] cgroup_get_sb+0x98/0x2e7 [<c045cfb6>] cpuset_get_sb+0x4a/0x5f [<c049dfa4>] vfs_kern_mount+0x40/0x7b [<c049e02d>] do_kern_mount+0x37/0xbf [<c04af4a0>] do_mount+0x5c3/0x61a [<c04addd2>] ? copy_mount_options+0x2c/0x111 [<c04af560>] sys_mount+0x69/0xa0 [<c0403251>] sysenter_do_call+0x12/0x31 The cause is after alloc_super() and then retry, an old entry in list fs_supers is found, so grab_super(old) is called, but both functions hold s_umount lock: struct super_block *sget(...) { ... retry: spin_lock(&sb_lock); if (test) { list_for_each_entry(old, &type->fs_supers, s_instances) { if (!test(old, data)) continue; if (!grab_super(old)) <--- 2nd: down_write(&old->s_umount); goto retry; if (s) destroy_super(s); return old; } } if (!s) { spin_unlock(&sb_lock); s = alloc_super(type); <--- 1th: down_write(&s->s_umount) if (!s) return ERR_PTR(-ENOMEM); goto retry; } ... } It seems like a false positive, and seems like VFS but not cgroup needs to be fixed. Peter said: We can simply put the new s_umount instance in a but lockdep doesn't particularly cares about subclass order. If there's any issue with the callers of sget() assuming the s_umount lock being of sublcass 0, then there is another annotation we can use to fix that, but lets not bother with that if this is sufficient. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Reported-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Menage <menage@google.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:55 -08:00
Nick Piggin	1cf6e7d83b	mm: task dirty accounting fix YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty). Additionally, there is some inconsistency about when task_dirty_inc is called. It is used for dirty balancing, however it even gets called for __set_page_dirty_no_writeback. So rather than increment it in a set_page_dirty wrapper, move it down to exactly where the dirty page accounting stats are incremented. Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:54 -08:00
Davide Libenzi	610d18f412	timerfd: add flags check As requested by Michael, add a missing check for valid flags in timerfd_settime(), and make it return EINVAL in case some extra bits are set. Michael said: If this is to be any use to userland apps that want to check flag support (perhaps it is too late already), then the sooner we get it into the kernel the better: 2.6.29 would be good; earlier stables as well would be even better. [akpm@linux-foundation.org: remove unused TFD_FLAGS_SET] Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> [2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:53 -08:00
Eric Biederman	8f19d47293	seq_file: properly cope with pread Currently seq_read assumes that the offset passed to it is always the offset it passed to user space. In the case pread this assumption is broken and we do the wrong thing when presented with pread. To solve this I introduce an offset cache inside of struct seq_file so we know where our logical file position is. Then in seq_read if we try to read from another offset we reset our data structures and attempt to go to the offset user space wanted. [akpm@linux-foundation.org: restore FMODE_PWRITE] [pjt@google.com: seq_open needs its fmode opened up to take advantage of this] Signed-off-by: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Turner <pjt@google.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:53 -08:00
Jens Axboe	78f707bfc7	block: revert part of `18ce3751cc` The above commit added WRITE_SYNC and switched various places to using that for committing writes that will be waited upon immediately after submission. However, this causes a performance regression with AS and CFQ for ext3 at least, since sync_dirty_buffer() will submit some writes with WRITE_SYNC while ext3 has sumitted others dependent writes without the sync flag set. This causes excessive anticipation/idling in the IO scheduler because sync and async writes get interleaved, causing a big performance regression for the below test case (which is meant to simulate sqlite like behaviour). ---- test case ---- int main(int argc, char *argv) { int fdes, i; FILE fp; struct timeval start; struct timeval end; struct timeval res; gettimeofday(&start, NULL); for (i=0; i<ROWS; i++) { fp = fopen("test_file", "a"); fprintf(fp, "Some Text Data\n"); fdes = fileno(fp); fsync(fdes); fclose(fp); } gettimeofday(&end, NULL); timersub(&end, &start, &res); fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS, (res.tv_sec*1000000 + res.tv_usec)/1000); return 0; } ------------------- Thanks to Sean.White@APCC.com for tracking down this performance regression and providing a test case. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:01 +01:00
Subhash Peddamallu	a60e78e57a	fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free When freeing from bio pool use right ptr to account for bs->front_pad, instead of bio ptr, Signed-off-by: Subhash Peddamallu <subhash.peddamallu@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:01 +01:00
Linus Torvalds	48c0d9ece3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: hold trans_mutex when using btrfs_record_root_in_trans Btrfs: make a lockdep class for the extent buffer locks Btrfs: fs/btrfs/volumes.c: remove useless kzalloc Btrfs: remove unused code in split_state() Btrfs: remove btrfs_init_path Btrfs: balance_level checks !child after access Btrfs: Avoid using __GFP_HIGHMEM with slab allocator Btrfs: don't clean old snapshots on sync(1) Btrfs: use larger metadata clusters in ssd mode Btrfs: process mount options on mount -o remount, Btrfs: make sure all pending extent operations are complete	2009-02-17 14:19:14 -08:00
Linus Torvalds	3512a79dbc	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling ext4: Implement range_cyclic in ext4_da_writepages instead of write_cache_pages ext4: Initialize preallocation list_head's properly ext4: Fix lockdep warning ext4: Fix to read empty directory blocks correctly in 64k jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() Revert "ext4: wait on all pending commits in ext4_sync_fs()" jbd2: Fix return value of jbd2_journal_start_commit()	2009-02-17 14:05:05 -08:00
Al Viro	1a88b5364b	Fix incomplete __mntput locking Getting this wrong caused WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() due to optimistically checking cpu_writer->mnt outside the spinlock. Here's what we really want: * we know that nobody will set cpu_writer->mnt to mnt from now on * all changes to that sucker are done under cpu_writer->lock * we want the laziest equivalent of spin_lock(&cpu_writer->lock); if (likely(cpu_writer->mnt != mnt)) { spin_unlock(&cpu_writer->lock); continue; } /* do stuff */ that would make sure we won't miss earlier setting of ->mnt done by another CPU. Anyway, for now we just move the spin_lock() earlier and move the test into the properly locked region. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-and-tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-17 14:02:08 -08:00
Dan Carpenter	090542641d	ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling This was found through a code checker (http://repo.or.cz/w/smatch.git/). It looks like you might be able to trigger the error by trying to migrate a readonly file system. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-15 20:02:19 -05:00
Aneesh Kumar K.V	2acf2c261b	ext4: Implement range_cyclic in ext4_da_writepages instead of write_cache_pages With delayed allocation we lock the page in write_cache_pages() and try to build an in memory extent of contiguous blocks. This is needed so that we can get large contiguous blocks request. If range_cyclic mode is enabled, write_cache_pages() will loop back to the 0 index if no I/O has been done yet, and try to start writing from the beginning of the range. That causes an attempt to take the page lock of lower index page while holding the page lock of higher index page, which can cause a dead lock with another writeback thread. The solution is to implement the range_cyclic behavior in ext4_da_writepages() instead. http://bugzilla.kernel.org/show_bug.cgi?id=12579 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-14 10:42:58 -05:00
Aneesh Kumar K.V	d794bf8e09	ext4: Initialize preallocation list_head's properly When creating a new ext4_prealloc_space structure, we have to initialize its list_head pointers before we add them to any prealloc lists. Otherwise, with list debug enabled, we will get list corruption warnings. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-14 10:31:16 -05:00
Andres Salomon	efab0b5d3e	[JFFS2] force the jffs2 GC daemon to behave a bit better I've noticed some pretty poor behavior on OLPC machines after bootup, when gdm/X are starting. The GCD monopolizes the scheduler (which in turns means it gets to do more nand i/o), which results in processes taking much much longer than they should to start. As an example, on an OLPC machine going from OFW to a usable X (via auto-login gdm) takes 2m 30s. The majority of this time is consumed by the switch into graphical mode. With this patch, we cut a full 60s off of bootup time. After bootup, things are much snappier as well. Note that we have seen a CRC node error with this patch that causes the machine to fail to boot, but we've also seen that problem without this patch. Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-02-14 08:59:04 +00:00
Yan Zheng	2456242530	Btrfs: hold trans_mutex when using btrfs_record_root_in_trans btrfs_record_root_in_trans needs the trans_mutex held to make sure two callers don't race to setup the root in a given transaction. This adds it to all the places that were missing it. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2009-02-12 14:14:53 -05:00
Chris Mason	4008c04a07	Btrfs: make a lockdep class for the extent buffer locks Btrfs is currently using spin_lock_nested with a nested value based on the tree depth of the block. But, this doesn't quite work because the max tree depth is bigger than what spin_lock_nested can deal with, and because locks are sometimes taken before the level field is filled in. The solution here is to use lockdep_set_class_and_name instead, and to set the class before unlocking the pages when the block is read from the disk and just after init of a freshly allocated tree block. btrfs_clear_path_blocking is also changed to take the locks in the proper order, and it also makes sure all the locks currently held are properly set to blocking before it tries to retake the spinlocks. Otherwise, lockdep gets upset about bad lock orderin. The lockdep magic cam from Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:09:45 -05:00
Julia Lawall	3f3420df50	Btrfs: fs/btrfs/volumes.c: remove useless kzalloc The call to kzalloc is followed by a kmalloc whose result is stored in the same variable. The semantic match that finds the problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,l; position p1,p2; expression ptr != NULL; @@ ( if ((x@p1 = $kmalloc\\|kzalloc\\|kcalloc$(...)) == NULL) S \| x@p1 = $kmalloc\\|kzalloc\\|kcalloc$(...); ... if (x == NULL) S ) <... when != x when != if (...) { <+...x...+> } x->f = E ...> ( return $0\\|<+...x...+>\\|ptr$; \| return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print " file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 10:16:03 -05:00
Qinghuang Feng	a48ddf08ba	Btrfs: remove unused code in split_state() These two lines are not used, remove them. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:25:23 -05:00
Jeff Mahoney	e00f730865	Btrfs: remove btrfs_init_path btrfs_init_path was initially used when the path objects were on the stack. Now all the work is done by btrfs_alloc_path and btrfs_init_path isn't required. This patch removes it, and just uses kmem_cache_zalloc to zero out the object. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:11:25 -05:00
Jeff Mahoney	7951f3cefb	Btrfs: balance_level checks !child after access The BUG_ON() is in the wrong spot. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 10:06:15 -05:00
Yan Zheng	b335b0034e	Btrfs: Avoid using __GFP_HIGHMEM with slab allocator btrfs_releasepage may call kmem_cache_alloc indirectly, and provide same GFP flags it gets to kmem_cache_alloc. So it's possible to use __GFP_HIGHMEM with the slab allocator. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2009-02-12 10:06:04 -05:00
Chris Mason	e1df36d2f1	Btrfs: don't clean old snapshots on sync(1) Cleaning old snapshots can make sync(1) somewhat slow, and some users and applications still use it in a global fsync kind of workload. This patch changes btrfs not to clean old snapshots during sync, which is safe from a FS consistency point of view. The major downside is that it makes it difficult to tell when old snapshots have been reaped and the space they were using has been reclaimed. A new ioctl will be added for this purpose instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:45:08 -05:00
Chris Mason	536ac8ae86	Btrfs: use larger metadata clusters in ssd mode Larger metadata clusters can significantly improve writeback performance on ssd drives with large erasure blocks. The larger clusters make it more likely a given IO will completely overwrite the ssd block, so it doesn't have to do an internal rwm cycle. On spinning media, lager metadata clusters end up spreading out the metadata more over time, which makes fsck slower, so we don't want this to be the default. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:41:38 -05:00
Chris Mason	b288052e17	Btrfs: process mount options on mount -o remount, Btrfs wasn't parsing any new mount options during remount, making it difficult to set mount options on a root drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:37:35 -05:00
Josef Bacik	eb09967089	Btrfs: make sure all pending extent operations are complete Theres a slight problem with finish_current_insert, if we set all to 1 and then go through and don't actually skip any of the extents on the pending list, we could exit right after we've added new extents. This is a problem because by inserting the new extents we could have gotten new COW's to happen and such, so we may have some pending updates to do or even more inserts to do after that. So this patch will only exit if we have never skipped any of the extents in the pending list, and we have no extents to insert, this will make sure that all of the pending work is truly done before we return. I've been running with this patch for a few days with all of my other testing and have not seen issues. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-12 09:27:38 -05:00
Carsten Otte	0e4a9b5928	ext2/xip: refuse to change xip flag during remount with busy inodes For a reason that I was unable to understand in three months of debugging, mount ext2 -o remount stopped working properly when remounting from regular operation to xip, or the other way around. According to a git bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL rework in the vm: commit `70688e4dd1` Author: Nick Piggin <npiggin@suse.de> Date: Mon Apr 28 02:13:02 2008 -0700 xip: support non-struct page backed memory In the failing scenario, the filesystem is mounted read only via root= kernel parameter on s390x. During remount (in rc.sysinit), the inodes of the bash binary and its libraries are busy and cannot be invalidated (the bash which is running rc.sysinit resides on subject filesystem). Afterwards, another bash process (running ifup-eth) recurses into a subshell, runs dup_mm (via fork). Some of the mappings in this bash process were created from inodes that could not be invalidated during remount. Both parent and child process crash some time later due to inconsistencies in their address spaces. The issue seems to be timing sensitive, various attempts to recreate it have failed. This patch refuses to change the xip flag during remount in case some inodes cannot be invalidated. This patch keeps users from running into that issue. [akpm@linux-foundation.org: cleanup] Signed-off-by: Carsten Otte <cotte@de.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Jan Kara	02ac597c9b	ext3: revert "ext3: wait on all pending commits in ext3_sync_fs" This reverts commit `c87591b719`. Since journal_start_commit() is now fixed to return 1 when we started a transaction commit, there's some transaction waiting to be committed or there's a transaction already committing, we don't need to call ext3_force_commit() in ext3_sync_fs(). Furthermore ext3_force_commit() can unnecessarily create sync transaction which is expensive so it's worthwhile to remove it when we can. Cc: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00
Jan Kara	8fe4cd0dc5	jbd: fix return value of journal_start_commit() journal_start_commit() returns 1 if either a transaction is committing or the function has queued a transaction commit. But it returns 0 if we raced with somebody queueing the transaction commit as well. This resulted in ext3_sync_fs() not functioning correctly (description from Arthur Jones): In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. This can be reproduced with a script created by Eric Sandeen <sandeen@redhat.com>: #!/bin/bash umount /mnt/test2 mount /dev/sdb4 /mnt/test2 rm -f /mnt/test2/* dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512 touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename /mnt/test2/link umount /mnt/test2 mount /dev/sdb4 /mnt/test2 ls /mnt/test2/ This patch fixes journal_start_commit() to always return 1 when there's a transaction committing or queued for commit. Cc: Eric Sandeen <sandeen@redhat.com> Cc: Mike Snitzer <snitzer@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00
Mel Gorman	5a6fe12595	Do not account for the address space used by hugetlbfs using VM_ACCOUNT When overcommit is disabled, the core VM accounts for pages used by anonymous shared, private mappings and special mappings. It keeps track of VMAs that should be accounted for with VM_ACCOUNT and VMAs that never had a reserve with VM_NORESERVE. Overcommit for hugetlbfs is much riskier than overcommit for base pages due to contiguity requirements. It avoids overcommiting on both shared and private mappings using reservation counters that are checked and updated during mmap(). This ensures (within limits) that hugepages exist in the future when faults occurs or it is too easy to applications to be SIGKILLed. As hugetlbfs makes its own reservations of a different unit to the base page size, VM_ACCOUNT should never be set. Even if the units were correct, we would double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may be set because an application can request no reserves be made for hugetlbfs at the risk of getting killed later. With commit `fc8744adc8`, VM_NORESERVE and VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This breaks the accounting for both the core VM and hugetlbfs, can trigger an OOM storm when hugepage pools are too small lockups and corrupted counters otherwise are used. This patch brings hugetlbfs more in line with how the core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-10 10:48:42 -08:00
Aneesh Kumar K.V	ba4439165f	ext4: Fix lockdep warning We should not call ext4_mb_add_n_trim while holding alloc_semp. ============================================= [ INFO: possible recursive locking detected ] 2.6.29-rc4-git1-dirty #124 --------------------------------------------- ffsb/3116 is trying to acquire lock: (&meta_group_info[i]->alloc_sem){----}, at: [<ffffffff8035a6e8>] ext4_mb_load_buddy+0xd2/0x343 but task is already holding lock: (&meta_group_info[i]->alloc_sem){----}, at: [<ffffffff8035a6e8>] ext4_mb_load_buddy+0xd2/0x343 http://bugzilla.kernel.org/show_bug.cgi?id=12672 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-10 11:14:34 -05:00
Wei Yongjun	7be2baaa03	ext4: Fix to read empty directory blocks correctly in 64k The rec_len field in the directory entry is 16 bits, so there was a problem representing rec_len for filesystems with a 64k block size in the case where the directory entry takes the entire 64k block. Unfortunately, there were two schemes that were proposed; one where all zeros meant 65536 and one where all ones (65535) meant 65536. E2fsprogs used 0, whereas the kernel used 65535. Oops. Fortunately this case happens extremely rarely, with the most common case being the lost+found directory, created by mke2fs. So we will be liberal in what we accept, and accept both encodings, but we will continue to encode 65536 as 65535. This will require a change in e2fsprogs, but with fortunately ext4 filesystems normally have the dir_index feature enabled, which precludes having a completely empty directory block. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-10 09:53:42 -05:00
Jan Kara	7f5aa21508	jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com>	2009-02-10 11:15:34 -05:00
Jan Kara	9eddacf9e9	Revert "ext4: wait on all pending commits in ext4_sync_fs()" This undoes commit `14ce0cb411`. Since jbd2_journal_start_commit() is now fixed to return 1 when we started a transaction commit, there's some transaction waiting to be committed or there's a transaction already committing, we don't need to call ext4_force_commit() in ext4_sync_fs(). Furthermore ext4_force_commit() can unnecessarily create sync transaction which is expensive so it's worthwhile to remove it when we can. http://bugzilla.kernel.org/show_bug.cgi?id=12224 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com> Cc: linux-ext4@vger.kernel.org	2009-02-10 06:46:05 -05:00
Jan Kara	c88ccea314	jbd2: Fix return value of jbd2_journal_start_commit() The function jbd2_journal_start_commit() returns 1 if either a transaction is committing or the function has queued a transaction commit. But it returns 0 if we raced with somebody queueing the transaction commit as well. This resulted in ext4_sync_fs() not functioning correctly (description from Arthur Jones): In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. This can be reproduced with a script created by Eric Sandeen <sandeen@redhat.com>: #!/bin/bash umount /mnt/test2 mount /dev/sdb4 /mnt/test2 rm -f /mnt/test2/* dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512 touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename /mnt/test2/link umount /mnt/test2 mount /dev/sdb4 /mnt/test2 ls /mnt/test2/ This patch fixes jbd2_journal_start_commit() to always return 1 when there's a transaction committing or queued for commit. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> CC: Eric Sandeen <sandeen@redhat.com> CC: linux-ext4@vger.kernel.org	2009-02-10 11:27:46 -05:00
Linus Torvalds	4c098bcd55	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: don't use spin_is_contended	2009-02-09 14:00:16 -08:00
Chris Mason	284b066af4	Btrfs: don't use spin_is_contended Btrfs was using spin_is_contended to see if it should drop locks before doing extent allocations during btrfs_search_slot. The idea was to avoid expensive searches in the tree unless the lock was actually contended. But, spin_is_contended is specific to the ticket spinlocks on x86, so this is causing compile errors everywhere else. In practice, the contention could easily appear some time after we started doing the extent allocation, and it makes more sense to always drop the lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-09 16:22:03 -05:00
Linus Torvalds	896abeb743	Merge branch 'for-2.6.29' of git://linux-nfs.org/~bfields/linux * 'for-2.6.29' of git://linux-nfs.org/~bfields/linux: lockd: fix regression in lockd's handling of blocked locks	2009-02-09 10:30:19 -08:00
J. Bruce Fields	9d9b87c121	lockd: fix regression in lockd's handling of blocked locks If a client requests a blocking lock, is denied, then requests it again, then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP set, because we've already queued a block and don't need the locks code to do it again. But that means vfs_lock_file() will return -EAGAIN instead of FILE_LOCK_DENIED. So we still need to translate that -EAGAIN return into a nlm_lck_blocked error in this case, and put ourselves back on lockd's block list. The bug was introduced by `bde74e4bc6` "locks: add special return value for asynchronous locks". Thanks to Frank van Maarseveen for the report; his original test case was essentially for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done Tested-by: Frank van Maarseveen <frankvm@frankvm.com> Reported-by: Frank van Maarseveen <frankvm@frankvm.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-02-09 13:19:46 -05:00
Cornelia Huck	766ccb9ed4	async: Rename _special -> _domain for clarity. Rename the async__special() functions to async__domain(), which describes the purpose of these functions much better. [Broke up long lines to silence checkpatch] Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2009-02-08 09:56:11 -08:00
Linus Torvalds	ccfef64621	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: CRED: Fix SUID exec regression	2009-02-06 18:52:55 -08:00
Linus Torvalds	ae1a25da84	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (37 commits) Btrfs: Make sure dir is non-null before doing S_ISGID checks Btrfs: Fix memory leak in cache_drop_leaf_ref Btrfs: don't return congestion in write_cache_pages as often Btrfs: Only prep for btree deletion balances when nodes are mostly empty Btrfs: fix btrfs_unlock_up_safe to walk the entire path Btrfs: change btrfs_del_leaf to drop locks earlier Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode Btrfs: Don't try to compress pages past i_size Btrfs: join the transaction in __btrfs_setxattr Btrfs: Handle SGID bit when creating inodes Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks Btrfs: Change btree locking to use explicit blocking points Btrfs: hash_lock is no longer needed Btrfs: disable leak debugging checks in extent_io.c Btrfs: sort references by byte number during btrfs_inc_ref Btrfs: async threads should try harder to find work Btrfs: selinux support Btrfs: make btrfs acls selectable Btrfs: Catch missed bios in the async bio submission thread Btrfs: fix readdir on 32 bit machines ...	2009-02-06 18:37:22 -08:00
Tyler Hicks	fd9fc842bb	eCryptfs: Regression in unencrypted filename symlinks The addition of filename encryption caused a regression in unencrypted filename symlink support. ecryptfs_copy_filename() is used when dealing with unencrypted filenames and it reported that the new, copied filename was a character longer than it should have been. This caused the return value of readlink() to count the NULL byte of the symlink target. Most applications don't care about the extra NULL byte, but a version control system (bzr) helped in discovering the bug. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-06 18:36:40 -08:00
Linus Torvalds	1d87b0d388	Merge branch 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland * 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland: elf core dump: fix get_user use	2009-02-06 18:10:04 -08:00
Roland McGrath	92dc07b1f9	elf core dump: fix get_user use The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force, so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely use get_user() for a normal user-space address. Checking for VM_READ optimizes out the case where get_user() would fail anyway. The vm_file check here was already superfluous given the control flow earlier in the function, so that is a cleanup/optimization unrelated to other changes but an obvious and trivial one. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>	2009-02-06 17:34:07 -08:00
David Howells	0bf2f3aec5	CRED: Fix SUID exec regression The patch: commit `a6f76f23d2` CRED: Make execve() take advantage of copy-on-write credentials moved the place in which the 'safeness' of a SUID/SGID exec was performed to before de_thread() was called. This means that LSM_UNSAFE_SHARE is now calculated incorrectly. This flag is set if any of the usage counts for fs_struct, files_struct and sighand_struct are greater than 1 at the time the determination is made. All of which are true for threads created by the pthread library. However, since we wish to make the security calculation before irrevocably damaging the process so that we can return it an error code in the case where we decide we want to reject the exec request on this basis, we have to make the determination before calling de_thread(). So, instead, we count up the number of threads (CLONE_THREAD) that are sharing our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs (CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and so can be discounted by check_unsafe_exec(). We do have to be careful because CLONE_THREAD does not imply FS or FILES. We _assume_ that there will be no extra references to these structs held by the threads we're going to kill. This can be tested with the attached pair of programs. Build the two programs using the Makefile supplied, and run ./test1 as a non-root user. If successful, you should see something like: [dhowells@andromeda tmp]$ ./test1 --TEST1-- uid=4043, euid=4043 suid=4043 exec ./test2 --TEST2-- uid=4043, euid=0 suid=0 SUCCESS - Correct effective user ID and if unsuccessful, something like: [dhowells@andromeda tmp]$ ./test1 --TEST1-- uid=4043, euid=4043 suid=4043 exec ./test2 --TEST2-- uid=4043, euid=4043 suid=4043 ERROR - Incorrect effective user ID! The non-root user ID you see will depend on the user you run as. [test1.c] #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h> static void thread_func(void arg) { while (1) {} } int main(int argc, char argv) { pthread_t tid; uid_t uid, euid, suid; printf("--TEST1--\n"); getresuid(&uid, &euid, &suid); printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid); if (pthread_create(&tid, NULL, thread_func, NULL) < 0) { perror("pthread_create"); exit(1); } printf("exec ./test2\n"); execlp("./test2", "test2", NULL); perror("./test2"); _exit(1); } [test2.c] #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(int argc, char argv) { uid_t uid, euid, suid; getresuid(&uid, &euid, &suid); printf("--TEST2--\n"); printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid); if (euid != 0) { fprintf(stderr, "ERROR - Incorrect effective user ID!\n"); exit(1); } printf("SUCCESS - Correct effective user ID\n"); exit(0); } [Makefile] CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused all: test1 test2 test1: test1.c gcc $(CFLAGS) -o test1 test1.c -lpthread test2: test2.c gcc $(CFLAGS) -o test2 test2.c sudo chown root.root test2 sudo chmod +s test2 Reported-by: David Smith <dsmith@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David Smith <dsmith@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-07 08:46:18 +11:00
Dave Kleikamp	d4cf109f05	vfs: Don't call attach_nobh_buffers() with an empty list This is a modification of a patch by Bill Pemberton <wfp5p@virginia.edu> nobh_write_end() could call attach_nobh_buffers() with head == NULL. This would result in a trap when attach_nobh_buffers() attempted to access bh->b_this_page. This can be illustrated by running the writev01 testcase from LTP on jfs. This error was introduced by commit `5b41e74a` "vfs: fix data leak in nobh_write_end()". That patch did not take into account that if PageMappedToDisk() is true upon entry to nobh_write_begin(), then no buffers will be allocated for the page. In that case, we won't have to worry about a failed write leaving unitialized data in the page. Of course, head != NULL implies !page_has_buffers(page), so no need to test both. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Dmitri Monakhov <dmonakhov@openvz.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-06 13:34:22 -08:00
Chris Mason	42f15d77df	Btrfs: Make sure dir is non-null before doing S_ISGID checks The S_ISGID check in btrfs_new_inode caused an oops during subvol creation because sometimes the dir is null. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-06 11:35:57 -05:00
Al Viro	767b5828ad	braino in sg_ioctl_trans() ... and yes, gcc is insane enough to eat that without complaint. We probably want sparse to scream on those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-05 16:35:52 -08:00

1 2 3 4 5 ...

12851 Commits