Commit Graph

15745 Commits

Author SHA1 Message Date
Linus Torvalds ac50e95078 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: copy li_lsn before dropping AIL lock
  XFS bug in log recover with quota (bugzilla id 855)
2009-11-17 09:42:35 -08:00
Linus Torvalds 8a1eaa6a56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: clear server inode number flag while autodisabling
2009-11-17 09:20:38 -08:00
Linus Torvalds 82abc2a97a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: deleted inconsistent comment in nilfs_load_inode_block()
  nilfs2: deleted struct nilfs_dat_group_desc
  nilfs2: fix lock order reversal in chcp operation
2009-11-17 09:15:18 -08:00
Nathaniel W. Turner 6c06f072c2 xfs: copy li_lsn before dropping AIL lock
Access to log items on the AIL is generally protected by m_ail_lock;
this is particularly needed when we're getting or setting the 64-bit
li_lsn on a 32-bit platform.  This patch fixes a couple places where we
were accessing the log item after dropping the AIL lock on 32-bit
machines.

This can result in a partially-zeroed log->l_tail_lsn if
xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at
least some cases, this can leave the l_tail_lsn with a zero cycle
number, which means xlog_space_left will think the log is full (unless
CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to
processes stuck forever in xlog_grant_log_space.

Thanks to Adrian VanderSpek for first spotting the race potential and to
Dave Chinner for debug assistance.

Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-17 10:26:49 -06:00
Jan Rekorajski 8ec6dba258 XFS bug in log recover with quota (bugzilla id 855)
Hi,
I was hit by a bug in linux 2.6.31 when XFS is not able to recover the
log after a crash if fs was mounted with quotas. Gory details in XFS
bugzilla: http://oss.sgi.com/bugzilla/show_bug.cgi?id=855.

It looks like wrong struct is used in buffer length check, and the following
patch should fix the problem.

xfs_dqblk_t has a size of 104+32 bytes, while xfs_disk_dquot_t is 104 bytes
long, and this is exactly what I see in system logs - "XFS: dquot too small
(104) in xlog_recover_do_dquot_trans."

Signed-off-by: Jan Rekorajski <baggins@sith.mimuw.edu.pl>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-17 10:26:38 -06:00
Suresh Jayaraman f534dc9943 cifs: clear server inode number flag while autodisabling
Fix the commit ec06aedd44 that intended to turn off querying for server inode
numbers when server doesn't consistently support inode numbers. Presumably
the commit didn't actually clear the CIFS_MOUNT_SERVER_INUM flag, perhaps a
typo.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-16 15:24:03 +00:00
Jiro SEKIBA 18dafac1a4 nilfs2: deleted inconsistent comment in nilfs_load_inode_block()
The comment says, "Caller of this function MUST lock s_inode_lock",
however just above the comment, it locks s_inode_lock in the function.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-15 17:17:46 +09:00
Petr Vandrovec 479c2553af Fix memory corruption caused by nfsd readdir+
Commit 8177e6d6df ("nfsd: clean up
readdirplus encoding") introduced single character typo in nfs3 readdir+
implementation.  Unfortunately that typo has quite bad side effects:
random memory corruption, followed (on my box) with immediate
spontaneous box reboot.

Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware
ESXi box tries to list contents of my home directory.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-14 12:55:55 -08:00
Ryusuke Konishi c1ea985c71 nilfs2: fix lock order reversal in chcp operation
Will fix the following lock order reversal lockdep detected:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc6 #7
-------------------------------------------------------
chcp/30157 is trying to acquire lock:
 (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2]

but task is already holding lock:
 (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&nilfs->ns_segctor_sem){++++.+}:
       [<c105799c>] __lock_acquire+0x109c/0x139d
       [<c1057d26>] lock_acquire+0x89/0xa0
       [<c14151e2>] down_read+0x31/0x45
       [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2]
       [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2]
       [<c10c0ccb>] vfs_kern_mount+0x8b/0x124
       [<c10c0db2>] do_kern_mount+0x37/0xc3
       [<c10d7517>] do_mount+0x64d/0x69d
       [<c10d75cd>] sys_mount+0x66/0x95
       [<c1002a14>] sysenter_do_call+0x12/0x32

-> #1 (&type->s_umount_key#31/1){+.+.+.}:
       [<c105799c>] __lock_acquire+0x109c/0x139d
       [<c1057d26>] lock_acquire+0x89/0xa0
       [<c104c0f3>] down_write_nested+0x34/0x52
       [<c10c08fe>] sget+0x22e/0x389
       [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2]
       [<c10c0ccb>] vfs_kern_mount+0x8b/0x124
       [<c10c0db2>] do_kern_mount+0x37/0xc3
       [<c10d7517>] do_mount+0x64d/0x69d
       [<c10d75cd>] sys_mount+0x66/0x95
       [<c1002a14>] sysenter_do_call+0x12/0x32

-> #0 (&nilfs->ns_mount_mutex){+.+.+.}:
       [<c1057727>] __lock_acquire+0xe27/0x139d
       [<c1057d26>] lock_acquire+0x89/0xa0
       [<c1414d63>] mutex_lock_nested+0x41/0x23e
       [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2]
       [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2]
       [<c10cca12>] vfs_ioctl+0x27/0x6e
       [<c10ccf93>] do_vfs_ioctl+0x491/0x4db
       [<c10cd022>] sys_ioctl+0x45/0x5f
       [<c1002a14>] sysenter_do_call+0x12/0x32

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-13 10:33:24 +09:00
Mike Hommey e04b5ef8b4 __generic_block_fiemap(): fix for files bigger than 4GB
Because of an integer overflow on start_blk, various kind of wrong results
would be returned by the generic_block_fiemap() handler, such as no
extents when there is a 4GB+ hole at the beginning of the file, or wrong
fe_logical when an extent starts after the first 4GB.

Signed-off-by: Mike Hommey <mh@glandium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@sgi.com>
Cc: Josef Bacik <jbacik@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12 07:26:01 -08:00
Anton Blanchard fc63cf2370 exec: setup_arg_pages() fails to return errors
In setup_arg_pages we work hard to assign a value to ret, but on exit we
always return 0.

Also remove a now duplicated exit path and branch to out_unlock instead.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12 07:25:58 -08:00
Heiko Carstens 7779d7bed9 fs: add missing compat_ptr handling for FS_IOC_RESVSP ioctl
For FS_IOC_RESVSP and FS_IOC_RESVSP64 compat_sys_ioctl() uses its
arg argument as a pointer to userspace. However it is missing a
a call to compat_ptr() which will do a proper pointer conversion.

This was introduced with 3e63cbb1 "fs: Add new pre-allocation ioctls
to vfs for compatibility with legacy xfs ioctls".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ankit Jain <me@ankitjain.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Arnd Bergmann <arndbergmann@googlemail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: <stable@kernel.org>		[2.6.31.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12 07:25:57 -08:00
Sukadev Bhattiprolu 29f12ca321 pidns: fix a leak in /proc dentries and inodes with pid namespaces.
Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace'
that is discussed in:

	http://lkml.org/lkml/2009/10/2/159.

To summarize the thread, when container-init is terminated, it sets the
PF_EXITING flag, zaps other processes in the container and waits to reap
them.  As a part of reaping, the container-init should flush any /proc
dentries associated with the processes.  But because the container-init is
itself exiting and the following PF_EXITING check, the dentries are not
flushed, resulting in leak in /proc inodes and dentries.

This fix reverts the commit 7766755a2f ("Fix /proc dcache deadlock
in do_exit") which introduced the check for PF_EXITING.  At the time of
the commit, shrink_dcache_parent() flushed dentries from other filesystems
also and could have caused a deadlock which the commit fixed.  But as
pointed out by Eric Biederman, after commit 0feae5c47a,
shrink_dcache_parent() no longer affects other filesystems.  So reverting
the commit is now safe.

As pointed out by Jan Kara, the leak is not as critical since the
unclaimed space will be reclaimed under memory pressure or by:

	echo 3 > /proc/sys/vm/drop_caches

But since this check is no longer required, its best to remove it.

Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Jan Kara <jack@ucw.cz>
Cc: Andrea Arcangeli <andrea@cpushare.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12 07:25:57 -08:00
Stefan Schmidt ff5e4b51a3 fs/jbd: Export log_start_commit to fix ext3 build.
This fixes:
ERROR: "log_start_commit" [fs/ext3/ext3.ko] undefined!

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2009-11-12 10:24:12 +01:00
Linus Torvalds aa021baa32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix panic when trying to destroy a newly allocated
  Btrfs: allow more metadata chunk preallocation
  Btrfs: fallback on uncompressed io if compressed io fails
  Btrfs: find ideal block group for caching
  Btrfs: avoid null deref in unpin_extent_cache()
  Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root
  Btrfs: fix some metadata enospc issues
  Btrfs: fix how we set max_size for free space clusters
  Btrfs: cleanup transaction starting and fix journal_info usage
  Btrfs: fix data allocation hint start
2009-11-11 13:38:59 -08:00
Josef Bacik a6dbd429d8 Btrfs: fix panic when trying to destroy a newly allocated
There is a problem where iget5_locked will look for an inode, not find it, and
then subsequently try to allocate it.  Another CPU will have raced in and
allocated the inode instead, so when iget5_locked gets the inode spin lock again
and does a search, it finds the new inode.  So it goes ahead and calls
destroy_inode on the inode it just allocated.  The problem is we don't set
BTRFS_I(inode)->root until the new inode is completely initialized.  This patch
makes us set root to NULL when alloc'ing a new inode, so when we get to
btrfs_destroy_inode and we see that root is NULL we can just free up the memory
and continue on.  This fixes the panic

http://www.kerneloops.org/submitresult.php?number=812690

Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 15:53:34 -05:00
Linus Torvalds fd801452a3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  JBD/JBD2: free j_wbuf if journal init fails.
  ext3: Wait for proper transaction commit on fsync
  ext3: retry failed direct IO allocations
2009-11-11 11:52:22 -08:00
Linus Torvalds 16fe4101ae Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: partial revert to fix double brelse WARNING()
  ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O
  ext4: code clean up for dio fallocate handling
  ext4: skip conversion of uninit extents after direct IO if there isn't any
  ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents
  ext4: discard preallocation when restarting a transaction during truncate
2009-11-11 11:28:11 -08:00
Chris Mason 33b2580864 Btrfs: allow more metadata chunk preallocation
On an FS where all of the space has not been allocated into chunks yet,
the enospc can return enospc just because the existing metadata chunks
are full.

We get around this by allowing more metadata chunks to be allocated up
to a certain limit, and finding the right limit is a little fuzzy.  The
problem is the reservations for delalloc would preallocate way too much
of the FS as metadata.  We need to start saying no and just force some
IO to happen.

But we also need to let a reasonable amount of the FS become metadata.
This bumps the hard limit up, later releases will have a better system.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:20 -05:00
Josef Bacik f5a84ee3cd Btrfs: fallback on uncompressed io if compressed io fails
Currently compressed IO does not deal with not having its entire extent able to
be allocated.  So if we have enough free space to allocate for the extent, but
its not contiguous, it will fail spectacularly.  This patch fixes this by
falling back on uncompressed IO which lets us spread the delalloc extent across
multiple extents.  I tested this by making us randomly think the reservation had
failed to make it fallback on the uncompressed io way and it seemed to work
fine.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:20 -05:00
Josef Bacik ccf0e72537 Btrfs: find ideal block group for caching
This patch changes a few things.  Hopefully the comments are helpfull, but
I'll try and be as verbose here.

Problem:

My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root.
Part of this problem was we pick the first block group we can find and start
caching it, even if it may not have enough free space.  The other problem is
we only search for cached block groups the first time around, which we won't
find any cached block groups because this is a newly mounted fs, so we end up
caching several block groups during bootup, which with alot of fragmentation
takes around 30-45 seconds to complete, which bogs down the system.  So

Solution:

1) Don't cache block groups willy-nilly at first.  Instead try and figure out
which block group has the most free, and therefore will take the least amount
of time to cache.

2) Don't be so picky about cached block groups.  The other problem is once
we've filled up a cluster, if the block group isn't finished caching the next
time we try and do the allocation we'll completely ignore the cluster and
start searching from the beginning of the space, which makes us cache more
block groups, which slows us down even more.  So instead of skipping block
groups that are not finished caching when we have a hint, only skip the block
group if it hasn't started caching yet.

There is one other tweak in here.  Before if we allocated a chunk and still
couldn't find new space, we'd end up switching the space info to force another
chunk allocation.  This could make us end up with way too many chunks, so keep
track of this particular case.

With this patch and my previous cluster fixes my fedora box now boots in 43
seconds, and according to the bootchart is not held up by our block group
caching at all.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:19 -05:00
Dan Carpenter 4eb3991c5d Btrfs: avoid null deref in unpin_extent_cache()
I re-orderred the checks to avoid dereferencing "em" if it was null.

Found by smatch static checker.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:18 -05:00
Li Dongyang df66916e71 Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root
We don't need to call btrfs_release_path because btrfs_free_path will do
that for us.

Signed-off-by: Li Dongyang <Jerry87905@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:18 -05:00
Josef Bacik 5df6a9f606 Btrfs: fix some metadata enospc issues
We weren't reserving metadata space for rename, rmdir and unlink, which could
cause problems.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:17 -05:00
Josef Bacik 01dea1efc2 Btrfs: fix how we set max_size for free space clusters
This patch fixes a problem where max_size can be set to 0 even though we
filled the cluster properly.  We set max_size to 0 if we restart the cluster
window, but if the new start entry is big enough to be our new cluster then we
could return with a max_size set to 0, which will mean the next time we try to
allocate from this cluster it will fail.  So set max_extent to the entry's
size.  Tested this on my box and now we actually allocate from the cluster
after we fill it.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:17 -05:00
Josef Bacik 249ac1e55c Btrfs: cleanup transaction starting and fix journal_info usage
We use journal_info to tell if we're in a nested transaction to make sure we
don't commit the transaction within a nested transaction.  We use another
method to see if there are any outstanding ioctl trans handles, so if we're
starting one do not set current->journal_info, since it will screw with other
filesystems.  This patch also cleans up the starting stuff so there aren't any
magic numbers.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:16 -05:00
Josef Bacik 6346c93988 Btrfs: fix data allocation hint start
Sometimes our start allocation hint when we cow a file can be either
EXTENT_HOLE or some other such place holder, which is not optimal.  So if we
find that our em->block_start is one of these special values, check to see
where the first block of the inode is stored, and use that as a hint.  If that
block is also a special value, just fallback on a hint of 0 and let the
allocator figure out a good place to put the data.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11 14:20:16 -05:00
Tao Ma 7b02bec07e JBD/JBD2: free j_wbuf if journal init fails.
If journal init fails, we need to free j_wbuf.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-11-11 15:24:14 +01:00
Jan Kara fe8bc91c4c ext3: Wait for proper transaction commit on fsync
We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-11-11 15:22:49 +01:00
Eric Sandeen ea0174a713 ext3: retry failed direct IO allocations
On a 256M 4k block filesystem, doing this in a loop:

    dd if=/dev/zero of=test oflag=direct bs=1M count=64
    rm -f test

eventually leads to spurious ENOSPC:

    dd: writing `test': No space left on device

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.

A similar patch went into ext4 (commit
fbbf694566)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-11-11 15:22:49 +01:00
Linus Torvalds b7b69c7e97 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix missing cleanup of gc cache on error cases
  nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks
2009-11-09 09:52:55 -08:00
Theodore Ts'o 1e424a3483 ext4: partial revert to fix double brelse WARNING()
This is a partial revert of commit 6487a9d (only the changes made to
fs/ext4/namei.c), since it is causing the following brelse()
double-free warning when running fsstress on a file system with 1k
blocksize and we run into a block allocation failure while converting
a single-block directory to a multi-block hash-tree indexed directory.

WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33()
Hardware name: 
VFS: brelse: Trying to free free buffer
Modules linked in:
Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101
Call Trace:
 [<c01587fb>] warn_slowpath_common+0x65/0x95
 [<c0158869>] warn_slowpath_fmt+0x29/0x2c
 [<c021168e>] __brelse+0x2e/0x33
 [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c
 [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8
 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
 [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e
 [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd
 [<c0175c1f>] ? cpu_clock+0x3f/0x5b
 [<c017f6ec>] ? lock_release_holdtime+0x36/0x137
 [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51
 [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124
 [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd
 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
 [<c0290d1c>] kjournald2+0x11a/0x310
 [<c017118e>] ? autoremove_wake_function+0x0/0x38
 [<c0290c02>] ? kjournald2+0x0/0x310
 [<c0170ee6>] kthread+0x66/0x6b
 [<c0170e80>] ? kthread+0x0/0x6b
 [<c01251b3>] kernel_thread_helper+0x7/0x10
---[ end trace 5579351b86af61e3 ]---

Commit 6487a9d was an attempt some buffer head leaks in an ENOSPC
error path, but in some cases it actually results in an excess ENOSPC,
as shown above.  Fixing this means cleaning up who is responsible for
releasing the buffer heads from the callee to the caller of
add_dirent_to_buf().

Since that's a relatively complex change, and we're late in the rcX
development cycle, I'm reverting this now, and holding back a more
complete fix until after 2.6.32 ships.  We've lived with this
buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a
few more months won't kill us.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
2009-11-08 15:45:44 -05:00
Ryusuke Konishi c083234f15 nilfs2: fix missing cleanup of gc cache on error cases
This fixes an -rc1 regression brought by the commit:
1cf58fa840 ("nilfs2: shorten freeze
period due to GC in write operation v3").

Although the patch moved out a function call of
nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from
nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding
cleanup job needed for the error case.

This will move the missing cleanup job to the destination function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Jiro SEKIBA <jir@unicus.jp>
2009-11-08 19:04:25 +09:00
Ryusuke Konishi 5399dd1fc8 nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks
This fixes a kernel oops reported by Markus Trippelsdorf in the email
titled "[NILFS users] kernel Oops while running nilfs_cleanerd".

The oops was caused by a bug of error path in
nilfs_ioctl_move_blocks() function, which was inlined in
nilfs_ioctl_clean_segments().

nilfs_ioctl_move_blocks checks duplication of blocks which will be
moved in garbage collection.  But, the check should have be done
within nilfs_ioctl_move_inode_block() to prevent list corruption among
buffers storing the target blocks.

To fix the kernel oops, this moves forward the duplication check
before the list insertion.

I also tested this for stable trees [2.6.30, 2.6.31].

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>
2009-11-08 19:01:35 +09:00
Jeff Layton f475f67754 cifs: don't use CIFSGetSrvInodeNumber in is_path_accessible
Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to
verify the accessibility of the root inode and then falls back to doing a
full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report
of a server that returns NT_STATUS_INTERNAL_ERROR rather than something
that translates to EOPNOTSUPP.

Rather than trying to be clever with that call, just have
is_path_accessible do a normal QPathInfo. That call is widely
supported and it shouldn't increase the overhead significantly.

Cc: Stable <stable@kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-06 22:06:14 +00:00
Jeff Layton ec06aedd44 cifs: clean up handling when server doesn't consistently support inode numbers
It's possible that a server will return a valid FileID when we query the
FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers
when we do a FindFile with an infolevel of
SMB_FIND_FILE_ID_FULL_DIR_INFO.

In this situation turn off querying for server inode numbers, generate a
warning for the user and just generate an inode number using iunique.
Once we generate any inode number with iunique we can no longer use any
server inode numbers or we risk collisions, so ensure that we don't do
that in cifs_get_inode_info either.

Cc: Stable <stable@kernel.org>
Reported-by: Timothy Normand Miller <theosib@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-06 22:04:37 +00:00
Mingming ba230c3f6d ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O
To prepare for a direct I/O write, we need to split the unwritten
extents before submitting the I/O.  When no extents needed to be
split, ext4_split_unwritten_extents() was incorrectly returning 0
instead of the size of uninitialized extents. This bug caused the
wrong return value sent back to VFS code when it gets called from
async IO path, leading to an unnecessary fall back to buffered IO.

This bug also hid the fact that the check to see whether or not a
split would be necessary was incorrect; we can only skip splitting the
extent if the write completely covers the uninitialized extent.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-06 04:01:23 -05:00
Linus Torvalds e5a9236222 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: invalidate target of rename
  fuse: fix kunmap in fuse_ioctl_copy_user
  fuse: prevent fuse_put_request on invalid pointer
2009-11-05 13:23:29 -08:00
Linus Torvalds 1bbc9a66d0 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/kvm: Remove problematic BUILD_BUG_ON statement
  powerpc/pci: Fix regression in powerpc MSI-X
  powerpc: Avoid giving out RTC dates below EPOCH
  powerpc/mm: Remove debug context clamping from nohash code
  powerpc: Cleanup Kconfig selection of hugetlbfs support
2009-11-05 13:22:49 -08:00
Linus Torvalds d4116f8204 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  sysfs: Don't leak secdata when a sysfs_dirent is freed.
2009-11-05 10:57:39 -08:00
Linus Torvalds 411094acb7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fs: Fix x86 procfs stack information for threads on 64-bit
  x86: Add reboot quirk for 3 series Mac mini
  x86: Fix printk message typo in mtrr cleanup code
  dma-debug: Fix compile warning with PAE enabled
  x86/amd-iommu: Un__init function required on shutdown
  x86/amd-iommu: Workaround for erratum 63
2009-11-05 10:54:08 -08:00
Eric W. Biederman 4c3da2209b sysfs: Don't leak secdata when a sysfs_dirent is freed.
While refreshing my sysfs patches I noticed a leak in the secdata
implementation.  We don't free the secdata when we free the
sysfs dirent.

This is a bug in 2.6.32-rc5 that we really should close.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-11-05 08:19:18 +11:00
Stefani Seibold 89240ba059 x86, fs: Fix x86 procfs stack information for threads on 64-bit
This patch fixes two issues in the procfs stack information on
x86-64 linux.

The 32 bit loader compat_do_execve did not store stack
start. (this was figured out by Alexey Dobriyan).

The stack information on a x64_64 kernel always shows 0 kbyte
stack usage, because of a missing implementation of the KSTK_ESP
macro which always returned -1.

The new implementation now returns the right value.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1257240160.4889.24.camel@wall-e>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 13:25:03 +01:00
Miklos Szeredi 5219f346b0 fuse: invalidate target of rename
Invalidate the target's attributes, which may have changed (such as
nlink, change time) so that they are refreshed on the next getattr().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-11-04 10:24:52 +01:00
Jens Axboe 0bd87182d3 fuse: fix kunmap in fuse_ioctl_copy_user
Looks like another victim of the confusing kmap() vs kmap_atomic() API
differences.

Reported-by: Todor Gyumyushev <yodor1@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
2009-11-04 10:24:51 +01:00
Anand V. Avati f60311d5f7 fuse: prevent fuse_put_request on invalid pointer
fuse_direct_io() has a loop where requests are allocated in each
iteration. if allocation fails, the loop is broken out and follows
into an unconditional fuse_put_request() on that invalid pointer.

Signed-off-by: Anand V. Avati <avati@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@kernel.org
2009-11-04 10:24:50 +01:00
Linus Torvalds 51bb296b09 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: limit coop preemption
  cfq-iosched: fix bad return value cfq_should_preempt()
  backing-dev: bdi sb prune should be in the unregister path, not destroy
  Fix bio_alloc() and bio_kmalloc() documentation
  bio_put(): add bio_clone() to the list of functions in the comment
2009-11-03 18:16:21 -08:00
Mingming 4b70df1816 ext4: code clean up for dio fallocate handling
The ext4_debug() call in ext4_end_io_dio() should be moved after the
check to make sure that io_end is non-NULL.

The comment above ext4_get_block_dio_write() ("Maximum number of
blocks...") is a duplicate; the original and correct comment is above
the #define DIO_MAX_BLOCKS up above.

Based on review comments from Curt Wohlgemuth.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-03 14:44:54 -05:00
Mingming 5f5249507e ext4: skip conversion of uninit extents after direct IO if there isn't any
At the end of direct I/O operation, ext4_ext_direct_IO() always called
ext4_convert_unwritten_extents(), regardless of whether there were any
unwritten extents involved in the I/O or not.

This commit adds a state flag so that ext4_ext_direct_IO() only calls
ext4_convert_unwritten_extents() when necessary.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-10 10:48:04 -05:00
Mingming 109f556519 ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents
After a direct I/O request covering an uninitalized extent (i.e.,
created using the fallocate system call) or a hole in a file, ext4
will convert the uninitialized extent so it is marked as initialized
by calling ext4_convert_unwritten_extents().  This function returns
zero on success.

This return value was getting returned by ext4_direct_IO(); however
the file system's direct_IO function is supposed to return the number
of bytes read or written on a success.  By returning zero, it confused
the direct I/O code into falling back to buffered I/O unnecessarily.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-10 10:48:08 -05:00