Commit Graph

17574 Commits

Author SHA1 Message Date
Jens Axboe c2c4986edd writeback: fix problem with !CONFIG_BLOCK compilation
When CONFIG_BLOCK isn't enabled:

mm/page-writeback.c: In function 'laptop_mode_timer_fn':
mm/page-writeback.c:708: error: dereferencing pointer to incomplete type
mm/page-writeback.c:709: error: dereferencing pointer to incomplete type

Fix this by essentially eliminating the laptop sync handlers when
CONFIG_BLOCK isn't set, as most are only used from the block layer code.
The exception is laptop_sync_completion() which is used from sys_sync(),
make that an empty declaration in that case.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:01:03 +02:00
Tejun Heo b403a98e26 block: improve automatic native capacity unlocking
Currently, native capacity unlocking is initiated only when a
recognized partition extends beyond the end of the disk.  However,
there are several other unhandled cases where truncated capacity can
lead to misdetection of partitions.

* Partition table is fully beyond EOD.

* Partition table is partially beyond EOD (daisy chained ones).

* Recognized partition starts beyond EOD.

This patch updates generic partition check code such that all the
above three cases are handled too.  For the first two, @state tracks
whether low level partition check code tried to read beyond EOD during
partition scan and triggers native capacity unlocking accordingly.
The third is now handled similarly to the original unlocking case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:01:03 +02:00
Tejun Heo 1493bf217f block: use struct parsed_partitions *state universally in partition check code
Make the following changes to partition check code.

* Add ->bdev to struct parsed_partitions.

* Introduce read_part_sector() which is a simple wrapper around
  read_dev_sector() which takes struct parsed_partitions *state
  instead of @bdev.

* For functions which used to take @state and @bdev, drop @bdev.  For
  functions which used to take @bdev, replace it with @state.

* While updating, drop superflous checks on NULL state/bdev in ldm.c.

This cleans up the API a bit and enables better handling of IO errors
during partition check as the generic partition check code now has
much better visibility into what went wrong in the low level code
paths.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:01:02 +02:00
Tejun Heo c3e33e043f block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()
bdops->set_capacity() is unnecessarily generic.  All that's required
is a simple one way notification to lower level driver telling it to
try to unlock native capacity.  There's no reason to pass in target
capacity or return the new capacity.  The former is always the
inherent native capacity and the latter can be handled via the usual
device resize / revalidation path.  In fact, the current API is always
used that way.

Replace ->set_capacity() with ->unlock_native_capacity() which take
only @disk and doesn't return anything.  IDE which is the only current
user of the API is converted accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:01:02 +02:00
Tejun Heo 56bca01738 block: restart partition scan after resizing a device
Device resize via ->set_capacity() can reveal new partitions (e.g. in
chained partition table formats such as dos extended parts).  Restart
partition scan from the beginning after resizing a device.  This
change also makes libata always revalidate the disk after resize which
makes lower layer native capacity unlocking implementation simpler and
more robust as resize can be handled in the usual path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:01:02 +02:00
Tejun Heo fa4b9074cd buffer: make invalidate_bdev() drain all percpu LRU add caches
invalidate_bdev() should release all page cache pages which are clean
and not being used; however, if some pages are still in the percpu LRU
add caches on other cpus, those pages are considered in used and don't
get released.  Fix it by calling lru_add_drain_all() before trying to
invalidate pages.

This problem was discovered while testing block automatic native
capacity unlocking.  Null pages which were read before automatic
unlocking didn't get released by invalidate_bdev() and ended up
interfering with partition scan after unlocking.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:01:02 +02:00
Jens Axboe f9eadbbd42 writeback: bdi_writeback_task() must set task state before calling schedule()
Calling schedule without setting the task state to non-running will
return immediately, so ensure that we set it properly and check our
sleep conditions after doing so.

This is a fixup for commit 69b62d01.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:00:25 +02:00
Jens Axboe 7c8a3554c6 writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync
Even if the writeout itself isn't a data integrity operation, we need
to ensure that the caller doesn't drop the sb umount sem before we
have actually done the writeback.

This is a fixup for commit e913fc82.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:00:25 +02:00
Dmitry Monakhov 5547e8aac6 writeback: Update dirty flags in two steps
Filesystems with delalloc support may dirty inode during writepages.
As result inode will have dirty metadata flags even after write_inode.
In fact we have two dedicated functions for proper data and metadata
writeback. It is reasonable to separate flags updates in two stages.

https://bugzilla.kernel.org/show_bug.cgi?id=15906

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-17 13:00:41 +02:00
Jens Axboe e913fc825d writeback: fix WB_SYNC_NONE writeback from umount
When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
writeback to kick off writeback of pending dirty inodes, then follow
that up with a WB_SYNC_ALL to wait for it. Since umount already holds
the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
since WB_SYNC_ALL writeback is a data integrity operation and thus
a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
it's a lot slower.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-17 12:55:07 +02:00
Jens Axboe 69b62d01ec writeback: disable periodic old data writeback for !dirty_writeback_centisecs
Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled
periodic dirty writeback from kupdate. This got broken and now causes
excessive sys CPU usage if set to zero, as we'll keep beating on
schedule().

Cc: stable@kernel.org
Reported-by: Justin Maggard <jmaggard10@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-17 12:51:03 +02:00
Jens Axboe 7407cf355f Merge branch 'master' into for-2.6.35
Conflicts:
	fs/block_dev.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-29 09:36:24 +02:00
Stephen Rothwell 6a47dc1418 nilfs: fix breakage caused by barrier flag changes
After merging the block tree, today's linux-next build (powerpc ppc64_defconfig)
failed like this:

fs/nilfs2/the_nilfs.c: In function 'nilfs_discard_segments':
fs/nilfs2/the_nilfs.c:673: error: 'DISCARD_FL_BARRIER' undeclared (first use in this function)

Caused by commit fbd9b09a17 ("blkdev:
generalize flags for blkdev_issue_fn functions") interacting with commit
e902ec9906 ("nilfs2: issue discard request
after cleaning segments") (which netered Linus' tree on about March 4 -
before v2.6.34-rc1).

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-29 09:32:00 +02:00
Al Viro d9e80b7de9 nfs d_revalidate() is too trigger-happy with d_drop()
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.

Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-28 20:40:03 -07:00
Dmitry Monakhov fbd9b09a17 blkdev: generalize flags for blkdev_issue_fn functions
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-28 19:47:36 +02:00
Linus Torvalds 970b06485f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  coda: move backing-dev.h kernel include inside __KERNEL__
  mtd: ensure that bdi entries are properly initialized and registered
  Move mtd_bdi_*mappable to mtdcore.c
  btrfs: convert to using bdi_setup_and_register()
  Catch filesystems lacking s_bdi
  drbd: Terminate a connection early if sending the protocol fails
  drbd: fix memory leak
  Fix JFFS2 sync silent failure
  smbfs: add bdi backing to mount session
  ncpfs: add bdi backing to mount session
  exofs: add bdi backing to mount session
  ecryptfs: add bdi backing to mount session
  coda: add bdi backing to mount session
  cifs: add bdi backing to mount session
  afs: add bdi backing to mount session.
  9p: add bdi backing to mount session
  bdi: add helper function for doing init and register of a bdi for a file system
  block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer
2010-04-28 07:56:05 -07:00
Linus Torvalds 11e39d993d Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
  nfsd4: bug in read_buf
2010-04-27 16:26:21 -07:00
Jerome Marchand 3835541dd4 procfs: fix tid fdinfo
Correct the file_operations struct in fdinfo entry of tid_base_stuff[].

Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
/proc/*/fd/.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-27 16:26:03 -07:00
Christoph Egger 16a5b3c414 Remove redundant check for CONFIG_MMU
The checks for CONFIG_MMU at this location are duplicated as all the code is
located inside a #ifndef CONFIG_MMU block. So the first conditional block will
always be included while the second never will.

Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-27 09:01:26 -07:00
Linus Torvalds bc113f151a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  squashfs: fix potential buffer over-run on 4K block file systems
  squashfs: add missing buffer free
  squashfs: fix warn_on when root inode is corrupted
  squashfs: fix locking bug in zlib wrapper
2010-04-27 08:59:38 -07:00
Tejun Heo 6b4517a791 block: implement bd_claiming and claiming block
Currently, device claiming for exclusive open is done after low level
open - disk->fops->open() - has completed successfully.  This means
that exclusive open attempts while a device is already exclusively
open will fail only after disk->fops->open() is called.

cdrom driver issues commands during open() which means that O_EXCL
open attempt can unintentionally inject commands to in-progress
command stream for burning thus disturbing burning process.  In most
cases, this doesn't cause problems because the first command to be
issued is TUR which most devices can process in the middle of burning.
However, depending on how a device replies to TUR during burning,
cdrom driver may end up issuing further commands.

This can't be resolved trivially by moving bd_claim() before doing
actual open() because that means an open attempt which will end up
failing could interfere other legit O_EXCL open attempts.
ie. unconfirmed open attempts can fail others.

This patch resolves the problem by introducing claiming block which is
started by bd_start_claiming() and terminated either by bd_claim() or
bd_abort_claiming().  bd_claim() from inside a claiming block is
guaranteed to succeed and once a claiming block is started, other
bd_start_claiming() or bd_claim() attempts block till the current
claiming block is terminated.

bd_claim() can still be used standalone although now it always
synchronizes against claiming blocks, so the existing users will keep
working without any change.

blkdev_open() and open_bdev_exclusive() are converted to use claiming
blocks so that exclusive open attempts from these functions don't
interfere with the existing exclusive open.

This problem was discovered while investigating bko#15403.

  https://bugzilla.kernel.org/show_bug.cgi?id=15403

The burning problem itself can be resolved by updating userspace
probing tools to always open w/ O_EXCL.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Matthias-Christian Ott <ott@mirix.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-27 10:57:54 +02:00
Tejun Heo 1a3cbbc5a5 block: factor out bd_may_claim()
Factor out bd_may_claim() from bd_claim(), add comments and apply a
couple of cosmetic edits.  This is to prepare for further updates to
claim path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-27 10:57:54 +02:00
Neil Brown 2bc3c1179c nfsd4: bug in read_buf
When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to.  So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.

We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic.  Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-26 15:39:08 -04:00
Dave Chinner dd77ef924c xfs: more swap extent fixes for dynamic fork offsets
A new xfsqa test (226) with a prototype xfs_fsr change to try to
handle dynamic fork offsets better triggers an assertion failure
where the inode data fork is in btree format, yet there is room in
the inode for it to be in extent format. The two inodes look like:

before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
before: ino 0x115 (temp),  num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56

Basically the target inode ends up with 5 extents in btree format,
but it had space for 6 extents in extent format, so ends up
incorrect. Notably here the broot size is the same, and that is
where the kernel code is going wrong - the btree root will fit, so
it lets the swap go ahead.

The check should not allow the swap to take place if the number of
extents while in btree format is less than the number of extents
that can fit in the inode in extent format. Adding that check will
prevent this swap and corruption from occurring.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-04-26 12:38:51 -05:00
Jens Axboe e6d086d83c btrfs: convert to using bdi_setup_and_register()
It's now a provided helper, so get rid of the internal setup
and btrfs atomic_t bdi enumerator.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-26 10:27:54 +02:00
Linus Torvalds 202f2bb070 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Issue the discard operation *before* releasing the blocks to be reused
  ext4: Fix buffer head leaks after calls to ext4_get_inode_loc()
  ext4: Fix possible lost inode write in no journal mode
2010-04-25 10:01:51 -07:00
Jörn Engel 5129a469a9 Catch filesystems lacking s_bdi
noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.

Signed-off-by: Joern Engel <joern@logfs.org>

Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
to the noop_backing_dev_info is not legal and will not result in
them being flushed, but we already catch this condition in
__mark_inode_dirty() when checking for a registered bdi.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-25 08:54:42 +02:00
Phillip Lougher e0d1f70010 squashfs: fix potential buffer over-run on 4K block file systems
Sizing the buffer based on block size is incorrect, leading
to a potential buffer over-run on 4K block size file systems
(because the metadata block size is always 8K).  This bug
doesn't seem have triggered because 4K block size file systems
are not default, and also because metadata blocks after
compression tend to be less than 4K.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-04-25 02:09:05 +01:00
Phillip Lougher 370ec3d1ed squashfs: add missing buffer free
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-04-25 02:09:05 +01:00
Phillip Lougher 1cb08e9738 squashfs: fix warn_on when root inode is corrupted
Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where
the root inode has been corrupted.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Reported-by: Steve Grubb <sgrubb@redhat.com>
2010-04-25 01:49:17 +01:00
Anton Blanchard b8af67e268 fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices
We are seeing a large regression in database performance on recent
kernels.  The database opens a block device with O_DIRECT|O_SYNC and a
number of threads write to different regions of the file at the same time.

A simple test case is below.  I haven't defined DEVICE since getting it
wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
see about 17MB/sec and only a few threads in IO wait:

procs  -----io---- -system-- -----cpu------
 r  b     bi    bo   in   cs us sy id wa st
 0  3      0 16170  656 2259  0  0 86 14  0
 0  2      0 16704  695 2408  0  0 92  8  0
 0  2      0 17308  744 2653  0  0 86 14  0
 0  2      0 17933  759 2777  0  0 89 10  0

Most threads are blocking in vfs_fsync_range, which has:

        mutex_lock(&mapping->host->i_mutex);
        err = fop->fsync(file, dentry, datasync);
        if (!ret)
                ret = err;
        mutex_unlock(&mapping->host->i_mutex);

commit 148f948ba8 (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
some explanation of what is going on:

    Use these new helpers for syncing from generic VFS functions. This makes
    O_SYNC writes to block devices acquire i_mutex for syncing. If we really
    care about this, we can make block_fsync() drop the i_mutex and reacquire
    it before it returns.

Thanks Jan for such a good commit message!  As well as dropping i_mutex,
Christoph suggests we should remove the call to sync_blockdev():

> sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
> the block device inode, which is exactly what we did just before calling
> into ->fsync

The patch below incorporates both suggestions. With it the testcase improves
from 17MB/s to 68M/sec:

procs  -----io---- -system-- -----cpu------
 r  b     bi    bo   in   cs us sy id wa st
 0  7      0 65536 1000 3878  0  0 70 30  0
 0 34      0 69632 1016 3921  0  1 46 53  0
 0 57      0 69632 1000 3921  0  0 55 45  0
 0 53      0 69640  754 4111  0  0 81 19  0

Testcase:

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define NR_THREADS 64
#define BUFSIZE (64 * 1024)

#define DEVICE "/dev/mapper/XXXXXX"

#define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))

static int fd;

static void *doit(void *arg)
{
	unsigned long offset = (long)arg;
	char *b, *buf;

	b = malloc(BUFSIZE + 1024);
	buf = (char *)ALIGN((unsigned long)b, 1024);
	memset(buf, 0, BUFSIZE);

	while (1)
		pwrite(fd, buf, BUFSIZE, offset);
}

int main(int argc, char *argv[])
{
	int flags = O_RDWR|O_DIRECT;
	int i;
	unsigned long offset = 0;

	if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
		flags |= O_SYNC;

	fd = open(DEVICE, flags);
	if (fd == -1) {
		perror("open");
		exit(1);
	}

	for (i = 0; i < NR_THREADS-1; i++) {
		pthread_t tid;
		pthread_create(&tid, NULL, doit, (void *)offset);
		offset += BUFSIZE;
	}
	doit((void *)offset);

	return 0;
}

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:26 -07:00
Jeff Mahoney fb2162df74 reiserfs: fix corruption during shrinking of xattrs
Commit 48b32a3553 ("reiserfs: use generic
xattr handlers") introduced a problem that causes corruption when extended
attributes are replaced with a smaller value.

The issue is that the reiserfs_setattr to shrink the xattr file was moved
from before the write to after the write.

The root issue has always been in the reiserfs xattr code, but was papered
over by the fact that in the shrink case, the file would just be expanded
again while the xattr was written.

The end result is that the last 8 bytes of xattr data are lost.

This patch fixes it to use new_size.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Cc: Jethro Beekman <kernel@jbeekman.nl>
Cc: Greg Surbey <gregsurbey@hotmail.com>
Cc: Marco Gatti <marco.gatti@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:24 -07:00
Jeff Mahoney cac36f7071 reiserfs: fix permissions on .reiserfs_priv
Commit 677c9b2e39 ("reiserfs: remove
privroot hiding in lookup") removed the magic from the lookup code to hide
the .reiserfs_priv directory since it was getting loaded at mount-time
instead.  The intent was that the entry would be hidden from the user via
a poisoned d_compare, but this was faulty.

This introduced a security issue where unprivileged users could access and
modify extended attributes or ACLs belonging to other users, including
root.

This patch resolves the issue by properly hiding .reiserfs_priv.  This was
the intent of the xattr poisoning code, but it appears to have never
worked as expected.  This is fixed by using d_revalidate instead of
d_compare.

This patch makes -oexpose_privroot a no-op.  I'm fine leaving it this way.
The effort involved in working out the corner cases wrt permissions and
caching outweigh the benefit of the feature.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Reported-by: Matt McCutchen <matt@mattmccutchen.net>
Tested-by: Matt McCutchen <matt@mattmccutchen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:24 -07:00
Josef Bacik 3a3076f4d6 Cleanup generic block based fiemap
This cleans up a few of the complaints of __generic_block_fiemap.  I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned.  It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-23 10:39:48 -07:00
Phillip Lougher 792590c723 squashfs: fix locking bug in zlib wrapper
Fix locking bug in zlib wrapper introduced by recent decompressor changes.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-04-23 02:54:54 +01:00
Jens Axboe 424264b7b2 smbfs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:37:07 +02:00
Jens Axboe f1970c73cb ncpfs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:31:11 +02:00
Jens Axboe b3d0ab7e60 exofs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:26:04 +02:00
Jens Axboe 9df9c8b930 ecryptfs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:22:04 +02:00
Jens Axboe 5163d90076 coda: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:12:40 +02:00
Jens Axboe 8044f7f468 cifs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:09:48 +02:00
Jens Axboe e1da022275 afs: add bdi backing to mount session.
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 11:58:18 +02:00
Jens Axboe 0ed07ddb56 9p: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 11:42:00 +02:00
Linus Torvalds 1ef6ce7a34 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: allow 4 coldfire serial ports
  m68knommu: fix coldfire tcdrain
  m68knommu: remove a duplicate vector setting line for 68360
  Fix m68k-uclinux's rt_sigreturn trampoline
  m68knommu: correct the CC flags for Coldfire M5272 targets
  uclinux: error message when FLAT reloc symbol is invalid, v2
2010-04-21 12:33:12 -07:00
Linus Torvalds 255f41c595 Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs
* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
  [LogFS] Split large truncated into smaller chunks
  [LogFS] Set s_bdi
  [LogFS] Prevent mempool_destroy NULL pointer dereference
  [LogFS] Move assertion
  [LogFS] Plug 8 byte information leak
  [LogFS] Prevent memory corruption on large deletes
  [LogFS] Remove unused method

Fix trivial conflict with added header includes in fs/logfs/super.c
2010-04-21 12:31:12 -07:00
Linus Torvalds 9befb55ef5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  jfs: add jfs specific ->setattr call
  jfs: fix diAllocExt error in resizing filesystem
  jfs_dmap.[ch]: trivial typo fix: s/heigth/height/g
2010-04-21 12:30:07 -07:00
David Howells 083fd8b21a AFS: Don't pass error value to page_cache_release() in error handling
In the error handling in afs_mntpt_do_automount(), we pass an error
pointer to page_cache_release() if read_mapping_page() failed.  Instead,
we should extend the gotos around the error handling we don't need.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-21 12:27:43 -07:00
Jun Sun d7dfee3f5d uclinux: error message when FLAT reloc symbol is invalid, v2
This patch fixes a cosmetic error in printk. Text segment and data/bss
segment are allocated from two different areas. It is not meaningful to
give the diff between them in the error reporting messages.

Signed-off-by: Jun Sun <jsun@junsun.net>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
2010-04-21 13:28:49 +10:00
Theodore Ts'o b90f687018 ext4: Issue the discard operation *before* releasing the blocks to be reused
Otherwise, we can end up having data corruption because the blocks
could get reused and then discarded!

https://bugzilla.kernel.org/show_bug.cgi?id=15579

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-04-20 16:51:59 -04:00
Joern Engel b6349ac89e [LogFS] Split large truncated into smaller chunks
Truncate would do an almost limitless amount of work without invoking
the garbage collector in between.  Split it up into more manageable,
though still large, chunks.

Signed-off-by: Joern Engel <joern@logfs.org>
2010-04-20 21:44:10 +02:00