Commit Graph

28178 Commits

Author SHA1 Message Date
Jeff Layton 5cf02d09b5 nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-07-30 18:55:59 -04:00
Peng Tao 159e0561e3 pnfsblock: bail out partial page IO
Current block layout driver read/write code assumes page
aligned IO in many places. Add a checker to validate the assumption.
Otherwise there would be data corruption like when application does
open(O_WRONLY) and page unaliged write.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:52:06 -04:00
Jeff Layton f44106e217 nfs: fix fl_type tests in NFSv4 code
fl_type is not a bitmap.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:09:13 -04:00
Fred Isaman c95908e4c5 NFS: fix pnfs regression with directio writes
Commit 57208fa7e5 "NFS: Create an write_pageio_init() function"
did not modify the calls in direct.c, preventing direct io from
using pnfs.  This reintroduces that capability.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:09:06 -04:00
Fred Isaman 59948db3be NFS: fix pnfs regression with directio reads
Commit 1abb50886a "NFS: Create an read_pageio_init() function"
did not modify the call in direct.c, preventing direct io from
using pnfs.  This reintroduces that capability.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:08:53 -04:00
Randy Dunlap 0add3e8567 nfs: fix stub return type warnings
Fix numerous repeated warnings by making the stub function
void instead of non-void:

fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl':
fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 17:30:24 -04:00
Jan Kara 4a55c1017b nfsd: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:51 +04:00
Jan Kara e7848683ae btrfs: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:51 +04:00
Jan Kara e24f17da35 fat: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
outside of i_mutex as in other places.

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:50 +04:00
Jan Kara c30dabfe5d fs: Push mnt_want_write() outside of i_mutex
Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
without it. This isn't really a problem because mnt_want_write() is a
non-blocking operation (essentially has a trylock semantics) but when the
function starts to handle also frozen filesystems, it will get a full lock
semantics and thus proper lock ordering has to be established. So move
all mnt_want_write() calls outside of i_mutex.

One non-trivial case needing conversion is kern_path_create() /
user_path_create() which didn't include mnt_want_write() but now needs to
because it acquires i_mutex.  Because there are virtual file systems which
don't bother with freeze / remount-ro protection we actually provide both
versions of the function - one which calls mnt_want_write() and one which does
not.

[AV: scratch the previous, mnt_want_write() has been moved to kern_path_create()
by now]

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:49 +04:00
Jan Kara 14ae417c6f sysfs: Push file_update_time() into bin_page_mkwrite()
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:47 +04:00
Jan Kara a63e9b2e76 gfs2: Push file_update_time() into gfs2_page_mkwrite()
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:46 +04:00
Jan Kara 120c2bcad8 9p: Push file_update_time() into v9fs_vm_page_mkwrite()
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:46 +04:00
Jan Kara 3ca9c3bd8a ceph: Push file_update_time() into ceph_page_mkwrite()
CC: Sage Weil <sage@newdream.net>
CC: ceph-devel@vger.kernel.org
Acked-by: Sage Weil <sage@newdream.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:45 +04:00
Jan Kara 5e8830dc85 fs: Push file_update_time() into __block_page_mkwrite()
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:44 +04:00
Al Viro 64894cf843 simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early
The write ref to vfsmount taken in lookup_open()/atomic_open() is going to
be dropped; we take the one to stay in dentry_open().  Just grab the temporary
in caller if it looks like we are going to need it (create/truncate/writable open)
and pass (by value) "has it succeeded" flag.  Instead of doing mnt_want_write()
inside, check that flag and treat "false" as "mnt_want_write() has just failed".
mnt_want_write() is cheap and the things get considerably simpler and more robust
that way - we get it and drop it in the same function, to start with, rather
than passing a "has something in the guts of really scary functions taken it"
back to caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 00:53:35 +04:00
Linus Torvalds 37cd9600a9 xfs: update for 3.6-rc1
Numerous cleanups and several bug fixes.  Here are some highlights:
 
 * Discontiguous directory buffer support
 * Inode allocator refactoring
 * Removal of the IO lock in inode reclaim
 * Implementation of .update_time
 * Fix for handling of EOF in xfs_vm_writepage
 * Fix for races in xfsaild, and idle mode is re-enabled
 * Fix for a crash in xfs_buf completion handlers on unmount.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQFtCvAAoJENaLyazVq6ZOIuEQAINJXb4SK9oBrdwGmq+Vsqf2
 Eh4OmzZmdnSPrxfFGmqvyL9DdUBvGBuidwOcVLMAXGtzbxE9USK9NuKC5zN/hJip
 8tIyv/8bqZ0aD4RJlHGN5zKFoQh/9Tag+JsaaqWstO8Ir1tA/5p04hDAz492btfT
 49SvnV64sJ1fi7pmaJblMWMMtlWJjD6iOldaHwnKBQ3LKmcgy9sD9DY5HiGOTr1j
 ecKtucX7B8Q9oFLKHaKEwTYZRRYDNuTbqZmI6hlEcA5hT280jotsGA4q/aXx/gHS
 lZuBaqVtNFT5WCKm+j/et76tmTfIh0CSbo64ZfgSOESy2BkEVXHg5XJ1gDvPdV+L
 6eBlUx3jaiNyFVHxVzFhzwKC/XdaITCd/ixFEogRDmoppDXencTCibLJXHNXxupN
 BCAyTLCxEJIE9WCeOMmwHA0450bMY4or13NGep57pIvG8GomtdG1WncTRIo84KV5
 0W5ocaUTGP7ROsr+KF8U9C7H866OHzVFijA+vvcTy8GtsT/xOCFxuJrqPVb+kgD7
 mIKaoK7iH6Kufu433TzsLEcUkF36gq/7NytPKjQhURLpZhxkHG3rq6LC0HXp6uuZ
 QgX5Y5Gl7SwDovIrndXmQXRnGrzvqHLguZl65+rB1CKggjemkLSdSLhryoNVjLU2
 iB7/hvzOUdYFMRRz2mLc
 =2wkC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "Numerous cleanups and several bug fixes.  Here are some highlights:

   - Discontiguous directory buffer support
   - Inode allocator refactoring
   - Removal of the IO lock in inode reclaim
   - Implementation of .update_time
   - Fix for handling of EOF in xfs_vm_writepage
   - Fix for races in xfsaild, and idle mode is re-enabled
   - Fix for a crash in xfs_buf completion handlers on unmount."

Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h}
due to duplicate patches that had already been merged for 3.5.

* tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits)
  xfs: wait for the write the superblock on unmount
  xfs: re-enable xfsaild idle mode and fix associated races
  xfs: remove iolock lock classes
  xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
  xfs: do not take the iolock in xfs_inactive
  xfs: remove xfs_inactive_attrs
  xfs: clean up xfs_inactive
  xfs: do not read the AGI buffer in xfs_dialloc until nessecary
  xfs: refactor xfs_ialloc_ag_select
  xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
  xfs: remove the alloc_done argument to xfs_dialloc
  xfs: split xfs_dialloc
  xfs: remove xfs_ialloc_find_free
  Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision.
  xfs: remove xfs_inotobp
  xfs: merge xfs_itobp into xfs_imap_to_bp
  xfs: handle EOF correctly in xfs_vm_writepage
  xfs: implement ->update_time
  xfs: fix comment typo of struct xfs_da_blkinfo.
  xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
  ...
2012-07-30 13:37:53 -07:00
Sage Weil 8842b3be96 ceph: clean up useless d_parent checks
d_parent is never NULL, and IS_ROOT() is the proper way to check for a
(non-self-referential) parent.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:54 -07:00
Al Viro f8310c5920 fix O_EXCL handling for devices
O_EXCL without O_CREAT has different semantics; it's "fail if already opened",
not "fail if already exists".  commit 71574865 broke that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-30 11:50:30 +04:00
Mark Tinguely 9a57fa8ee7 xfs: wait for the write the superblock on unmount
v2: Add the xfs_buf_lock to xfs_quiesce_attr().
    Add explaination why xfs_buf_lock() is used to wait for write.

xfs_wait_buftarg() does not wait for the completion of the write of the
uncached superblock. This write can race with the shutdown of the log
and causes a panic if the write does not win the race.

During the log write, xfsaild_push() will lock the buffer and set the
XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed
on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait
for the write to complete. The buffer's lock is held until the write is
complete, so we can block on a xfs_buf_lock() request to be notified
that the write is complete.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:34:19 -05:00
Brian Foster 8375f922aa xfs: re-enable xfsaild idle mode and fix associated races
xfsaild idle mode logic currently leads to a couple hangs:

1.) If xfsaild is rescheduled in during an incremental scan
    (i.e., tout != 0) and the target has been updated since
    the previous run, we can hit the new target and go into
    idle mode with a still populated ail.
2.) A wake up is only issued when the target is pushed forward.
    The wake up can race with xfsaild if it is currently in the
    process of entering idle mode, causing future wake up
    events to be lost.

These hangs have been reproduced and verified as fixed by
running xfstests 273 in a loop on a slightly modified upstream
kernel. The kernel is modified to re-enable idle mode as
previously implemented (when count == 0) and with a revert of
commit 670ce93f, which includes performance improvements that
make this harder to reproduce.

The solution, the algorithm for which has been outlined by
Dave Chinner, is to modify xfsaild to enter idle mode only when
the ail is empty and the push target has not been moved forward
since the last push.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:27:57 -05:00
Christoph Hellwig 4f59af758f xfs: remove iolock lock classes
Content-Disposition: inline; filename=xfs-remove-iolock-classes

Now that we never take the iolock during inode reclaim we don't need
to play games with lock classes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:23:51 -05:00
Christoph Hellwig 5a15322da1 xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
Same rational as the last patch - these inodes are not reachable, so
don't bother with locking.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:22:20 -05:00
Christoph Hellwig 0b56185b0d xfs: do not take the iolock in xfs_inactive
An inode that enters xfs_inactive has been removed from all global
lists but the inode hash, and can't be recycled in xfs_iget before
it has been marked reclaimable.  Thus taking the iolock in here
is not nessecary at all, and given the amount of lockdep false
positives it has triggered already I'd rather remove the locking.

The only change outside of xfs_inactive is relaxing an assert in
xfs_itruncate_extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:16:49 -05:00
Christoph Hellwig fe67be036f xfs: remove xfs_inactive_attrs
Remove this helper as the code flow is a lot more obvious when it gets
merged into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:15:33 -05:00
Christoph Hellwig b373e98daa xfs: clean up xfs_inactive
The code to reserve log space and join the inode to the transaction is
common for all cases, so don't duplicate it.  Also remove the trivial
xfs_inactive_symlink_local helper which can simply be opencode now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:13:09 -05:00
Christoph Hellwig be60fe54b2 xfs: do not read the AGI buffer in xfs_dialloc until nessecary
Refactor the AG selection loop in xfs_dialloc to operate on the in-memory
perag data as much as possible.  We only read the AGI buffer once we have
selected an AG to allocate inodes now instead of for every AG considered.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:10:54 -05:00
Christoph Hellwig 55d6af64cb xfs: refactor xfs_ialloc_ag_select
Loop over the in-core perag structures and prefer using pagi_freecount over
going out to the AGI buffer where possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:08:13 -05:00
Christoph Hellwig 4bb61069d2 xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
In this case we already have selected an AG and know it has free space
beause the buffer lock never got released.  Jump directly into xfs_dialloc_ag
and short cut the AG selection loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:03:23 -05:00
Christoph Hellwig 08358906ed xfs: remove the alloc_done argument to xfs_dialloc
We can simplify check the IO_agbp pointer for being non-NULL instead of
passing another argument through two layers of function calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:00:31 -05:00
Christoph Hellwig f2ecc5e453 xfs: split xfs_dialloc
Move the actual allocation once we have selected an allocation group into a
separate helper, and make xfs_dialloc a wrapper around it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 15:56:49 -05:00
Al Viro bf8848918d lockd: handle lockowner allocation failure in nlmclnt_proc()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 23:17:39 +04:00
Al Viro 446945ab9a lockd: shift grabbing a reference to nlm_host into nlm_alloc_call()
It's used both for client and server hosts; we can't do nlmclnt_release_host()
on failure exits, since the host might need nlmsvc_release_host(), with BUG_ON()
for calling the wrong one.  Makes life simpler for callers, actually...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 23:09:57 +04:00
Kees Cook a51d9eaa41 fs: add link restriction audit reporting
Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:43:08 +04:00
Kees Cook 800179c9b8 fs: add link restrictions
This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:37:58 +04:00
Jeff Layton 3134f37e93 vfs: don't let do_last pass negative dentry to audit_inode
I can reliably reproduce the following panic by simply setting an audit
rule on a recent 3.5.0+ kernel:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 PGD 7acd9067 PUD 7b8fb067 PMD 0
 Oops: 0000 [#86] SMP
 Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 CPU 0
 Pid: 1286, comm: abrt-dump-oops Tainted: G      D      3.5.0+ #1 Bochs Bochs
 RIP: 0010:[<ffffffff810d1250>]  [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 RSP: 0018:ffff88007aebfc38  EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4
 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860
 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800
 FS:  00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530)
 Stack:
  ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358
  ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968
  ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000
 Call Trace:
  [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160
  [<ffffffff810d4968>] __audit_inode+0x118/0x2d0
  [<ffffffff811955a9>] do_last+0x999/0xe80
  [<ffffffff81191fe8>] ? inode_permission+0x18/0x50
  [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130
  [<ffffffff81195b4a>] path_openat+0xba/0x420
  [<ffffffff81196111>] do_filp_open+0x41/0xa0
  [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120
  [<ffffffff811855cd>] do_sys_open+0xed/0x1c0
  [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300
  [<ffffffff811856c1>] sys_open+0x21/0x30
  [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b
  RSP <ffff88007aebfc38>
 CR2: 0000000000000040

The problem is that do_last is passing a negative dentry to audit_inode.
The comments on lookup_open note that it can pass back a negative dentry
if O_CREAT is not set.

This patch fixes the oops, but I'm not clear on whether there's a better
approach.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:27:03 +04:00
Al Viro e4fad8e5d2 consolidate pipe file creation
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:19 +04:00
Al Viro b5bcdda327 take grabbing f->f_path to do_dentry_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:18 +04:00
Al Viro 5c33b183a3 uninline file_free_rcu()
What inline?  Its only use is passing its address to call_rcu(), for fuck sake!

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:17 +04:00
Al Viro 0b1d90119a ecryptfs_lookup_interpose(): allocate dentry_info first
less work on failure that way

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:17 +04:00
Al Viro bc65a1215e sanitize ecryptfs_lookup()
* ->lookup() never gets hit with . or ..
* dentry it gets is unhashed, so unless we had gone and hashed it ourselves, there's
no need to d_drop() the sucker.
* wrong name printed in one of the printks (NULL, in fact)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:16 +04:00
Al Viro a8104a9fcd pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp.
One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now.  Makes more sense, POSIX allows, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:15 +04:00
Al Viro 8e4bfca1d1 mknod: take sanity checks on mode into the very beginning
Note that applying umask can't affect their results.  While
that affects errno in cases like
	mknod("/no_such_directory/a", 030000)
yielding -EINVAL (due to impossible mode_t) instead of
-ENOENT (due to inexistent directory), IMO that makes a lot
more sense, POSIX allows to return either and any software
that relies on getting -ENOENT instead of -EINVAL in that
case deserves everything it gets.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:14 +04:00
Al Viro 921a1650de new helper: done_path_create()
releases what needs to be released after {kern,user}_path_create()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:13 +04:00
Linus Torvalds 173f865474 The usual collection of bug fixes and optimizations. Perhaps of
greatest note is a speed up for parallel, non-allocating DIO writes,
 since we no longer take the i_mutex lock in that case.  For bug fixes,
 we fix an incorrect overhead calculation which caused slightly
 incorrect results for df(1) and statfs(2).  We also fixed bugs in the
 metadata checksum feature.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQE1SzAAoJENNvdpvBGATwzOMQAKxVkaTqkMc+cUahLLUkFdGQ
 xnmigHufVuOvv+B8l1p6vbYx+qOftztqXF1WC41j8mTfEDs19jKXv3LI57TJ+bIW
 a/Kp1CjMicRs/HGhFPNWp1D+N8J95MTFP6Y9biXSmBBvefg2NSwxpg48yZtjUy1/
 Zl0414AqMvTJyqKKOUa++oyl3XlawnzDZ+6a7QPKsrAaDOU5977W4y2tZkNFk84d
 qRjTfaiX13aVe7XupgQHe/jtk40Sj38pyGVGiAGlHOZhZtYKE6MzB8OreGiMTy8z
 rmJz/CQUsQ8+sbKYhAcDru+bMrKghbO0u78XRz9CY+YpVFF/Xs2QiXoV0ZOGkIm6
 eokq7jEV0K+LtDEmM3PkmUPgXfYw5damTv8qWuBUFd4wtVE5x/DmK8AJVMidCAUN
 GkVR+rEbbEi7RCwsuac/aKB8baVQCTiJ5tfNTgWh9zll+9GZSk+U71Pp0KdcJGiS
 nxitAZ+20hZN2CQctlmaGbCPTPYCWU4hQ3IuMdTlQTQAs8S0y1FtylTRsXcC1eVR
 i1hBS/dVw5PVCaqoX79zYrByUymgX0ZaYY6seRT6+U9xPGDCSNJ9mjXZL3fttnOC
 d5gsx/pbMIAv52G5Hj6DfsXR2JFmmxsaIzsLtRvKi9q89d84XaZfbUsHYjn4Neym
 5lTKaSQHU71cKCxrStHC
 =VAVB
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The usual collection of bug fixes and optimizations.  Perhaps of
  greatest note is a speed up for parallel, non-allocating DIO writes,
  since we no longer take the i_mutex lock in that case.

  For bug fixes, we fix an incorrect overhead calculation which caused
  slightly incorrect results for df(1) and statfs(2).  We also fixed
  bugs in the metadata checksum feature."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: undo ext4_calc_metadata_amount if we fail to claim space
  ext4: don't let i_reserved_meta_blocks go negative
  ext4: fix hole punch failure when depth is greater than 0
  ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
  ext4: weed out ext4_write_super
  ext4: remove unnecessary superblock dirtying
  ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super()
  ext4: remove useless marking of superblock dirty
  ext4: fix ext4 mismerge back in January
  ext4: remove dynamic array size in ext4_chksum()
  ext4: remove unused variable in ext4_update_super()
  ext4: make quota as first class supported feature
  ext4: don't take the i_mutex lock when doing DIO overwrites
  ext4: add a new nolock flag in ext4_map_blocks
  ext4: split ext4_file_write into buffered IO and direct IO
  ext4: remove an unused statement in ext4_mb_get_buddy_page_lock()
  ext4: fix out-of-date comments in extents.c
  ext4: use s_csum_seed instead of i_csum_seed for xattr block
  ext4: use proper csum calculation in ext4_rename
  ext4: fix overhead calculation used by ext4_statfs()
  ...
2012-07-27 20:52:25 -07:00
Stanislav Kinsbursky 2c142baa7b NFSd: make boot_time variable per network namespace
NFSd's boot_time represents grace period start point in time.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky a51c84ed50 NFSd: make grace end flag per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky 5630f7fa97 Lockd: move grace period management from lockd() to per-net functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky 5ccb0066f2 LockD: pass actual network namespace to grace period management functions
Passed network namespace replaced hard-coded init_net

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky db9c455341 LockD: manage grace list per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky 9695c7057f SUNRPC: service request network namespace helper introduced
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky 5e1533c788 NFSd: make nfsd4_manager allocated per network namespace context.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky 08d44a35a9 LockD: make lockd manager allocated per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky 66547b0251 LockD: manage grace period per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky e2edaa98cb Lockd: add more debug to host shutdown functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky d5850ff9ea Lockd: host complaining function introduced
Just a small cleanup.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky caa4e76b6f LockD: manage used host count per networks namespace
This patch introduces moves nrhosts in per-net data.
It also adds kernel warning to nlm_shutdown_hosts_net() about remaining hosts
in specified network namespace context.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky 3cf7fb07e0 LockD: manage garbage collection timeout per networks namespace
This patch moves next_gc to per-net data.

Note: passed network can be NULL (when Lockd kthread is exiting of Lockd
module is removing).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky 27adaddc8d LockD: make garbage collector network namespace aware.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky b26411f85d LockD: mark host per network namespace on garbage collect
This is required for per-network NLM shutdown and cleanup.
This patch passes init_net for a while.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:43 -04:00
J. Bruce Fields 99dbb8fe09 nfsd4: fix missing fault_inject.h include
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:30:12 -04:00
J. Bruce Fields 96d6d59cea locks: move lease-specific code out of locks_delete_lock
No point putting something only used by one caller into common code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:18:00 -04:00
Pavel Shilovsky 1a500f010f CIFS: Add SMB2 support for rmdir
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-27 15:17:50 -05:00
Pavel Shilovsky f958ca5d88 CIFS: Move rmdir code to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-27 15:17:47 -05:00
Pavel Shilovsky a0e731839d CIFS: Add SMB2 support for mkdir operation
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-27 15:17:43 -05:00
Pavel Shilovsky f436720e94 CIFS: Separate protocol specific part from mkdir
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-27 15:17:40 -05:00
Pavel Shilovsky ff691e9694 CIFS: Simplify cifs_mkdir call
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-27 15:17:16 -05:00
Linus Torvalds 84eda28060 Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull final kmap_atomic cleanups from Cong Wang:
 "This should be the final round of cleanup, as the definitions of enum
  km_type finally get removed from the whole tree.  The patches have
  been in linux-next for a long time."

* 'kmap_atomic' of git://github.com/congwang/linux:
  pipe: remove KM_USER0 from comments
  vmalloc: remove KM_USER0 from comments
  feature-removal-schedule.txt: remove kmap_atomic(page, km_type)
  tile: remove km_type definitions
  um: remove km_type definitions
  asm-generic: remove km_type definitions
  avr32: remove km_type definitions
  frv: remove km_type definitions
  powerpc: remove km_type definitions
  arm: remove km_type definitions
  highmem: remove the deprecated form of kmap_atomic
  tile: remove usage of enum km_type
  frv: remove the second parameter of kmap_atomic_primary()
  jbd2: remove the second argument of kmap_atomic
2012-07-27 11:26:48 -07:00
Filipe Brandenburger 3b6e2723f3 locks: prevent side-effects of locks_release_private before file_lock is initialized
When calling fcntl(fd, F_SETLEASE, lck) [with lck=F_WRLCK or F_RDLCK],
the custom signal or owner (if any were previously set using F_SETSIG
or F_SETOWN fcntls) would be reset when F_SETLEASE was called for the
second time on the same file descriptor.

This bug is a regression of 2.6.37 and is described here:
https://bugzilla.kernel.org/show_bug.cgi?id=43336

This patch reverts a commit from Oct 2004 (with subject "nfs4 lease:
move the f_delown processing") which originally introduced the
lm_release_private callback.

Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 09:39:55 -04:00
Stephen Rothwell a1857ebe75 Btrfs: using vmalloc and friends needs vmalloc.h
On powerpc, we don't get the implicit vmalloc.h include, and as a result
the build fails noisily:

  fs/btrfs/send.c: In function 'fs_path_free':
  fs/btrfs/send.c:185:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
  fs/btrfs/send.c: In function 'fs_path_ensure_buf':
  fs/btrfs/send.c:215:4: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration]
  fs/btrfs/send.c:215:12: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:225:12: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:233:13: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c: In function 'iterate_dir_item':
  fs/btrfs/send.c:900:10: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:909:11: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c: In function 'btrfs_ioctl_send':
  fs/btrfs/send.c:4463:17: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:4469:17: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:4475:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
  fs/btrfs/send.c:4475:20: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:4483:21: warning: assignment makes pointer from integer without a cast [enabled by default]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-26 18:08:30 -07:00
Linus Torvalds e2aed8dfa5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull large btrfs update from Chris Mason:
 "This pull request is very large, and the two main features in here
  have been under testing/devel for quite a while.

  We have subvolume quotas from the strato developers.  This enables
  full tracking of how many blocks are allocated to each subvolume (and
  all snapshots) and you can set limits on a per-subvolume basis.  You
  can also create quota groups and toss multiple subvolumes into a big
  group.  It's everything you need to be a web hosting company and give
  each user their own subvolume.

  The userland side of the quotas is being refreshed, they'll send out
  details on where to grab it soon.

  Next is the kernel side of btrfs send/receive from Alexander Block.
  This leverages the same infrastructure as the quota code to figure out
  relationships between blocks and their owners.  It can then compute
  the difference between two snapshots and sends the diffs in a neutral
  format into userland.

  The basic model:

        create a snapshot
        send that snapshot as the initial backup
        make changes
        create a second snapshot
        send the incremental as a backup
        delete the first snapshot
        (use the second snapshot for the next incremental)

  The receive portion is all in userland, and in the 'next' branch of my
  btrfs-progs repo.

  There's still some work to do in terms of optimizing the send side
  from kernel to userland.  The really important part is figuring out
  how two snapshots are different, and this is where we are
  concentrating right now.  The initial send of a dataset is a little
  slower than tar, but the incremental sends are dramatically faster
  than what rsync can do.

  On top of all of that, we have a nice queue of fixes, cleanups and
  optimizations."

Fix up trivial modify/del conflict in fs/btrfs/ioctl.c

Also fix up semantic conflict in fs/btrfs/send.c: the interface to
dentry_open() changed in commit 765927b2d5 ("switch dentry_open() to
struct path, make it grab references itself"), and since it now grabs
whatever references it needs, we should no longer do the mntget() on the
mnt (and we need to dput() the dentry reference we took).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (65 commits)
  Btrfs: uninit variable fixes in send/receive
  Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive
  Btrfs: add btrfs_compare_trees function
  Btrfs: introduce subvol uuids and times
  Btrfs: make iref_to_path non static
  Btrfs: add a barrier before a waitqueue_active check
  Btrfs: call the ordered free operation without any locks held
  Btrfs: Check INCOMPAT flags on remount and add helper function
  Btrfs: add helper for tree enumeration
  btrfs: allow cross-subvolume file clone
  Btrfs: improve multi-thread buffer read
  Btrfs: make btrfs's allocation smoothly with preallocation
  Btrfs: lock the transition from dirty to writeback for an eb
  Btrfs: fix potential race in extent buffer freeing
  Btrfs: don't return true in releasepage unless we actually freed the eb
  Btrfs: suppress printk() if all device I/O stats are zero
  Btrfs: remove unwanted printk() for btrfs device I/O stats
  Btrfs: rewrite BTRFS_SETGET_FUNCS
  Btrfs: zero unused bytes in inode item
  Btrfs: kill free_space pointer from inode structure
  ...

Conflicts:
	fs/btrfs/ioctl.c
2012-07-26 14:48:55 -07:00
Linus Torvalds 548ed10228 dlm for 3.6
This set includes a major redesign of recording the master node for
 resources.  The old dir hash table, which just held the master node for
 each resource, has been removed.  The rsb hash table has always duplicated
 the master node value from the dir, and is now the single record of it.
 
 Having two full hash tables of all resources has always been a waste,
 especially since one just duplicated a single value from the other.
 Local requests will now often require one instead of two lengthy hash
 table searches.
 
 The other substantial change is made possible by the dirtbl removal, and
 fixes a long standing race between resource removal and lookup by
 reworking how removal is done.  At the same time it improves the
 efficiency of removal by avoiding repeated searches through a hash bucket.
 
 The other commits include minor fixes and changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQEFuFAAoJEDgbc8f8gGmqklAQAL2gMCp3asEbbUa2FRd5cZQT
 IfELLBS0kqr6ghEDpV1XTVig8vCwhGoa9UC/jYEtqnmo02ImjCOvxM4wcSAjbEhc
 L0J/JwfQNPtAbs8NN7vo7p2n5DPCWcKr3YiksvrPpQdXXAVTAfC4SkK028hF6n8B
 LtMrADqsz+dTqg6UQTCRsMyZlLrFNPZiR5IRy72BQLmg+lcNNtARb0T++W2EfAAV
 03RnEUQPxbrKNZpVAAdncX5NUZz/D+DB5DMLF2q+hfquuGvheizmZHipnhOexeKV
 YZcQA/232BH0kqbKTGoIyqgMlCeKTgL56lBMyax3naEbBu4s80ZPOXjuA4pUpx5p
 24mCtXcozoZ9WCJZwSNUqVGfG22HKSMGXTVoXxteamL/8CQ9XRPAvH/UpHi38LBm
 pzQ+aZGHDXpDlMlQYTnD4n7NEiq8IV5p9sOZnK2EjSshoiWefQomx73wFpwqdRgm
 TYtb583BjdU1QYLYGiJTy9yqg693Ket3pSipOVoqH/qMGM+d7Z3cwoMvcoHTVkOf
 Z2XiFvfTjWcA+4givvUOWTd9FuFJ+KQr+mPo867lIJyuT0KDttVDwZyQPqN7m7ma
 ymS9tVmrmfVmq97XkypVDkZELchqaW41Y8BKQ4c5JxRDLQX6Di7Pq36S6XL7mrOf
 medHFc9e6jhrS45lvpw+
 =1bGM
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updatesfrom David Teigland:
 "This set includes a major redesign of recording the master node for
  resources.  The old dir hash table, which just held the master node
  for each resource, has been removed.  The rsb hash table has always
  duplicated the master node value from the dir, and is now the single
  record of it.

  Having two full hash tables of all resources has always been a waste,
  especially since one just duplicated a single value from the other.
  Local requests will now often require one instead of two lengthy hash
  table searches.

  The other substantial change is made possible by the dirtbl removal,
  and fixes a long standing race between resource removal and lookup by
  reworking how removal is done.  At the same time it improves the
  efficiency of removal by avoiding repeated searches through a hash
  bucket.

  The other commits include minor fixes and changes."

* tag 'dlm-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: fix missing dir remove
  dlm: fix conversion deadlock from recovery
  dlm: use wait_event_timeout
  dlm: fix race between remove and lookup
  dlm: use idr instead of list for recovered rsbs
  dlm: use rsbtbl as resource directory
2012-07-26 14:03:42 -07:00
Linus Torvalds 98077a7205 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (40 commits)
  cifs: ensure that we always do cifsFileInfo_get under the spinlock
  CIFS: Make CAP_* checks protocol independent
  CIFS: Allow SMB2 statistics to be tracked
  CIFS: Move clear/print_stats code to ops struct
  CIFS: Add echo request support for SMB2
  CIFS: Move echo code to osp struct
  CIFS: Add SMB2 support for async requests
  CIFS: Setup async request in ops struct
  CIFS: Add SMB2 support for build_path_to_root
  CIFS: Move building path to root to ops struct
  CIFS: Query SMB2 inode info
  CIFS: Move query inode info code to ops struct
  CIFS: Add SMB2 support for is_path_accessible
  CIFS: Move is_path_accessible to ops struct
  CIFS: Move informational tcon calls to ops struct
  CIFS: Move getting dfs referalls to ops struct
  CIFS: Process reconnects for SMB2 shares
  CIFS: Add tree connect/disconnect capability for SMB2
  CIFS: Add session setup/logoff capability for SMB2
  CIFS: Add capability to send SMB2 negotiate message
  ...
2012-07-26 14:00:52 -07:00
Josh Boyer 8ded2bbc18 posix_types.h: Cleanup stale __NFDBITS and related definitions
Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1).  This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
<linux/types.h> after including <sys/select.h>.  A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.

It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely.  The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines.  Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.

Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.

Reported-by: Jeff Law <law@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-26 13:36:43 -07:00
Linus Torvalds fa93669a19 Driver core merge for 3.6-rc1
Here's the big driver core pull request for 3.6-rc1.
 
 Unlike 3.5, this kernel should be a lot tamer, with the printk changes now
 settled down.  All we have here is some extcon driver updates, w1 driver
 updates, a few printk cleanups that weren't needed for 3.5, but are good to
 have now, and some other minor fixes/changes in the driver core.
 
 All of these have been in the linux-next releases for a while now.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAlARgIUACgkQMUfUDdst+ynDHgCfRNwIB9L+zZvjcKE5e1BhDbUl
 wVUAn398DFgbJ1+PjGkd1EMR2uVTh7Ou
 =MIFu
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core changes from Greg Kroah-Hartman:
 "Here's the big driver core pull request for 3.6-rc1.

  Unlike 3.5, this kernel should be a lot tamer, with the printk changes
  now settled down.  All we have here is some extcon driver updates, w1
  driver updates, a few printk cleanups that weren't needed for 3.5, but
  are good to have now, and some other minor fixes/changes in the driver
  core.

  All of these have been in the linux-next releases for a while now.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
  printk: Export struct log size and member offsets through vmcoreinfo
  Drivers: hv: Change the hex constant to a decimal constant
  driver core: don't trigger uevent after failure
  extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device
  sysfs: fail dentry revalidation after namespace change fix
  sysfs: fail dentry revalidation after namespace change
  extcon: spelling of detach in function doc
  extcon: arizona: Stop microphone detection if we give up on it
  extcon: arizona: Update cable reporting calls and split headset
  PM / Runtime: Do not increment device usage counts before probing
  kmsg - do not flush partial lines when the console is busy
  kmsg - export "continuation record" flag to /dev/kmsg
  kmsg - avoid warning for CONFIG_PRINTK=n compilations
  kmsg - properly print over-long continuation lines
  driver-core: Use kobj_to_dev instead of re-implementing it
  driver-core: Move kobj_to_dev from genhd.h to device.h
  driver core: Move deferred devices to the end of dpm_list before probing
  driver core: move uevent call to driver_register
  driver core: fix shutdown races with probe/remove(v3)
  Extcon: Arizona: Add driver for Wolfson Arizona class devices
  ...
2012-07-26 11:25:33 -07:00
Linus Torvalds b13bc8dda8 Staging tree patches for 3.6-rc1
Here's the big staging tree merge for the 3.6-rc1 merge window.
 
 There are some patches in here outside of drivers/staging/, notibly the iio
 code (which is still stradeling the staging / not staging boundry), the pstore
 code, and the tracing code.  All of these have gotten ackes from the various
 subsystem maintainers to be included in this tree.  The pstore and tracing
 patches are related, and are coming here as they replace one of the android
 staging drivers.
 
 Otherwise, the normal staging mess.  Lots of cleanups and a few new drivers
 (some iio drivers, and the large csr wireless driver abomination.)
 
 Note, you will get a merge issue with the following files:
 	drivers/staging/comedi/drivers/s626.h
 	drivers/staging/gdm72xx/netlink_k.c
 both of which should be trivial for you to handle.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAlAQiD8ACgkQMUfUDdst+ykxhgCeMUjvc+1RTtSprzvkzpejgoUU
 6A4AnAleWMnkaCD8vruGnRdGl/Qtz51+
 =mN6M
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging tree patches from Greg Kroah-Hartman:
 "Here's the big staging tree merge for the 3.6-rc1 merge window.

  There are some patches in here outside of drivers/staging/, notibly
  the iio code (which is still stradeling the staging / not staging
  boundry), the pstore code, and the tracing code.  All of these have
  gotten acks from the various subsystem maintainers to be included in
  this tree.  The pstore and tracing patches are related, and are coming
  here as they replace one of the android staging drivers.

  Otherwise, the normal staging mess.  Lots of cleanups and a few new
  drivers (some iio drivers, and the large csr wireless driver
  abomination.)

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up trivial conflicts in drivers/staging/comedi/drivers/s626.h and
drivers/staging/gdm72xx/netlink_k.c

* tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1108 commits)
  staging: csr: delete a bunch of unused library functions
  staging: csr: remove csr_utf16.c
  staging: csr: remove csr_pmem.h
  staging: csr: remove CsrPmemAlloc
  staging: csr: remove CsrPmemFree()
  staging: csr: remove CsrMemAllocDma()
  staging: csr: remove CsrMemCalloc()
  staging: csr: remove CsrMemAlloc()
  staging: csr: remove CsrMemFree() and CsrMemFreeDma()
  staging: csr: remove csr_util.h
  staging: csr: remove CsrOffSetOf()
  stating: csr: remove unneeded #includes in csr_util.c
  staging: csr: make CsrUInt16ToHex static
  staging: csr: remove CsrMemCpy()
  staging: csr: remove CsrStrLen()
  staging: csr: remove CsrVsnprintf()
  staging: csr: remove CsrStrDup
  staging: csr: remove CsrStrChr()
  staging: csr: remove CsrStrNCmp
  staging: csr: remove CsrStrCmp
  ...
2012-07-26 11:14:49 -07:00
Chris Mason b24baf6917 Btrfs: uninit variable fixes in send/receive
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 19:21:10 -04:00
Chris Mason 113c1cb530 Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linus
This is the kernel portion of btrfs send/receive

Conflicts:
	fs/btrfs/Makefile
	fs/btrfs/backref.h
	fs/btrfs/ctree.c
	fs/btrfs/ioctl.c
	fs/btrfs/ioctl.h

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 19:19:10 -04:00
Alexander Block 31db9f7c23 Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive
This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25 23:30:19 +02:00
Alexander Block 7069830a9e Btrfs: add btrfs_compare_trees function
This function is used to find the differences between
two trees. The tree compare skips whole subtrees if it
detects shared tree blocks and thus is pretty fast.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25 23:30:14 +02:00
Alexander Block 8ea05e3a42 Btrfs: introduce subvol uuids and times
This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25 23:28:38 +02:00
Alexander Block 91cb916ca2 Btrfs: make iref_to_path non static
Make iref_to_path non static (needed in send) and rename
it to btrfs_iref_to_path

Signed-off-by: Alexander Block <ablock84@googlemail.com>
2012-07-25 23:25:06 +02:00
Chris Mason cd1cfc4915 Btrfs: add a barrier before a waitqueue_active check
We were missing wakeups on the delayed ref waitqueue due
to races on waitqueue_active.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 16:15:08 -04:00
Chris Mason e9fbcb4220 Btrfs: call the ordered free operation without any locks held
Each ordered operation has a free callback, and this was called with the
worker spinlock held.  Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list.  We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
2012-07-25 16:15:07 -04:00
Mitch Harder 2b0ce2c290 Btrfs: Check INCOMPAT flags on remount and add helper function
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.

Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 16:14:31 -04:00
Chris Mason b478b2baa3 Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linus
Conflicts:
	fs/btrfs/ioctl.c
	fs/btrfs/ioctl.h
	fs/btrfs/transaction.c
	fs/btrfs/transaction.h

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 16:11:38 -04:00
Jeff Layton 764a1b1ace cifs: ensure that we always do cifsFileInfo_get under the spinlock
The readpages bug is a regression that was introduced in 6993f74a5.
This also fixes a couple of similar bugs in the uncached read and write
codepaths.

Also, prevent this sort of thing in the future by having cifsFileInfo_get
take the spinlock itself, and adding a _locked variant for use in places
that are already holding the lock. The _put code has always done that
so this makes for a less confusing interface.

Cc: <stable@vger.kernel.org> # 3.5.x
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-25 14:51:30 -05:00
Arne Jansen e679376911 Btrfs: add helper for tree enumeration
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-25 17:33:18 +02:00
David Sterba 362a20c5e2 btrfs: allow cross-subvolume file clone
Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-07-25 17:33:09 +02:00
Stanislav Kinsbursky 57c8b13e3c NFSd: set nfsd_serv to NULL after service destruction
In nfsd_destroy():

	if (destroy)
		svc_shutdown_net(nfsd_serv, net);
	svc_destroy(nfsd_server);

svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets
nfsd_serv to NULL, causing a NULL dereference on the following line.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:21:31 -04:00
Stanislav Kinsbursky 19f7e2ca44 NFSd: introduce nfsd_destroy() helper
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:21:30 -04:00
J. Bruce Fields a007c4c3e9 nfsd: add get_uint for u32's
I don't think there's a practical difference for the range of values
these interfaces should see, but it would be safer to be unambiguous.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:18:27 -04:00
Stanislav Kinsbursky a6d88f293e NFSd: fix locking in nfsd_forget_delegations()
This patch adds recall_lock hold to nfsd_forget_delegations() to protect
nfsd_process_n_delegations() call.
Also, looks like it would be better to collect delegations to some local
on-stack list, and then unhash collected list. This split allows to
simplify locking, because delegation traversing is protected by recall_lock,
when delegation unhash is protected by client_mutex.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:18:27 -04:00
Vivek Trivedi 5559b50acd nfsd4: fix cr_principal comparison check in same_creds
This fixes a wrong check for same cr_principal in same_creds

Introduced by 8fbba96e5b "nfsd4: stricter
cred comparison for setclientid/exchange_id".

Cc: stable@vger.kernel.org
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:05:30 -04:00
Linus Torvalds 801b03653f Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Eliminate 64-bit divides
  GFS2: Reduce file fragmentation
  GFS2: kernel panic with small gfs2 filesystems - 1 RG
  GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
  GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve
  GFS2: Add kobject release method
  GFS2: Size seq_file buffer more carefully
  GFS2: Use seq_vprintf for glocks debugfs file
  seq_file: Add seq_vprintf function and export it
  GFS2: Use lvbs for storing rgrp information with mount option
  GFS2: Cache last hash bucket for glock seq_files
  GFS2: Increase buffer size for glocks and glstats debugfs files
  GFS2: Fix error handling when reading an invalid block from the journal
  GFS2: Add "top dir" flag support
  GFS2: Fold quota data into the reservations struct
  GFS2: Extend the life of the reservations
2012-07-24 17:57:05 -07:00
Linus Torvalds 08d9329c29 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara:
 "Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs.  I'm
  on vacation and scarcely checking email since we are expecting baby
  any day now but these fixes should be safe to go in and I don't want
  to delay them unnecessarily."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: avoid info leak on export
  isofs: avoid info leak on export
  udf: Improve table length check to avoid possible overflow
  ext3: Check return value of blkdev_issue_flush()
  jbd: Check return value of blkdev_issue_flush()
  udf: Do not decrement i_blocks when freeing indirect extent block
  udf: Fix memory leak when mounting
  ext2: cleanup the confused goto label
  UDF: Remove unnecessary variable "offset" from udf_fill_inode
  udf: stop using s_dirt
  ext3: force ro mount if ext3_setup_super() fails
  quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
2012-07-24 17:40:44 -07:00
Linus Torvalds d14b7a419a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Trivial updates all over the place as usual."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits)
  Fix typo in include/linux/clk.h .
  pci: hotplug: Fix typo in pci
  iommu: Fix typo in iommu
  video: Fix typo in drivers/video
  Documentation: Add newline at end-of-file to files lacking one
  arm,unicore32: Remove obsolete "select MISC_DEVICES"
  module.c: spelling s/postition/position/g
  cpufreq: Fix typo in cpufreq driver
  trivial: typo in comment in mksysmap
  mach-omap2: Fix typo in debug message and comment
  scsi: aha152x: Fix sparse warning and make printing pointer address more portable.
  Change email address for Steve Glendinning
  Btrfs: fix typo in convert_extent_bit
  via: Remove bogus if check
  netprio_cgroup.c: fix comment typo
  backlight: fix memory leak on obscure error path
  Documentation: asus-laptop.txt references an obsolete Kconfig item
  Documentation: ManagementStyle: fixed typo
  mm/vmscan: cleanup comment error in balance_pgdat
  mm: cleanup on the comments of zone_reclaim_stat
  ...
2012-07-24 13:34:56 -07:00
Pavel Shilovsky 29e20f9c65 CIFS: Make CAP_* checks protocol independent
Since both CIFS and SMB2 use ses->capabilities (server->capabilities)
field but flags are different we should make such checks protocol
independent.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 14:12:03 -05:00
Pavel Shilovsky d60622eb5a CIFS: Allow SMB2 statistics to be tracked
Since there are only 19 command codes, it also is easier to track by exact
command code than it was for cifs.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:20 +04:00
Pavel Shilovsky 44c581866e CIFS: Move clear/print_stats code to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:18 +04:00
Pavel Shilovsky 9094fad1ed CIFS: Add echo request support for SMB2
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:17 +04:00
Pavel Shilovsky f6d7617862 CIFS: Move echo code to osp struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:15 +04:00
Pavel Shilovsky c95b8eeda3 CIFS: Add SMB2 support for async requests
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:14 +04:00
Pavel Shilovsky 45740847e2 CIFS: Setup async request in ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:12 +04:00
Pavel Shilovsky 25e266320c CIFS: Add SMB2 support for build_path_to_root
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:11 +04:00
Pavel Shilovsky 9224dfc2f9 CIFS: Move building path to root to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:10 +04:00
Pavel Shilovsky be4cb9e3d4 CIFS: Query SMB2 inode info
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:08 +04:00
Pavel Shilovsky 1208ef1f76 CIFS: Move query inode info code to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:07 +04:00
Pavel Shilovsky 2503a0dba9 CIFS: Add SMB2 support for is_path_accessible
that needs for a successful mount through SMB2 protocol.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:05 +04:00
Pavel Shilovsky 68889f269b CIFS: Move is_path_accessible to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:04 +04:00
Pavel Shilovsky af4281dc22 CIFS: Move informational tcon calls to ops struct
and rename variables in cifs_mount.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:02 +04:00
Pavel Shilovsky b669f33ca6 CIFS: Move getting dfs referalls to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:55:01 +04:00
Pavel Shilovsky aa24d1e969 CIFS: Process reconnects for SMB2 shares
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:59 +04:00
Pavel Shilovsky faaf946a7d CIFS: Add tree connect/disconnect capability for SMB2
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:58 +04:00
Pavel Shilovsky 5478f9ba9a CIFS: Add session setup/logoff capability for SMB2
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:57 +04:00
Pavel Shilovsky ec2e4523fd CIFS: Add capability to send SMB2 negotiate message
and add negotiate request type to let set_credits know that
we are only on negotiate stage and no need to make a decision
about disabling echos and oplocks.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:55 +04:00
Pavel Shilovsky 3792c17328 CIFS: Respect SMB2 header/max header size
Use SMB2 header size values for allocation and memset because they
are bigger and suitable for both CIFS and SMB2.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:54 +04:00
Pavel Shilovsky 093b2bdad3 CIFS: Make demultiplex_thread work with SMB2 code
Now we can process SMB2 messages: check message, get message id
and wakeup awaiting routines.

Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:52 +04:00
Pavel Shilovsky 4b1241006c CIFS: Fix a wrong pointer in atomic_open
Commit 30d9049474 caused a regression
in cifs open codepath.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 21:54:26 +04:00
Pavel Shilovsky 28ea5290d7 CIFS: Add SMB2 credits support
For SMB2 protocol we can add more than one credit for one received
request: it depends on CreditRequest field in SMB2 response header.
Also we divide all requests by type: echoes, oplocks and others.
Each type uses its own slot pull.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:23 -05:00
Pavel Shilovsky 2dc7e1c033 CIFS: Make transport routines work with SMB2
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:20 -05:00
Steve French ddfbefbd39 CIFS: Map SMB2 status codes to POSIX errors
Add mapping table for 32 bit SMB2 status codes to linux errors.
Note that SMB2 does not use DOS/OS2 errors (ever) so mapping to
DOS/OS2 errors as a common network subset (as we do for cifs)
doesn't help. And note that the set of status codes is much more
complete here.

Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:15 -05:00
Pavel Shilovsky b8030603d9 CIFS: Add SMB2 status codes
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:13 -05:00
Pavel Shilovsky f7ec0d0bbc CIFS: Rename 7 error codes to NT_ style
and consider such codes as CIFS errors.

Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:10 -05:00
Pavel Shilovsky 6d5786a34d CIFS: Rename Get/FreeXid and make them work with unsigned int
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:08 -05:00
Pavel Shilovsky 2e6e02ab6d CIFS: Move protocol specific tcon/tdis code to ops struct
and rename variables around the code changes.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:06 -05:00
Pavel Shilovsky 58c45c58a1 CIFS: Move protocol specific session setup/logoff code to ops struct
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 10:25:03 -05:00
Cong Wang 2164d33446 pipe: remove KM_USER0 from comments
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-07-24 15:27:34 +08:00
Pavel Shilovsky 286170aa24 CIFS: Move protocol specific negotiate code to ops struct
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 00:33:26 -05:00
Pavel Shilovsky a891f0f895 CIFS: Extend credit mechanism to process request type
Split all requests to echos, oplocks and others - each group uses
its own credit slot. This is indicated by new flags

CIFS_ECHO_OP and CIFS_OBREAK_OP

that are not used now for CIFS. This change is required to support
SMB2 protocol because of different processing of these commands.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 00:32:48 -05:00
Pavel Shilovsky 316cf94a91 CIFS: Move trans2 processing to ops struct
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-24 00:32:25 -05:00
Linus Torvalds 83c7f72259 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
 "Notable highlights:

   - iommu improvements from Anton removing the per-iommu global lock in
     favor of dividing the DMA space into pools, each with its own lock,
     and hashed on the CPU number.  Along with making the locking more
     fine grained, this gives significant improvements in multiqueue
     networking scalability.

   - Still from Anton, we know provide a vdso based variant of getcpu
     which makes sched_getcpu with the appropriate glibc patch something
     like 18 times faster.

   - More anton goodness (he's been busy !) in other areas such as a
     faster __clear_user and copy_page on P7, various perf fixes to
     improve sampling quality, etc...

   - One more step toward removing legacy i2c interfaces by using new
     device-tree based probing of platform devices for the AOA audio
     drivers

   - A nice series of patches from Michael Neuling that helps avoiding
     confusion between register numbers and litterals in assembly code,
     trying to enforce the use of "%rN" register names in gas rather
     than plain numbers.

   - A pile of FSL updates

   - The usual bunch of small fixes, cleanups etc...

  You may spot a change to drivers/char/mem.  The patch got no comment
  or ack from outside, it's a trivial patch to allow the architecture to
  skip creating /dev/port, which we use to disable it on ppc64 that
  don't have a legacy brige.  On those, IO ports 0...64K are not mapped
  in kernel space at all, so accesses to /dev/port cause oopses (and
  yes, distros -still- ship userspace that bangs hard coded ports such
  as kbdrate)."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
  powerpc/mpic: Create a revmap with enough entries for IPIs and timers
  Remove stale .rej file
  powerpc/iommu: Fix iommu pool initialization
  powerpc/eeh: Check handle_eeh_events() return value
  powerpc/85xx: Add phy nodes in SGMII mode for MPC8536/44/72DS & P2020DS
  powerpc/e500: add paravirt QEMU platform
  powerpc/mpc85xx_ds: convert to unified PCI init
  powerpc/fsl-pci: get PCI init out of board files
  powerpc/85xx: Update corenet64_smp_defconfig
  powerpc/85xx: Update corenet32_smp_defconfig
  powerpc/85xx: Rename P1021RDB-PC device trees to be consistent
  powerpc/watchdog: move booke watchdog param related code to setup-common.c
  sound/aoa: Adapt to new i2c probing scheme
  i2c/powermac: Improve detection of devices from device-tree
  powerpc: Disable /dev/port interface on systems without an ISA bridge
  of: Improve prom_update_property() function
  powerpc: Add "memory" attribute for mfmsr()
  powerpc/ftrace: Fix assembly trampoline register usage
  powerpc/hw_breakpoints: Fix incorrect pointer access
  powerpc: Put the gpr save/restore functions in their own section
  ...
2012-07-23 18:54:23 -07:00
Jeff Layton 7659624ffb cifs: reinstate sec=ntlmv2 mount option
sec=ntlmv2 as a mount option got dropped in the mount option overhaul.

Cc: Sachin Prabhu <sprabhu@redhat.com>
Cc: <stable@vger.kernel.org> # 3.4+
Reported-by: Günter Kukkukk <linux@kukkukk.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-23 20:48:24 -05:00
Linus Torvalds 93912fe69d * Added another debugfs knob for forcing UBIFS R/O mode without flushing caches
or finishing commit or any other I/O operation. I've originally added this
   knob in order to reproduce the free space fixup bug (see c672793) on nandsim.
   Without this knob I would have to do real power-cuts, which would make
   debugging much harder. Then I've decided to keep this knob because it is also
   useful for UBIFS power-cut recovery end error-paths testing.
 * Well-spotted fix from Julia. This bug did not cause real troubles for
   UBIFS, but nevertheless it could cause issues for someone trying to modify
   the orphans handling code. Kudos to coccinelle!
 * Minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQDQxSAAoJECmIfjd9wqK0W3QQAKCbDBNMJaj/idXcupatoOI2
 4M3IBJ5mwfjrBeXy1kcsxxSadW7fJTy6FK7jfPpYq2iCLmFkh0Ba6QSgHHgCmb1v
 3ExIR2+2hLl1YhliDU1q+k9H4fGDklq/CNgiTGDti0pdPczZfBKrksXQy8leqT8d
 MxzQDzQsCxVKIizHHMEr373s1y8QHhabFK1LU1wkXJK/p/bYS0mgD0BOjIrZKuCd
 W5L5XzIIbcXs1h91Lgdnr8fj5TmMtM1tQ890U2OaNPt+44Eh6Gjjku3N8RaWvaIe
 vXMpDNb00NwtJp0V2SIsez3VtaA1xJaW7bd4UcyNNvBy5aECNx8K0IbZKTUkiMSj
 3xOiqZr/LWpPkAq3NL6kAZkgiam9Aez5Yc+XTna3HP79L/LYbXsJiOewlDwvGbrh
 89+EatjLUVFh00BvzmmxuZkm2bWm3sPQiAKWqwc7ijMGwnPt+c6Eiepgw3XXVHgI
 qmw1Ddkh4VxdGXi8r3RMxg21MqFttM0astT5cmQeF//H6Wq3S2u8RFgZqaLpGc6s
 vcgx92V6BBrWzCZKxBQs3Bd21uSrVsOa/js5wORpuWZAGk1Yy1BFy1vFC0PuqmcI
 kvZa/bQPP3/PiHAI8qq2Min5baoG9jUl6ui1eM8KXECkzQSQJjxN1bth8kzZHuPY
 HMNoK0RVUt3V5xtYI7i7
 =ecyI
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.6-rc1' of git://git.infradead.org/linux-ubifs

Pull UBIFS updates from Artem Bityutskiy:

 - Added another debugfs knob for forcing UBIFS R/O mode without
   flushing caches or finishing commit or any other I/O operation.  I've
   originally added this knob in order to reproduce the free space fixup
   bug (see commit c6727932cfdb: "UBIFS: fix a bug in empty space
   fix-up") on nandsim.

   Without this knob I would have to do real power-cuts, which would
   make debugging much harder.  Then I've decided to keep this knob
   because it is also useful for UBIFS power-cut recovery end
   error-paths testing.

 - Well-spotted fix from Julia.  This bug did not cause real troubles
   for UBIFS, but nevertheless it could cause issues for someone trying
   to modify the orphans handling code.  Kudos to coccinelle!

 - Minor cleanups.

* tag 'upstream-3.6-rc1' of git://git.infradead.org/linux-ubifs:
  UBIFS: remove invalid reference to list iterator variable
  UBIFS: simplify reply code a bit
  UBIFS: add debugfs knob to switch to R/O mode
  UBIFS: fix compilation warning
2012-07-23 15:50:52 -07:00
Jeff Layton 762a4206a3 cifs: rename cifs_sign_smb2 to cifs_sign_smbv
"smb2" makes me think of the SMB2.x protocol, which isn't at all what
this function is for...

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-23 16:36:34 -05:00
Jeff Layton d971e0656b cifs: remove bogus reset of smb_buf_length in smb_send routines
There's a comment here about how we don't want to modify this length,
but nothing in this function actually does.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-23 16:36:31 -05:00
Jeff Layton c5fd363d77 cifs: move file_lock off stack in cifs_push_posix_locks
struct file_lock is pretty large, so we really don't want that on the
stack in a potentially long call chain. Reorganize the arguments to
CIFSSMBPosixLock to eliminate the need for that.

Eliminate the get_flag and simply use a non-NULL pLockInfo to indicate
that this is a "get" operation. In order to do that, need to add a new
loff_t argument for the start_offset.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-23 16:36:29 -05:00
Jeff Layton ac3aa2f8ae cifs: remove extraneous newlines from cERROR and cFYI calls
Those macros add a newline on their own, so there's not any need to
embed one in the message itself.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-23 16:36:26 -05:00
Jeff Layton 00401ff780 cifs: after upcalling for krb5 creds, invalidate key rather than revoking it
Calling key_revoke here isn't ideal as further requests for the key will
end up returning -EKEYREVOKED until it gets purged from the cache. What we
really intend here is to force a new upcall on the next request_key.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-23 16:36:24 -05:00
Liu Bo 67c9684f48 Btrfs: improve multi-thread buffer read
While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.

Here is a scenario in fio jobs:

We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.

And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:

     t1          t2          t3          t4
   add Page1
   read Page1  add Page2
     |         read Page2  add Page3
     |            |        read Page3  add Page4
     |            |           |        read Page4
-----|------------|-----------|-----------|--------
     v            v           v           v
    bio          bio         bio         bio

Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio.  Thus, we can end up with far more bios
than we need.

Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.

With that said, we can make each bio hold more pages and reduce the number
of bios we need.

Here is some numbers taken from fio results:
         w/o patch                 w patch
       -------------  --------  ---------------
READ:    745MB/s        +25%       934MB/s

[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=2000M
invalidate=1

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:10 -04:00
Liu Bo df57dbe6bf Btrfs: make btrfs's allocation smoothly with preallocation
For backref walking, we've introduce delayed ref's sequence.  However,
it changes our preallocation behavior.

The story is that when we preallocate an extent and then mark it written
piece by piece, the ideal case should be that we don't need to COW the
extent, which is why we use 'preallocate'.

But we may not make use of preallocation, since when we check for cross refs on
the extent, we may have two ref entries which have the same content except
the sequence value, and we recognize them as cross refs and do COW to allocate
another extent.

So we end up with several pieces of space instead of an whole extent.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:10 -04:00
Josef Bacik 51561ffec9 Btrfs: lock the transition from dirty to writeback for an eb
There is a small window where an eb can have no IO bits set on it, which
could potentially result in extent_buffer_under_io() returning false when we
want it to return true, which could result in not fun things happening.  So
in order to protect this case we need to hold the refs_lock when we make
this transition to make sure we get reliable results out of
extent_buffer_udner_io().  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:09 -04:00
Josef Bacik 594831c4b2 Btrfs: fix potential race in extent buffer freeing
This sounds sort of impossible but it is the only thing I can think of and
at the very least it is theoretically possible so here it goes.

If we are in try_release_extent_buffer we will check that the ref count on
the extent buffer is 1 and not under IO, and then go down and clear the tree
ref.  If between this check and clearing the tree ref somebody else comes in
and grabs a ref on the eb and the marks it dirty before
try_release_extent_buffer() does it's tree ref clear we can end up with a
dirty eb that will be freed while it is still dirty which will result in a
panic.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:09 -04:00
Josef Bacik e64860aa05 Btrfs: don't return true in releasepage unless we actually freed the eb
I noticed while looking at an extent_buffer race that we will
unconditionally return 1 if we get down to release_extent_buffer after
clearing the tree ref.  However we can easily race in here and get a ref on
the eb and not actually free the eb.  So make release_extent_buffer return 1
if it free'd the eb and 0 if not so we can be a little kinder to the vm.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:08 -04:00
Stefan Behrens a98cdb85b9 Btrfs: suppress printk() if all device I/O stats are zero
Code is added to suppress the I/O stats printing at mount time if all
statistic values are zero.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-07-23 16:28:07 -04:00
Stefan Behrens 5021976d8d Btrfs: remove unwanted printk() for btrfs device I/O stats
People complained about the annoying kernel log message
"btrfs: no dev_stats entry found ... (OK on first mount after mkfs)"
everytime a filesystem is mounted for the first time after running
mkfs. Since the distribution of the btrfs-progs is not synchronized
to the kernel version, mkfs like it is now will be used also in the
future. Then this message is not useful to find errors, it is just
annoying. This commit removes the printk().

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-07-23 16:28:07 -04:00
Li Zefan 18077bb413 Btrfs: rewrite BTRFS_SETGET_FUNCS
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.

The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.

It results in redunction of ~37K bytes.

   text    data     bss     dec     hex filename
 629661   12489     216  642366   9cd3e fs/btrfs/btrfs.o.orig
 592637   12489     216  605342   93c9e fs/btrfs/btrfs.o

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-07-23 16:28:06 -04:00
Li Zefan 293f7e0740 Btrfs: zero unused bytes in inode item
The otime field is not zeroed, so users will see random otime in an old
filesystem with a new kernel which has otime support in the future.

The reserved bytes are also not zeroed, and we'll have compatibility
issue if we make use of those bytes.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-07-23 16:28:05 -04:00
Li Zefan b4d7c3c945 Btrfs: kill free_space pointer from inode structure
Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
which means every inode has the same BTRFS_I(inode)->free_space pointer.

This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-07-23 16:28:05 -04:00
Anand Jain d5b025d510 btrfs read error corrected message floods the console during recovery
Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:04 -04:00