Commit Graph

44912 Commits

Author SHA1 Message Date
Andrew Elble ed94164398 nfsd: implement machine credential support for some operations
This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:32:47 -04:00
Andrew Elble dedeb13f9e nfsd: allow mach_creds_match to be used more broadly
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:32:47 -04:00
Miklos Szeredi 6343a21208 locks: use file_inode()
(Another one for the f_path debacle.)

ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode().  This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.

So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode.  When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.

Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-01 10:24:18 -04:00
Scott Mayhew cb7d224f82 lockd: unregister notifier blocks if the service fails to come up completely
If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0751ddf77b "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-30 16:35:07 -04:00
Linus Torvalds da2f6aba4a Merge branch 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes part 2 from Chris Mason:
 "This has one patch from Omar to bring iterate_shared back to btrfs.

  We have a tree of work we queue up for directory items and it doesn't
  lend itself well to shared access.  While we're cleaning it up, Omar
  has changed things to use an exclusive lock when there are delayed
  items"

* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
2016-06-25 08:53:38 -07:00
Linus Torvalds b971712afc Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I have a two part pull this time because one of the patches Dave
  Sterba collected needed to be against v4.7-rc2 or higher (we used
  rc4).  I try to make my for-linus-xx branch testable on top of the
  last major so we can hand fixes to people on the list more easily, so
  I've split this pull in two.

  This first part has some fixes and two performance improvements that
  we've been testing for some time.

  Josef's two performance fixes are most notable.  The transid tracking
  patch makes a big improvement on pretty much every workload"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: Force stripesize to the value of sectorsize
  btrfs: fix disk_i_size update bug when fallocate() fails
  Btrfs: fix error handling in map_private_extent_buffer
  Btrfs: fix error return code in btrfs_init_test_fs()
  Btrfs: don't do nocow check unless we have to
  btrfs: fix deadlock in delayed_ref_async_start
  Btrfs: track transid for delayed ref flushing
2016-06-25 08:42:31 -07:00
Omar Sandoval 02dbfc99b4 Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Commit fe742fd4f9 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.

This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.

Tested with xfstests and by doing a parallel kernel build:

	while make tinyconfig && make -j4 && git clean dqfx; do
		:
	done

along with a bunch of parallel finds in another shell:

	while true; do
		for ((i=0; i<4; i++)); do
			find . >/dev/null &
		done
		wait
	done

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-25 06:20:10 -07:00
Linus Torvalds 086e3eb65e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "Two weeks worth of fixes here"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
  autofs: don't get stuck in a loop if vfs_write() returns an error
  mm/page_owner: avoid null pointer dereference
  tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
  fs/nilfs2: fix potential underflow in call to crc32_le
  oom, suspend: fix oom_reaper vs. oom_killer_disable race
  ocfs2: disable BUG assertions in reading blocks
  mm, compaction: abort free scanner if split fails
  mm: prevent KASAN false positives in kmemleak
  mm/hugetlb: clear compound_mapcount when freeing gigantic pages
  mm/swap.c: flush lru pvecs on compound page arrival
  memcg: css_alloc should return an ERR_PTR value on error
  memcg: mem_cgroup_migrate() may be called with irq disabled
  hugetlb: fix nr_pmds accounting with shared page tables
  Revert "mm: disable fault around on emulated access bit architecture"
  Revert "mm: make faultaround produce old ptes"
  mailmap: add Boris Brezillon's email
  mailmap: add Antoine Tenart's email
  mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
  mm: mempool: kasan: don't poot mempool objects in quarantine
  ...
2016-06-24 19:08:33 -07:00
Andrey Vagin 5a9294e5c5 autofs: don't get stuck in a loop if vfs_write() returns an error
__vfs_write() returns a negative value in a error case.

Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Torsten Hilbrich 63d2f95d63 fs/nilfs2: fix potential underflow in call to crc32_le
The value `bytes' comes from the filesystem which is about to be
mounted.  We cannot trust that the value is always in the range we
expect it to be.

Check its value before using it to calculate the length for the crc32_le
call.  It value must be larger (or equal) sumoff + 4.

This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1.  This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.

  BUG: unable to handle kernel paging request at ffff88021e600000
  IP:  crc32_le+0x36/0x100
  ...
  Call Trace:
    nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
    nilfs_load_super_block+0x142/0x300 [nilfs2]
    init_nilfs+0x60/0x390 [nilfs2]
    nilfs_mount+0x302/0x520 [nilfs2]
    mount_fs+0x38/0x160
    vfs_kern_mount+0x67/0x110
    do_mount+0x269/0xe00
    SyS_mount+0x9f/0x100
    entry_SYSCALL_64_fastpath+0x16/0x71

Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Gang He 7186ee06b6 ocfs2: disable BUG assertions in reading blocks
According to some high-load testing, these two BUG assertions were
encountered, this led system panic.  Actually, there were some
discussions about removing these two BUG() assertions, it would not
bring any side effect.

Then, I did the the following changes,

1) use the existing macro CATCH_BH_JBD_RACES to wrap BUG() in the
   ocfs2_read_blocks_sync function like before.

2) disable the macro CATCH_BH_JBD_RACES in Makefile by default.

Link: http://lkml.kernel.org/r/1466574294-26863-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko f2db19719a jbd2: get rid of superfluous __GFP_REPEAT
jbd2_alloc is explicit about its allocation preferences wrt.  the
allocation size.  Sub page allocations go to the slab allocator and
larger are using either the page allocator or vmalloc.  This is all good
but the logic is unnecessarily complex.

1) as per Ted, the vmalloc fallback is a left-over:

 : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so
 : the code path that calls vmalloc() should never get called.  When we
 : conveted jbd2_alloc() to suppor sub-page size allocations in commit
 : d2eecb0393, there was an assumption that it could be called with a size
 : greater than PAGE_SIZE, but that's certaily not true today.

Moreover vmalloc allocation might even lead to a deadlock because the
callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.

2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored
   since the flag was introduced.

Let's simplify the code flow and use the slab allocator for sub-page
requests and the page allocator for others.  Even though order > 0 is
not currently used as per above leave that option open.

Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Linus Torvalds 9c46a6df3b Fix missing server-side permission checks on setting NFS ACLs.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXba7TAAoJECebzXlCjuG+j+gP/18y6ot02Y5R2pI/O8nqoY3I
 WeBNOo1yD77wQ1SopiIbPL/ChxOh/OVlUzo9ikNtwm5l6Op8mLMxPYaDjaIpA5Nt
 FC/pAHibdTJA4ZjzenRhnEEFYbOQh0GssF/qMG30ySGPhx0eoonXi5/qYvjFyTBF
 BuDrpC4YHSNvqCZ/r0aD2bw79Skw8cBPdj+SUfK2r37WyuQ4Kade9NCmDYwSNxSx
 6cru5ztRQSE8Ni0le3U2wTlYhq8xrpP0bRdIzc/9EipdKVdsvfukonjnT+dwtDks
 72fwDoALAZq0iiIur7LKaUjkaZcKzHwe6LVsZEoiJ5aeI2a2FodLwoyXl4SntAR7
 027YEqe7Pc+KHGUYACVuNuCcJkEK5B3zRBBSNoskhkPaK/lJ7BMSXNNhIt248YE3
 HAl1vuf4PakCgh7qIsiUHB1EVs6FCcG8aKH1TmumvPD2udwabiYcKqd8soNu5ZWu
 ALi1vtD/8B1LEI8TacP5NIt8Pdr1AQ0kVDFWlZSiK3oE11DrHLiUgfvl2y7cokMa
 xzcNnoyEppaWNFJzYzQes8XO7Ti/DLJoCB8JnxMaWT1BfVhpEAs1LNl4AIHij5fO
 /PKNs4OusntvOmEvgKtxZpvqXaElgvXz7LMgzM2bmMGMVY+mq0+lpDbzAK91ijk0
 di8+ivIMayA60P5xV4dJ
 =TZ/R
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Fix missing server-side permission checks on setting NFS ACLs"

* tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux:
  nfsd: check permissions when setting ACLs
  posix_acl: Add set_posix_acl
2016-06-24 17:22:27 -07:00
Ben Hutchings 999653786d nfsd: check permissions when setting ACLs
Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly.  Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.

Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)

This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.

The permission checks and the inode locking were lost with commit
4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.

Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl]
Fixes: 4ac7249e
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24 12:11:52 -04:00
Andreas Gruenbacher 485e71e8fb posix_acl: Add set_posix_acl
Factor out part of posix_acl_xattr_set into a common function that takes
a posix_acl, which nfsd can also call.

The prototype already exists in include/linux/posix_acl.h.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24 12:11:34 -04:00
Linus Torvalds 63c04ee7d3 This pull requests contains fixes for two critical bugs in UBI and UBIFS:
1. Fixes the possibility of losing data upon a power cut when UBI tries
    to recover from a write error.
 2. Fixes page migration on UBIFS. It turned out that the default page
    migration function is not suitable for UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXa5tEAAoJEEtJtSqsAOnW090P/RcQjIfVf2g3r8VRp38OQPbb
 MTd4sD/rnyt5Eq0QYUPWG5xcYK2BWI1PwpdB81JvW5hxnXPgG8DpVIxjzt/7Xgnp
 QheYe9tMfgYjDntz1rzGVa/uHSAldP9V4czgczrBW/0lwnRsZ6mLY1RA9Oz0hRdG
 cp53I8CSD0DPyqU0XkgzLkzVUstmySwQ5i46C0kQEnlRcytReOLgcjSrXXn+/Zih
 yZxhtDQSCKmQAfVmERggPXVHo8jFtVfej52ja7RFcMA2uXvXqljOBNCyLUYPdYka
 XdQEKsXRLl69ktFUXwZwPAYAW23I8+PMpsoljHDVc0hF25p8omp3D+7HE18SsMSv
 6RNnUwz+PDbiFApyoTu0SBgHN/OO9o6rjNNoRIInoKpk0NvWmrMQOo6BIFsX4yq1
 0dOVJiKXVoFuo75Yw9mOKdrV/Z5P1TvgdTBj6g03aUM9vcX1Gz6+1xKkvcXGgh02
 8qFDZdZ5L87TlpMkvtWO87Ir0ssrfjxpvxR8pPsxxqvxbfUuVmss4ILuh9AVSVk+
 d1zrz30+JZzTbIrky/7R31i6Bx2+reYdTKiPIkST9sF5WblUPSeyUoKq1OlNRYxj
 n+0Q8S5Tm/6AHXUOQFxurbXU+D7G7TaL/CsBeepvV/AqJb07+vBxUuGFH1rDbmLB
 r5dTfOXn3iNEmmNyrhgN
 =EDeX
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Richard Weinberger:
 "This contains fixes for two critical bugs in UBI and UBIFS:

   - fix the possibility of losing data upon a power cut when UBI tries
     to recover from a write error

   - fix page migration on UBIFS.  It turned out that the default page
     migration function is not suitable for UBIFS"

* tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs:
  UBIFS: Implement ->migratepage()
  mm: Export migrate_page_move_mapping and migrate_page_copy
  ubi: Make recover_peb power cut aware
  gpio: make library immune to error pointers
  gpio: make sure gpiod_to_irq() returns negative on NULL desc
  gpio: 104-idi-48: Fix missing spin_lock_init for ack_lock
2016-06-23 22:48:48 -07:00
Chandan Rajendra b7f67055d2 Btrfs: Force stripesize to the value of sectorsize
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.

This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:42 -07:00
Wang Xiaoguang c0d2f6104e btrfs: fix disk_i_size update bug when fallocate() fails
When doing truncate operation, btrfs_setsize() will first call
truncate_setsize() to set new inode->i_size, but if later
btrfs_truncate() fails, btrfs_setsize() will call
"i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the
inmemory inode size, now bug occurs. It's because for truncate
case btrfs_ordered_update_i_size() directly uses inode->i_size
to update BTRFS_I(inode)->disk_i_size, indeed we should use the
"offset" argument to update disk_i_size. Here is the call graph:
==>btrfs_truncate()
====>btrfs_truncate_inode_items()
======>btrfs_ordered_update_i_size(inode, last_size, NULL);
Here btrfs_ordered_update_i_size()'s offset argument is last_size.

And below test case can reveal this bug:

dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs  -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint

echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
	i=$((i + 1))
	dst_offset=$((blocksize * i))
	xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
		testfile > /dev/null
done
sync

truncate --size 0 testfile
ls -l testfile
du -sh testfile
exit

In this case, truncate operation will fail for enospc reason and
"du -sh testfile" returns value greater than 0, but testfile's
size is 0, we need to reflect correct inode->i_size.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:41 -07:00
Liu Bo 415b35a55b Btrfs: fix error handling in map_private_extent_buffer
map_private_extent_buffer() can return -EINVAL in two different cases,
1. when the requested contents span two pages if nodesize is larger
   than pagesize,
2. when it detects something insane.

The 2nd one used to be only a WARN_ON(1), and we decided to return a error
to callers, but we didn't fix up all its callers, which will be
addressed by this patch.

Without this, btrfs may end up with 'general protection', ie.
reading invalid memory.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:40 -07:00
Wei Yongjun 04e1b65af2 Btrfs: fix error return code in btrfs_init_test_fs()
Fix to return a negative error code from the kern_mount() error handling
case instead of 0(ret is set to 0 by register_filesystem), as done
elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:39 -07:00
Josef Bacik c6887cd111 Btrfs: don't do nocow check unless we have to
Before we write into prealloc/nocow space we have to make sure that there are no
references to the extents we are writing into, which means checking the extent
tree and csum tree in the case of nocow.  So we don't want to do the nocow dance
unless we can't reserve data space, since it's a serious drag on performance.
With the following sequence

fallocate -l10737418240 /mnt/btrfs-test/file
cp --reflink /mnt/btrfs-test/file /mnt/btrfs-test/link
fio --name=randwrite --rw=randwrite --bs=4k --filename=/mnt/btrfs-test/file \
	--end_fsync=1

we get the worst case scenario where we have to fall back on to doing the check
anyway.

Without this patch
lat (usec): min=5, max=111598, avg=27.65, stdev=124.51
write: io=10240MB, bw=126876KB/s, iops=31718, runt= 82646msec

With this patch
lat (usec): min=3, max=91210, avg=14.09, stdev=110.62
write: io=10240MB, bw=212753KB/s, iops=53188, runt= 49286msec

We get twice the throughput, half of the runtime, and half of the average
latency.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
[ PAGE_CACHE_ removal related fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:57:14 -07:00
Chris Mason 0f873eca82 btrfs: fix deadlock in delayed_ref_async_start
"Btrfs: track transid for delayed ref flushing" was deadlocking on
btrfs_attach_transaction because its not safe to call from the async
delayed ref start code.  This commit brings back btrfs_join_transaction
instead and checks for a blocked commit.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:54:18 -07:00
Josef Bacik 31b9655f43 Btrfs: track transid for delayed ref flushing
Using the offwakecputime bpf script I noticed most of our time was spent waiting
on the delayed ref throttling.  This is what is supposed to happen, but
sometimes the transaction can commit and then we're waiting for throttling that
doesn't matter anymore.  So change this stuff to be a little smarter by tracking
the transid we were in when we initiated the throttling.  If the transaction we
get is different then we can just bail out.  This resulted in a 50% speedup in
my fs_mark test, and reduced the amount of time spent throttling by 60 seconds
over the entire run (which is about 30 minutes).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:54:18 -07:00
Kirill A. Shutemov 4ac1c17b20 UBIFS: Implement ->migratepage()
During page migrations UBIFS might get confused
and the following assert triggers:
[  213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436)
[  213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008
[  213.490000] Hardware name: Allwinner sun4i/sun5i Families
[  213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14)
[  213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0)
[  213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50)
[  213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8)
[  213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290)
[  213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80)
[  213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0)
[  213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4)
[  213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0)
[  213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8)
[  213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274)
[  213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c)
[  213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0)
[  213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8)
[  213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48)
[  213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444)
[  213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614)
[  213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c)
[  213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34)

UBIFS is using PagePrivate() which can have different meanings across
filesystems. Therefore the generic page migration code cannot handle this
case correctly.
We have to implement our own migration function which basically does a
plain copy but also duplicates the page private flag.
UBIFS is not a block device filesystem and cannot use buffer_migrate_page().

Cc: stable@vger.kernel.org
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[rw: Massaged changelog, build fixes, etc...]
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Christoph Hellwig <hch@lst.de>
2016-06-23 00:29:53 +02:00
Linus Torvalds f9020d1741 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fix from Eric Biederman:
 "This contains just a single small patch that fixes a tiny hole in the
  logic of allowing unprivileged mounting of proc and sysfs.

  In practice I don't think anyone is affected because having MNT_RDONLY
  clear in mnt->mnt_flags but MS_RDONLY set in sb->s_flags is very weird
  for a filesystem, and weirder for proc and sysfs.  However if it
  happens let's handle it correctly and then no one has to to worry
  about this crazy case"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mnt: Account for MS_RDONLY in fs_fully_visible
2016-06-22 14:11:24 -07:00
Linus Torvalds 67016f6cdf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A couple more of d_walk()/d_subdirs reordering fixes (stable fodder;
  ought to solve that crap for good) and a fix for a brown paperbag bug
  in d_alloc_parallel() (this cycle)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix idiotic braino in d_alloc_parallel()
  autofs races
  much milder d_walk() race
2016-06-20 10:41:51 -07:00
Al Viro e7d6ef9790 fix idiotic braino in d_alloc_parallel()
Check for d_unhashed() while searching in in-lookup hash was absolutely
wrong.  Worse, it masked a deadlock on dget() done under bitlock that
nests inside ->d_lock.  Thanks to J. R. Okajima for spotting it.

Spotted-by: "J. R. Okajima" <hooanon05g@gmail.com>
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-20 10:07:42 -04:00
Linus Torvalds c3695331f3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and a reiserfs fix from Jan Kara:
 "A couple of udf fixes (most notably a bug in parsing UDF partitions
  which led to inability to mount recent Windows installation media) and
  a reiserfs fix for handling kstrdup failure"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: check kstrdup failure
  udf: Use correct partition reference number for metadata
  udf: Use IS_ERR when loading metadata mirror file entry
  udf: Don't BUG on missing metadata partition descriptor
2016-06-19 07:05:14 -10:00
Linus Torvalds 607117a153 Driver core fixes for 4.7-rc4
Here are a small number of debugfs, ISA, and one driver core fix for 4.7-rc4.
 
 All of these resolve reported issues.  The ISA ones have spent the least
 amount of time in linux-next, sorry about that, I didn't realize they
 were regressions that needed to get in now (thanks to Thorsten for the
 prodding!) but they do all pass the 0-day bot tests.  The others have
 been in linux-next for a while now.
 
 Full details about them are in the shortlog below.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAldlbeQACgkQMUfUDdst+ymnFACfaWhEKA/84jwNNHiim92diJrY
 zYsAoLOmpBw68yL6qTSZbcWJF4Flb6Xk
 =N8M2
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are a small number of debugfs, ISA, and one driver core fix for
  4.7-rc4.

  All of these resolve reported issues.  The ISA ones have spent the
  least amount of time in linux-next, sorry about that, I didn't realize
  they were regressions that needed to get in now (thanks to Thorsten
  for the prodding!) but they do all pass the 0-day bot tests.  The
  others have been in linux-next for a while now.

  Full details about them are in the shortlog below"

* tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  isa: Dummy isa_register_driver should return error code
  isa: Call isa_bus_init before dependent ISA bus drivers register
  watchdog: ebc-c384_wdt: Allow build for X86_64
  iio: stx104: Allow build for X86_64
  gpio: Allow PC/104 devices on X86_64
  isa: Allow ISA-style drivers on modern systems
  base: make module_create_drivers_dir race-free
  debugfs: open_proxy_open(): avoid double fops release
  debugfs: full_proxy_open(): free proxy on ->open() failure
  kernel/kcov: unproxify debugfs file's fops
2016-06-18 06:04:01 -10:00
Linus Torvalds 4c6459f945 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The most user visible change here is a fix for our recent superblock
  validation checks that were causing problems on non-4k pagesized
  systems"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize
  btrfs: remove build fixup for qgroup_account_snapshot
  btrfs: use new error message helper in qgroup_account_snapshot
  btrfs: avoid blocking open_ctree from cleaner_kthread
  Btrfs: don't BUG_ON() in btrfs_orphan_add
  btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
  Btrfs: check if extent buffer is aligned to sectorsize
  btrfs: Use correct format specifier
2016-06-18 05:57:59 -10:00
Chandan Rajendra dd5c93111d Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize
Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence
restricting stripesize to be equal to sectorsize would cause super block
validation to return an error on architectures where PAGE_SIZE is not
equal to 4096.

Hence as a workaround, this commit allows stripesize to be set to 4096
bytes.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:49 +02:00
David Sterba 89c5a5441d btrfs: remove build fixup for qgroup_account_snapshot
Introduced in 2c1984f244 ("btrfs: build fixup for
qgroup_account_snapshot") as temporary bisectability build fixup.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
David Sterba f7af3934c2 btrfs: use new error message helper in qgroup_account_snapshot
We've renamed btrfs_std_error, this one is left from last merge.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Zygo Blaxell 90c711ab38 btrfs: avoid blocking open_ctree from cleaner_kthread
This fixes a problem introduced in commit 2f3165ecf1
"btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes".

open_ctree eventually calls btrfs_replay_log which in turn calls
btrfs_commit_super which tries to lock the cleaner_mutex, causing a
recursive mutex deadlock during mount.

Instead of playing whack-a-mole trying to keep up with all the
functions that may want to lock cleaner_mutex, put all the cleaner_mutex
lockers back where they were, and attack the problem more directly:
keep cleaner_kthread asleep until the filesystem is mounted.

When filesystems are mounted read-only and later remounted read-write,
open_ctree did not set fs_info->open and neither does anything else.
Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs
nor cleaner_kthread get confused by the common case of "/" filesystem
read-only mount followed by read-write remount.

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Josef Bacik 3b6571c180 Btrfs: don't BUG_ON() in btrfs_orphan_add
This is just a screwup for developers, so change it to an ASSERT() so developers
notice when things go wrong and deal with the error appropriately if ASSERT()
isn't enabled.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Jeff Mahoney 64c12921e1 btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
The test for !trans->blocks_used in btrfs_abort_transaction is
insufficient to determine whether it's safe to drop the transaction
handle on the floor.  btrfs_cow_block, informed by should_cow_block,
can return blocks that have already been CoW'd in the current
transaction.  trans->blocks_used is only incremented for new block
allocations. If an operation overlaps the blocks in the current
transaction entirely and must abort the transaction, we'll happily
let it clean up the trans handle even though it may have modified
the blocks and will commit an incomplete operation.

In the long-term, I'd like to do closer tracking of when the fs
is actually modified so we can still recover as gracefully as possible,
but that approach will need some discussion.  In the short term,
since this is the only code using trans->blocks_used, let's just
switch it to a bool indicating whether any blocks were used and set
it when should_cow_block returns false.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Liu Bo c871b0f2fd Btrfs: check if extent buffer is aligned to sectorsize
Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
via alloc_extent_buffer().  An unaligned eb can have more pages than it
should have, which ends up extent buffer's leak or some corrupted content
in extent buffer.

This adds a warning to let us quickly know what was happening.

Now that alloc_extent_buffer() no more returns NULL, this changes its
caller and callers of its caller to match with the new error
handling.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Heinrich Schuchardt 16ff4b454f btrfs: Use correct format specifier
Component mirror_num of struct btrfsic_block is defined
as unsigned int. Use %u as format specifier.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Linus Torvalds 41ef72181a Oleg Drokin found and fixed races in the nfsd4 state code that go back
to the big nfs4_lock_state removal around 3.17 (but that were also
 probably hard to reproduce before client changes in 3.20 allowed the
 client to perform parallel opens).
 
 Also fix a 4.1 backchannel crash due to rpc multipath changes in 4.6.
 Trond acked the client-side rpc fixes going through my tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXYsiKAAoJECebzXlCjuG+PK8P/jGBS+h7Zf4GWOOsWN5cbEs/
 8VTn83KXFp/feFhGikUIcAZQJRynDK+tD9Vh2FznC2zKDTLFPS0mAvL3tZyQhUO2
 nEWaCUOFR+sB3aTPlMGwxbGc7NHNQg1hKqKgqcLEEqtozxhFQye3WW0MZNfFCiUZ
 qpq2tK1OGGhJVIp7wWSa8+B2nGFMuasPaGM2OVJrebip49yTG/tT3rwKxKMoB8kS
 i8BwNejoP1KRD6LqvpgdV1ESzkdyokDxKXCrdY/j2lMdp2YRe+cWmX239ojjvm8G
 n9Ow8DYCefuiKiF6iCLZfxpX8dcmVJvT6g+k+9V63A4YCyuGhy/CneA3MO4QyLhq
 yfe2zviJ2kZVz+1Ih3v9kD7ZkyK1hjrxXx/VPrI5CBIXE5eVXin2ZDvTCSoV491h
 g1zscPc9Thgk6gKXsvkaVOXxLHBoUzXeSRbNqVXXZfjl+s4TXLNJ0lcaBYkzh74/
 SypiFeNHjsjNpJYz5GptlbMUpaEoeyH0Y+OiH8d5Jf8hCcQ+CLjKgKSuCH5zrypt
 Lx3U5QWHTT3IXH4QS/njcTSfSDu7BUip4RTLzw6C/ZJf7hd6SS4Xv72J6ZmeDSmg
 146MpAYty8HB04KQWpYx0DGI7UEPlubfRHSF9XzsSitbRtNGr6xvIug8fkKBlXDB
 aHtr+/gI7UvrmnXnlGdD
 =aNkl
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Oleg Drokin found and fixed races in the nfsd4 state code that go back
  to the big nfs4_lock_state removal around 3.17 (but that were also
  probably hard to reproduce before client changes in 3.20 allowed the
  client to perform parallel opens).

  Also fix a 4.1 backchannel crash due to rpc multipath changes in 4.6.
  Trond acked the client-side rpc fixes going through my tree"

* tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: Make init_open_stateid() a bit more whole
  nfsd: Extend the mutex holding region around in nfsd4_process_open2()
  nfsd: Always lock state exclusively.
  rpc: share one xps between all backchannels
  nfsd4/rpc: move backchannel create logic into rpc code
  SUNRPC: fix xprt leak on xps allocation failure
  nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix
2016-06-16 17:25:52 -10:00
Linus Torvalds 9c514bedbe Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This contains two regression fixes: one for the xattr API update and
  one for using the mounter's creds in file creation in overlayfs.

  There's also a fix for a bug in handling hard linked AF_UNIX sockets
  that's been there from day one.  This fix is overlayfs only despite
  the fact that it touches code outside the overlay filesystem: d_real()
  is an identity function for all except overlay dentries"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix uid/gid when creating over whiteout
  ovl: xattr filter fix
  af_unix: fix hard linked sockets on overlay
  vfs: add d_real_inode() helper
2016-06-16 17:16:56 -10:00
Oleg Drokin 8c7245abda nfsd: Make init_open_stateid() a bit more whole
Move the state selection logic inside from the caller,
always making it return correct stp to use.

Signed-off-by: J . Bruce Fields <bfields@fieldses.org>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15 22:03:53 -04:00
Oleg Drokin 5cc1fb2a09 nfsd: Extend the mutex holding region around in nfsd4_process_open2()
To avoid racing entry into nfs4_get_vfs_file().
Make init_open_stateid() return with locked stateid to be unlocked
by the caller.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15 22:03:41 -04:00
Oleg Drokin feb9dad520 nfsd: Always lock state exclusively.
It used to be the case that state had an rwlock that was locked for write
by downgrades, but for read for upgrades (opens). Well, the problem is
if there are two competing opens for the same state, they step on
each other toes potentially leading to leaking file descriptors
from the state structure, since access mode is a bitmap only set once.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15 22:03:31 -04:00
J. Bruce Fields d50039ea5e nfsd4/rpc: move backchannel create logic into rpc code
Also simplify the logic a bit.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
2016-06-15 10:32:25 -04:00
Miklos Szeredi d0e13f5bbe ovl: fix uid/gid when creating over whiteout
Fix a regression when creating a file over a whiteout.  The new
file/directory needs to use the current fsuid/fsgid, not the ones from the
mounter's credentials.

The refcounting is a bit tricky: prepare_creds() sets an original refcount,
override_creds() gets one more, which revert_cred() drops.  So

  1) we need to expicitly put the mounter's credentials when overriding
     with the updated one

  2) we need to put the original ref to the updated creds (and this can
     safely be done before revert_creds(), since we'll still have the ref
     from override_creds()).

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Fixes: 3fe6e52f06 ("ovl: override creds with the ones from the superblock mounter")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-06-15 14:18:59 +02:00
Nicolai Stange 75f0b68b75 debugfs: open_proxy_open(): avoid double fops release
Debugfs' open_proxy_open(), the ->open() installed at all inodes created
through debugfs_create_file_unsafe(),
- grabs a reference to the original file_operations instance passed to
  debugfs_create_file_unsafe() via fops_get(),
- installs it at the file's ->f_op by means of replace_fops()
- and calls fops_put() on it.

Since the semantics of replace_fops() are such that the reference's
ownership is transferred, the subsequent fops_put() will result in a double
release when the file is eventually closed.

Currently, this is not an issue since fops_put() basically does a
module_put() on the file_operations' ->owner only and there don't exist any
modules calling debugfs_create_file_unsafe() yet. This is expected to
change in the future though, c.f. commit c646880814 ("debugfs: add
support for self-protecting attribute file fops").

Remove the call to fops_put() from open_proxy_open().

Fixes: 9fd4dcece4 ("debugfs: prevent access to possibly dead
                      file_operations at file open")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-15 04:56:35 -07:00
Nicolai Stange b10e3e9048 debugfs: full_proxy_open(): free proxy on ->open() failure
Debugfs' full_proxy_open(), the ->open() installed at all inodes created
through debugfs_create_file(),
- grabs a reference to the original struct file_operations instance passed
  to debugfs_create_file(),
- dynamically allocates a proxy struct file_operations instance wrapping
  the original
- and installs this at the file's ->f_op.

Afterwards, it calls the original ->open() and passes its return value back
to the VFS layer.

Now, if that return value indicates failure, the VFS layer won't ever call
->release() and thus, neither the reference to the original file_operations
nor the memory for the proxy file_operations will get released, i.e. both
are leaked.

Upon failure of the original fops' ->open(), undo the proxy installation.
That is:
- Set the struct file ->f_op to what it had been when full_proxy_open()
  was entered.
- Drop the reference to the original file_operations.
- Free the memory holding the proxy file_operations.

Fixes: 49d200deaa ("debugfs: prevent access to removed files' private
                      data")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-15 04:56:35 -07:00
Eric W. Biederman 695e9df010 mnt: Account for MS_RDONLY in fs_fully_visible
In rare cases it is possible for s_flags & MS_RDONLY to be set but
MNT_READONLY to be clear.  This starting combination can cause
fs_fully_visible to fail to ensure that the new mount is readonly.
Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY
is set on the source filesystem of the mount.

In general both MS_RDONLY and MNT_READONLY are set at the same for
mounts so I don't expect any programs to care.  Nor do I expect
MS_RDONLY to be set on proc or sysfs in the initial user namespace,
which further decreases the likelyhood of problems.

Which means this change should only affect system configurations by
paranoid sysadmins who should welcome the additional protection
as it keeps people from wriggling out of their policies.

Cc: stable@vger.kernel.org
Fixes: 8c6cf9cc82 ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-15 06:52:23 -05:00
Geert Uytterhoeven eee930163c nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix
On 32-bit:

    fs/nfsd/blocklayout.c: In function ‘nfsd4_block_get_device_info_scsi’:
    fs/nfsd/blocklayout.c:337: warning: integer constant is too large for ‘long’ type
    fs/nfsd/blocklayout.c:344: warning: integer constant is too large for ‘long’ type
    fs/nfsd/blocklayout.c: In function ‘nfsd4_scsi_fence_client’:
    fs/nfsd/blocklayout.c:385: warning: integer constant is too large for ‘long’ type

Add the missing "ULL" postfix to 64-bit constant NFSD_MDS_PR_KEY to fix
this.

Fixes: f99d4fbdae ("nfsd: add SCSI layout support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-14 11:50:04 -04:00
Al Viro ea01a18494 autofs races
* make autofs4_expire_indirect() skip the dentries being in process of
expiry
* do *not* mess with list_move(); making sure that dentry with
AUTOFS_INF_EXPIRING are not picked for expiry is enough.
* do not remove NO_RCU when we set EXPIRING, don't bother with smp_mb()
there.  Clear it at the same time we clear EXPIRING.  Makes a bunch of
tests simpler.
* rename NO_RCU to WANT_EXPIRE, which is what it really is.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-12 11:24:46 -04:00