Commit Graph

39463 Commits

Author SHA1 Message Date
Christoph Hellwig c5c707f96f nfsd: implement pNFS layout recalls
Add support to issue layout recalls to clients.  For now we only support
full-file recalls to get a simple and stable implementation.  This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall.  For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:43 +01:00
Christoph Hellwig 9cf514ccfa nfsd: implement pNFS operations
Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure.  It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid.  The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused.  As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place.  To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export.  Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:42 +01:00
Christoph Hellwig 4d227fca1b nfsd: make find_any_file available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:41 +01:00
Christoph Hellwig e6ba76e194 nfsd: make find/get/put file available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:41 +01:00
Christoph Hellwig cd61c52231 nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:40 +01:00
Christoph Hellwig 9558f2500a nfsd: add fh_fsid_match helper
Add a helper to check that the fsid parts of two file handles match.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:39 +01:00
Christoph Hellwig 4d94c2ef20 nfsd: move nfsd_fh_match to nfsfh.h
The pnfs code will need it too.  Also remove the nfsd_ prefix to match the
other filehandle helpers in that file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:39 +01:00
Christoph Hellwig 11afe9f76e fs: add FL_LAYOUT lease type
This (ab-)uses the file locking code to allow filesystems to recall
outstanding pNFS layouts on a file.  This new lease type is similar but
not quite the same as FL_DELEG.  A FL_LAYOUT lease can always be granted,
an a per-filesystem lock (XFS iolock for the initial implementation)
ensures not FL_LAYOUT leases granted when we would need to recall them.

Also included are changes that allow multiple outstanding read
leases of different types on the same file as long as they have a
differnt owner.  This wasn't a problem until now as nfsd never set
FL_LEASE leases, and no one else used FL_DELEG leases, but given that
nfsd will also issues FL_LAYOUT leases we will have to handle it now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:38 +01:00
Christoph Hellwig 2ab99ee124 fs: track fl_owner for leases
Just like for other lock types we should allow different owners to have
a read lease on a file.  Currently this can't happen, but with the addition
of pNFS layout leases we'll need this feature.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:38 +01:00
Al Viro 15d0f5ea34 Make super_blocks and sb_lock static
The only user outside of fs/super.c is gone now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-02-02 10:07:59 -07:00
J. Bruce Fields a584143b01 Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20
Christoph's block pnfs patches have some minor dependencies on these
lock patches.
2015-02-02 11:29:29 -05:00
Dave Chinner 179073620d Merge branch 'xfs-ioctl-setattr-cleanup' into for-next 2015-02-02 10:57:30 +11:00
Iustin Pop 9b94fcc398 xfs: fix behaviour of XFS_IOC_FSSETXATTR on directories
Currently, the ioctl handling code for XFS_IOC_FSSETXATTR treats all
targets as regular files: it refuses to change the extent size if
extents are allocated. This is wrong for directories, as there the
extent size is only used as a default for children.

The patch fixes this issue and improves validation of flag
combinations:

- only disallow extent size changes after extents have been allocated
  for regular files
- only allow XFS_XFLAG_EXTSIZE for regular files
- only allow XFS_XFLAG_EXTSZINHERIT for directories
- automatically clear the flags if the extent size is zero

Thanks to Dave Chinner for guidance on the proper fix for this issue.

[dchinner: ported changes onto cleanup series. Makes changes clear
	   and obvious.]
[dchinner: added comments documenting validity checking rules.]

Signed-off-by: Iustin Pop <iustin@k1024.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:26:26 +11:00
Dave Chinner 23bd0735cf xfs: factor projid hint checking out of xfs_ioctl_setattr
The project ID change checking is one of the few remaining open
coded checks in xfs_ioctl_setattr(). Factor it into a helper
function so that the setattr code mostly becomes a flow of check
and action helpers, making it easier to read and follow.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:22:53 +11:00
Dave Chinner d4388d3c09 xfs: factor extsize hint checking out of xfs_ioctl_setattr
The extent size hint change checking is fairly complex, so isolate
that into it's own function. This simplifies the logic flow of the
setattr code, making it easier to read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:22:20 +11:00
Dave Chinner 41c145271d xfs: XFS_IOCTL_SETXATTR can run in user namespaces
Currently XFS_IOCTL_SETXATTR will fail if run in a user namespace as
it it not allowed to change project IDs. The current code, however,
also prevents any other change being made as well, so things like
extent size hints cannot be set in user namespaces. This is wrong,
so only disallow access to project IDs and related flags from inside
the init namespace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:17:51 +11:00
Dave Chinner fd179b9c3b xfs: kill xfs_ioctl_setattr behaviour mask
Now there is only one caller to xfs_ioctl_setattr that uses all the
functionality of the function we can kill the behviour mask and
start cleaning up the code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:16:25 +11:00
Dave Chinner f96291f6a3 xfs: disaggregate xfs_ioctl_setattr
xfs_ioctl_setxflags doesn't need all of the functionailty in
xfs_ioctl_setattr() and now we have separate helper functions that
share the checks and modifications that xfs_ioctl_setxflags
requires. Hence disaggregate it from xfs_ioctl_setattr() to allow
further work to be done on xfs_ioctl_setattr.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:15:56 +11:00
Dave Chinner 8f3d17ab06 xfs: factor out xfs_ioctl_setattr transaciton preamble
The setup of the transaction is done after a random smattering of
checks and before another bunch of ioperations specific
validity checks. Pull all the preamble out into a helper function
that returns a transaction or error.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:15:35 +11:00
Dave Chinner 29a17c00d4 xfs: separate xflags from xfs_ioctl_setattr
The setting of the extended flags is down through two separate
interfaces, but they are munged together into xfs_ioctl_setattr
and make that function far more complex than it needs to be.
Separate it out into a helper function along with all the other
common inode changes and transaction manipulations in
xfs_ioctl_setattr().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:14:25 +11:00
Dave Chinner 817b6c480e xfs: FSX_NONBLOCK is not used
It is set if the filp is set ot non-blocking, but the flag is not
used anywhere. Hence we can kill it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:14:04 +11:00
Dave Chinner 3fd1b0d158 Merge branch 'xfs-misc-fixes-for-3.20-3' into for-next 2015-02-02 10:03:18 +11:00
Christoph Hellwig 2ba6623702 xfs: don't allocate an ioend for direct I/O completions
Back in the days when the direct I/O ->end_io callback could be called
from interrupt context for AIO we needed a structure to hand off to the
workqueue, and reused the ioend structure for this purpose.  These days
->end_io is always called from user or workqueue context, which allows us
to avoid this memory allocation and simplify the code significantly.

[dchinner: removed now unused xfs_finish_ioend_sync() function after
	   Brian Foster did an initial review. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 10:02:09 +11:00
Wang, Yalin f3d215526e xfs: change kmem_free to use generic kvfree()
Change kmem_free to use kvfree() generic function, remove the
duplicated code.

Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 09:54:18 +11:00
Christoph Hellwig 8add71ca3f xfs: factor out a xfs_update_prealloc_flags() helper
This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and
we'll need another copy of it for pNFS block support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02 09:53:56 +11:00
Dan Carpenter c7c545d4a3 NFS: a couple off by ones
These tests are off by one because if len == sizeof(nfs_export_path)
then we have truncated the name.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-30 20:43:30 -05:00
Omar Sandoval 3a7ed3fff3 nfs: prevent truncate on active swapfile
Most filesystems prevent truncation of an active swapfile by way of
inode_newsize_ok, called from inode_change_ok. NFS doesn't call either
from nfs_setattr, presumably because most of these checks are expected
to be done server-side. However, the IS_SWAPFILE check can only be done
client-side, and truncating a swapfile can't possibly be good.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-30 20:43:29 -05:00
Jeff Layton 6ffa30d3f7 nfs: don't call blocking operations while !TASK_RUNNING
Bruce reported seeing this warning pop when mounting using v4.1:

     ------------[ cut here ]------------
     WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
    do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90
    Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
    CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
     0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
     0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
     ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
    Call Trace:
     [<ffffffff8186ac78>] dump_stack+0x4c/0x65
     [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
     [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70
     [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
     [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
     [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0
     [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430
     [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130
     [<ffffffff810d941e>] groups_alloc+0x3e/0x130
     [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc]
     [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc]
     [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc]
     [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc]
     [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
     [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0
     [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
     [<ffffffff810d45bf>] kthread+0x11f/0x140
     [<ffffffff810ea815>] ? local_clock+0x15/0x30
     [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
     [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0
     [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
    ---[ end trace 675220a11e30f4f2 ]---

nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
which is just wrong. Fix that by finishing the wait immediately if we've
found that the list has something on it.

Also, we don't expect this kthread to accept signals, so we should be
using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
hung task warnings from the watchdog, so have the schedule_timeout
wake up every 60s if there's no callback activity.

Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-30 20:39:50 -05:00
Linus Torvalds bc208e0ee0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "We have one more fix for btrfs in my for-linus branch - this was a bug
  in the new raid5/6 scrubbing support"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix raid56 scrub failed in xfstests btrfs/072
2015-01-30 14:25:52 -08:00
Linus Torvalds 92ef9ce301 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota and UDF fix from Jan Kara:
 "A fix for UDF to properly free preallocated blocks and a fix for quota
  so that Q_GETQUOTA quotactl reports correct numbers for XFS filesystem
  (and similarly Q_XGETQUOTA quotactl works properly for other
  filesystems)"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units
  udf: Release preallocation on last writeable close
2015-01-30 13:46:04 -08:00
Jan Kara b10a08194c quota: Store maximum space limit in bytes
Currently maximum space limit quota format supports is in blocks however
since we store space limits in bytes, this is somewhat confusing. So
store the maximum limit in bytes as well. Also rename the field to match
the new unit and related inode field to match the new naming scheme.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:51:21 +01:00
Jan Kara aaa3daed15 quota: Remove quota_on_meta callback
There are no more users for quota_on_meta callback. Just remove it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:51:10 +01:00
Jan Kara 664dbd5fe5 ocfs2: Use generic helpers for quotaon and quotaoff
Ocfs2 can just use the generic helpers provided by quota code for
turning quotas on and off when quota files are stored as system inodes.
The only difference is the feature test in ocfs2_quota_on() and that is
covered by dquot_quota_enable() checking whether usage tracking is
enabled (which can happen only if the filesystem has the quota feature
set).

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:50:58 +01:00
Jan Kara 1fa5efe362 ext4: Use generic helpers for quotaon and quotaoff
Ext4 can just use the generic helpers provided by quota code for turning
quotas on and off when quota files are stored as system inodes. The only
difference is the feature test in ext4_quota_on_sysfile() but the same
is achieved in dquot_quota_enable() by checking whether usage tracking
for the corresponding quota type is enabled (which can happen only if
quota feature is set).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:50:42 +01:00
Jan Kara 3e2af67e66 quota: Add ->quota_{enable,disable} callbacks for VFS quotas
Add functions which translate ->quota_enable / ->quota_disable calls
into appropriate changes in VFS quota. This will enable filesystems
supporting VFS quota files in system inodes to be controlled via
Q_XQUOTA[ON|OFF] quotactls for better userspace compatibility.

Also provide a vector for quotactl using these functions which can be
used by filesystems with quota files stored in hidden system files.

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:50:32 +01:00
Jan Kara d3b8632485 quota: Wire up ->quota_{enable,disable} callbacks into Q_QUOTA{ON,OFF}
Make Q_QUOTAON / Q_QUOTAOFF quotactl call ->quota_enable /
->quota_disable callback when provided. To match current behavior of
ocfs2 & ext4 we make these quotactls turn on / off quota enforcement for
appropriate quota type.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:50:16 +01:00
Jan Kara 38e478c448 quota: Split ->set_xstate callback into two
Split ->set_xstate callback into two callbacks - one for turning quotas
on (->quota_enable) and one for turning quotas off (->quota_disable). That
way we don't have to pass quotactl command into the callback which seems
cleaner.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-30 12:49:40 +01:00
Jan Kara 1cd6b7be92 Merge branch 'for_linus' into for_next 2015-01-30 10:16:33 +01:00
Linus Torvalds 353a0c6fcc NFS client bugfixes for Linux 3.19
Highlights include:
 - Stable fix for a NFSv4.1 Oops on mount
 - Stable fix for an O_DIRECT deadlock condition
 - Fix an issue with submounted volumes and fake duplicate inode numbers
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUyqe0AAoJEGcL54qWCgDyIDcP/RZmf7SasgP+i1c4oWGMrqsn
 69DaPRjakbCEJcX00DIp3x2Ulq4uUgJ5v9WDEPkQBKe2lAg7uaQBTzY6x1erDs+3
 bBZFXd0SByfRD8M6UXQwxtbxksBfozItLNU7vbln/0BiIoabO8QVxNhX6fMfK7yt
 tl4Ik/Dr/o3Oe3pVR0wW+0eqEYcGGXzTpPdYoN470z1/DjV9w5rkm3HsPzoq3QHI
 CN3acBJXFrWdCMSJcyzjmeaWRO93FwLSwKS4KKB7hcFY1+pUgA9qhnA5W4GY/0/7
 1ovakR778CPD2YAaQuY9iJG561QixiTSRQHRWoI6LgawQ3JyBMt/eqtcfFI6Olxp
 Mjs+EMjf6hlrFUpBNZ/8dAROJYSU35sZ6y9C5Ch7ZGBhcgbiQCOVyKjNiBWRG8KX
 +xnufzs4tnSIScMHuPyGYpkXy/xEdRhbtm+NjVbvel7Hr37q0jlV2M0/4i54HJ8U
 UWMyHfGFnODKkFGByJc9w1XWSGcEZmRsCAsevGI9Yzf+FSTxrdkLWwdVFFOonHQw
 OfAlYmO29v4zdTbl1dCgU1pyqdDb47Js/07TQCLb2YcRiUfAuysUTFtBSkRDbwCf
 eYKnRvXahPqTB84TDM62HUCJ7y4D945wt+/6FvuYnDYgOeTARXGzAZ+jhv1i8UbJ
 8xX2xcdW6fe2DIfZM/9h
 =eGYe
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.19-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Stable fix for a NFSv4.1 Oops on mount
   - Stable fix for an O_DIRECT deadlock condition
   - Fix an issue with submounted volumes and fake duplicate inode
     numbers"

* tag 'nfs-for-3.19-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix use of nfs_attr_use_mounted_on_fileid()
  NFSv4.1: Fix an Oops in nfs41_walk_client_list
  nfs: fix dio deadlock when O_DIRECT flag is flipped
2015-01-29 15:18:12 -08:00
Ingo Molnar 3c01b74e81 * Move efivarfs from the misc filesystem section to pseudo filesystem,
since that's a more logical and accurate place - Leif Lindholm
 
  * Update efibootmgr URL in Kconfig help - Peter Jones
 
  * Improve accuracy of EFI guid function names - Borislav Petkov
 
  * Expose firmware platform size in sysfs for the benefit of EFI boot
    loader installers and other utilities - Steve McIntyre
 
  * Cleanup __init annotations for arm64/efi code - Ard Biesheuvel
 
  * Mark the UIE as unsupported for rtc-efi - Ard Biesheuvel
 
  * Fix memory leak in error code path of runtime map code - Dan Carpenter
 
  * Improve robustness of get_memory_map() by removing assumptions on the
    size of efi_memory_desc_t (which could change in future spec
    versions) and querying the firmware instead of guessing about the
    memmap size - Ard Biesheuvel
 
  * Remove superfluous guid unparse calls - Ivan Khoronzhuk
 
  * Delete unnecessary chosen@0 DT node FDT code since was duplicated
    from code in drivers/of and is entirely unnecessary - Leif Lindholm
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUv69oAAoJEC84WcCNIz1VEYgP/1b27WRfCXs4q/8FP+UheSDS
 nAFbGe9PjVPnxo5pA9VwPP6eNQ2zYiyNGEK1BlbQlFPZdSD1updIraA78CiF5iys
 iSYyG9xVIcTB23RZI8aJLnBXbosIUKPJZ3FORv1LPhI6Mz1rCpraEaaUlv67rUKr
 FLBG9cR7t9f/f+fJw6LOAAISGIG/4s0wQdA5/noaYkj5R5bICl2UTGtbwa0oNstb
 NUO93aKDgaG/VljpIEeG6XV96Ioz7cHjQsEaX8sTrvT0n7nPNIqSDjFJOqWKJOXl
 RsFrzyl8fFIbMuQatYv1f3efPvyH+iKOfHnHrvcjUNje0xhm7F0Bd86BkOw1a3JQ
 pNb0YUWecI0Z/8GSzN8X0JQ7cowa3wI15Z/Hfs03odTXiM6VqwFAhuz/s5DEUdKS
 U+rOPjU0ezt3G4oBB/VGgF9w5JWKfsMcsHgmLX9P+JYzKFrxggo1SXAtXUeRAqQp
 agKmUB+k6Y1baQO8efkoM7rKL2F0q1SR9QiK+16BHCCkevD23v7IFGrHm2r1xKil
 kvWlY4MkRVa4KGPxEFEDVty0HjXxImwYsxTaYVHTS7SMeoP41f6koHKB19NaB3No
 5fqn/rT1KcJuhQj/I+vAixIX4WMJkX/MQVbtKfqSaKlAiRg3eRY6ONYr0jOglfF6
 gaMuvmDd0HlV6UJvH/9L
 =iPpM
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi

Pull EFI updates from Matt Fleming:

" - Move efivarfs from the misc filesystem section to pseudo filesystem,
    since that's a more logical and accurate place - Leif Lindholm

  - Update efibootmgr URL in Kconfig help - Peter Jones

  - Improve accuracy of EFI guid function names - Borislav Petkov

  - Expose firmware platform size in sysfs for the benefit of EFI boot
    loader installers and other utilities - Steve McIntyre

  - Cleanup __init annotations for arm64/efi code - Ard Biesheuvel

  - Mark the UIE as unsupported for rtc-efi - Ard Biesheuvel

  - Fix memory leak in error code path of runtime map code - Dan Carpenter

  - Improve robustness of get_memory_map() by removing assumptions on the
    size of efi_memory_desc_t (which could change in future spec
    versions) and querying the firmware instead of guessing about the
    memmap size - Ard Biesheuvel

  - Remove superfluous guid unparse calls - Ivan Khoronzhuk

  - Delete unnecessary chosen@0 DT node FDT code since was duplicated
    from code in drivers/of and is entirely unnecessary - Leif Lindholm

   There's nothing super scary, mainly cleanups, and a merge from Ricardo who
   kindly picked up some patches from the linux-efi mailing list while I
   was out on annual leave in December.

   Perhaps the biggest risk is the get_memory_map() change from Ard, which
   changes the way that both the arm64 and x86 EFI boot stub build the
   early memory map. It would be good to have it bake in linux-next for a
   while.
"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-29 19:16:40 +01:00
Christoph Hellwig dbe4e192a2 fs: add vfs_iter_{read,write} helpers
Simple helpers that pass an arbitrary iov_iter to filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29 00:13:13 -05:00
Al Viro 05afcb77eb new helper: iov_iter_bvec()
similar to iov_iter_kvec(), for ITER_BVEC ones

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29 00:13:11 -05:00
Artem Bityutskiy fb4325a3d9 UBIFS: add a couple of extra asserts
... to catch possible memory corruptions.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-01-28 16:09:32 +01:00
Subodh Nijsure fee1756d80 UBIFS: add ubifs_err() to print error reason
This patch adds ubifs_err() output to some error paths to tell the user
what's going on.

Artem: improve the messages, rename too long variable

Signed-off-by: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Acked-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Terry Wilcox <terry.wilcox@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-01-28 16:09:01 +01:00
Subodh Nijsure d7f0b70d30 UBIFS: Add security.* XATTR support for the UBIFS
Artem: rename static functions so that they do not use the "ubifs_" prefix - we
       only use this prefix for non-static functions.
Artem: remove few junk white-space changes in file.c

Signed-off-by: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Acked-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Terry Wilcox <terry.wilcox@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-01-28 16:08:54 +01:00
Subodh Nijsure 895d9db253 UBIFS: Add xattr support for symlinks
Artem: rename the __ubifs_setxattr() functions to just 'setxattr()'.

Signed-off-by: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Acked-by: Terry Wilcox <terry.wilcox@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-01-28 16:08:46 +01:00
Jan Kara 14bf61ffe6 quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
tracks space limits and usage in 512-byte blocks. However VFS quotas
track usage in bytes (as some filesystems require that) and we need to
somehow pass this information. Upto now it wasn't a problem because we
didn't do any unit conversion (thus VFS quota routines happily stuck
number of bytes into d_bcount field of struct fd_disk_quota). Only if
you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
/ Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
tried this but reportedly some Samba users hit the problem in practice.
So when we want interfaces compatible we need to fix this.

We bite the bullet and define another quota structure used for passing
information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
to have more conversion routines in fs/quota/quota.c and another copying
of quota structure slows down getting of quota information by about 2%
but it seems cleaner than overloading e.g. units of d_bcount to bytes.

CC: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-28 09:01:40 +01:00
Jan Kara b07ef35244 udf: Release preallocation on last writeable close
Commit 6fb1ca92a6 "udf: Fix race between write(2) and close(2)"
changed the condition when preallocation is released. The idea was that
we don't want to release the preallocation for an inode on close when
there are other writeable file descriptors for the inode. However the
condition was written in the opposite way so we released preallocation
only if there were other writeable file descriptors. Fix the problem by
changing the condition properly.

CC: stable@vger.kernel.org
Fixes: 6fb1ca92a6
Reported-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-28 09:00:40 +01:00
David S. Miller 95f873f2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 16:59:56 -08:00
Gui Hecheng 063c54dccd btrfs: fix raid56 scrub failed in xfstests btrfs/072
The xfstests btrfs/072 reports uncorrectable read errors in dmesg,
because scrub forgets to use commit_root for parity scrub routine
and scrub attempts to scrub those extents items whose contents are
not fully on disk.

To fix it, we just add the @search_commit_root flag back.

Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-27 15:26:16 -08:00
Niklas Cassel 7a1ceba071 cifs: fix MUST SecurityFlags filtering
If CONFIG_CIFS_WEAK_PW_HASH is not set, CIFSSEC_MUST_LANMAN
and CIFSSEC_MUST_PLNTXT is defined as 0.

When setting new SecurityFlags without any MUST flags,
your flags would be overwritten with CIFSSEC_MUST_LANMAN (0).

Signed-off-by: Niklas Cassel <niklass@axis.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-01-26 19:38:26 -06:00
Bob Peterson 45094a58b1 GFS2: Eliminate a nonsense goto
This patch just removes a goto that did nothing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2015-01-26 20:58:54 +00:00
Al Viro 87b95ce096 switch the IO-triggering parts of umount to fs_pin
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:17:29 -05:00
Al Viro 59eda0e07f new fs_pin killing logics
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:17:28 -05:00
Al Viro fdab684d72 allow attaching fs_pin to a group not associated with some superblock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:17:28 -05:00
Linus Torvalds 360f54796e dcache: let the dentry count go down to zero without taking d_lock
We can be more aggressive about this, if we are clever and careful. This is subtle.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:29 -05:00
Al Viro 32426f6653 pull bumping refcount into ->kill()
there will be one more change of ->kill() calling conventions; this
isn't final.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:29 -05:00
Al Viro 9e251d0204 kill pin_put()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:28 -05:00
Al Viro d443b9fd56 gut proc_register() a bit
There are only 3 callers and quite a bit of that thing is executed
exactly in one of those.  Just lift it there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:26 -05:00
Al Viro d6cb125b99 kill d_validate()
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:26 -05:00
Al Viro 5e993e2534 ncpfs: get rid of d_validate() nonsense
What we want is to have non-counting references to children in
pagecache of parent directory, and avoid picking them after a child
has been freed.  Fine, so let's just have ->d_prune() clear
parent's inode "has directory contents in page cache" flag.
That way we don't need ->d_fsdata for storing offsets, so we can
use it as a quick and dirty "is it referenced from page cache"
flag.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:26 -05:00
Linus Torvalds 80a755545d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A couple of fixes - deadlock in CIFS and build breakage in cris serial
  driver (resurfaced f_dentry in there)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Convert file->f_dentry->d_inode to file_inode()
  fix deadlock in cifs_ioctl_clone()
2015-01-25 17:27:18 -08:00
Al Viro 77b3da6e32 new primitive: debugfs_create_automount()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:53 -05:00
Al Viro 5233e31191 debugfs: split end_creating() into success and failure cases
... and don't bother with dput(dentry) in the former and with
dget(dentry) preceding all its calls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:53 -05:00
Al Viro edac65eaf8 debugfs: take mode-dependent parts of debugfs_get_inode() into callers
... and trim the arguments list

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:53 -05:00
Al Viro 680b302409 fold debugfs_mknod() into callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:52 -05:00
Al Viro 3473cde565 fold debugfs_create() into caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:52 -05:00
Al Viro 02538a75ba fold debugfs_mkdir() into caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:51 -05:00
Al Viro 160f7592f2 debugfs_mknod(): get rid useless arguments
dev is always zero, dir was only used to get its ->i_sb, which is
equal to ->d_sb of dentry...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:51 -05:00
Al Viro 9b73fab01b fold debugfs_link() into caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:50 -05:00
Al Viro ad5abd5ba8 debugfs: kill __create_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 16:52:31 -05:00
Al Viro 190afd81e4 debugfs: split the beginning and the end of __create_file() off
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 15:23:31 -05:00
Al Viro e09ddf36dd debugfs_{mkdir,create,link}(): get rid of redundant argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 15:23:31 -05:00
Trond Myklebust cf6726e2ee NFSv4: Deal with atomic upgrades of an existing delegation
Ensure that we deal correctly with the case where the server sends us a
newer instance of the same delegation. If the stateids match, but the
sequence numbers differ, then treat the new delegation as if it were
an atomic upgrade.

Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
2015-01-24 18:46:51 -05:00
Trond Myklebust 89f0ff386c NFSv4.1: Replace usage of nfs_client->cl_addr in encode_create_session
Replace the current code with something that is a little closer to what
net/sunrpc/auth_unix.c uses.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:50 -05:00
Trond Myklebust 40dd4b7aee NFSv4.1: Optimise layout return-on-close
Optimise the layout return on close code by ensuring that

1) Add a check for whether we hold a layout before taking any spinlocks
2) Only take the spin lock once
3) Use nfs_state->state to speed up open file checks

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:48 -05:00
Trond Myklebust b4019c0e21 NFSv4.1: Allow parallel LOCK/LOCKU calls
Note, however, that we still serialise on the open stateid if the lock
stateid is unconfirmed. Hopefully that will not prove too much of a
burden for first time locks; it should leave the ability to parallelise
OPENs unchanged, since they no longer call the serialisation primitives.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:48 -05:00
Trond Myklebust c69899a17c NFSv4: Update of VFS byte range lock must be atomic with the stateid update
Ensure that we test the lock stateid remained unchanged while we were
updating the VFS tracking of the byte range lock. Have the process
replay the lock to the server if we detect that was not the case.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:47 -05:00
Trond Myklebust 425c1d4e5b NFSv4: Fix lock on-wire reordering issues
This patch ensures that the server cannot reorder our LOCK/LOCKU
requests if they are sent in parallel on the wire.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:47 -05:00
Trond Myklebust 6b447539aa NFSv4: Always do open_to_lock_owner if the lock stateid is uninitialised
The original text in RFC3530 was terribly confusing since it conflated
lockowners and lock stateids. RFC3530bis clarifies that you must use
open_to_lock_owner when there is no lock state for that file+lockowner
combination.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:46 -05:00
Trond Myklebust 39071e6fff NFSv4: Fix atomicity problems with lock stateid updates
When we update the lock stateid, we really do need to ensure that this is
done under the state->state_lock, and that we are indeed only updating
confirmed locks with a newer version of the same stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 17:27:41 -05:00
Trond Myklebust 63f5f796af NFSv4.1: Allow parallel OPEN/OPEN_DOWNGRADE/CLOSE
Remove the serialisation of OPEN/OPEN_DOWNGRADE and CLOSE calls for the
case of NFSv4.1 and newer.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-23 23:06:45 -05:00
Trond Myklebust a67964197c NFSv4: Check for NULL argument in nfs_*_seqid() functions
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-23 23:06:45 -05:00
Trond Myklebust badc76dd0d NFSv4: Convert nfs_alloc_seqid() to return an ERR_PTR() if allocation fails
When we relax the sequencing on the NFSv4.1 OPEN/CLOSE code, we will want
to use the value NULL to indicate that no sequencing is needed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-23 23:06:44 -05:00
Trond Myklebust f95549cf24 NFSv4: More CLOSE/OPEN races
If an OPEN RPC call races with a CLOSE or OPEN_DOWNGRADE so that it
updates the nfs_state structure before the CLOSE/OPEN_DOWNGRADE has
a chance to do so, then we know that the state->flags need to be
recalculated from scratch.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-23 23:06:44 -05:00
Linus Torvalds c4e00f1d31 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We have a few fixes in my for-linus branch.

  Qu Wenruo's batch fix a regression between some our merge window pull
  and the inode_cache feature.  The rest are smaller bugs"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
  btrfs: Fix the bug that fs_info->pending_changes is never cleared.
  btrfs: fix state->private cast on 32 bit machines
  Btrfs: fix race deleting block group from space_info->ro_bgs list
  Btrfs: fix incorrect freeing in scrub_stripe
  btrfs: sync ioctl, handle errors after transaction start
2015-01-24 14:31:27 +12:00
Trond Myklebust 566fcec60b NFSv4: Fix an atomicity problem in CLOSE
If we are to remove the serialisation of OPEN/CLOSE, then we need to
ensure that the stateid sent as part of a CLOSE operation does not
change after we test the state in nfs4_close_prepare.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-23 19:22:39 -05:00
Christoph Hellwig 4c94e13e9c nfsd: factor out a helper to decode nfstime4 values
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23 10:29:13 -05:00
Jeff Layton 3c5199143b sunrpc/lockd: fix references to the BKL
The BKL is completely out of the picture in the lockd and sunrpc code
these days. Update the antiquated comments that refer to it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23 10:29:12 -05:00
J. Bruce Fields bbc7f33ac6 nfsd: fix year-2038 nfs4 state problem
Someone with a weird time_t happened to notice this, it shouldn't really
manifest till 2038.  It may not be our ownly year-2038 problem.

Reported-by: Aaron Pace <Aaron.Pace@alcatel-lucent.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23 10:29:11 -05:00
Paul Moore 55422d0bd2 audit: replace getname()/putname() hacks with reference counters
In order to ensure that filenames are not released before the audit
subsystem is done with the strings there are a number of hacks built
into the fs and audit subsystems around getname() and putname().  To
say these hacks are "ugly" would be kind.

This patch removes the filename hackery in favor of a more
conventional reference count based approach.  The diffstat below tells
most of the story; lots of audit/fs specific code is replaced with a
traditional reference count based approach that is easily understood,
even by those not familiar with the audit and/or fs subsystems.

CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-23 00:23:58 -05:00
Paul Moore fd3522fdc8 audit: enable filename recording via getname_kernel()
Enable recording of filenames in getname_kernel() and remove the
kludgy workaround in __audit_inode() now that we have proper filename
logging for kernel users.

CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-23 00:23:52 -05:00
Al Viro cbaab2db91 simpler calling conventions for filename_mountpoint()
a) make it accept ERR_PTR() as filename (and return its PTR_ERR() in that case)
b) make it putname() the sucker in the end otherwise

simplifies life for callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-23 00:22:21 -05:00
Paul Moore 5168910413 fs: create proper filename objects using getname_kernel()
There are several areas in the kernel that create temporary filename
objects using the following pattern:

	int func(const char *name)
	{
		struct filename *file = { .name = name };
		...
		return 0;
	}

... which for the most part works okay, but it causes havoc within the
audit subsystem as the filename object does not persist beyond the
lifetime of the function.  This patch converts all of these temporary
filename objects into proper filename objects using getname_kernel()
and putname() which ensure that the filename object persists until the
audit subsystem is finished with it.

Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
for helping resolve a difficult kernel panic on boot related to a
use-after-free problem in kern_path_create(); the thread can be seen
at the link below:

 * https://lkml.org/lkml/2015/1/20/710

This patch includes code that was either based on, or directly written
by Al in the above thread.

CC: viro@zeniv.linux.org.uk
CC: linux@roeck-us.net
CC: sd@queasysnail.net
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-23 00:22:20 -05:00
Paul Moore 0851854972 fs: rework getname_kernel to handle up to PATH_MAX sized filenames
In preparation for expanded use in the kernel, make getname_kernel()
more useful by allowing it to handle any legal filename length.

Thanks to Guenter Roeck for his suggestion to substitute memcpy() for
strlcpy().

CC: linux@roeck-us.net
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-23 00:22:20 -05:00
Al Viro fa14a0b8d2 cut down the number of do_path_lookup() callers
... and don't bother with new struct filename when we already have one

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-23 00:22:19 -05:00
Jens Axboe b520252aa2 fs: make inode_to_bdi() handle NULL inode
Running a heavy fs workload, I ran into a situation where we pass
down a page for writeback/swap that doesn't have an inode mapping:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffff8119589f>] inode_to_bdi+0xf/0x50
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: wl(O) tun cfg80211 btusb joydev hid_apple hid_generic usbhid hid bcm5974 usb_storage nouveau snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_codec_generic x86_pkg_temp_thermal snd_hda_intel kvm_intel snd_hda_controller snd_hda_codec kvm snd_hwdep snd_pcm applesmc input_polldev snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_timer snd_seq_device snd xhci_pci xhci_hcd ttm thunderbolt soundcore apple_gmux apple_bl bluetooth binfmt_misc fuse nls_iso8859_1 nls_cp437 vfat fat [last unloaded: wl]
CPU: 4 PID: 50 Comm: kswapd0 Tainted: G     U     O   3.19.0-rc5+ #60
Hardware name: Apple Inc. MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS MBP112.88Z.0138.B02.1310181745 10/18/2013
task: ffff880462e917f0 ti: ffff880462edc000 task.ti: ffff880462edc000
RIP: 0010:[<ffffffff8119589f>]  [<ffffffff8119589f>] inode_to_bdi+0xf/0x50
RSP: 0000:ffff880462edf8e8  EFLAGS: 00010282
RAX: ffffffff81c4cd80 RBX: ffffea0001b3abc0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880462edf8f8 R08: 00000000001e8500 R09: ffff880460f7cb68
R10: ffff880462edfa00 R11: 0000000000000101 R12: 0000000000000000
R13: ffffffff81c4cd98 R14: 0000000000000000 R15: ffff880460f7c9c0
FS:  0000000000000000(0000) GS:ffff88047f300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 00000002b6341000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffffea0001b3abc0 ffffffff81c4cd80 ffff880462edf948 ffffffff811244aa
 ffffffff811565b0 ffff880460f7c9c0 ffff880462edf948 ffffea0001b3abc0
 0000000000000001 ffff880462edfb40 ffff880008b999c0 ffff880460f7c9c0
Call Trace:
 [<ffffffff811244aa>] __test_set_page_writeback+0x3a/0x170
 [<ffffffff811565b0>] ? SyS_madvise+0x790/0x790
 [<ffffffff81156bb6>] __swap_writepage+0x216/0x280
 [<ffffffff8133d592>] ? radix_tree_insert+0x32/0xe0
 [<ffffffff81157741>] ? swap_info_get+0x61/0xf0
 [<ffffffff81159bfc>] ? page_swapcount+0x4c/0x60
 [<ffffffff81156c4d>] swap_writepage+0x2d/0x50
 [<ffffffff81131658>] shmem_writepage+0x198/0x2c0
 [<ffffffff8112cae4>] shrink_page_list+0x464/0xa00
 [<ffffffff8112d666>] shrink_inactive_list+0x266/0x500
 [<ffffffff8112e215>] shrink_lruvec+0x5d5/0x720
 [<ffffffff8112e3bb>] shrink_zone+0x5b/0x190
 [<ffffffff8112ee3f>] kswapd+0x48f/0x8d0
 [<ffffffff8112e9b0>] ? try_to_free_pages+0x4c0/0x4c0
 [<ffffffff81067be2>] kthread+0xd2/0xf0
 [<ffffffff81060000>] ? workqueue_congested+0x30/0x80
 [<ffffffff81067b10>] ? kthread_create_on_node+0x180/0x180
 [<ffffffff816b556c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81067b10>] ? kthread_create_on_node+0x180/0x180
Code: 00 48 c7 c7 8d 8d a4 81 e8 3f 62 eb ff e9 fc fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 5f 28 48 89 df e8 15 f8 00 00 85 c0 75 11 48 8b 83 d8 00
RIP  [<ffffffff8119589f>] inode_to_bdi+0xf/0x50
 RSP <ffff880462edf8e8>
CR2: 0000000000000028
---[ end trace eb0e21aa7dad3ddf ]---

Handle this in inode_to_bdi() by punting it to noop_backing_dev_info,
if mapping->host is NULL.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-22 08:13:17 -07:00
Jeff Layton 8116bf4cb6 locks: update comments that refer to inode->i_flock
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-21 20:44:01 -05:00
Brian Foster 4d949021aa xfs: remove incorrect error negation in attr_multi ioctl
xfs_compat_attrmulti_by_handle() calls memdup_user() which returns a
negative error code. The error code is negated by the caller and thus
incorrectly converted to a positive error code.

Remove the error negation such that the negative error is passed
correctly back up to userspace.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 10:04:24 +11:00
Dave Chinner 438c3c8d2b Merge branch 'xfs-buf-type-fixes' into for-next 2015-01-22 09:51:30 +11:00
Dave Chinner 3443a3bca5 xfs: set superblock buffer type correctly
When the superblock is modified in a transaction, the commonly
modified fields are not actually copied to the superblock buffer to
avoid the buffer lock becoming a serialisation point. However, there
are some other operations that modify the superblock fields within
the transaction that don't directly log to the superblock but rely
on the changes to be applied during the transaction commit (to
minimise the buffer lock hold time).

When we do this, we fail to mark the buffer log item as being a
superblock buffer and that can lead to the buffer not being marked
with the corect type in the log and hence causing recovery issues.
Fix it by setting the type correctly, similar to xfs_mod_sb()...

cc: <stable@vger.kernel.org> # 3.10 to current
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:30:23 +11:00
Dave Chinner fe22d552b8 xfs: set buf types when converting extent formats
Conversion from local to extent format does not set the buffer type
correctly on the new extent buffer when a symlink data is moved out
of line.

Fix the symlink code and leave a comment in the generic bmap code
reminding us that the format-specific data copy needs to set the
destination buffer type appropriately.

cc: <stable@vger.kernel.org> # 3.10 to current
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:30:06 +11:00
Dave Chinner f19b872b08 xfs: inode unlink does not set AGI buffer type
This leads to log recovery throwing errors like:

XFS (md0): Mounting V5 Filesystem
XFS (md0): Starting recovery (logdev: internal)
XFS (md0): Unknown buffer type 0!
XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
ffff8800ffc53800: 58 41 47 49 .....

Which is the AGI buffer magic number.

Ensure that we set the type appropriately in both unlink list
addition and removal.

cc: <stable@vger.kernel.org> # 3.10 to current
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:29:40 +11:00
Dave Chinner 0d612fb570 xfs: ensure buffer types are set correctly
Jan Kara reported that log recovery was finding buffers with invalid
types in them. This should not happen, and indicates a bug in the
logging of buffers. To catch this, add asserts to the buffer
formatting code to ensure that the buffer type is in range when the
transaction is committed.

We don't set a type on buffers being marked stale - they are not
going to get replayed, the format item exists only for recovery to
be able to prevent replay of the buffer, so the type does not
matter. Hence that needs special casing here.

cc: <stable@vger.kernel.org> # 3.10 to current
Reported-by: Jan Kara <jack@suse.cz>
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:29:05 +11:00
Dave Chinner 465e2def7c Merge branch 'xfs-sb-logging-rework' into for-next
Conflicts:
	fs/xfs/xfs_mount.c
2015-01-22 09:20:53 +11:00
Anna Schumaker 2ef47eb1ae NFS: Fix use of nfs_attr_use_mounted_on_fileid()
This function call was being optimized out during nfs_fhget(), leading
to situations where we have a valid fileid but still want to use the
mounted_on_fileid.  For example, imagine we have our server configured
like this:

server % df
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.1G  6.5G  1.9G  78% /
/dev/vdb1       487M  2.3M  456M   1% /exports
/dev/vdc1       487M  2.3M  456M   1% /exports/vol1
/dev/vdd1       487M  2.3M  456M   1% /exports/vol2

If our client mounts /exports and tries to do a "chown -R" across the
entire mountpoint, we will get a nasty message warning us about a circular
directory structure.  Running chown with strace tells me that each directory
has the same device and inode number:

newfstatat(AT_FDCWD, "/nfs/", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
newfstatat(4, "vol1", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0
newfstatat(4, "vol2", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0

With this patch the mounted_on_fileid values are used for st_ino, so the
directory loop warning isn't reported.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-21 17:15:41 -05:00
Dave Chinner 074e427ba7 xfs: sanitise sb_bad_features2 handling
We currently have to ensure that every time we update sb_features2
that we update sb_bad_features2. Now that we log and format the
superblock in it's entirety we actually don't have to care because
we can simply update the sb_bad_features2 when we format it into the
buffer. This removes the need for anything but the mount and
superblock formatting code to care about sb_bad_features2, and
hence removes the possibility that we forget to update bad_features2
when necessary in the future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:10:33 +11:00
Dave Chinner 61e63ecb57 xfs: consolidate superblock logging functions
We now have several superblock loggin functions that are identical
except for the transaction reservation and whether it shoul dbe a
synchronous transaction or not. Consolidate these all into a single
function, a single reserveration and a sync flag and call it
xfs_sync_sb().

Also, xfs_mod_sb() is not really a modification function - it's the
operation of logging the superblock buffer. hence change the name of
it to reflect this.

Note that we have to change the mp->m_update_flags that are passed
around at mount time to a boolean simply to indicate a superblock
update is needed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:10:31 +11:00
Dave Chinner 4d11a40239 xfs: remove bitfield based superblock updates
When we log changes to the superblock, we first have to write them
to the on-disk buffer, and then log that. Right now we have a
complex bitfield based arrangement to only write the modified field
to the buffer before we log it.

This used to be necessary as a performance optimisation because we
logged the superblock buffer in every extent or inode allocation or
freeing, and so performance was extremely important. We haven't done
this for years, however, ever since the lazy superblock counters
pulled the superblock logging out of the transaction commit
fast path.

Hence we have a bunch of complexity that is not necessary that makes
writing the in-core superblock to disk much more complex than it
needs to be. We only need to log the superblock now during
management operations (e.g. during mount, unmount or quota control
operations) so it is not a performance critical path anymore.

As such, remove the complex field based logging mechanism and
replace it with a simple conversion function similar to what we use
for all other on-disk structures.

This means we always log the entirity of the superblock, but again
because we rarely modify the superblock this is not an issue for log
bandwidth or CPU time. Indeed, if we do log the superblock
frequently, delayed logging will minimise the impact of this
overhead.

[Fixed gquota/pquota inode sharing regression noticed by bfoster.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:10:26 +11:00
Trond Myklebust 3175e1dcec NFSv4.1: Fix an Oops in nfs41_walk_client_list
If we start state recovery on a client that failed to initialise correctly,
then we are very likely to Oops.

Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Link: http://lkml.kernel.org/r/130621862.279655.1421851650684.JavaMail.zimbra@desy.de
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-21 17:10:09 -05:00
Peng Tao ee8a1a8b16 nfs: fix dio deadlock when O_DIRECT flag is flipped
We only support swap file calling nfs_direct_IO. However, application
might be able to get to nfs_direct_IO if it toggles O_DIRECT flag
during IO and it can deadlock because we grab inode->i_mutex in
nfs_file_direct_write(). So return 0 for such case. Then the generic
layer will fall back to buffer IO.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-21 17:10:08 -05:00
Jan Kara a39427007e xfs: Remove some pointless quota checks
xfs_fs_get_xstate() and xfs_fs_get_xstatev() check whether there's quota
running before calling xfs_qm_scall_getqstat() or
xfs_qm_scall_getqstatv(). Thus we are certain that superblock supports
quota and xfs_sb_version_hasquota() check is pointless. Similarly we
know that when quota is running, mp->m_quotainfo will be allocated.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:33 +01:00
Jan Kara fbf64b3df3 xfs: Remove some useless flags tests
'flags' have XFS_ALL_QUOTA_ACCT cleared immediately on function entry.
There's no point in checking these bits later in the function. Also
because we check something is going to change, we know some enforcement
bits are being added and thus there's no point in testing that later.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:32 +01:00
Jan Kara 8a2fdd4a49 xfs: Remove useless test
Q_XQUOTARM is never passed to xfs_fs_set_xstate() so remove the test.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:32 +01:00
Jan Kara ca6cb0918e quota: Verify flags passed to Q_SETINFO
Currently flags passed via Q_SETINFO were just stored. This makes it
hard to add new flags since in theory userspace could be just setting /
clearing random flags. Since currently there is only one userspace
settable flag and that is somewhat obscure flags only for ancient v1
quota format, I'm reasonably sure noone operates these flags and
hopefully we are fine just adding the check that passed flags are sane.
If we indeed find some userspace program that gets broken by the strict
check, we can always remove it again.

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:31 +01:00
Jan Kara 9c45101e88 quota: Cleanup flags definitions
Currently all quota flags were defined just in kernel-private headers.
Export flags readable / writeable from userspace to userspace via
include/uapi/linux/quota.h.

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:30 +01:00
Jan Kara 96827adcc2 ocfs2: Move OLQF_CLEAN flag out of generic quota flags
OLQF_CLEAN flag is used by OCFS2 on disk to recognize whether quota
recovery is needed or not. We also somewhat abuse mem_dqinfo->dqi_flags
to pass this flag around. Use private flags for this to avoid clashes
with other quota flags / not pollute generic quota flag namespace.

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:30 +01:00
Jan Kara c119c5b974 quota: Don't store flags for v2 quota format
Currently, v2 quota format blindly stored flags from in-memory dqinfo on
disk, although there are no flags supported. Since it is stupid to store
flags which have no effect, just store 0 unconditionally and don't
bother loading it from disk.

Note that userspace could have stored some flags there via Q_SETINFO
quotactl and then later read them (although flags have no effect) but
I'm pretty sure noone does that (most definitely quota-tools don't and
quota interface doesn't have too much other users).

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-21 19:21:07 +01:00
Ingo Molnar f49028292c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

  - Documentation updates.

  - Miscellaneous fixes.

  - Preemptible-RCU fixes, including fixing an old bug in the
    interaction of RCU priority boosting and CPU hotplug.

  - SRCU updates.

  - RCU CPU stall-warning updates.

  - RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-21 06:12:21 +01:00
Qu Wenruo a53f4f8e9c btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Commit 6b5fe46dfa (btrfs: do commit in sync_fs if there are pending
changes) will call btrfs_start_transaction() in sync_fs(), to handle
some operations needed to be done in next transaction.

However this can cause deadlock if the filesystem is frozen, with the
following sys_r+w output:
[  143.255932] Call Trace:
[  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
[  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
[  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
[btrfs]
[  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
[  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
[  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
[  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
[  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
[  143.256111] Call Trace:
[  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
[  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
[  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
[  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
[  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
[  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
[  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17

The reason is like the following:
(Holding s_umount)
VFS sync_fs staff:
|- btrfs_sync_fs()
   |- btrfs_start_transaction()
      |- sb_start_intwrite()
      (Waiting thaw_fs to unfreeze)
					VFS thaw_fs staff:
					thaw_fs()
					(Waiting sync_fs to release
					 s_umount)

So deadlock happens.
This can be easily triggered by fstest/generic/068 with inode_cache
mount option.

The fix is to check if the fs is frozen, if the fs is frozen, just
return and waiting for the next transaction.

Cc: David Sterba <dsterba@suse.cz>
Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[enhanced comment, changed to SB_FREEZE_WRITE]
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-20 17:20:21 -08:00
Qu Wenruo 6c9fe14f9d btrfs: Fix the bug that fs_info->pending_changes is never cleared.
Fs_info->pending_changes is never cleared since the original code uses
cmpxchg(&fs_info->pending_changes, 0, 0), which will only clear it if
pending_changes is already 0.

This will cause a lot of problem when mount it with inode_cache mount
option.
If the btrfs is mounted as inode_cache, pending_changes will always be
1, even when the fs is frozen.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-20 17:19:40 -08:00
Christoph Hellwig df0ce26cb4 fs: remove default_backing_dev_info
Now that default_backing_dev_info is not used for writeback purposes we can
git rid of it easily:

 - instead of using it's name for tracing unregistered bdi we just use
   "unknown"
 - btrfs and ceph can just assign the default read ahead window themselves
   like several other filesystems already do.
 - we can assign noop_backing_dev_info as the default one in alloc_super.
   All filesystems already either assigned their own or
   noop_backing_dev_info.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:05:38 -07:00
Christoph Hellwig 7b14a21389 nfs: don't call bdi_unregister
bdi_destroy already does all the work, and if we delay freeing the
anon bdev we can get away with just that single call.

Addintionally remove the call during mount failure, as
deactivate_super_locked will already call ->kill_sb and clean up
the bdi for us.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:09 -07:00
Christoph Hellwig e4d2750909 ceph: remove call to bdi_unregister
bdi_destroy already does all the work, and if we delay freeing the
anon bdev we can get away with just that single call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:07 -07:00
Christoph Hellwig b83ae6d421 fs: remove mapping->backing_dev_info
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:05 -07:00
Christoph Hellwig de1414a654 fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case.  Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:04 -07:00
Christoph Hellwig 26ff13047e nilfs2: set up s_bdi like the generic mount_bdev code
mapping->backing_dev_info will go away, so don't rely on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:02 -07:00
Christoph Hellwig 495a276e1c block_dev: get bdev inode bdi directly from the block device
Directly grab the backing_dev_info from the request_queue instead of
detouring through the address_space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:03:01 -07:00
Christoph Hellwig 564f00f6c0 block_dev: only write bdev inode on close
Since 018a17bdc8 ("bdi: reimplement bdev_inode_switch_bdi()") the
block device code writes out all dirty data whenever switching the
backing_dev_info for a block device inode.  But a block device inode can
only be dirtied when it is in use, which means we only have to write it
out on the final blkdev_put, but not when doing a blkdev_get.

Factoring out the write out from the bdi list switch prepares from
removing the list switch later in the series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:02:59 -07:00
Christoph Hellwig b4caecd480 fs: introduce f_op->mmap_capabilities for nommu mmap support
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.

Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead.  Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device.  It also removes the need for
the mtd_inodefs filesystem.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:02:58 -07:00
Christoph Hellwig a7a2c680a2 fs: deduplicate noop_backing_dev_info
hugetlbfs, kernfs and dlmfs can simply use noop_backing_dev_info instead
of creating a local duplicate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20 14:02:54 -07:00
Sachin Prabhu ca7df8e0bb Complete oplock break jobs before closing file handle
Commit
c11f1df500
requires writers to wait for any pending oplock break handler to
complete before proceeding to write. This is done by waiting on bit
CIFS_INODE_PENDING_OPLOCK_BREAK in cifsFileInfo->flags. This bit is
cleared by the oplock break handler job queued on the workqueue once it
has completed handling the oplock break allowing writers to proceed with
writing to the file.

While testing, it was noticed that the filehandle could be closed while
there is a pending oplock break which results in the oplock break
handler on the cifsiod workqueue being cancelled before it has had a
chance to execute and clear the CIFS_INODE_PENDING_OPLOCK_BREAK bit.
Any subsequent attempt to write to this file hangs waiting for the
CIFS_INODE_PENDING_OPLOCK_BREAK bit to be cleared.

We fix this by ensuring that we also clear the bit
CIFS_INODE_PENDING_OPLOCK_BREAK when we remove the oplock break handler
from the workqueue.

The bug was found by Red Hat QA while testing using ltp's fsstress
command.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-01-19 20:20:46 -06:00
Giel van Schijndel f99dbfa4b3 cifs: use memzero_explicit to clear stack buffer
When leaving a function use memzero_explicit instead of memset(0) to
clear stack allocated buffers. memset(0) may be optimized away.

This particular buffer is highly likely to contain sensitive data which
we shouldn't leak (it's named 'passwd' after all).

Signed-off-by: Giel van Schijndel <me@mortis.eu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-at: http://www.viva64.com/en/b/0299/
Reported-by: Andrey Karpov
Reported-by: Svyatoslav Razmyslov
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-01-19 15:32:13 -06:00
Satoru Takeuchi 6e1103a6e9 btrfs: fix state->private cast on 32 bit machines
Suppress the following warning displayed on building 32bit (i686) kernel.

===============================================================================
...
   CC [M]  fs/btrfs/extent_io.o
fs/btrfs/extent_io.c: In function ‘btrfs_free_io_failure_record’:
fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of
different size [-Wint-to-pointer-cast]
    failrec = (struct io_failure_record *)state->private;
...
===============================================================================

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-19 13:06:06 -08:00
Filipe Manana 75c68e9fbb Btrfs: fix race deleting block group from space_info->ro_bgs list
When removing a block group we were deleting it from its space_info's
ro_bgs list without the correct protection - the space info's spinlock.
Fix this by doing the list delete while holding the spinlock of the
corresponding space info, which is the correct lock for any operation
on that list.

This issue was introduced in the 3.19 kernel by the following change:

    Btrfs: move read only block groups onto their own list V2
    commit 633c0aad4c

I ran into a kernel crash while a task was running statfs, which iterates
the space_info->ro_bgs list while holding the space info's spinlock,
and another task was deleting it from the same list, without holding that
spinlock, as part of the block group remove operation (while running the
function btrfs_remove_block_group). This happened often when running the
stress test xfstests/generic/038 I recently made.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-19 13:05:45 -08:00
Tsutomu Itoh 379d6854a2 Btrfs: fix incorrect freeing in scrub_stripe
The address that should be freed is not 'ppath' but 'path'.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Reviewed-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-19 13:05:44 -08:00
David Sterba 98bd5c547e btrfs: sync ioctl, handle errors after transaction start
The version merged to 3.19 did not handle errors from start_trancaction
and could pass an invalid pointer to commit_transaction.

Fixes: 6b5fe46dfa ("btrfs: do commit in sync_fs if there are pending changes")
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-19 13:05:44 -08:00
Al Viro 378ff1a53b fix deadlock in cifs_ioctl_clone()
It really needs to check that src is non-directory *and* use
{un,}lock_two_nodirectories().  As it is, it's trivial to cause
double-lock (ioctl(fd, CIFS_IOC_COPYCHUNK_FILE, fd)) and if the
last argument is an fd of directory, we are asking for trouble
by violating the locking order - all directories go before all
non-directories.  If the last argument is an fd of parent
directory, it has 50% odds of locking child before parent,
which will cause AB-BA deadlock if we race with unlink().

Cc: stable@vger.kernel.org @ 3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-18 23:49:26 -05:00
Johannes Berg 053c095a82 netlink: make nlmsg_end() and genlmsg_end() void
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 01:03:45 -05:00
alex chen a6b8978c54 pstore: Fix sprintf format specifier in pstore_dump()
We should use sprintf format specifier "%u" instead of "%d" for
argument of type 'unsigned int' in pstore_dump().

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-16 16:01:29 -08:00
Mark Salyzyn 9d5438f462 pstore: Add pmsg - user-space accessible pstore object
A secured user-space accessible pstore object. Writes
to /dev/pmsg0 are appended to the buffer, on reboot
the persistent contents are available in
/sys/fs/pstore/pmsg-ramoops-[ID].

One possible use is syslogd, or other daemon, can
write messages, then on reboot provides a means to
triage user-space activities leading up to a panic
as a companion to the pstore dmesg or console logs.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-16 16:01:10 -08:00
Mark Salyzyn f44f96528a pstore: Handle zero-sized prz in series
ramoops_pstore_read fails to return the next in a prz
series after first zero-sized entry, not venturing to
the next non-zero entry.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-16 15:47:23 -08:00
Mark Salyzyn ff6bf6e802 pstore: Remove superfluous memory size check
All previous checks will fail with error if memory size
is not sufficient to register a zone, so this legacy
check has become redundant.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-16 15:45:46 -08:00
Mark Salyzyn dbaffde764 pstore: Use scnprintf() in pstore_mkfile()
No guarantees that the names will not exceed the
name buffer with future adjustments.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-16 15:30:50 -08:00
Jeff Layton 3d8e560de4 locks: consolidate NULL i_flctx checks in locks_remove_file
We have each of the locks_remove_* variants doing this individually.
Have the caller do it instead, and have locks_remove_flock and
locks_remove_lease just assume that it's a valid pointer.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-16 16:08:50 -05:00
Jeff Layton 9bd0f45b70 locks: keep a count of locks on the flctx lists
This makes things a bit more efficient in the cifs and ceph lock
pushing code.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:50 -05:00
Jeff Layton 7448cc37b1 locks: clean up the lm_change prototype
Now that we use standard list_heads for tracking leases, we can have
lm_change take a pointer to the lease to be modified instead of a
double pointer.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:50 -05:00
Jeff Layton 6109c85037 locks: add a dedicated spinlock to protect i_flctx lists
We can now add a dedicated spinlock without expanding struct inode.
Change to using that to protect the various i_flctx lists.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:49 -05:00
Jeff Layton 8634b51f6c locks: convert lease handling to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:17 -05:00
Jeff Layton bd61e0a9c8 locks: convert posix locks to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:16 -05:00
Jeff Layton 5263e31e45 locks: move flock locks to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 15:09:25 -05:00
Jeff Layton c362781cad ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks
There is only a single call site for each of these functions, and the
caller takes the i_lock prior to calling them and drops it just
afterward. Move the spinlocking into the functions instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 15:09:25 -05:00
Jeff Layton 4a075e39c8 locks: add a new struct file_locking_context pointer to struct inode
The current scheme of using the i_flock list is really difficult to
manage. There is also a legitimate desire for a per-inode spinlock to
manage these lists that isn't the i_lock.

Start conversion to a new scheme to eventually replace the old i_flock
list with a new "file_lock_context" object.

We start by adding a new i_flctx to struct inode. For now, it lives in
parallel with i_flock list, but will eventually replace it. The idea is
to allocate a structure to sit in that pointer and act as a locus for
all things file locking.

We allocate a file_lock_context for an inode when the first lock is
added to it, and it's only freed when the inode is freed. We use the
i_lock to protect the assignment, but afterward it should mostly be
accessed locklessly.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 15:05:54 -05:00
Jeff Layton dd459bb197 locks: have locks_release_file use flock_lock_file to release generic flock locks
...instead of open-coding it and removing flock locks directly. This
helps consolidate the flock lock removal logic into a single spot.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-16 15:05:54 -05:00
Jeff Layton 6dee60f69d locks: add new struct list_head to struct file_lock
...that we can use to queue file_locks to per-ctx list_heads. Go ahead
and convert locks_delete_lock and locks_dispose_list to use it instead
of the fl_block list.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 15:05:54 -05:00
Linus Torvalds 62b1530065 Driver core fixes for 3.19-rc5
Here is one kernfs fix for a reported issue for 3.19-rc5.
 
 It has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlS5V8QACgkQMUfUDdst+ynQTgCdEOUn6oftKCkErl4WWX9q0+ZT
 4CIAoLuGH9Gdn5tIVlqJ1tVmESnsgn0T
 =P84S
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is one kernfs fix for a reported issue for 3.19-rc5.

  It has been in linux-next for a while"

* tag 'driver-core-3.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: Fix kernfs_name_compare
2015-01-17 08:16:52 +13:00
Linus Torvalds a2a32cd1d7 NFS client bugfixes for Linux 3.19
Highlights include:
 
 - Stable fix for a NFSv3/lockd race
 - Fixes for several NFSv4.1 client id trunking bugs
 - Remove an incorrect test when checking for delegated opens
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUuSAcAAoJEGcL54qWCgDyVGkQAJTTzJ8TcuDaXPz7a+8vYR/F
 0Z0iJ7Q2VzfVwLKReQdSMUhX8E6OsVe/HlmFT1paNdIOnF7gQvTclP7/GZ/EaDtA
 csBKkEcxoCRbF6OhCdeSMJC+iXs0cugCgYrkusPSGJj8dG+qa6Hix8rDfZKbM4yp
 sQphMozeI/FLHlckBTsViBCECcyR1FxjOrpXc1RPTz1u2dw9dDgZABdX00ubve2Y
 5dGIGyriFUiZzqFp+7GLXa2GITuLnda+IPPywyzF2MQy4XYWH63bXoWbyXL3kzQ7
 P7hLsGMOfl+pFOBnwYyXhhfqdsS/lOt8V+18Rrr1NhFx7VtQUae/VDPwLxnnoq0P
 ZtYiuJmIihzHfr4N5V2QR9TcuNO7fg6/1hxHZkOYxaYkH2IezFRWbtj9J/UJS+8q
 XDQzEvXMxtXCHKDQvmW653oQIdCFRR9YvgsN/YaxIB3vWPKL2tupeQWe/ZEdVvX3
 WemIxyc6NBS86otauVXOIKZFgiJ4mynpgH22xb43CJ33b049vJVu/NOQWHgrhHqQ
 7vpNDvpjyErjtoMVdN/x2L21uVTucZDXF59hfUD/Q9kE5qXGDRlqiFsDbkVvd+0C
 SXKBngtslBvgU3dYblAuJ9uMkTYkSE/vNkzVibPXJr4/cysVyMeeRDysVPHRZpw2
 8jH+yyHBxvK8XSDTxIkh
 =hz4r
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Stable fix for a NFSv3/lockd race
   - Fixes for several NFSv4.1 client id trunking bugs
   - Remove an incorrect test when checking for delegated opens"

* tag 'nfs-for-3.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Remove incorrect check in can_open_delegated()
  NFS: Ignore transport protocol when detecting server trunking
  NFSv4/v4.1: Verify the client owner id during trunking detection
  NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client
  NFSv4.1: Fix client id trunking on Linux
  LOCKD: Fix a race when initialising nlmsvc_timeout
2015-01-17 07:59:06 +13:00
Linus Torvalds cb59670870 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "This fixes a regression in the latest fuse update plus a fix for a
  rather theoretical memory ordering issue"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add memory barrier to INIT
  fuse: fix LOOKUP vs INIT compat handling
2015-01-16 14:58:16 +13:00
Rickard Strandqvist 917937025a nfsd: nfs4state: Remove unused function
Remove the function renew_client() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:42 -05:00
Rickard Strandqvist 4cb7208a41 lockd: xdr: Remove unused function
Remove the function nlm_encode_fh() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 13:46:27 -05:00
Matthew Wilcox dd22f551ac block: Change direct_access calling convention
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.

Add a new helper function, bdev_direct_access(), to handle common
functionality including partition handling, checking the length requested
is positive, checking for the sector being page-aligned, and checking
the length of the request does not pass the end of the partition.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-13 21:58:11 -07:00
NeilBrown 52d304eb4e locks: fix NULL-deref in generic_delete_lease
commit 0efaa7e82f
  locks: generic_delete_lease doesn't need a file_lock at all

moves the call to fl->fl_lmops->lm_change() to a place in the
code where fl might be a non-lease lock.
When that happens, fl_lmops is NULL and an Oops ensures.

So add an extra test to restore correct functioning.

Reported-by: Linda Walsh <suse@tlinx.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=912569
Cc: stable@vger.kernel.org (v3.18)
Fixes: 0efaa7e82f
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-13 07:00:55 -05:00
alex chen 3566c96476 GFS2: fix sprintf format specifier
Sprintf format specifier "%d" and "%u" are mixed up in
gfs2_recovery_done() and freeze_show(). So correct them.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2015-01-13 10:48:57 +00:00
Rickard Strandqvist e4999bbe08 jffs2: compr_rubin: Remove unused function
Remove the function pulledbits() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2015-01-12 20:40:03 -08:00
Fabian Frederick bbe48dd811 udf: destroy sbi mutex in put_super
Call mutex_destroy() on superblock mutex in udf_put_super()
otherwise mutex debugging code isn't able to detect that
mutex is used after being freed.
(thanks to Jan Kara for complete definition).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-12 10:56:04 +01:00
Matt Fleming 17473c32ce Updates included:
*Rename the function efi_guid_unparse to reflect better its behavior. - Borislav Petkov
 
  *Update the EFI firmware Kconfig help with the URL of efibootmgr. - Peter Jones
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUrfd/AAoJECYUxVC6df4Sd7kP/RP4uzJQnQTteg2ys2Zk2sEe
 rWdjxtcQurnP5P07SqNV/dZIWQx6hG/l61herFzryq9JKNye70fCd8bGl4sOPiHW
 i64FJtmCF+3vgLhJK2s6Ysub/xPdC0zkRXIIHvulX1nwZRWXiPzkJPzFlkCbXaW1
 O3ykSBIT4EwXAe84dN1Oio476Zyf/rS81rKakYXKwYNrfSKTmyH2FUlZODaSL+0a
 oGpae/2kps6rqdlOq19J5VKcYE3CTzy+Pv3QhqVlgywWF0AIUOHrcWpsWTv4IhKr
 v8blQcy7QZ/QWOEg7Nnus48/Mh0sNdAy497ZV4/tgmBfkGUcHD5qX5/Fy6Yp+/0R
 YJfRv5fTDyxYp6krdEeUnOphaXZT2jkG8uRdLJxNBONWBkwf/i9QXcmwuJc0+sgF
 7NKP8/kbqc3A6blQ/RNQRdeLW0hAAXfrVHj/zW8Fbi0PJTn+ALTTyvjDhMwYOr46
 IVbm1CzGQLXGNDFWvxRlMbzQa5PktiiqOxymTNJ6HZwP8TnS5IE5f/iEVpj5Yd+z
 zbYQcpBiN8VnQsKBmq04865/OrNgQRcUC+OU6Mz1iIIEIF8NLBDoRfE/oX0I+dEC
 jc0wTszmr0uXAu6dP4tgQEaQ0oNdOE806+BZ++3bpaZ3GCScjYtDOnSdsFB5ByYp
 dsAqvfPW8uY+I27SXMat
 =aLDw
 -----END PGP SIGNATURE-----

Merge tag 'rneri-efi-next' of https://github.com/ricardon/efi into next

Pull EFI changes for v3.20 from Ricardo Neri, who kindly picked them up
while I was out on annual leave in December.

Updates included:

 *Rename the function efi_guid_unparse to reflect better its behavior. - Borislav Petkov

 *Update the EFI firmware Kconfig help with the URL of efibootmgr. - Peter Jones

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-01-12 09:33:25 +00:00
Linus Torvalds 5ab551d662 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: group scheduling corner case fix, two deadline scheduler
  fixes, effective_load() overflow fix, nested sleep fix, 6144 CPUs
  system fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()
  sched/deadline: Avoid double-accounting in case of missed deadlines
  sched/deadline: Fix migration of SCHED_DEADLINE tasks
  sched: Fix odd values in effective_load() calculations
  sched, fanotify: Deal with nested sleeps
  sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation
2015-01-11 11:51:49 -08:00
Linus Torvalds dc9319f5a3 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
  rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
  nfsd: fix fi_delegees leak when fi_had_conflict returns true
2015-01-09 18:10:48 -08:00
Linus Torvalds 20ebb34528 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
 "These are both pretty trivial: a sparse warning fix and size_t printk
  thing"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix sparse endianness warnings
  ceph: use %zu for len in ceph_fill_inline_data()
2015-01-09 17:55:00 -08:00
Linus Torvalds 03c751a5e1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "None of these are huge, but my commit does fix a regression from 3.18
  that could cause lost files during log replay.

  This also adds Dave Sterba to the list of Btrfs maintainers.  It
  doesn't mean we're doing things differently, but Dave has really been
  helping with the maintainer workload for years"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: don't delay inode ref updates during log replay
  Btrfs: correctly get tree level in tree_backref_for_extent
  Btrfs: call inode_dec_link_count() on mkdir error path
  Btrfs: abort transaction if we don't find the block group
  Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
  Btrfs: add more maintainers
2015-01-09 17:46:07 -08:00
kbuild test robot 08e4126e1e f2fs: pids_lock can be static
fs/f2fs/trace.c:19:12: sparse: symbol 'pids_lock' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:29 -08:00
Jaegeuk Kim 351f4fba84 f2fs: add f2fs_destroy_trace_ios to free radix tree
This patch removes radix tree after finishing tracing IOs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim c05086506f f2fs: add spin_lock to cover radix operations in IO tracer
This patch adds spin_lock to cover radix tree operations in IO tracer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim dd4e4b59b1 f2fs: add nat/sit entries into status
This patch adds NAT/SIT entry informations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim 7aed0d4537 f2fs: free radix_tree_nodes used by nat_set entries
In the normal case, the radix_tree_nodes are freed successfully.
But, when cp_error was detected, we should destroy them forcefully.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim df1991391f f2fs: fix wrong unlock_page call
This patch removes wrongly called unlock_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Chao Yu 9e5ba77fdb f2fs: get rid of kzalloc in __recover_inline_status
We use kzalloc to allocate memory in __recover_inline_status, and use this
all-zero memory to check the inline date content of inode page by comparing
them. This is low effective and not needed, let's check inline date content
directly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: make the code more neat]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim 38aa0889b2 f2fs: align direct_io'ed data to section
This patch aligns the start block address of a file for direct io to the f2fs's
section size.

Some flash devices manage an over 4KB-sized page as a write unit, and if the
direct_io'ed data are written but not aligned to that unit, the performance can
be degraded due to the partial page copies.

Thus, since f2fs has a section that is well aligned to FTL units, we can align
the block address to the section size so that f2fs avoids this misalignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim 41ef94b35c f2fs: remove uncovered code path
This patch removes unnecessary function calls.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim 3547ea961d f2fs: avoid potential unnecessary codes
This patch relocates some operations to avoid unnecessary execution.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Jaegeuk Kim e1509cf294 f2fs: clean up to remove parameter
This patch uses dn->data_blkaddr as a parameter for the destination block
address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu 062920734c f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively
There are two slab cache inode_entry_slab and winode_slab using the same
structure as below:

struct dir_inode_entry {
	struct list_head list;	/* list head */
	struct inode *inode;	/* vfs inode pointer */
};

struct inode_entry {
	struct list_head list;
	struct inode *inode;
};

It's a little waste that the two cache can not share their memory space for each
other.
So in this patch we remove one redundant winode_slab slab cache, then use more
universal name struct inode_entry as remaining data structure name of slab,
finally we reuse the inode_entry_slab to store dirty dir item and gc item for
more effective.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu 2ace38e00e f2fs: cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
as one parameter.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu 3e1c8f125e f2fs: cleanup trace event of f2fs_submit_page_{m,}bio with DECLARE_EVENT_CLASS
This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then
use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event
code related to f2fs_submit_page_{m,}bio.

Additionally, after we remove redundant code, size of code can be reduced:
   text    data     bss     dec     hex filename
 176787    8712      56  185555   2d4d3 f2fs.ko.org
 174408    8648      56  183112   2cb48 f2fs.ko

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim 09eb483e89 f2fs: fix missing cold bit during recovery
In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Changman Lee b9a2c25207 f2fs: add block count by in-place-update in stat info
This patch adds block count by in-place-update in stat.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim dd802406e3 f2fs: avoid double lock for cp_rwsem
The __f2fs_add_link is covered by cp_rwsem all the time.
This calls init_inode_metadata, which conducts some acl operations including
memory allocation with GFP_KERNEL previously.
But, under memory pressure, f2fs_write_data_page can be called, which also
grabs cp_rwsem too.

In this case, this incurs a deadlock pointed by Chao.
Thread #1        Thread #2
 down_read
                 down_write
  down_read
 -> here down_read should wait forever.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim db9f7c1a95 f2fs: activate f2fs_trace_ios
This patch activates f2fs_trace_ios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim 9e4ded3f30 f2fs: activate f2fs_trace_pid
This patch activates f2fs_trace_pid.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim 0e689d0369 f2fs: add key functions for f2fs_io_tracer
This patch adds two key functions to trace process ids and IOs.
The basic idea is to
1. remain process ids, pids, in page->private.
2. show pids in IO traces.

So, later we can retrieve process information according to IO traces.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim 63f92ddc8a f2fs: add f2fs_io_tracer support
This patch adds:
 o initial trace.c and trace.h with skeleton functions
 o Kconfig and Makefile to activate this feature

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim cf04e8eb55 f2fs: use f2fs_io_info to clean up messy parameters during IO path
This patch cleans up parameters on IO paths.
The key idea is to use f2fs_io_info adding a parameter, block address, and then
use this structure as parameters.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu 9ecf4b80bd f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary
Use more common function ra_meta_pages() with META_POR to readahead node blocks
in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
readahead code there, and also we can remove unused function ra_sum_pages().

changes from v2:
 o use invalidate_mapping_pages as before suggested by Changman Lee.
changes from v1:
 o fix one bug when using truncate_inode_pages_range which is pointed out by
   Jaegeuk Kim.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu 5c27f4ee44 f2fs: merge two uchar variable in struct node_info to reduce memory cost
This patch moves one member of struct nat_entry: _flag_ to struct node_info,
so _version_ in struct node_info and _flag_ which are unsigned char type will
merge to one 32-bit space in register/memory. So the size of nat_entry will be
reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40
bytes to 32 bytes) and then slab memory using by f2fs will be reduced.

changes from v2:
 o update description of memory usage gain for 64-bit machine suggested by
   Changman Lee.
changes from v1:
 o introduce inline copy_node_info() to copy valid data from node info suggested
   by Jaegeuk Kim, it can avoid bug.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu 3fa06d7bc9 f2fs: readahead contiguous current summary blocks in checkpoint
Let's add readahead code for reading contiguous compact/normal summary blocks
in checkpoint, then we will gain better performance in mount procedure.

Changes from v1
  o remove inappropriate 'unlikely' in npages_for_summary_flush.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Jaegeuk Kim 5df1f1da7a f2fs: use missing the use of f2fs_kunmap_page
This patch calls f2fs_kunmap_page which I missed before.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 042b7816aa f2fs: remove unnecessary call to invalidate inmemory pages
Now we use inmemory pages for atomic write only and provide abort procedure,
we don't need to truncate them explicitly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim d7bc2484b8 f2fs: fix small discards not to issue redundantly
The ckpt_valid_map and cur_valid_map are synced by seg_info_to_raw_sit.

In the case of small discards, the candidates are selected before sync,
while fitrim selects candidates after sync.

So, for small discards, we need to add candidates only just being obsoleted.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 1e84371ffe f2fs: change atomic and volatile write policies
This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
 o f2fs_ioc_abort_volatile_write
  - If transaction was failed, all the grabbed pages and data should be written.
 o f2fs_ioc_release_volatile_write
  - This is to enhance the performance of PERSIST mode in sqlite.

In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 70c640b1d6 f2fs: don't need to call lock_op and lock_page for abort
We don't need to call lock_op and lock_page at the aborting path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Jaegeuk Kim 88a70a69c0 f2fs: fix wrong condition check to trigger f2fs_sync_fs
If there is not enough available memory, we need to trigger f2fs_sync_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Jaegeuk Kim cd52b6368f f2fs: remove checking dirty_exceed
We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg.
This flag was only meaningful to write bypassing conditions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Rasmus Villemoes 72392ed0eb kernfs: Fix kernfs_name_compare
Returning a difference from a comparison functions is usually wrong
(see acbbe6fbb2 "kcmp: fix standard comparison bug" for the long
story). Here there is the additional twist that if the void pointers
ns and kn->ns happen to differ by a multiple of 2^32,
kernfs_name_compare returns 0, falsely reporting a match to the
caller.

Technically 'hash - kn->hash' is ok since the hashes are restricted to
31 bits, but it's better to avoid that subtlety.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-09 15:51:08 -08:00
Bob Peterson 8f6cb409f0 GFS2: Eliminate __gfs2_glock_remove_from_lru
Since the only caller of function __gfs2_glock_remove_from_lru locks the
same spin_lock as gfs2_glock_remove_from_lru, the functions can be combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2015-01-09 11:33:13 +00:00
Peter Zijlstra 536ebe9ca9 sched, fanotify: Deal with nested sleeps
As per e23738a730 ("sched, inotify: Deal with nested sleeps").

fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:18:12 +01:00
Dave Chinner 6bcf0939ff Merge branch 'xfs-misc-fixes-for-3.20-2' into for-next 2015-01-09 11:06:17 +11:00
Nicholas Mc Guire 43fd1fce96 xfs: fix implicit bool to int conversion
try_wait_for_completion returns bool so the wrapper function
xfs_dqflock_nowait should probably also return bool and not int.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:48:58 +11:00
Christoph Hellwig d32057fc84 xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten
The code is already ready for it, and the pnfs layout commit code expects
to be able to pass a larger than 32-bit argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:48:12 +11:00
Dave Chinner 64af7a6ea5 xfs: remove deprecated sysctls
xfsbufd_centisecs and age_buffer_centisecs were due for removal in
3.14. We forgot to do that - it's now well past time to remove these
deprecated, unused sysctls.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:47:43 +11:00
Dave Chinner aa5d95c1b5 xfs: move xfs_bmap_finish prototype
This function is used libxfs code, but is implemented separately in
userspace. Move the function prototype to xfs_bmap.h so that the
prototype is shared even if the implementations aren't.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:47:14 +11:00
Dave Chinner 9799b438ce xfs: move struct xfs_bmalloca to libxfs
It no long is used for stack splits, so strip the kernel workqueue
bits from it and push it back into libxfs/xfs_bmap.h so that
it can be shared with the userspace code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:46:49 +11:00
Dave Chinner 5ebdc213ac xfs: move xfs_types.h to libxfs
The types used by the core XFS code are common between kernel and
userspace. xfs_types.h is duplicated in both kernel and userspace,
so move it to libxfs along with all the other shared code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:46:31 +11:00
Dave Chinner 2155355fda xfs: move xfs_fs.h to libxfs
Ioctl API definitions are shared with userspace, so move the header
file that defines them all to libxfs along with all the other code
shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-09 10:45:13 +11:00
David Drysdale 75069f2b5b vfs: renumber FMODE_NONOTIFY and add to uniqueness check
Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc.  The
clashing O_PATH value was added in commit 5229645bdc ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.

FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c).  So renumber it to avoid the clash.

All of this has happened before (commit 12ed2e36c98a: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.

Signed-off-by: David Drysdale <drysdale@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Xue jiufei 53dc20b9a3 ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file
In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong.  Parameter dir is the parent of
new_dentry not old_dentry.  We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.

With this change, hard linking works again, when paths are relative with
at least one subdirectory.  This is how the problem was reproducable:

  # mkdir a
  # mkdir b
  # touch a/test
  # ln a/test b/test
  ln: failed to create hard link `b/test' => `a/test': No such file or  directory

However when creating links in the same dir, it worked well.

Now the link gets created.

Fixes: 0e048316ff ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Szabo Aron - UBIT <aron@ubit.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Tested-by: Aron Szabo <aron@ubit.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Joseph Qi eb4f73b4ca ocfs2: remove bogus check in dlm_process_recovery_data
In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
be set to -ENOMEM.  And in this case, newlock is definitely NULL.  So
test newlock is meaningless, remove it.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Ilya Dryomov 0668ff52e2 ceph: use %zu for len in ceph_fill_inline_data()
len is size_t, should be printed with %zu.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:56 +03:00
Borislav Petkov 26e022727f efi: Rename efi_guid_unparse to efi_guid_to_str
Call it what it does - "unparse" is plain-misleading.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
2015-01-07 19:07:44 -08:00
J. Bruce Fields 0ec016e3e0 nfsd4: tweak rd_dircount accounting
RFC 3530 14.2.24 says

	This value represents the length of the names of the directory
	entries and the cookie value for these entries.  This length
	represents the XDR encoding of the data (names and cookies)...

The "xdr encoding" of the name should probably include the 4 bytes for
the length.

But this is all just a hint so not worth e.g. backporting to stable.

Also reshuffle some lines to more clearly group together the
dircount-related code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 14:48:10 -05:00
Jeff Layton 67db103448 nfsd: fi_delegees doesn't need to be an atomic_t
fi_delegees is always handled under the fi_lock, so there's no need to
use an atomic_t for this field.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 14:05:35 -05:00
Jeff Layton 94ae1db226 nfsd: fix fi_delegees leak when fi_had_conflict returns true
Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.

Change the code to take the reference after the check and only if there
is no conflict.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 13:38:21 -05:00
Jan Kara 23b133bdc4 udf: Check length of extended attributes and allocation descriptors
Check length of extended attributes and allocation descriptors when
loading inodes from disk. Otherwise corrupted filesystems could confuse
the code and make the kernel oops.

Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-07 13:51:58 +01:00
Jan Kara 7914495427 udf: Remove repeated loads blocksize
Store blocksize in a local variable in udf_fill_inode() since it is used
a lot of times.

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-07 13:46:16 +01:00
Oscar Forner Martinez e4a93be6cb isofs: Fix bug in the way to check if the year is a leap year
Changed the whole algorithm for a call to mktime64 that takes
care of all that details.

Signed-off-by: Oscar Forner Martinez <oscar.forner.martinez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-07 09:51:49 +01:00
Linus Torvalds 3b421b80be Revert a potential seek_data/hole regression which shows up when using
ext4 to handle ext3 file systems, plus two minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUqxTQAAoJENNvdpvBGATwUC8QAIYfP02XYyGrBQIoAoCiRJji
 TmhAe2Amy9xHgBq2I8Yz3bi8j1wGuiqc57fYgwwVScf10tzmO/HuwnAZZDAmg9hK
 ZWp9WyPoyf+U/nbkIfC5mRh3Qz0dt1pt6R3uQDUlcUuAamdMBrdJnhkQC6WMbpU2
 fAqsJT3/yGrLnMF29eVqJzcxb5KORJ8hEcD7kwkvJwe4sGm3C7iDsjS0i63YWDz4
 QNclW6zF4THhmuVNxwRupOgMQNSq8sHg8U23nP4DZLvLE7GlgtwfDvehU7uBfw5n
 WO5UfsEYLoeODNmujUJCtjXNLpzDXmrtByyWbbTK7EX3MmV94ym4uu5lHLfyMiTc
 o2ppxcsKBVcOsPWnFwuhJ5p/Wyy0Uld9Q3P6b5ymhyzDhkuwcTURpeRxBRXHEgcm
 nY5GE1bBdO7OigDz/+DFL/Zgr8EO7hW72hrBaLDWMEbrrl0asZw/ReC/bnreMmm4
 sP87DB+MqRXzRs8aOPWmCofJwGSgCYmOq2nqNCAaxgk/ofvrDURrnZYfLrbzspGa
 hqE1W0X5hvQydcifi4qq2Na76+Js3atSY38EOH/HNknSqlQjysnkW4ajTDWk/GFy
 M/fKUCfIl1tmCMN2myZzl89E7uMSyod75ycd0BQy36iHPE14JVvk/u7GfcKHLs53
 1rAPpW90a72GX2Z9+xxA
 =uPYz
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Revert a potential seek_data/hole regression which shows up when using
  ext4 to handle ext3 file systems, plus two minor bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: remove spurious KERN_INFO from ext4_warning call
  Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
  ext4: prevent online resize with backup superblock
2015-01-06 14:05:40 -08:00
Pranith Kumar 83fe27ea53 rcu: Make SRCU optional by using CONFIG_SRCU
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.

The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.

If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

   text    data     bss     dec     hex filename
   2007       0       0    2007     7d7 kernel/rcu/srcu.o

Size of arch/powerpc/boot/zImage changes from

   text    data     bss     dec     hex filename
 831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
 829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after

so the savings are about ~2000 bytes.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
2015-01-06 11:04:29 -08:00
Miklos Szeredi 9759bd5189 fuse: add memory barrier to INIT
Theoretically we need to order setting of various fields in fc with
fc->initialized.

No known bug reports related to this yet.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-01-06 10:45:35 +01:00
Miklos Szeredi 21f621741a fuse: fix LOOKUP vs INIT compat handling
Analysis from Marc:

 "Commit 7078187a79 ("fuse: introduce fuse_simple_request() helper")
  from the above pull request triggers some EIO errors for me in some tests
  that rely on fuse

  Looking at the code changes and a bit of debugging info I think there's a
  general problem here that fuse_get_req checks and possibly waits for
  fc->initialized, and this was always called first.  But this commit
  changes the ordering and in many places fc->minor is now possibly used
  before fuse_get_req, and we can't be sure that fc has been initialized.
  In my case fuse_lookup_init sets req->out.args[0].size to the wrong size
  because fc->minor at that point is still 0, leading to the EIO error."

Fix by moving the compat adjustments into fuse_simple_request() to after
fuse_get_req().

This is also more readable than the original, since now compatibility is
handled in a single function instead of cluttering each operation.

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: 7078187a79 ("fuse: introduce fuse_simple_request() helper")
2015-01-06 10:45:35 +01:00
Trond Myklebust 4e379d36c0 NFSv4: Remove incorrect check in can_open_delegated()
Remove an incorrect check for NFS_DELEGATION_NEED_RECLAIM in
can_open_delegated(). We are allowed to cache opens even in
a situation where we're doing reboot recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:54 -08:00
Chuck Lever 7a01edf005 NFS: Ignore transport protocol when detecting server trunking
Detect server trunking across transport protocols. Otherwise, an
RDMA mount and a TCP mount of the same server will end up with
separate nfs_clients using the same clientid4.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:54 -08:00
Trond Myklebust 55b9df93dd NFSv4/v4.1: Verify the client owner id during trunking detection
While we normally expect the NFSv4 client to always send the same client
owner to all servers, there are a couple of situations where that is not
the case:
 1) In NFSv4.0, switching between use of '-omigration' and not will cause
    the kernel to switch between using the non-uniform and uniform client
    strings.
 2) In NFSv4.1, or NFSv4.0 when using uniform client strings, if the
    uniquifier string is suddenly changed.

This patch will catch those situations by checking the client owner id
in the trunking detection code, and will do the right thing if it notices
that the strings differ.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Trond Myklebust ceb3a16c07 NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client
Ensure that we cache the NFSv4/v4.1 client owner_id so that we can
verify it when we're doing trunking detection.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Trond Myklebust 1fc0703af3 NFSv4.1: Fix client id trunking on Linux
Currently, our trunking code will check for session trunking, but will
fail to detect client id trunking. This is a problem, because it means
that the client will fail to recognise that the two connections represent
shared state, even if they do not permit a shared session.
By removing the check for the server minor id, and only checking the
major id, we will end up doing the right thing in both cases: we close
down the new nfs_client and fall back to using the existing one.

Fixes: 05f4c350ee ("NFS: Discover NFSv4 server trunking when mounting")
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 3.7.x
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Trond Myklebust 06bed7d18c LOCKD: Fix a race when initialising nlmsvc_timeout
This commit fixes a race whereby nlmclnt_init() first starts the lockd
daemon, and then calls nlm_bind_host() with the expectation that
nlmsvc_timeout has already been initialised. Unfortunately, there is no
no synchronisation between lockd() and lockd_up() to guarantee that this
is the case.

Fix is to move the initialisation of nlmsvc_timeout into lockd_create_svc

Fixes: 9a1b6bf818 ("LOCKD: Don't call utsname()->nodename...")
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org # 3.10.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Leif Lindholm 62c204ddfe fs: Make efivarfs a pseudo filesystem, built by default with EFI
efivars is currently enabled under MISC_FILESYSTEMS, which is decribed
as "such as filesystems that came from other operating systems".
In reality, it is a pseudo filesystem, providing access to the kernel
UEFI variable interface.

Since this is the preferred interface for accessing UEFI variables, over
the legacy efivars interface, also build it by default as a module if
CONFIG_EFI.

Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-01-05 14:15:58 +00:00
Fabian Frederick 6744e90b0f ext3: destroy sbi mutexes in put_super
Call mutex_destroy() on superblock mutexes in ext3_put_super().
Otherwise mutex debugging code isn't able to detect that mutex is used
after being freed.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-05 11:13:55 +01:00
Jan Kara 2c561bc362 udf: Update Kconfig description
Update description of UDF in Kconfig to mention that UDF is also
suitable for removable USB disks.

Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-05 11:04:37 +01:00
Jakub Wilk 363307e6e5 ext4: remove spurious KERN_INFO from ext4_warning call
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-01-02 15:31:14 -05:00
Theodore Ts'o ad7fefb109 Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
This reverts commit 14516bb7bb.

This was causing regression test failures with generic/285 with an ext3
filesystem using CONFIG_EXT4_USE_FOR_EXT23.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-01-02 15:16:00 -05:00
Chris Mason 6f8960541b Btrfs: don't delay inode ref updates during log replay
Commit 1d52c78afb (Btrfs: try not to ENOSPC on log replay) added a
check to skip delayed inode updates during log replay because it
confuses the enospc code.  But the delayed processing will end up
ignoring delayed refs from log replay because the inode itself wasn't
put through the delayed code.

This can end up triggering a warning at commit time:

WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()

Which is repeated for each commit because we never process the delayed
inode ref update.

The fix used here is to change btrfs_delayed_delete_inode_ref to return
an error if we're currently in log replay.  The caller will do the ref
deletion immediately and everything will work properly.

Signed-off-by: Chris Mason <clm@fb.com>
cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afb
2015-01-02 14:47:56 -05:00
Filipe Manana a1317f455a Btrfs: correctly get tree level in tree_backref_for_extent
If we are using skinny metadata, the block's tree level is in the offset
of the key and not in a btrfs_tree_block_info structure following the
extent item (it doesn't exist). Therefore fix it.

Besides returning the correct level in the tree, this also prevents reading
past the leaf's end in the case where the extent item is the last item in
the leaf (eb) and it has only 1 inline reference - this is because
sizeof(struct btrfs_tree_block_info) is greater than
sizeof(struct btrfs_extent_inline_ref).

Got it while running a scrub which produced the following warning:

    BTRFS: checksum error at logical 42123264 on dev /dev/sde, sector 15840: metadata node (level 24) in tree 5

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:56 -05:00
Wang Shilong c7cfb8a540 Btrfs: call inode_dec_link_count() on mkdir error path
In btrfs_mkdir(), if it fails to create dir, we should
clean up existed items, setting inode's link properly
to make sure it could be cleaned up properly.

Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:55 -05:00
Josef Bacik df95e7f0d9 Btrfs: abort transaction if we don't find the block group
We shouldn't BUG_ON() if there is corruption.  I hit this while testing my block
group patch and the abort worked properly.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:55 -05:00
Dan Carpenter 6b6d24b389 Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
The only way that "ret" is set is when we call scrub_pages_for_parity()
so the skip to "if (ret) " test doesn't make sense and causes a static
checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:55 -05:00
Linus Torvalds 5faa0154fe Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "A set of three minor cifs fixes"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: make new inode cache when file type is different
  Fix signed/unsigned pointer warning
  Convert MessageID in smb2_hdr to LE
2014-12-29 21:09:57 -08:00
Linus Torvalds b9d4a35f0a Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF & isofs fixes from Jan Kara:
 "A couple of UDF fixes of handling of corrupted media and one iso9660
  fix of the same"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Reduce repeated dereferences
  udf: Check component length before reading it
  udf: Check path length when reading symlink
  udf: Verify symlink size before loading it
  udf: Verify i_size when loading inode
  isofs: Fix unchecked printing of ER records
2014-12-29 20:43:10 -08:00
Theodore Ts'o 011fa99404 ext4: prevent online resize with backup superblock
Prevent BUG or corrupted file systems after the following:

mkfs.ext4 /dev/vdc 100M
mount -t ext4 -o sb=40961 /dev/vdc /vdc
resize2fs /dev/vdc

We previously prevented online resizing using the old resize ioctl.
Move the code to ext4_resize_begin(), so the check applies for all of
the resize ioctl's.

Reported-by: Maxim Malkov <malkov@ispras.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-12-26 23:58:21 -05:00
Al Viro e1f1fe798d jfs: get rid of homegrown endianness helpers
Get rid of le24 stuff, along with the bitfields use - all that stuff
can be done with standard stuff, in sparse-verifiable manner.  Moreover,
that way (shift-and-mask) often generates better code - gcc optimizer
sucks on bitfields...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
----
2014-12-23 17:01:24 -06:00
Dave Chinner efdca7aa3c Merge branch 'xfs-misc-fixes-for-3.20-1' into for-next 2014-12-24 09:49:53 +11:00
Jan Kara 1a43ec03dd xfs: Keep sb_bad_features2 consistent with sb_features2
Currently when we modify sb_features2, we store the same value also in
sb_bad_features2. However in most places we forget to mark field
sb_bad_features2 for logging and thus it can happen that a change to it
is lost. This results in an inconsistent sb_features2 and
sb_bad_features2 fields e.g. after xfstests test xfs/187.

Fix the problem by changing XFS_SB_FEATURES2 to actually mean both
sb_features2 and sb_bad_features2 fields since this is always what we
want to log. This isn't ideal because the fact that XFS_SB_FEATURES2
means two fields could cause some problem in future however the code is
hopefully less error prone that it is now.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-24 09:48:35 +11:00
Eric Sandeen 77af574eef xfs: remove extra newlines from xfs messages
xfs_warn() and friends add a newline by default, but some
messages add another one.

Particularly for the failing write message below, this can
waste a lot of console real estate!

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-24 09:47:27 +11:00
Brian Foster 96ab7954bc xfs: initialize log buf I/O completion wq on log alloc
Log buffer I/O completion passes through the high priority
m_log_workqueue rather than the default metadata buffer workqueue. The
log buffer wq is initialized at I/O submission time. The log buffers are
reused once initialized, however, so this is not necessary.

Initialize the log buffer I/O completion workqueue pointers once when
the log is allocated and log buffers initialized rather than on every
log buffer I/O submission.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-24 09:46:23 +11:00
Carlos Maiolino d31a182545 xfs: Add support to RENAME_EXCHANGE flag
Adds a new function named xfs_cross_rename(), responsible for
handling requests from sys_renameat2() using RENAME_EXCHANGE flag.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-24 08:51:42 +11:00
Carlos Maiolino dbe1b5ca26 xfs: Make xfs_vn_rename compliant with renameat2() syscall
To be able to support RENAME_EXCHANGE flag from renameat2() system
call, XFS must have its inode_operations updated, exporting .rename2
method, instead of .rename.

This patch just replaces the (now old) .rename method by .rename2,
using the same infra-structure, but checking rename flags.  Calls to
.rename2 using RENAME_EXCHANGE flag, although now handled inside
XFS, still return -EINVAL.

RENAME_NOREPLACE is handled via VFS and we don't need to care about
it inside xfs_vn_rename.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-24 08:51:38 +11:00
Nakajima Akira 9e6d722f3d cifs: make new inode cache when file type is different
In spite of different file type,
 if file is same name and same inode number, old inode cache is used.
This causes that you can not cd directory, can not cat SymbolicLink.
So this patch is that if file type is different, return error.

Reproducible sample :
1. create file 'a' at cifs client.
2. repeat rm and mkdir 'a' 4 times at server, then direcotry 'a' having same inode number is created.
   (Repeat 4 times, then same inode number is recycled.)
   (When server is under RHEL 6.6, 1 time is O.K.  Always same inode number is recycled.)
3. ls -li at client, then you can not cd directory, can not remove directory.

SymbolicLink has same problem.

Bug link:
https://bugzilla.kernel.org/show_bug.cgi?id=90011

Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
2014-12-22 14:16:21 -06:00
Jan Kara 3ee3039c5b udf: Reduce repeated dereferences
Replace repeated dereferences like dir->i_sb by storing superblock
pointer in a variable and using that.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-21 22:42:37 +01:00
Jan Kara e237ec37ec udf: Check component length before reading it
Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.

Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-21 22:42:19 +01:00
Linus Torvalds ecb5ec044a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #3 from Al Viro:
 "Assorted fixes and patches from the last cycle"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [regression] chunk lost from bd9b51
  vfs: make mounts and mountstats honor root dir like mountinfo does
  vfs: cleanup show_mountinfo
  init: fix read-write root mount
  unfuck binfmt_misc.c (broken by commit e6084d4)
  vm_area_operations: kill ->migrate()
  new helper: iter_is_iovec()
  move_extent_per_page(): get rid of unused w_flags
  lustre: get rid of playing with ->fs
  btrfs: filp_open() returns ERR_PTR() on failure, not NULL...
2014-12-19 18:19:19 -08:00
Linus Torvalds 298647e31a Fixes for filename decryption and encrypted view plus a cleanup
- The filename decryption routines were, at times, writing a zero byte one
   character past the end of the filename buffer
 - The encrypted view feature attempted, and failed, to roll its own form of
   enforcing a read-only mount instead of letting the VFS enforce it
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJUlGlWAAoJENaSAD2qAscKgdwQAJLCgt5xgoij8uH65mYVy6PE
 ++E8KggCHchhUcFh19EuuG++ENzwZedqKqbbRhmZt+svla08uPDO0jY+sYaOZzqB
 HMzZyyL4Fw/DiVsK6LUMGfN2CS7tua9ReK0dj4tUc3jwBplnQnGrQlUPbgdPRJJp
 XJDKHVtHGQPQC1dJAeProCJTg23jv5wXacly2I5VxZuh5DnDWjuo2KkzqGMKfadU
 9bxOf5DjVwQ/X6apgkNxG/8Q6J8a+80K7SGs/aRELIuRL0+mIj5AX9ht+p5Z0XI+
 E8jU4HM2biuqj8PtxK/WbyF1bHiREVyOLdneW+n3pcqGTLMZ3TLpDURkNd0cwZcd
 AGoZtUZZuQ/xvzE9IrEj+KYeINnZzPQpyJu9jXXp1CsQJ8u/wUG9e0kcJrUI1dZ/
 q26zhPaVhJ9x0go/BNOxzTUpOtuMWXttAVZGICthp74I5l+DgfSZ4J4CeQmFMMRs
 kbiTg5h4qsrD4jnEU/BCscpCLoz4qlF6+q/+lspwkg7OwvhHc0qSMJn3UjZ/eYzb
 ncDp6gc4Iju3RhWnVEZphA4ttbZJX7ahER4y0NdIG65Fa37AUEUxy0OmGfoyFOmC
 KVtlHkl2hKoCB5+ujLzL7WimH33ogtVhtOJTZlXzS0lsdSfSxUWU4/OZ22jTy0hL
 Bnx6uoNIPWsr6b5OJiA0
 =jTyq
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Fixes for filename decryption and encrypted view plus a cleanup

   - The filename decryption routines were, at times, writing a zero
     byte one character past the end of the filename buffer

   - The encrypted view feature attempted, and failed, to roll its own
     form of enforcing a read-only mount instead of letting the VFS
     enforce it"

* tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Remove buggy and unnecessary write in file name decode routine
  eCryptfs: Remove unnecessary casts when parsing packet lengths
  eCryptfs: Force RO mount when encrypted view is enabled
2014-12-19 18:15:12 -08:00
Linus Torvalds 5c68eac68b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
 "This is part two of our merge window patches.

  These are all from Filipe, and fix some really hard to find races that
  can cause corruptions.  Most of them involved block group removal
  (balance) or discard"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove non-sense btrfs_error_discard_extent() function
  Btrfs: fix fs corruption on transaction abort if device supports discard
  Btrfs: always clear a block group node when removing it from the tree
  Btrfs: ensure deletion from pinned_chunks list is protected
2014-12-19 18:10:42 -08:00
Linus Torvalds ac88ee3b6c Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core fix from Thomas Gleixner:
 "A single fix plugging a long standing race between proc/stat and
  proc/interrupts access and freeing of interrupt descriptors"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Prevent proc race against freeing of irq descriptors
2014-12-19 13:26:08 -08:00
Jan Kara 0e5cc9a40a udf: Check path length when reading symlink
Symlink reading code does not check whether the resulting path fits into
the page provided by the generic code. This isn't as easy as just
checking the symlink size because of various encoding conversions we
perform on path. So we have to check whether there is still enough space
in the buffer on the fly.

CC: stable@vger.kernel.org
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 14:12:08 +01:00
Jan Kara a1d47b2629 udf: Verify symlink size before loading it
UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.

CC: stable@vger.kernel.org
Reported-by: Carl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 13:17:10 +01:00
Jan Kara e159332b9a udf: Verify i_size when loading inode
Verify that inode size is sane when loading inode with data stored in
ICB. Otherwise we may get confused later when working with the inode and
inode size is too big.

CC: stable@vger.kernel.org
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 13:13:05 +01:00
Jan Kara 4e2024624e isofs: Fix unchecked printing of ER records
We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 11:29:24 +01:00
Linus Torvalds 018cb13eb3 Merge branch 'akpm' (patches from Andrew)
Merge misc patches from Andrew Morton:
 "A few stragglers"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  tools/testing/selftests/Makefile: alphasort the TARGETS list
  mm/zsmalloc: adjust order of functions
  ocfs2: fix journal commit deadlock
  ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker
  ocfs2: reflink: fix slow unlink for refcounted file
  mm/memory.c:do_shared_fault(): add comment
  .mailmap: Santosh Shilimkar has moved
  .mailmap: update akpm@osdl.org
  lib/show_mem.c: add cma reserved information
  fs/proc/meminfo.c: include cma info in proc/meminfo
  mm: cma: split cma-reserved in dmesg log
  hfsplus: fix longname handling
  mm/mempolicy.c: remove unnecessary is_valid_nodemask()
2014-12-18 19:08:25 -08:00
Junxiao Bi 136f49b917 ocfs2: fix journal commit deadlock
For buffer write, page lock will be got in write_begin and released in
write_end, in ocfs2_write_end_nolock(), before it unlock the page in
ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
for the read lock of journal->j_trans_barrier.  Holding page lock and
ask for journal->j_trans_barrier breaks the locking order.

This will cause a deadlock with journal commit threads, ocfs2cmt will
get write lock of journal->j_trans_barrier first, then it wakes up
kjournald2 to do the commit work, at last it waits until done.  To
commit journal, kjournald2 needs flushing data first, it needs get the
cache page lock.

Since some ocfs2 cluster locks are holding by write process, this
deadlock may hung the whole cluster.

unlock pages before ocfs2_run_deallocs() can fix the locking order, also
put unlock before ocfs2_commit_trans() to make page lock is unlocked
before j_trans_barrier to preserve unlocking order.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:11 -08:00
Joseph Qi 1e58958160 ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker
Commit ac4fef4d23 ("ocfs2/dlm: do not purge lockres that is queued for
assert master") may have the following possible race case:

  dlm_dispatch_assert_master       dlm_wq
  ========================================================================
  queue_work(dlm->quedlm_worker,
      &dlm->dispatched_work);
                                 dispatch work,
                                 dlm_lockres_drop_inflight_worker
                                 *BUG_ON(res->inflight_assert_workers == 0)*
  dlm_lockres_grab_inflight_worker
  inflight_assert_workers++

So ensure inflight_assert_workers to be increased first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Xue jiufei <xuejiufei@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:11 -08:00
Junxiao Bi f62f12b3a4 ocfs2: reflink: fix slow unlink for refcounted file
When running ocfs2 test suite multiple nodes reflink stress test, for a
4 nodes cluster, every unlink() for refcounted file needs about 700s.

The slow unlink is caused by the contention of refcount tree lock since
all nodes are unlink files using the same refcount tree.  When the
unlinking file have many extents(over 1600 in our test), most of the
extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
execute the following call trace for every extents.  This means it needs
get and released refcount tree lock about 1600 times.  And when several
nodes are do this at the same time, the performance will be very low.

  ocfs2_remove_btree_range()
  --  ocfs2_lock_refcount_tree()
  ----  ocfs2_refcount_lock()
  ------  __ocfs2_cluster_lock()

ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
do lock/unlock once can improve a lot performance.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Wengang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:11 -08:00
Pintu Kumar 47f8f9297d fs/proc/meminfo.c: include cma info in proc/meminfo
This patch include CMA info (CMATotal, CMAFree) in /proc/meminfo.
Currently, in a CMA enabled system, if somebody wants to know the total
CMA size declared, there is no way to tell, other than the dmesg or
/var/log/messages logs.

With this patch we are showing the CMA info as part of meminfo, so that it
can be determined at any point of time.  This will be populated only when
CMA is enabled.

Below is the sample output from a ARM based device with RAM:512MB and CMA:16MB.

  MemTotal:         471172 kB
  MemFree:          111712 kB
  MemAvailable:     271172 kB
  .
  .
  .
  CmaTotal:          16384 kB
  CmaFree:            6144 kB

This patch also fix below checkpatch errors that were found during these changes.

  ERROR: space required after that ',' (ctx:ExV)
  199: FILE: fs/proc/meminfo.c:199:
  +       ,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
          ^

  ERROR: space required after that ',' (ctx:ExV)
  202: FILE: fs/proc/meminfo.c:202:
  +       ,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
          ^

  ERROR: space required after that ',' (ctx:ExV)
  206: FILE: fs/proc/meminfo.c:206:
  +       ,K(totalcma_pages)
          ^

  total: 3 errors, 0 warnings, 2 checks, 236 lines checked

Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
Signed-off-by: Vishnu Pratap Singh <vishnu.ps@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:10 -08:00
Sougata Santra 89ac9b4d3d hfsplus: fix longname handling
Longname is not correctly handled by hfsplus driver.  If an attempt to
create a longname(>255) file/directory is made, it succeeds by creating a
file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key.  Thus
leaving the volume in an inconsistent state.  This patch fixes this issue.

Although lookup is always called first to create a negative entry, so just
doing a check in lookup would probably fix this issue.  I choose to
propagate error to other iops as well.

Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
hfsplus_cat_build_key, to avoid unncessary branching.

Thanks a lot.

  TEST:
  ------
  dir="TEST_DIR"
  cdir=`pwd`
  name255="_123456789_123456789_123456789_123456789_123456789_123456789\
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
  _123456789_123456789_123456789_123456789_123456789_1234"
  name256="${name255}5"

  mkdir $dir
  cd $dir
  touch $name255
  rm -f $name255
  touch $name256
  ls -la
  cd $cdir
  rm -rf $dir

  RESULT:
  -------
  [sougata@ultrabook tmp]$ cdir=`pwd`
  [sougata@ultrabook tmp]$
  name255="_123456789_123456789_123456789_123456789_123456789_123456789\
   > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
   > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
   > _123456789_123456789_123456789_123456789_123456789_1234"
  [sougata@ultrabook tmp]$ name256="${name255}5"
  [sougata@ultrabook tmp]$
  [sougata@ultrabook tmp]$ mkdir $dir
  [sougata@ultrabook tmp]$ cd $dir
  [sougata@ultrabook TEST_DIR]$ touch $name255
  [sougata@ultrabook TEST_DIR]$ rm -f $name255
  [sougata@ultrabook TEST_DIR]$ touch $name256
  [sougata@ultrabook TEST_DIR]$ ls -la
  ls: cannot access
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
  No such file or directory
  total 0
  drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
  drwxrwxrwx 1 root    root    6 Feb 20 19:56 ..
  -????????? ? ?       ?       ?            ?
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
  [sougata@ultrabook TEST_DIR]$ cd $cdir
  [sougata@ultrabook tmp]$ rm -rf $dir
  rm: cannot remove `TEST_DIR': Directory not empty

-ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
and incorrect keys, leaving the FS in an inconsistent state.  This patch
fixes this issue.

Signed-off-by: Sougata Santra <sougata@tuxera.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:10 -08:00
Eric W. Biederman c297abfdf1 mnt: Fix a memory stomp in umount
While reviewing the code of umount_tree I realized that when we append
to a preexisting unmounted list we do not change pprev of the former
first item in the list.

Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
the former first item of the list will stomp unmounted.first leaving
it set to some random mount point which we are likely to free soon.

This isn't likely to hit, but if it does I don't know how anyone could
track it down.

[ This happened because we don't have all the same operations for
  hlist's as we do for normal doubly-linked lists. In particular,
  list_splice() is easy on our standard doubly-linked lists, while
  hlist_splice() doesn't exist and needs both start/end entries of the
  hlist.  And commit 38129a13e6 incorrectly open-coded that missing
  hlist_splice().

  We should think about making these kinds of "mindless" conversions
  easier to get right by adding the missing hlist helpers   - Linus ]

Fixes: 38129a13e6 switch mnt_hash to hlist
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 11:22:02 -08:00
Linus Torvalds 44e8967d59 Ceph: remove left-over reject file
Neither Sage nor I noticed that Zheng Yan had mistakenly committed
fs/ceph/super.h.rej as part of commit 31c542a199 ("ceph: add inline
data to pagecache").

Remove it.

Requested-by: Yan, Zheng <ukernel@gmail.com>
Cc: Sage Weil <sweil@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-17 18:47:01 -08:00
Linus Torvalds 57666509b7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The big item here is support for inline data for CephFS and for
  message signatures from Zheng.  There are also several bug fixes,
  including interrupted flock request handling, 0-length xattrs, mksnap,
  cached readdir results, and a message version compat field.  Finally
  there are several cleanups from Ilya, Dan, and Markus.

  Note that there is another series coming soon that fixes some bugs in
  the RBD 'lingering' requests, but it isn't quite ready yet"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
  ceph: fix setting empty extended attribute
  ceph: fix mksnap crash
  ceph: do_sync is never initialized
  libceph: fixup includes in pagelist.h
  ceph: support inline data feature
  ceph: flush inline version
  ceph: convert inline data to normal data before data write
  ceph: sync read inline data
  ceph: fetch inline data when getting Fcr cap refs
  ceph: use getattr request to fetch inline data
  ceph: add inline data to pagecache
  ceph: parse inline data in MClientReply and MClientCaps
  libceph: specify position of extent operation
  libceph: add CREATE osd operation support
  libceph: add SETXATTR/CMPXATTR osd operations support
  rbd: don't treat CEPH_OSD_OP_DELETE as extent op
  ceph: remove unused stringification macros
  libceph: require cephx message signature by default
  ceph: introduce global empty snap context
  ceph: message versioning fixes
  ...
2014-12-17 16:03:12 -08:00
Linus Torvalds 87c31b39ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace related fixes from Eric Biederman:
 "As these are bug fixes almost all of thes changes are marked for
  backporting to stable.

  The first change (implicitly adding MNT_NODEV on remount) addresses a
  regression that was created when security issues with unprivileged
  remount were closed.  I go on to update the remount test to make it
  easy to detect if this issue reoccurs.

  Then there are a handful of mount and umount related fixes.

  Then half of the changes deal with the a recently discovered design
  bug in the permission checks of gid_map.  Unix since the beginning has
  allowed setting group permissions on files to less than the user and
  other permissions (aka ---rwx---rwx).  As the unix permission checks
  stop as soon as a group matches, and setgroups allows setting groups
  that can not later be dropped, results in a situtation where it is
  possible to legitimately use a group to assign fewer privileges to a
  process.  Which means dropping a group can increase a processes
  privileges.

  The fix I have adopted is that gid_map is now no longer writable
  without privilege unless the new file /proc/self/setgroups has been
  set to permanently disable setgroups.

  The bulk of user namespace using applications even the applications
  using applications using user namespaces without privilege remain
  unaffected by this change.  Unfortunately this ix breaks a couple user
  space applications, that were relying on the problematic behavior (one
  of which was tools/selftests/mount/unprivileged-remount-test.c).

  To hopefully prevent needing a regression fix on top of my security
  fix I rounded folks who work with the container implementations mostly
  like to be affected and encouraged them to test the changes.

    > So far nothing broke on my libvirt-lxc test bed. :-)
    > Tested with openSUSE 13.2 and libvirt 1.2.9.
    > Tested-by: Richard Weinberger <richard@nod.at>

    > Tested on Fedora20 with libvirt 1.2.11, works fine.
    > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>

    > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
    > Just to be sure I was testing the right thing I also tested using
    > my unprivileged nsexec testcases, and they failed on setgroup/setgid
    > as now expected, and succeeded there without your patches.
    > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>

    > I tested this with Sandstorm.  It breaks as is and it works if I add
    > the setgroups thing.
    > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :("

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Unbreak the unprivileged remount tests
  userns; Correct the comment in map_write
  userns: Allow setting gid_maps without privilege when setgroups is disabled
  userns: Add a knob to disable setgroups on a per user namespace basis
  userns: Rename id_map_mutex to userns_state_mutex
  userns: Only allow the creator of the userns unprivileged mappings
  userns: Check euid no fsuid when establishing an unprivileged uid mapping
  userns: Don't allow unprivileged creation of gid mappings
  userns: Don't allow setgroups until a gid mapping has been setablished
  userns: Document what the invariant required for safe unprivileged mappings.
  groups: Consolidate the setgroups permission checks
  mnt: Clear mnt_expire during pivot_root
  mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
  mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
  umount: Do not allow unmounting rootfs.
  umount: Disallow unprivileged mount force
  mnt: Update unprivileged remount test
  mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
2014-12-17 12:31:40 -08:00
Linus Torvalds d6666be6f0 MTD updates for 3.19:
* Add device tree support for DoC3
 
  * SPI NOR:
 
     Refactoring, for better layering between spi-nor.c and its driver users
     (e.g., m25p80.c)
 
     New flash device support
 
     Support 6-byte ID strings
 
  * NAND
 
     New NAND driver for Allwinner SoC's (sunxi)
 
     GPMI NAND: add support for raw (no ECC) access, for testing purposes
 
     Add ATO manufacturer ID
 
     A few odd driver fixes
 
  * MTD tests:
 
     Allow testers to compensate for OOB bitflips in oobtest
 
     Fix a torturetest regression
 
  * nandsim: Support longer ID byte strings
 
 And more.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUj6PbAAoJEFySrpd9RFgtahcP/RGvknk9lnitaZI7+aZPP8Zs
 AopfiuisLNv3v87EEBAWGYjRYm6vuuhO1z3K55iOIlemBVzoMIP0jf68XGy9uXnL
 Ox6AHqxm55wwmc+CHry5/GssaqE6GzdPm8TBP+AGGNhHrhc+raJL55R0QJaoYVwX
 pUxkhWaa4lZ6CrOIMQ3n+MEnduilHZoFIcXSc1UtI0y9IXsf1m0Qs8M5jGN8BQ16
 HOVNJN9wOXF89swoi7bKsyAn+QFUYgdksAisncb6b9r5Ks5KHmcOuS1LM5X9YoUr
 KfeogHfDM68fcaHsSvMU1xmxjXGtE+HFJE52eYNPB1fNbT3JAC13FFj92GeSsZtE
 ekpCQh4OPLazT/2wCUHTQwC7T1dCilwyW7VJB9MSl7fSBo9P7jIiUHxVUdM43kez
 Di02/XWi4IULTwrgzZiTT8yplFrVdvkmKHAAFEIoaVWiF/l4DeSodLGUw7owmNYn
 rJPBPQulpPHRwKZY7gThJuOUXpgbT715GSbvmPYXimHBqmViiPkrhqQ/b/v4PRRs
 Nlfhwbswr7WBq6vmPkd6eOyfdFANmWcZQMp/++BCdI/7mhfaik72GWMTBSuJ7hN5
 PB+z95soHaKBWlbiDGGGPvuqJmPkOVq1R8itQdIYBWEh7eNSHecwVxyUJJ+V3oPv
 QkD7mEP2ZozZe3Ys2EJQ
 =gDW8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "Summary:
   - Add device tree support for DoC3

   - SPI NOR:
        Refactoring, for better layering between spi-nor.c and its
        driver users (e.g., m25p80.c)

        New flash device support

        Support 6-byte ID strings

   - NAND:
        New NAND driver for Allwinner SoC's (sunxi)

        GPMI NAND: add support for raw (no ECC) access, for testing
        purposes

        Add ATO manufacturer ID

        A few odd driver fixes

   - MTD tests:
        Allow testers to compensate for OOB bitflips in oobtest

        Fix a torturetest regression

   - nandsim: Support longer ID byte strings

  And more"

* tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd: (63 commits)
  mtd: tests: abort torturetest on erase errors
  mtd: physmap_of: fix potential NULL dereference
  mtd: spi-nor: allow NULL as chip name and try to auto detect it
  mtd: nand: gpmi: add raw oob access functions
  mtd: nand: gpmi: add proper raw access support
  mtd: nand: gpmi: add gpmi_copy_bits function
  mtd: spi-nor: factor out write_enable() for erase commands
  mtd: spi-nor: add support for s25fl128s
  mtd: spi-nor: remove the jedec_id/ext_id
  mtd: spi-nor: add id/id_len for flash_info{}
  mtd: nand: correct the comment of function nand_block_isreserved()
  jffs2: Drop bogus if in comment
  mtd: atmel_nand: replace memcpy32_toio/memcpy32_fromio with memcpy
  mtd: cafe_nand: drop duplicate .write_page implementation
  mtd: m25p80: Add support for serial flash Spansion S25FL132K
  MTD: m25p80: fix inconsistency in m25p_ids compared to spi_nor_ids
  mtd: spi-nor: improve wait-till-ready timeout loop
  mtd: delete unnecessary checks before two function calls
  mtd: nand: omap: Fix NAND enumeration on 3430 LDP
  mtd: nand: add ATO manufacturer info
  ...
2014-12-17 09:59:26 -08:00
Linus Torvalds c103b21c20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
 "The first part makes sure we don't hold up umount with pending async
  requests.  In addition to being a cleanup, this is a small behavioral
  change (for the better) and unlikely to break anything.

  The second part prepares for a cleanup of the fuse device I/O code by
  adding a helper for simple request submission, with some savings in
  line numbers already realized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: use file_inode() in fuse_file_fallocate()
  fuse: introduce fuse_simple_request() helper
  fuse: reduce max out args
  fuse: hold inode instead of path after release
  fuse: flush requests on umount
  fuse: don't wake up reserved req in fuse_conn_kill()
2014-12-17 09:41:32 -08:00
Yan, Zheng 0aeff37aba ceph: fix setting empty extended attribute
make sure 'value' is not null. otherwise __ceph_setxattr will remove
the extended attribute.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:18:49 +03:00
Yan, Zheng 275dd19ea4 ceph: fix mksnap crash
mksnap reply only contain 'target', does not contain 'dentry'. So
it's wrong to use req->r_reply_info.head->is_dentry to detect traceless
reply.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:09:53 +03:00
Dan Carpenter 021b77bee2 ceph: do_sync is never initialized
Probably this code was syncing a lot more often then intended because
the do_sync variable wasn't set to zero.

Cc: stable@vger.kernel.org # v3.11+
Fixes: c62988ec09 ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng 65a22662bf ceph: support inline data feature
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng e20d258d73 ceph: flush inline version
After converting inline data to normal data, client need to flush
the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes
cap messages (sent to MDS) contain inline_version and inline_data.
Client always converts inline data to normal data before data write,
so the inline data length part is always zero.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng 28127bdd2f ceph: convert inline data to normal data before data write
Before any data write, convert inline data to normal data and set
i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
inline data to object contains 3 operations (CMPXATTR, WRITE and
SETXATTR). It compares a xattr named 'inline_version' to prevent
old data overwrites newer data.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng 83701246ae ceph: sync read inline data
we can't use getattr to fetch inline data while holding Fr cap,
because it can cause deadlock. If we need to sync read inline data,
drop cap refs first, then use getattr to fetch inline data.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng 3738daa68a ceph: fetch inline data when getting Fcr cap refs
we can't use getattr to fetch inline data after getting Fcr caps,
because it can cause deadlock. The solution is try bringing inline
data to page cache when not holding any cap, and hope the inline
data page is still there after getting the Fcr caps. If the page
is still there, pin it in page cache for later IO.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng 01deead041 ceph: use getattr request to fetch inline data
Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
in getattr reply will be copied to the page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng 31c542a199 ceph: add inline data to pagecache
Request reply and cap message can contain inline data. add inline data
to the page cache if there is Fc cap.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng fb01d1f8b0 ceph: parse inline data in MClientReply and MClientCaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng 715e4cd405 libceph: specify position of extent operation
allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:52 +03:00
Ilya Dryomov ca3995ad13 ceph: remove unused stringification macros
These were used to report git versions a long time ago.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:51 +03:00
Yan, Zheng 97c85a828f ceph: introduce global empty snap context
Current snaphost code does not properly handle moving inode from one
empty snap realm to another empty snap realm. After changing inode's
snap realm, some dirty pages' snap context can be not equal to inode's
i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs()

The fix is introduce a global empty snap context for all empty snap
realm. This avoids triggering the BUG() for filesystem with no snapshot.

Fixes: http://tracker.ceph.com/issues/9928

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:51 +03:00
John Spray 7cfa0313d0 ceph: message versioning fixes
There were two places we were assigning version in host byte order
instead of network byte order.

Also in MSG_CLIENT_SESSION we weren't setting compat_version in the
header to reflect continued compatability with older MDSs.

Fixes: http://tracker.ceph.com/issues/9945

Signed-off-by: John Spray <john.spray@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:09:51 +03:00
Yan, Zheng 33d0733796 libceph: message signature support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:50 +03:00
SF Markus Elfring e96a650a81 ceph, rbd: delete unnecessary checks before two function calls
The functions ceph_put_snap_context() and iput() test whether their
argument is NULL and then return immediately. Thus the test around the
call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[idryomov@redhat.com: squashed rbd.c hunk, changelog]
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:50 +03:00
Yan, Zheng 70db4f3629 ceph: introduce a new inode flag indicating if cached dentries are ordered
After creating/deleting/renaming file, offsets of sibling dentries may
change. So we can not use cached dentries to satisfy readdir. But we can
still use the cached dentries to conclude -ENOENT for lookup.

This patch introduces a new inode flag indicating if child dentries are
ordered. The flag is set at the same time marking a directory complete.
After creating/deleting/renaming file, we clear the flag on directory
inode. This prevents ceph_readdir() from using cached dentries to satisfy
readdir syscall.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:50 +03:00
Yan, Zheng 9280be24dc ceph: fix file lock interruption
When a lock operation is interrupted, current code sends a unlock request to
MDS to undo the lock operation. This method does not work as expected because
the unlock request can drop locks that have already been acquired.

The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
requests to interrupt blocked file lock request. These requests do not drop
locks that have alread been acquired, they only interrupt blocked file lock
request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:49 +03:00
Dmitry V. Levin 9d4d65748a vfs: make mounts and mountstats honor root dir like mountinfo does
As we already show mountpoints relative to the root directory, thanks
to the change made back in 2000, change show_vfsmnt() and show_vfsstat()
to skip out-of-root mountpoints the same way as show_mountinfo() does.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:27:15 -05:00
Dmitry V. Levin 9ad4dc4f73 vfs: cleanup show_mountinfo
Starting with commit v3.2-rc4-1-g02125a8, seq_path_root() no longer
changes the value of its "struct path *root" argument.
Starting with commit v3.2-rc7-104-g8c9379e, the "struct path *root"
argument of seq_path_root() is const.
As result, the temporary variable "root" in show_mountinfo() that
holds a copy of struct path root is no longer needed.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:27:15 -05:00
Al Viro 7d65cf10e3 unfuck binfmt_misc.c (broken by commit e6084d4)
scanarg(s, del) never returns s; the empty field results in s + 1.
Restore the correct checks, and move NUL-termination into scanarg(),
while we are at it.

Incidentally, mixing "coding style cleanups" (for small values of cleanup)
with functional changes is a Bad Idea(tm)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:27:14 -05:00
Al Viro 50062175ff vm_area_operations: kill ->migrate()
the only instance this method has ever grown was one in kernfs -
one that call ->migrate() of another vm_ops if it exists.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:26:51 -05:00