Commit Graph

41246 Commits

Author SHA1 Message Date
Steve French 9d1b06602e Add Get/Set Integrity Information structure definitions
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-06-28 21:15:41 -05:00
Steve French 02b1666544 Add reflink copy over SMB3.11 with new FSCTL_DUPLICATE_EXTENTS
Getting fantastic copy performance with cp --reflink over SMB3.11
 using the new FSCTL_DUPLICATE_EXTENTS.

 This FSCTL was added in the SMB3.11 dialect (testing was
 against REFS file system) so have put it as a 3.11 protocol
 specific operation ("vers=3.1.1" on the mount).  Tested at
 the SMB3 plugfest in Redmond.

 It depends on the new FS Attribute (BLOCK_REFCOUNTING) which
 is used to advertise support for the ability to do this ioctl
 (if you can support multiple files pointing to the same block
 than this refcounting ability or equivalent is needed to
 support the new reflink-like duplicate extent SMB3 ioctl.

Signed-off-by: Steve French <steve.french@primarydata.com>
2015-06-28 21:15:38 -05:00
Linus Torvalds 21dc2e6c6d Merge branch 'for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - remove hppfs ("HonePot ProcFS")

 - initial support for musl libc

 - uaccess cleanup

 - random cleanups and bug fixes all over the place

* 'for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (21 commits)
  um: Don't pollute kernel namespace with uapi
  um: Include sys/types.h for makedev(), major(), minor()
  um: Do not use stdin and stdout identifiers for struct members
  um: Do not use __ptr_t type for stack_t's .ss pointer
  um: Fix mconsole dependency
  um: Handle tracehook_report_syscall_entry() result
  um: Remove copy&paste code from init.h
  um: Stop abusing __KERNEL__
  um: Catch unprotected user memory access
  um: Fix warning in setup_signal_stack_si()
  um: Rework uaccess code
  um: Add uaccess.h to ldt.c
  um: Add uaccess.h to syscalls_64.c
  um: Add asm/elf.h to vma.c
  um: Cleanup mem_32/64.c headers
  um: Remove hppfs
  um: Move syscall() declaration into os.h
  um: kernel: ksyms: Export symbol syscall() for fixing modpost issue
  um/os-Linux: Use char[] for syscall_stub declarations
  um: Use char[] for linker script address declarations
  ...
2015-06-28 13:55:08 -07:00
Steve French aab1893d5f Add SMB3.11 mount option synonym for new dialect
Most people think of SMB 3.1.1 as SMB version 3.11 so add synonym
for "vers=3.1.1" of "vers=3.11" on mount.

Also make sure that unlike SMB3.0 and 3.02 we don't send
validate negotiate on mount (it is handled by negotiate contexts) -
add list of SMB3.11 specific functions (distinct from 3.0 dialect).

Signed-off-by: Steve French <steve.french@primarydata.com>w
2015-06-27 20:28:11 -07:00
Steve French 80bc83c360 add struct FILE_STANDARD_INFO
Signed-off-by: Gregor Beck <gbeck@sernet.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
2015-06-27 20:25:56 -07:00
Steve French f799d6234b Make dialect negotiation warning message easier to read
Dialect version and minor version are easier to read in hex

Signed-off-by: Steve French <steve.french@primarydata.com>
2015-06-27 20:28:49 -07:00
Steve French eed0e1753c Add defines and structs for smb3.1 dialect
Add new structures and defines for SMB3.11 negotiate, session setup and tcon

See MS-SMB2-diff.pdf section 2.2.3 for additional protocol documentation.

Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-06-27 20:23:59 -07:00
Steve French 5f7fbf733c Allow parsing vers=3.11 on cifs mount
Parses and recognizes "vers=3.1.1" on cifs mount and allows sending
0x0311 as a new CIFS/SMB3 dialect. Subsequent patches will add
the new negotiate contexts and updated session setup

Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
2015-06-27 20:23:32 -07:00
Noel Power f291095f34 client MUST ignore EncryptionKeyLength if CAP_EXTENDED_SECURITY is set
[MS-SMB] 2.2.4.5.2.1 states:

"ChallengeLength (1 byte): When the CAP_EXTENDED_SECURITY bit is set,
 the server MUST set this value to zero and clients MUST ignore this
 value."

Signed-off-by: Noel Power <noel.power@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-06-27 20:26:00 -07:00
Linus Torvalds e22619a29f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "The main change in this kernel is Casey's generalized LSM stacking
  work, which removes the hard-coding of Capabilities and Yama stacking,
  allowing multiple arbitrary "small" LSMs to be stacked with a default
  monolithic module (e.g.  SELinux, Smack, AppArmor).

  See
        https://lwn.net/Articles/636056/

  This will allow smaller, simpler LSMs to be incorporated into the
  mainline kernel and arbitrarily stacked by users.  Also, this is a
  useful cleanup of the LSM code in its own right"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
  tpm, tpm_crb: fix le64_to_cpu conversions in crb_acpi_add()
  vTPM: set virtual device before passing to ibmvtpm_reset_crq
  tpm_ibmvtpm: remove unneccessary message level.
  ima: update builtin policies
  ima: extend "mask" policy matching support
  ima: add support for new "euid" policy condition
  ima: fix ima_show_template_data_ascii()
  Smack: freeing an error pointer in smk_write_revoke_subj()
  selinux: fix setting of security labels on NFS
  selinux: Remove unused permission definitions
  selinux: enable genfscon labeling for sysfs and pstore files
  selinux: enable per-file labeling for debugfs files.
  selinux: update netlink socket classes
  signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
  selinux: Print 'sclass' as string when unrecognized netlink message occurs
  Smack: allow multiple labels in onlycap
  Smack: fix seq operations in smackfs
  ima: pass iint to ima_add_violation()
  ima: wrap event related data to the new ima_event_data structure
  integrity: add validity checks for 'path' parameter
  ...
2015-06-27 13:26:03 -07:00
Geert Uytterhoeven ff5053f666 bdi: Remove "inline" keyword from exported I_BDEV() implementation
With gcc 3.4.6/4.1.2/4.2.4 (not with 4.4.7/4.6.4/4.8.4):

      CC      fs/block_dev.o
    include/linux/fs.h:804: warning: ‘I_BDEV’ declared inline after being called
    include/linux/fs.h:804: warning: previous declaration of ‘I_BDEV’ was here

Commit a212b105b0 ("bdi: make inode_to_bdi() inline") added a
caller of I_BDEV() in a header file, exposing the bogus "inline" on the
exported implementation.

Drop the "inline" keyword to fix this.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-27 11:44:12 -06:00
Linus Torvalds d2c3ac7e7e Merge branch 'for-4.2' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"

* 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
  sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
  nfsd: wrap too long lines in nfsd4_encode_read
  nfsd: fput rd_file from XDR encode context
  nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
  nfsd: refactor nfs4_preprocess_stateid_op
  nfsd: clean up raparams handling
  nfsd: use swap() in sort_pacl_range()
  rpcrdma: Merge svcrdma and xprtrdma modules into one
  svcrdma: Add a separate "max data segs macro for svcrdma
  svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
  svcrdma: Keep rpcrdma_msg fields in network byte-order
  svcrdma: Fix byte-swapping in svc_rdma_sendto.c
  nfsd: Update callback sequnce id only CB_SEQUENCE success
  nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
  svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
  SUNRPC: Move EXPORT_SYMBOL for svc_process
  uapi/nfs: Add NFSv4.1 ACL definitions
  nfsd: Remove dead declarations
  nfsd: work around a gcc-5.1 warning
  nfsd: Checking for acl support does not require fetching any acls
  ...
2015-06-27 10:14:39 -07:00
Linus Torvalds 546fac6073 GFS2: merge window
Here is a list of patches we've accumulated for GFS2 for the current upstream
 merge window. We have a good mixture this time. Here are some of the features:
 
 1. Fix a problem with RO mounts writing to the journal.
 2. Further improvements to quotas on GFS2.
 3. Added support for rename2 and RENAME_EXCHANGE on GFS2.
 4. Increase performance by making glock lru_list less of a bottleneck.
 5. Increase performance by avoiding unnecessary buffer_head releases.
 6. Increase performance by using average glock round trip time from all CPUs.
 7. Fixes for some compiler warnings and minor white space issues.
 8. Other misc. bug fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEbBAABAgAGBQJVjFEXAAoJENeLYdPf93o7zqoH926XV0oddQCsGCYg6gq7OL+c
 /4q2y9x31hkv5XbPTNcahqR6UsK8JcbcdZD+XAqTftL4Q789FDAdWbYS++45qz8D
 YuUFgZd2bc75Ge2qqacgEdv85YRLtws1fRnI6DUpjfN1qJ9kJXX+gk+BSve6rh6V
 Qyv8a4DRIw1En5fwt0R6yIg0LI/ywPhYeVlxo6WUoK8fiL/i3eNd57Jgv5YQ6ly+
 ZyqH8w1m0kE4IkIiTFgmIpvepiWLBCA3mPOfHfE3QxbDKpXe4uFsMdnxghmP5bFN
 3H9syJQNFqjs+ooKma/fE2VmpxQ5/5lAs0/ms+ECW3GvBseTll8Iln7y4NwgAQ==
 =ITwD
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org:/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "Here are the patches we've accumulated for GFS2 for the current
  upstream merge window.  We have a good mixture this time.  Here are
  some of the features:

   - Fix a problem with RO mounts writing to the journal.

   - Further improvements to quotas on GFS2.

   - Added support for rename2 and RENAME_EXCHANGE on GFS2.

   - Increase performance by making glock lru_list less of a bottleneck.

   - Increase performance by avoiding unnecessary buffer_head releases.

   - Increase performance by using average glock round trip time from all CPUs.

   - Fixes for some compiler warnings and minor white space issues.

   - Other misc bug fixes"

* tag 'gfs2-merge-window' of git://git.kernel.org:/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Don't brelse rgrp buffer_heads every allocation
  GFS2: Don't add all glocks to the lru
  gfs2: Don't support fallocate on jdata	files
  gfs2: s64 cast for negative quota value
  gfs2: limit quota log messages
  gfs2: fix quota updates on block boundaries
  gfs2: fix shadow warning in gfs2_rbm_find()
  gfs2: kerneldoc warning fixes
  gfs2: convert simple_str to kstr
  GFS2: make sure S_NOSEC flag isn't overwritten
  GFS2: add support for rename2 and RENAME_EXCHANGE
  gfs2: handle NULL rgd in set_rgrp_preferences
  GFS2: inode.c: indent with TABs, not spaces
  GFS2: mark the journal idle to fix ro mounts
  GFS2: Average in only non-zero round-trip times for congestion stats
  GFS2: Use average srttb value in congestion calculations
2015-06-27 09:47:46 -07:00
Linus Torvalds ebeaa8ddb3 Revert "jbd2: speedup jbd2_journal_dirty_metadata()"
This reverts commit 2143c1965a.

This commit seems to be the cause of the following jbd2 assertion
failure:

   ------------[ cut here ]------------
   kernel BUG at fs/jbd2/transaction.c:1325!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ...
   CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679b411 #1
   Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
   task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000
   RIP: jbd2_journal_dirty_metadata+0x237/0x290
   Call Trace:
     __ext4_handle_dirty_metadata+0x43/0x1f0
     ext4_handle_dirty_dirent_node+0xde/0x160
     ? jbd2_journal_get_write_access+0x36/0x50
     ext4_delete_entry+0x112/0x160
     ? __ext4_journal_start_sb+0x52/0xb0
     ext4_unlink+0xfa/0x260
     vfs_unlink+0xec/0x190
     do_unlinkat+0x24a/0x270
     SyS_unlink+0x11/0x20
     entry_SYSCALL_64_fastpath+0x12/0x6a
   ---[ end trace ae033ebde8d080b4 ]---

which is not easily reproducible (I've seen it just once, and then Ted
was able to reproduce it once).  Revert it while Ted and Jan try to
figure out what is wrong.

Cc: Jan Kara <jack@suse.cz>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-27 09:41:50 -07:00
Trond Myklebust 6c5a0d8915 NFSv4.2: LAYOUTSTATS is optional to implement
Make it so, by checking the return value for NFS4ERR_MOTSUPP and
caching the information as a server capability.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-27 11:48:58 -04:00
Trond Myklebust da2e812751 NFSv4.2: Fix up a decoding error in layoutstats
According to the spec, the server is only returning the status,
which we decode in the op header.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-27 11:30:57 -04:00
Linus Torvalds bbe179f88d Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - threadgroup_lock got reorganized so that its users can pick the
   actual locking mechanism to use.  Its only user - cgroups - is
   updated to use a percpu_rwsem instead of per-process rwsem.

   This makes things a bit lighter on hot paths and allows cgroups to
   perform and fail multi-task (a process) migrations atomically.
   Multi-task migrations are used in several places including the
   unified hierarchy.

 - Delegation rule and documentation added to unified hierarchy.  This
   will likely be the last interface update from the cgroup core side
   for unified hierarchy before lifting the devel mask.

 - Some groundwork for the pids controller which is scheduled to be
   merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add delegation section to unified hierarchy documentation
  cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
  kernfs: make kernfs_get_inode() public
  MAINTAINERS: add a cgroup core co-maintainer
  cgroup: fix uninitialised iterator in for_each_subsys_which
  cgroup: replace explicit ss_mask checking with for_each_subsys_which
  cgroup: use bitmask to filter for_each_subsys
  cgroup: add seq_file forward declaration for struct cftype
  cgroup: simplify threadgroup locking
  sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
  sched, cgroup: reorganize threadgroup locking
  cgroup: switch to unsigned long for bitmasks
  cgroup: reorganize include/linux/cgroup.h
  cgroup: separate out include/linux/cgroup-defs.h
  cgroup: fix some comment typos
2015-06-26 19:50:04 -07:00
Linus Torvalds 8d7804a2f0 Driver core patches for 4.2-rc1
Here is the driver core / firmware changes for 4.2-rc1.
 
 A number of small changes all over the place in the driver core, and in
 the firmware subsystem.  Nothing really major, full details in the
 shortlog.  Some of it is a bit of churn, given that the platform driver
 probing changes was found to not work well, so they were reverted.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
 jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
 =Al2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core / firmware changes for 4.2-rc1.

  A number of small changes all over the place in the driver core, and
  in the firmware subsystem.  Nothing really major, full details in the
  shortlog.  Some of it is a bit of churn, given that the platform
  driver probing changes was found to not work well, so they were
  reverted.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
  Revert "base/platform: Only insert MEM and IO resources"
  Revert "base/platform: Continue on insert_resource() error"
  Revert "of/platform: Use platform_device interface"
  Revert "base/platform: Remove code duplication"
  firmware: add missing kfree for work on async call
  fs: sysfs: don't pass count == 0 to bin file readers
  base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
  base/platform: Remove code duplication
  of/platform: Use platform_device interface
  base/platform: Continue on insert_resource() error
  base/platform: Only insert MEM and IO resources
  firmware: use const for remaining firmware names
  firmware: fix possible use after free on name on asynchronous request
  firmware: check for file truncation on direct firmware loading
  firmware: fix __getname() missing failure check
  drivers: of/base: move of_init to driver_init
  drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
  sysfs: disambiguate between "error code" and "failure" in comments
  driver-core: fix build for !CONFIG_MODULES
  driver-core: make __device_attach() static
  ...
2015-06-26 15:07:37 -07:00
Trond Myklebust d620876990 pNFS/flexfiles: Fix the reset of struct pgio_header when resending
hdr->good_bytes needs to be set to the length of the request, not
zero.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-26 15:39:50 -04:00
Trond Myklebust c0f5f5059f pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
This patch ensures that we record the value of 'ffl_flags' from
the layout, and then checks for the presence of the
FF_FLAGS_NO_LAYOUTCOMMIT flag before deciding whether or not to
call pnfs_set_layoutcommit().

The effect is that servers now can decide whether or not they want
the client to call layoutcommit before returning a writeable layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-26 14:51:32 -04:00
Trond Myklebust 122d328d30 Merge branch 'layoutstats'
* layoutstats:
  pnfs/flexfiles: protect ktime manipulation with mirror lock
  nfs: provide pnfs_report_layoutstat when NFS42 is disabled
  pnfs/flexfiles: report layoutstat regularly
  nfs42: serialize LAYOUTSTATS calls of the same file
  pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data
  pnfs/flexfiles: add ff_layout_prepare_layoutstats
  pNFS/flexfiles: track when layout is first used
  pNFS/flexfiles: add layoutstats tracking
  pNFS/flexfiles: Remove unused struct members user_name, group_name
  pnfs: add pnfs_report_layoutstat helper function
  pNFS: fill in nfs42_layoutstat_ops
  NFSv.2/pnfs Add a LAYOUTSTATS rpc function
2015-06-26 14:01:59 -04:00
Peng Tao 9bbd9bb40c pnfs/flexfiles: protect ktime manipulation with mirror lock
It looks as if xchg() and cmpxchg() are not available for 64-bit integers on sparc32:

> New breakage seen in linux-next today:
>
> ERROR: "__xchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined!
> ERROR: "__cmpxchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined!
> make[2]: *** [__modpost] Error 1
> make[1]: *** [modules] Error 2

Given that mirror ktime manipulation is already under mirror->lock, let's make use of the fact.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-26 14:01:37 -04:00
Peng Tao 865a7ecb21 nfs: provide pnfs_report_layoutstat when NFS42 is disabled
kbuild test robot reported:
   fs/built-in.o: In function `pnfs_report_layoutstat':
>> (.text+0x151a1c): undefined reference to `nfs42_proc_layoutstats_generic'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-26 14:01:37 -04:00
Linus Torvalds 47a469421d Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:

 - most of the rest of MM

 - lots of misc things

 - procfs updates

 - printk feature work

 - updates to get_maintainer, MAINTAINERS, checkpatch

 - lib/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
  exit,stats: /* obey this comment */
  coredump: add __printf attribute to cn_*printf functions
  coredump: use from_kuid/kgid when formatting corename
  fs/reiserfs: remove unneeded cast
  NILFS2: support NFSv2 export
  fs/befs/btree.c: remove unneeded initializations
  fs/minix: remove unneeded cast
  init/do_mounts.c: add create_dev() failure log
  kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
  fs/efs: femove unneeded cast
  checkpatch: emit "NOTE: <types>" message only once after multiple files
  checkpatch: emit an error when there's a diff in a changelog
  checkpatch: validate MODULE_LICENSE content
  checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
  checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
  checkpatch: fix processing of MEMSET issues
  checkpatch: suggest using ether_addr_equal*()
  checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
  checkpatch: remove local from codespell path
  checkpatch: add --showfile to allow input via pipe to show filenames
  ...
2015-06-26 09:52:05 -07:00
Vishal Verma f68eb1e71a fs/block_dev.c: skip rw_page if bdev has integrity
If a block device has bio integrity enabled, rw_page will bypass the
integrity payload, which is undesirable. Skip rw_page if this is the
case.

Currently brd and zram provide rw_page, and the proposed 'nd' drivers
will too.

Cc: Jens Axboe <axboe@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Suggested-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-26 11:23:38 -04:00
Nicolas Iooss b4176b7c13 coredump: add __printf attribute to cn_*printf functions
This allows detecting improper format string at build time, like:

  fs/coredump.c:225:5: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
       err = cn_printf(cn, "%ld", cprm->siginfo->si_signo);
       ^

As si_signo is always an int, the format should be %d here.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Nicolas Iooss 5202efe544 coredump: use from_kuid/kgid when formatting corename
When adding __printf attribute to cn_printf, gcc reports some issues:

  fs/coredump.c:213:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kuid_t' [-Wformat=]
       err = cn_printf(cn, "%d", cred->uid);
       ^
  fs/coredump.c:217:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kgid_t' [-Wformat=]
       err = cn_printf(cn, "%d", cred->gid);
       ^

These warnings come from the fact that the value of uid/gid needs to be
extracted from the kuid_t/kgid_t structure before being used as an
integer.  More precisely, cred->uid and cred->gid need to be converted to
either user-namespace uid/gid or to init_user_ns uid/gid.

Use init_user_ns in order not to break existing ABI, and document this in
Documentation/sysctl/kernel.txt.

While at it, format uid and gid values with %u instead of %d because
uid_t/__kernel_uid32_t and gid_t/__kernel_gid32_t are unsigned int.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Firo Yang 89bfae43c8 fs/reiserfs: remove unneeded cast
kmem_cache_alloc() returns void*.

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
NeilBrown f73c2f1f83 NILFS2: support NFSv2 export
The "fh_len" passed to ->fh_to_* is not guaranteed to be that same as that
returned by encode_fh - it may be larger.

With NFSv2, the filehandle is fixed length, so it may appear longer than
expected and be zero-padded.

So we must test that fh_len is at least some value, not exactly equal to
it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Fabian Frederick 9f0f564abb fs/befs/btree.c: remove unneeded initializations
bh, od_sup and this_node are unconditionally initialized in
befs_bt_read_super() and befs_btree_find()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Firo Yang 83b08bf7de fs/minix: remove unneeded cast
kmem_cache_alloc() returns void*.

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:42 -07:00
Firo Yang 35a4c902f6 fs/efs: femove unneeded cast
kmem_cache_alloc() returns void*.

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:42 -07:00
Rasmus Villemoes ec3904dc65 fs/ext4/super.c: use strreplace() in ext4_fill_super()
This makes a very large function a little smaller.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:41 -07:00
Rasmus Villemoes 81ae394bdc fs/jbd2/journal.c: use strreplace()
In one case, we eliminate a local variable; in the other a strlen()
call and some .text.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:40 -07:00
Iago López Galeiras 2e13ba54a2 fs, proc: introduce CONFIG_PROC_CHILDREN
Commit 818411616b ("fs, proc: introduce /proc/<pid>/task/<tid>/children
entry") introduced the children entry for checkpoint restore and the
file is only available on kernels configured with CONFIG_EXPERT and
CONFIG_CHECKPOINT_RESTORE.

This is available in most distributions (Fedora, Debian, Ubuntu, CoreOS)
because they usually enable CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE.
But Arch does not enable CONFIG_EXPERT or CONFIG_CHECKPOINT_RESTORE.

However, the children proc file is useful outside of checkpoint restore.
I would like to use it in rkt.  The rkt process exec() another program
it does not control, and that other program will fork()+exec() a child
process.  I would like to find the pid of the child process from an
external tool without iterating in /proc over all processes to find
which one has a parent pid equal to rkt.

This commit introduces CONFIG_PROC_CHILDREN and makes
CONFIG_CHECKPOINT_RESTORE select it.  This allows enabling
/proc/<pid>/task/<tid>/children without needing to enable
CONFIG_CHECKPOINT_RESTORE and CONFIG_EXPERT.

Alban tested that /proc/<pid>/task/<tid>/children is present when the
kernel is configured with CONFIG_PROC_CHILDREN=y but without
CONFIG_CHECKPOINT_RESTORE

Signed-off-by: Iago López Galeiras <iago@endocode.com>
Tested-by: Alban Crequy <alban@endocode.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Djalal Harouni <djalal@endocode.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:37 -07:00
Alexey Dobriyan c2c0bb4462 proc: fix PAGE_SIZE limit of /proc/$PID/cmdline
/proc/$PID/cmdline truncates output at PAGE_SIZE. It is easy to see with

	$ cat /proc/self/cmdline $(seq 1037) 2>/dev/null

However, command line size was never limited to PAGE_SIZE but to 128 KB
and relatively recently limitation was removed altogether.

People noticed and ask questions:
http://stackoverflow.com/questions/199130/how-do-i-increase-the-proc-pid-cmdline-4096-byte-limit

seq file interface is not OK, because it kmalloc's for whole output and
open + read(, 1) + sleep will pin arbitrary amounts of kernel memory.  To
not do that, limit must be imposed which is incompatible with arbitrary
sized command lines.

I apologize for hairy code, but this it direct consequence of command line
layout in memory and hacks to support things like "init [3]".

The loops are "unrolled" otherwise it is either macros which hide control
flow or functions with 7-8 arguments with equal line count.

There should be real setproctitle(2) or something.

[akpm@linux-foundation.org: fix a billion min() warnings]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:37 -07:00
Benjamin Coddington 18a6008972 nfs: verify open flags before allowing open
Commit 9597c13b forbade opens with O_APPEND|O_DIRECT for NFSv4:

    nfs: verify open flags before allowing an atomic open

    Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot
    fcntl(F_SETFL,...) with those flags. This flag combination is explicitly
    forbidden on NFSv3 opens, and it seems like it should also be on NFSv4.

However, you can still open a file with O_DIRECT|O_APPEND if there exists a
cached dentry for the file because nfs4_file_open() is used instead of
nfs_atomic_open() and the check is bypassed.  Add the check in
nfs4_file_open() as well.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-25 19:38:00 -04:00
Jeff Layton 0c8315dd56 nfs: always update creds in mirror, even when we have an already connected ds
A ds can be associated with more than one mirror, but we currently skip
setting a mirror's credentials if we find that it's already set up with
a connected client.

The upshot is that we can end up sending DS writes with MDS credentials
instead of properly setting them up. Fix nfs4_ff_layout_prepare_ds to
always verify that the mirror's credentials are set up, even when we
have a DS that's already connected.

Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-25 19:35:21 -04:00
Jeff Layton a24221dca1 nfs: fix potential credential leak in ff_layout_update_mirror_cred
If we have two tasks racing to update a mirror's credentials, then they
can end up leaking one (or more) sets of credentials. The first task
will set mirror->cred and then the second task will just overwrite it.

Use a cmpxchg to ensure that the creds are only set once. If we get to
the point where we would set mirror->cred and find that they're already
set, then we just release the creds that were just found.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-25 19:34:40 -04:00
Linus Torvalds e4bc13adfd Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block
Pull cgroup writeback support from Jens Axboe:
 "This is the big pull request for adding cgroup writeback support.

  This code has been in development for a long time, and it has been
  simmering in for-next for a good chunk of this cycle too.  This is one
  of those problems that has been talked about for at least half a
  decade, finally there's a solution and code to go with it.

  Also see last weeks writeup on LWN:

        http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
  writeback, blkio: add documentation for cgroup writeback support
  vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
  writeback: do foreign inode detection iff cgroup writeback is enabled
  v9fs: fix error handling in v9fs_session_init()
  bdi: fix wrong error return value in cgwb_create()
  buffer: remove unusued 'ret' variable
  writeback: disassociate inodes from dying bdi_writebacks
  writeback: implement foreign cgroup inode bdi_writeback switching
  writeback: add lockdep annotation to inode_to_wb()
  writeback: use unlocked_inode_to_wb transaction in inode_congested()
  writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
  writeback: implement [locked_]inode_to_wb_and_lock_list()
  writeback: implement foreign cgroup inode detection
  writeback: make writeback_control track the inode being written back
  writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
  mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
  writeback: implement memcg writeback domain based throttling
  writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
  writeback: implement memcg wb_domain
  writeback: update wb_over_bg_thresh() to use wb_domain aware operations
  ...
2015-06-25 16:00:17 -07:00
Linus Torvalds bfffa1cc9d Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-block
Pull core block IO update from Jens Axboe:
 "Nothing really major in here, mostly a collection of smaller
  optimizations and cleanups, mixed with various fixes.  In more detail,
  this contains:

   - Addition of policy specific data to blkcg for block cgroups.  From
     Arianna Avanzini.

   - Various cleanups around command types from Christoph.

   - Cleanup of the suspend block I/O path from Christoph.

   - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

   - Eliminating atomic inc/dec of both remaining IO count and reference
     count in a bio.  From me.

   - Fixes for SG gap and chunk size support for data-less (discards)
     IO, so we can merge these better.  From me.

   - Small restructuring of blk-mq shared tag support, freeing drivers
     from iterating hardware queues.  From Keith Busch.

   - A few cfq-iosched tweaks, from Tahsin Erdogan and me.  Makes the
     IOPS mode the default for non-rotational storage"

* 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
  cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
  cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
  cfq-iosched: move group scheduling functions under ifdef
  cfq-iosched: fix the setting of IOPS mode on SSDs
  blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
  block, cgroup: implement policy-specific per-blkcg data
  block: Make CFQ default to IOPS mode on SSDs
  block: add blk_set_queue_dying() to blkdev.h
  blk-mq: Shared tag enhancements
  block: don't honor chunk sizes for data-less IO
  block: only honor SG gap prevention for merges that contain data
  block: fix returnvar.cocci warnings
  block, dm: don't copy bios for request clones
  block: remove management of bi_remaining when restoring original bi_end_io
  block: replace trylock with mutex_lock in blkdev_reread_part()
  block: export blkdev_reread_part() and __blkdev_reread_part()
  suspend: simplify block I/O handling
  block: collapse bio bit space
  block: remove unused BIO_RW_BLOCK and BIO_EOF flags
  block: remove BIO_EOPNOTSUPP
  ...
2015-06-25 14:29:53 -07:00
Linus Torvalds cc8a0a9439 This pull request includes the following UBI/UBIFS changes:
* Minor fixes for UBI and UBIFS
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJVi9DTAAoJEPmeIjfg4UaaOlAP/AtZqcxpCHlKpbZUz7z1MmCb
 MP9AbowP0CSXrMYSA7GrBROw+bfTM7aZrhWqhhoJxAyBbb9LGldjW9xZmUqUvS5e
 LZjAO8sWqxEbfVBqnpTgA2AiSN37I3+epYqgDFNlNCtxPYxkEXpXVoLUSdrAUKw8
 24EjPe0tq1JKLmjh3XJQZJnmCW+7xPKRUj7KiwPZzombIoNXFA9z4PCOLLlnDx1X
 TRasArrncarslQaHQerRoZUKC/hoIGZMipv94pNovgO8HAcZ/N19ef1Fc4iG5zvS
 XUMerUyKMTVfR6LFjueMjRpEMOmDb4AifWeDKY00xqY4RCnkIcAAqVRBGWQep6Y4
 PxCwttNktykWj/LygPCsQk6gd0IxdVGsFra4DpxhCCWXaUw2RJCRg0a4mQFOqojO
 tmCVQVYgd4Ob2DDnxmS7Z8IvjIWzsSkgatF+Dde2feeHUwEuM9NoudHg704+A30x
 3fNwIFhuREhOymxGpBaprfaKh7khF88CcrGjiEOvw/DK0QfwlVoUVLqjkNVaLZXC
 nrwNATdheq2gMV2Mdydy/uo5NpUxeblSrlnuknKQSTmH5m2/U9F9lJ/v3Sv7ShjB
 gWqyNIoxJBKTSbsGkFP5bzWmxwsQ4HlPTcAmJhoUDK7SLeXThhgRwB4o2zlriYfS
 D7ZVxRMY5JCNxomIzD0X
 =xU0H
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.2-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "Minor fixes for UBI and UBIFS"

* tag 'upstream-4.2-rc1' of git://git.infradead.org/linux-ubifs:
  UBI: Remove unnecessary `\'
  UBI: Use static class and attribute groups
  UBI: add a helper function for updatting on-flash layout volumes
  UBI: Fastmap: Do not add vol if it already exists
  UBI: Init vol->reserved_pebs by assignment
  UBI: Fastmap: Rename variables to make them meaningful
  UBI: Fastmap: Remove unnecessary `\'
  UBI: Fastmap: Use max() to get the larger value
  ubifs: fix to check error code of register_shrinker
  UBI: block: Dynamically allocate minor numbers
2015-06-25 14:11:34 -07:00
Linus Torvalds d857da7b70 A very large number of cleanups and bug fixes --- in particular for
the ext4 encryption patches, which is a new feature added in the last
 merge window.  Also fix a number of long-standing xfstest failures.
 (Quota writes failing due to ENOSPC, a race between truncate and
 writepage in data=journalled mode that was causing generic/068 to
 fail, and other corner cases.)
 
 Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2
 performance eliminating locking when a buffer is modified more than
 once during a transaction (which is very common for allocation
 bitmaps, for example), in which case the state of the journalled
 buffer head doesn't need to change.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVi3PeAAoJEPL5WVaVDYGj+I0H/jRPexvyvnGfxiqs1sxIlbSk
 cwewFJSsuKsy/pGYdmHvozWZyWGGORc89NrxoNwdbG+axvHbgUWt/3+vF+rzmaek
 vX4v9QvCEo4PfpRgzbnYJFhbxGMJtwci887sq1o/UoNXikFYT2kz8rpdf0++eO5W
 /GJNRA5ZUY0L0eeloUILAMrBr7KjtkI2oXwOZt5q68jh7B3n3XdNQXyEiQS/28aK
 QYcFrqA/e2Fiuk6l5OSGBCP38mySu+x0nBTLT5LFwwrUBnoZvGtdjM6Sj/yADDDn
 uP/Zpq56aLzkFRwwItrDaF26BIf2MhIH/WUYs65CraEGxjMaiPuzAudGA/iUVL8=
 =1BdR
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "A very large number of cleanups and bug fixes --- in particular for
  the ext4 encryption patches, which is a new feature added in the last
  merge window.  Also fix a number of long-standing xfstest failures.
  (Quota writes failing due to ENOSPC, a race between truncate and
  writepage in data=journalled mode that was causing generic/068 to
  fail, and other corner cases.)

  Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2
  performance eliminating locking when a buffer is modified more than
  once during a transaction (which is very common for allocation
  bitmaps, for example), in which case the state of the journalled
  buffer head doesn't need to change"

[ I renamed "ext4_follow_link()" to "ext4_encrypted_follow_link()" in
  the merge resolution, to make it clear that that function is _only_
  used for encrypted symlinks.  The function doesn't actually work for
  non-encrypted symlinks at all, and they use the generic helpers
                                         - Linus ]

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits)
  ext4: set lazytime on remount if MS_LAZYTIME is set by mount
  ext4: only call ext4_truncate when size <= isize
  ext4: make online defrag error reporting consistent
  ext4: minor cleanup of ext4_da_reserve_space()
  ext4: don't retry file block mapping on bigalloc fs with non-extent file
  ext4: prevent ext4_quota_write() from failing due to ENOSPC
  ext4: call sync_blockdev() before invalidate_bdev() in put_super()
  jbd2: speedup jbd2_journal_dirty_metadata()
  jbd2: get rid of open coded allocation retry loop
  ext4: improve warning directory handling messages
  jbd2: fix ocfs2 corrupt when updating journal superblock fails
  ext4: mballoc: avoid 20-argument function call
  ext4: wait for existing dio workers in ext4_alloc_file_blocks()
  ext4: recalculate journal credits as inode depth changes
  jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail()
  ext4: use swap() in mext_page_double_lock()
  ext4: use swap() in memswap()
  ext4: fix race between truncate and __ext4_journalled_writepage()
  ext4 crypto: fail the mount if blocksize != pagesize
  ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate
  ...
2015-06-25 14:06:55 -07:00
Yan, Zheng e1966b4944 ceph: fix ceph_writepages_start()
Before a page get locked, someone else can write data to the page
and increase the i_size. So we should re-check the i_size after
pages are locked.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 18:30:53 +03:00
Yan, Zheng fdd4e15838 ceph: rework dcache readdir
Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:32 +03:00
Yan, Zheng 687265e5a8 ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng f66fd9f095 ceph: pre-allocate data structure that tracks caps flushing
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng e548e9b93d ceph: re-send flushing caps (which are revoked) in reconnect stage
if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng a2971c8ccb ceph: send TID of the oldest pending caps flush to MDS
According to this information, MDS can trim its completed caps flush
list (which is used to detect duplicated cap flush).

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng 8310b08913 ceph: track pending caps flushing globally
So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng 553adfd941 ceph: track pending caps flushing accurately
Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng da819c8150 ceph: fix directory fsync
fsync() on directory should flush dirty caps and wait for any
uncommitted directory opertions to commit. But ceph_dir_fsync()
only waits for uncommitted directory opertions.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng 89b52fe14d ceph: fix flushing caps
Current ceph_fsync() only flushes dirty caps and wait for them to be
flushed. It doesn't wait for caps that has already been flushing.
This patch makes ceph_fsync() wait for pending flushing caps too.
Besides, this patch also makes caps_are_flushed() peroperly handle
tid wrapping.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng 41445999ae ceph: don't include used caps in cap_wanted
when copying files to cephfs, file data may stay in page cache after
corresponding file is closed. Cached data use Fc capability. If we
include Fc capability in cap_wanted, MDS will treat files with cached
data as open files, and journal them in an EOpen event when trimming
log segment.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng 3e0708b990 ceph: ratelimit warn messages for MDS closes session
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Ilya Dryomov 5be7303477 ceph: simplify two mount_timeout sites
No need to bifurcate wait now that we've got ceph_timeout_jiffies().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Ilya Dryomov a319bf56a6 libceph: store timeouts in jiffies, verify user input
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:29 +03:00
Yan, Zheng e8a7b8b12b ceph: exclude setfilelock requests when calculating oldest tid
setfilelock requests can block for a long time, which can prevent
client from advancing its oldest tid.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng 745a8e3bcc ceph: don't pre-allocate space for cap release messages
Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng affbc19a68 ceph: make sure syncfs flushes all cap snaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng 622f3e250f ceph: don't trim auth cap when there are cap snaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 604d1b0245 ceph: take snap_rwsem when accessing snap realm's cached_context
When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps()
accesses snap realm's cached_context. So we need take read lock
of snap_rwsem.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 8605609049 ceph: avoid sending unnessesary FLUSHSNAP message
when a snap notification contains no new snapshot, we can avoid
sending FLUSHSNAP message to MDS. But we still need to create
cap_snap in some case because it's required by write path and
page writeback path

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 5dda377cf0 ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference
In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.

Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 7b06a826e7 ceph: use empty snap context for uninline_data and get_pool_perm
Cached_context in ceph_snap_realm is directly accessed by
uninline_data() and get_pool_perm(). This is racy in theory.
both uninline_data() and get_pool_perm() do not modify existing
object, they only create new object. So we can pass the empty
snap context to them.  Unlike cached_context in ceph_snap_realm,
we do not need to protect the empty snap context.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 10183a6955 ceph: check OSD caps before read/write
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 144cba1493 libceph: allow setting osd_req_op's flags
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:27 +03:00
Linus Torvalds aefbef10e3 Merge branch 'akpm' (patches from Andrew)
Merge first patchbomb from Andrew Morton:

 - a few misc things

 - ocfs2 udpates

 - kernel/watchdog.c feature work (took ages to get right)

 - most of MM.  A few tricky bits are held up and probably won't make 4.2.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits)
  mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
  mm, thp: respect MPOL_PREFERRED policy with non-local node
  tmpfs: truncate prealloc blocks past i_size
  mm/memory hotplug: print the last vmemmap region at the end of hot add memory
  mm/mmap.c: optimization of do_mmap_pgoff function
  mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
  mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
  mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
  mm: kmemleak: fix delete_object_*() race when called on the same memory block
  mm: kmemleak: allow safe memory scanning during kmemleak disabling
  memcg: convert mem_cgroup->under_oom from atomic_t to int
  memcg: remove unused mem_cgroup->oom_wakeups
  frontswap: allow multiple backends
  x86, mirror: x86 enabling - find mirrored memory ranges
  mm/memblock: allocate boot time data structures from mirrored memory
  mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
  mm: do not ignore mapping_gfp_mask in page cache allocation paths
  mm/cma.c: fix typos in comments
  mm/oom_kill.c: print points as unsigned int
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  ...
2015-06-24 20:47:21 -07:00
Linus Torvalds 266da6f142 Miscellaneous pstore improvements
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViyiCAAoJEKurIx+X31iB3acP/1kPYKClGZLoqH6wqHNM/djq
 ROsPm9PDt9g7WZ/yw5HpKuUzC+XhOg4odYpZIy+6PogW6BUCumxPCLVq/qbVSPhT
 Q7Pv0mlmjyJS+kj6FncWuJrJG0xQfKYE6OeYrnkyHfrHYmJunf1bS6K71CDGHJAa
 O2bQr4E67twuU8yQR8BZ+YlZu4NTzPYZ4JWmb9Wepm3seIM2GEJsNRZ7WJXH7BOv
 BHOPi8FDr8fMkA+2WitE853gYvcTYcuxlsDgRumtGzWDhRIUH8Q5yS9QLAFd0Rly
 BW7YOHYCY1L75RJxnVTWd04GNrepxe4LY1bbtx+mqI6FrdMw0dK0M5BKuohV+BT0
 tBC/anSHBOqua/aDA6m+8c+p8I7qp1wXNHtmm15lqKQg1YHvh0Rs7FlP3HBdLDxQ
 rmUQHTcVQGaf00GCTUgsEn80kW8FYYtOnh4FJcbSkLdU8/mkr1+1rE/3i8ob/AL9
 EsJgzqptT9/VBX3j6tZgk8tt6xstLMGVw/DmScjxeLqA2WoaINh5XRPgGCGWQW6T
 wa9nFGBuJFtJv9NfVlMEe8xekxyDa6xPIgoCheWBcV9NFRmfBrUmbG4yF127nqTG
 TXYBzFPSxdBn0FqGIaz+6RPbcGd7tZuz317sIYHTgHBWQHeoWG9TJGHeGt8L5uql
 eKMoHAvVRZIyBuZruUeB
 =248K
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore updates from Tony Luck:
 "Miscellaneous pstore improvements"

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  ramoops: make it possible to change mem_type param.
  pstore/ram: verify ramoops header before saving record
  fs/pstore: Optimization function ramoops_init_przs
  fs/pstore: update the backend parameter in pstore module
  pstore: do not use message compression without lock
2015-06-24 20:42:21 -07:00
Linus Torvalds cfcc0ad47f Merge tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - per-file encryption (e.g., ext4)
   - FALLOC_FL_ZERO_RANGE
   - FALLOC_FL_COLLAPSE_RANGE
   - RENAME_WHITEOUT

  Major enhancement/fixes:
   - recovery broken superblocks
   - enhance f2fs_trim_fs with a discard_map
   - fix a race condition on dentry block allocation
   - fix a deadlock during summary operation
   - fix a missing fiemap result

  .. and many minor bug fixes and clean-ups were done"

* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
  f2fs: do not trim preallocated blocks when truncating after i_size
  f2fs crypto: add alloc_bounce_page
  f2fs crypto: fix to handle errors likewise ext4
  f2fs: drop the volatile_write flag only
  f2fs: skip committing valid superblock
  f2fs: setting discard option in parse_options()
  f2fs: fix to return exact trimmed size
  f2fs: support FALLOC_FL_INSERT_RANGE
  f2fs: hide common code in f2fs_replace_block
  f2fs: disable the discard option when device doesn't support
  f2fs crypto: remove alloc_page for bounce_page
  f2fs: fix a deadlock for summary page lock vs. sentry_lock
  f2fs crypto: clean up error handling in f2fs_fname_setup_filename
  f2fs crypto: avoid f2fs_inherit_context for symlink
  f2fs crypto: do not set encryption policy for non-directory by ioctl
  f2fs crypto: allow setting encryption policy once
  f2fs crypto: check context consistent for rename2
  f2fs: avoid duplicated code by reusing f2fs_read_end_io
  f2fs crypto: use per-inode tfm structure
  f2fs: recovering broken superblock during mount
  ...
2015-06-24 20:38:29 -07:00
Linus Torvalds a7296b49fb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and cleanups from Jan Kara:
 "The contains some small fixes and improvements in error handling for
  UDF.

  Bundled is also one ext3 coding style fix and a fix in quota
  documentation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: fix udf_load_pvoldesc()
  udf: remove double err declaration in udf_file_write_iter()
  UDF: support NFSv2 export
  fs: ext3: super: fixed a space coding style issue
  quota: Update documentation
  udf: Return error from udf_find_entry()
  udf: Make udf_get_filename() return error instead of 0 length file name
  udf: bug on exotic flag in udf_get_filename()
  udf: improve error management in udf_CS0toNLS()
  udf: improve error management in udf_CS0toUTF8()
  udf: unicode: update function name in comments
  udf: remove unnecessary test in udf_build_ustr_exact()
  udf: Return -ENOMEM when allocation fails in udf_get_filename()
2015-06-24 20:07:10 -07:00
Michal Hocko 6afdb859b7 mm: do not ignore mapping_gfp_mask in page cache allocation paths
page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.

This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path.  Anyway it
is better to obey mapping gfp mask and prevent from later breakage.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00
Zhang Zhen a67a31fa30 mm/hugetlb: reduce arch dependent code about hugetlb_prefault_arch_hook
Currently we have many duplicates in definitions of
hugetlb_prefault_arch_hook.  In all architectures this function is empty.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:41 -07:00
Chris Metcalf f51c0eaee3 procfs: treat parked tasks as sleeping for task state
Allowing watchdog threads to be parked means that we now have the
opportunity of actually seeing persistent parked threads in the output
of /proc/<pid>/stat and /proc/<pid>/status.  The existing code reported
such threads as "Running", which is kind-of true if you think of the
case where we park them as part of taking cpus offline.  But if we allow
parking them indefinitely, "Running" is pretty misleading, so we report
them as "Sleeping" instead.

We could simply report them with a new string, "Parked", but it feels
like it's a bit risky for userspace to see unexpected new values; the
output is already documented in Documentation/filesystems/proc.txt, and
it seems like a mistake to change that lightly.

The scheduler does report parked tasks with a "P" in debugging output
from sched_show_task() or dump_cpu_task(), but that's a different API.
Similarly, the trace_ctxwake_* routines report a "P" for parked tasks,
but again, different API.

This change seemed slightly cleaner than updating the task_state_array
to have additional rows.  TASK_DEAD should be subsumed by the exit_state
bits; TASK_WAKEKILL is just a modifier; and TASK_WAKING can very
reasonably be reported as "Running" (as it is now).  Only TASK_PARKED
shows up with unreasonable output here.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Joseph Qi b519ea6d9a ocfs2: mark local functions as static
Some functions are only used locally, so mark them as static.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Fabian Frederick ab1ba02181 ocfs2: use swap() in ocfs2_double_lock()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Fabian Frederick a612543fd1 ocfs2: use swap() in swap_refcount_rec()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Fabian Frederick 2a28f98c49 ocfs2: use swap() in dx_leaf_sort_swap()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Joseph Qi ae1f081467 ocfs2: fix wrong check in ocfs2_direct_IO_get_blocks
contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared
with clusters_to_alloc. So convert it to clusters first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Xue jiufei 74e364ad1b ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()
ocfs2_abort_trigger() use bh->b_assoc_map to get sb.  But there's no
function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
dereference while calling this function.  We can get sb from
bh->b_bdev->bd_super instead of b_assoc_map.

[akpm@linux-foundation.org: update comment, per Joseph]
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
alex chen fce56d841e ocfs2: o2net: should remove debugfs in o2net_init() out branch
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
WeiWei Wang fa5a0eb3b0 ocfs2: remove OCFS2_IOCB_SEM lock type in direct io
In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect
inode->i_alloc_sem rw semaphore lock in the earlier kernel version.
However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not
used at all, so OCFS2_IOCB_SEM lock type needs to be removed.

Signed-off-by: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joseph Qi e272e7f0fb ocfs2: do not BUG if jbd2_journal_dirty_metadata fails
jbd2_journal_dirty_metadata may fail.  Currently it cannot take care of
non zero return value and just BUG in ocfs2_journal_dirty.  This patch is
aborting the handle and journal instead of BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Xue jiufei 099768b0c6 ocfs2: remove BUG_ON(!empty_extent) in __ocfs2_rotate_tree_left()
ocfs2_rotate_tree_left() calls __ocfs2_rotate_tree_left() for left
rotation while non-rightmost path containing an empty extent in the leaf
block.  __ocfs2_rotate_tree_left() returns -EAGAIN if right subtree having an
empty extent and pass the empty_extent_path to caller.  The caller
ocfs2_rotate_tree_left() will restart rotation from the returned path.

It will trigger the BUG_ON(!ocfs2_is_empty_extent) when the et on disk
is as follows:

eb0 is the leaf block of path(say path_a) passed to
ocfs2_rotate_tree_left, which has an empty rec[0].

eb1 is the leaf block of path(say path_b) that just right to path_a, which
has no empty record.

eb2 is the leaf block of path(say path_c) that just right to path_b, which
has an empty rec[0].  And path_c is also the rightmost path.

Now we want to remove the empty rec[0] in eb0:

ocfs2_rotate_tree_left:
  -> call __ocfs2_rotate_tree_left with path_a as its input *path*
    -> call ocfs2_rotate_subtree_left with path_a as its input
       *left_path* and path_b as its input *right_path*. it will move
       rec[0] in eb1 to eb0, and rec[0] in eb0 is not empty now.
    -> continue to call ocfs2_rotate_subtree_left with path_b as its
       input *left_path* and path_c as its input *right_path*, and
       return -EAGAIN because eb2 has an empty rec[0]
  -> call __ocfs2_rotate_tree_left with path_c as it input, rotate all
     records in eb2 to left and return 0.
  -> call __ocfs2_rotate_tree_left with path_a as its input, and triggers
     the BUG_ON(!ocfs2_is_empty_extent) as the rec[0] in eb0 is not empty.

So the BUG_ON() should be removed and return 0 if rec[0] is no longer an
empty extent.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Xue jiufei 9f99ad0861 ocfs2: return error when ocfs2_figure_merge_contig_type() fails
ocfs2_figure_merge_contig_type() still returns CONTIG_NONE when some error
occurs which will cause an unpredictable error.  So return a proper errno
when ocfs2_figure_merge_contig_type() fails.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joseph Qi 345dc681bd ocfs2/dlm: cleanup unused function __dlm_wait_on_lockres_flags_set
__dlm_wait_on_lockres_flags_set() is declared but not implemented and
used.  So remove it.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Daeseok Youn 2e17315242 ocfs2: use retval instead of status for checking error
The use of 'status' in __ocfs2_add_entry() can return wrong value.

Some functions' return value in __ocfs2_add_entry(), i.e
ocfs2_journal_access_di() is saved to 'status'.  But 'status' is not
used in 'bail' label for returning result of __ocfs2_add_entry().

So use retval instead of status.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joseph Qi cf1776a9e8 ocfs2: fix a tiny race when truncate dio orohaned entry
Once dio crashed it will leave an entry in orphan dir.  And orphan scan
will take care of the clean up.  There is a tiny race case that the same
entry will be truncated twice and then trigger the BUG in
ocfs2_del_inode_from_orphan.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Andrew Morton e327284abb ocfs2: remove __mlog_cpu_guess
raw_smp_processor_id() is the means of avoiding the runtime preemptibility
check.

[akpm@linux-foundation.org: fix printk warning]
Cc: Joe Perches <joe@perches.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Joe Perches 7c2bd2f930 ocfs2: reduce object size of mlog uses
Using a function for __mlog_printk instead of a macro reduces the object
size of built-in.o by about 190KB, or ~18% overall (x86-64 defconfig
with all ocfs2 options)

  $ size fs/ocfs2/built-in.o*
     text    data     bss     dec     hex filename
   870954  118471  134408 1123833  1125f9 fs/ocfs2/built-in.o,new
  1064081  118071  134408 1316560  1416d0 fs/ocfs2/built-in.o.old

Miscellanea:

 - Move the used-once __mlog_cpu_guess statement expression macro to the
   masklog.c file above the use in __mlog_printk function

 - Simplify the mlog macro moving the and/or logic and level code into
   __mlog_printk

[akpm@linux-foundation.org: export __mlog_printk() to other ocfs2 modules]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Fabian Frederick 5286d20c4e configfs: unexport/make static config_item_init()
config_item_init() is only used in item.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:39 -07:00
Pekka Enberg b0cbeee72f NTFS: use kvfree() in ntfs_free()
Use kvfree() instead of open-coding it.

Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:38 -07:00
Linus Torvalds e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
Dan Carpenter 5a5003df98 btrfs: delayed-ref: double free in btrfs_add_delayed_tree_ref()
There is a cut and paste error so instead of freeing "head_ref", we free
"ref" twice.

Fixes: 3368d001ba ('btrfs: qgroup: Record possible quota-related extent for qgroup.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-24 12:28:03 -07:00
Peng Tao 97ba375b5d pnfs/flexfiles: report layoutstat regularly
As a simple scheme, report every minute if IO is still going on.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:54:23 -04:00
Peng Tao 1bfe3b259f nfs42: serialize LAYOUTSTATS calls of the same file
There is no need to report concurrently.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00
Peng Tao 27c4306443 pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00
Peng Tao ad4dc53e64 pnfs/flexfiles: add ff_layout_prepare_layoutstats
It fills in the generic part of LAYOUTSTATS call. One thing to note
is that we don't really track if IO is continuous or not. So just fake
to use the completed bytes for it.

Still missing flexfiles specific part, which will be included in the next patch.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00
Peng Tao d983803d38 pNFS/flexfiles: track when layout is first used
So that we can report cumulative time since the beginning
of statistics collection of the layout.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:10 -04:00
Trond Myklebust abcb7bfc9f pNFS/flexfiles: add layoutstats tracking
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:10 -04:00
Trond Myklebust 27797d1bb3 pNFS/flexfiles: Remove unused struct members user_name, group_name
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:17:37 -04:00
Peng Tao 8733408d6e pnfs: add pnfs_report_layoutstat helper function
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:17:37 -04:00
Peng Tao 1b4a4bd82c pNFS: fill in nfs42_layoutstat_ops
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:17:37 -04:00
Trond Myklebust be3a5d2339 NFSv.2/pnfs Add a LAYOUTSTATS rpc function
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:17:37 -04:00
Linus Torvalds 54245ed870 MTD fixes for 4.2
JFFS2
 
  * fix a theoretical unbalanced locking issue; the lock handling was a bit
    unclean, but AFAICT, it didn't actually lead to real deadlocks
 
 NAND
 
  * brcmnand driver: new driver supporting NAND controller found originally on
    Broadcom STB SoCs (BCM7xxx), but now also found on BCM63xxx, iProc (e.g.,
    Cygnus, BCM5301x), BCM3xxx, and more
 
  * Begin factoring out BBT code so it can be shared between traditional
    (parallel) NAND drivers and upcoming SPI NAND drivers (WIP)
 
  * Add common DT-based init support, so nand_base can pick up some flash
    properties automatically, using established common NAND DT properties
 
  * mxc_nand: support 8-bit ECC
 
  * pxa3xx_nand:
    - fix build for ARM64
    - use a jiffies-based timeout
 
 SPI NOR
 
  * Add a few new IDs
 
  * Clear out some unnecessary entries
 
  * Make sure SECT_4K flags are correct for all (?) entries
 
 Core
 
  * Fix mtd->usecount race conditions (BUG_ON())
 
  * Switch to modern PM ops
 
 Other
 
  * CFI: save code space by de-inlining large functions
 
  * Clean up some partition parser selection code across several drivers
 
  * Various miscellaneous changes, mostly minor
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViZdKAAoJEFySrpd9RFgtaqoQAINkYUdxa8mOlmXQSIhWz19K
 5FJqJN+/lgrYmIGI1SzVy/TTB/V4hWH+9h1snU98ToaHFACqbKKeo6FLs6GRd+WJ
 adR9H4PUjtAZOt8trKzzygJnqvRPDWnn6H+0urxFv7kpyPTEorifiiI6+o3AM995
 vh+YsLJNFYHucIzi4TVkgwssLYi8I9TOb35pDnW2uT/ikDi+4Uu+Ocq6u7SuMPXV
 Zch6DcmBhtTQ9ghBwF/dMMAhyyMLyY4q0+CEzmJGxNU+g2o1njOTUDBv+lLqxxMB
 fUXMAPXIT3ndhWjMmzklG7AjOorbg44aG2UlqJWXx00VYJKNf+pljpgdHOXKiuWo
 xYkqOYnEM8gZ4qYNInKbbv34Q2+EjCxg+aAWxh+yRCJrfnF3p5/QSMVL2P2JNrc3
 Kg8W7pW42xOeGxTYpydBtvpq+41qRtdWMgNp47PHvKF0mhpeSCCvRf/YfU4cNcfv
 WfQzBLy/d7RMVFyvACdvzerF/fh9YX3yxV0B0LstytCSZO13617F6HHpm1pYfP0v
 d9GpLpdiFktoG/AXpPzlSl992ALf0SQaedamuxApfvLVymDkG9xoO0P5p6SbOiJ0
 QnBvEMkswEf7ExPzr5SYufSE93wikAEsAfjsLOHo+FQQ57eaRgpCOZHHgfuwcTra
 xUr6u2Rq9iDenhVYDqJU
 =JN5N
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20150623' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "JFFS2:
   - fix a theoretical unbalanced locking issue; the lock handling was a
     bit unclean, but AFAICT, it didn't actually lead to real deadlocks

  NAND:
   - brcmnand driver: new driver supporting NAND controller found
     originally on Broadcom STB SoCs (BCM7xxx), but now also found on
     BCM63xxx, iProc (e.g., Cygnus, BCM5301x), BCM3xxx, and more

   - begin factoring out BBT code so it can be shared between
     traditional (parallel) NAND drivers and upcoming SPI NAND drivers
     (WIP)

   - add common DT-based init support, so nand_base can pick up some
     flash properties automatically, using established common NAND DT
     properties

   - mxc_nand: support 8-bit ECC

   - pxa3xx_nand:
     * fix build for ARM64
     * use a jiffies-based timeout

  SPI NOR:
   - add a few new IDs

   - clear out some unnecessary entries

   - make sure SECT_4K flags are correct for all (?) entries

  Core:
   - fix mtd->usecount race conditions (BUG_ON())

   - switch to modern PM ops

  Other:
   - CFI: save code space by de-inlining large functions

   - clean up some partition parser selection code across several
     drivers

   - various miscellaneous changes, mostly minor"

* tag 'for-linus-20150623' of git://git.infradead.org/linux-mtd: (57 commits)
  mtd: docg3: Fix kasprintf() usage
  mtd: docg3: Don't leak docg3->bbt in error path
  mtd: nandsim: Fix kasprintf() usage
  mtd: cs553x_nand: Fix kasprintf() usage
  mtd: r852: Fix device_create_file() usage
  mtd: brcmnand: drop unnecessary initialization
  mtd: propagate error codes from add_mtd_device()
  mtd: diskonchip: remove two-phase partitioning / registration
  mtd: dc21285: use raw spinlock functions for nw_gpio_lock
  mtd: chips: fixup dependencies, to prevent build error
  mtd: cfi_cmdset_0002: Initialize datum before calling map_word_load_partial
  mtd: cfi: deinline large functions
  mtd: lantiq-flash: use default partition parsers
  mtd: plat_nand: use default partition probe
  mtd: nand: correct indentation within conditional
  mtd: remove incorrect file name
  mtd: blktrans: use better error code for unimplemented ioctl()
  mtd: maps: Spelling s/reseved/reserved/
  mtd: blktrans: change blktrans_getgeo return value
  mtd: mxc_nand: generate nand_ecclayout for 8 bit ECC
  ...
2015-06-23 17:38:39 -07:00
Al Viro dc3f4198ea make simple_positive() public
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:01 -04:00
Fabian Frederick 5d754ced15 ufs: use dir_pages instead of ufs_dir_pages()
dir_pages was declared in a lot of filesystems.
Use newly dir_pages() from pagemap.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:01 -04:00
Fabian Frederick b57c2cb9ea pagemap.h: move dir_pages() over there
That function was declared in a lot of filesystems to calculate
directory pages.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:00 -04:00
Al Viro e5e6e97fe0 remove the pointless include of lglock.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:00 -04:00
Rasmus Villemoes db6172c411 fs: cleanup slight list_entry abuse
list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:59 -04:00
Al Viro 8ea3a7c0df Merge branch 'fscache-fixes' into for-next 2015-06-23 18:01:30 -04:00
Jan Kara a6de82cab1 xfs: Correctly lock inode when removing suid and file capabilities
Currently XFS calls file_remove_privs() without holding i_mutex. This is
wrong because that function can end up messing with file permissions and
file capabilities stored in xattrs for which we need i_mutex held.

Fix the problem by grabbing iolock exclusively when we will need to
change anything in permissions / xattrs.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:10 -04:00
Jan Kara 45f147a1bc fs: Call security_ops->inode_killpriv on truncate
Comment in include/linux/security.h says that ->inode_killpriv() should
be called when setuid bit is being removed and that similar security
labels (in fact this applies only to file capabilities) should be
removed at this time as well. However we don't call ->inode_killpriv()
when we remove suid bit on truncate.

We fix the problem by calling ->inode_need_killpriv() and subsequently
->inode_killpriv() on truncate the same way as we do it on file write.

After this patch there's only one user of should_remove_suid() - ocfs2 -
and indeed it's buggy because it doesn't call ->inode_killpriv() on
write. However fixing it is difficult because of special locking
constraints.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:09 -04:00
Jan Kara dbfae0cdcd fs: Provide function telling whether file_remove_privs() will do anything
Provide function telling whether file_remove_privs() will do anything.
Currently we only have should_remove_suid() and that does something
slightly different.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:09 -04:00
Jan Kara 5fa8e0a1c6 fs: Rename file_remove_suid() to file_remove_privs()
file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:08 -04:00
Jan Kara 2426f39100 fs: Fix S_NOSEC handling
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.

Fix the bug by checking actual mode bits before setting S_NOSEC.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:08 -04:00
Dan Carpenter c0c3a718e3 fs/posix_acl.c: make posix_acl_create() safer and cleaner
If posix_acl_create() returns an error code then "*acl" and "*default_acl"
can be uninitialized or point to freed memory.  This is a dangerous thing
to do.  For example, it causes a problem in ocfs2_reflink():

	fs/ocfs2/refcounttree.c:4327 ocfs2_reflink()
	error: potentially using uninitialized 'default_acl'.

I've re-written this so we set the pointers to NULL at the start.  I've
added a temporary "clone" variable to hold the value of "*acl" until end.
Setting them to NULL means means we don't need the "no_acl" label.  We may
as well remove the "apply_umask" stuff forward and remove that label as
well.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-06-23 18:01:07 -04:00
Al Viro 6b6dabc8dc nilfs2_direct_IO(): remove dead code
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:07 -04:00
Miklos Szeredi 2726d56620 vfs: add seq_file_path() helper
Turn
	seq_path(..., &file->f_path, ...);
into
	seq_file_path(..., file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:07 -04:00
Miklos Szeredi 9bf39ab2ad vfs: add file_path() helper
Turn
	d_path(&file->f_path, ...);
into
	file_path(file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:00:05 -04:00
Theodore Ts'o a2fd66d069 ext4: set lazytime on remount if MS_LAZYTIME is set by mount
Newer versions of mount parse the lazytime feature and pass it to the
mount system call via the flags field in the mount system call,
removing the lazytime string from the mount options list.  So we need
to check for the presence of MS_LAZYTIME and set it in sb->s_flags in
order for this flag to be set on a remount.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-06-23 11:03:54 -04:00
Chris Mason c40b7b064f Merge branch 'sysfs-fsdevices-4.2-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into anand 2015-06-23 05:34:39 -07:00
Linus Torvalds 43224b96af Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A rather largish update for everything time and timer related:

   - Cache footprint optimizations for both hrtimers and timer wheel

   - Lower the NOHZ impact on systems which have NOHZ or timer migration
     disabled at runtime.

   - Optimize run time overhead of hrtimer interrupt by making the clock
     offset updates smarter

   - hrtimer cleanups and removal of restrictions to tackle some
     problems in sched/perf

   - Some more leap second tweaks

   - Another round of changes addressing the 2038 problem

   - First step to change the internals of clock event devices by
     introducing the necessary infrastructure

   - Allow constant folding for usecs/msecs_to_jiffies()

   - The usual pile of clockevent/clocksource driver updates

  The hrtimer changes contain updates to sched, perf and x86 as they
  depend on them plus changes all over the tree to cleanup API changes
  and redundant code, which got copied all over the place.  The y2038
  changes touch s390 to remove the last non 2038 safe code related to
  boot/persistant clock"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  clocksource: Increase dependencies of timer-stm32 to limit build wreckage
  timer: Minimize nohz off overhead
  timer: Reduce timer migration overhead if disabled
  timer: Stats: Simplify the flags handling
  timer: Replace timer base by a cpu index
  timer: Use hlist for the timer wheel hash buckets
  timer: Remove FIFO "guarantee"
  timers: Sanitize catchup_timer_jiffies() usage
  hrtimer: Allow hrtimer::function() to free the timer
  seqcount: Introduce raw_write_seqcount_barrier()
  seqcount: Rename write_seqcount_barrier()
  hrtimer: Fix hrtimer_is_queued() hole
  hrtimer: Remove HRTIMER_STATE_MIGRATE
  selftest: Timers: Avoid signal deadlock in leap-a-day
  timekeeping: Copy the shadow-timekeeper over the real timekeeper last
  clockevents: Check state instead of mode in suspend/resume path
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  ntp: Do leapsecond adjustment in adjtimex read path
  time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
  ntp: Introduce and use SECS_PER_DAY macro instead of 86400
  ...
2015-06-22 18:57:44 -07:00
Dave Chinner de50e16ffa Merge branch 'xfs-misc-fixes-for-4.2-3' into for-next 2015-06-23 08:49:01 +10:00
Dave Chinner 3d238b7e0e Merge branch 'xfs-freelist-cleanup' into for-next 2015-06-23 08:48:43 +10:00
Brian Foster f66bf04269 xfs: don't truncate attribute extents if no extents exist
The xfs_attr3_root_inactive() call from xfs_attr_inactive() assumes that
attribute blocks exist to invalidate. It is possible to have an
attribute fork without extents, however. Consider the case where the
attribute fork is created towards the beginning of xfs_attr_set() but
some part of the subsequent attribute set fails.

If an inode in such a state hits xfs_attr_inactive(), it eventually
calls xfs_dabuf_map() and possibly xfs_bmapi_read(). The former emits a
filesystem corruption warning, returns an error that bubbles back up to
xfs_attr_inactive(), and leads to destruction of the in-core attribute
fork without an on-disk reset. If the inode happens to make it back
through xfs_inactive() in this state (e.g., via a concurrent bulkstat
that cycles the inode from the reclaim state and releases it), i_afp
might not exist when xfs_bmapi_read() is called and causes a NULL
dereference panic.

A '-p 2' fsstress run to ENOSPC on a relatively small fs (1GB)
reproduces these problems. The behavior is a regression caused by:

6dfe5a0 xfs: xfs_attr_inactive leaves inconsistent attr fork state behind

... which removed logic that avoided the attribute extent truncate when
no extents exist. Restore this logic to ensure the attribute fork is
destroyed and reset correctly if it exists without any allocated
extents.

cc: stable@vger.kernel.org # 3.12 to 4.0.x
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-23 08:47:20 +10:00
Linus Torvalds 1bf7067c6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes are:

   - 'qspinlock' support, enabled on x86: queued spinlocks - these are
     now the spinlock variant used by x86 as they outperform ticket
     spinlocks in every category.  (Waiman Long)

   - 'pvqspinlock' support on x86: paravirtualized variant of queued
     spinlocks.  (Waiman Long, Peter Zijlstra)

   - 'qrwlock' support, enabled on x86: queued rwlocks.  Similar to
     queued spinlocks, they are now the variant used by x86:

       CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
       CONFIG_QUEUED_SPINLOCKS=y
       CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
       CONFIG_QUEUED_RWLOCKS=y

   - various lockdep fixlets

   - various locking primitives cleanups, further WRITE_ONCE()
     propagation"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  locking/lockdep: Remove hard coded array size dependency
  locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  lockdep: Do not break user-visible string
  locking/arch: Rename set_mb() to smp_store_mb()
  locking/arch: Add WRITE_ONCE() to set_mb()
  rtmutex: Warn if trylock is called from hard/softirq context
  arch: Remove __ARCH_HAVE_CMPXCHG
  locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
  locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
  locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
  locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
  locking/pvqspinlock, x86: Enable PV qspinlock for Xen
  locking/pvqspinlock, x86: Enable PV qspinlock for KVM
  locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
  locking/pvqspinlock: Implement simple paravirt support for the qspinlock
  locking/qspinlock: Revert to test-and-set on hypervisors
  locking/qspinlock: Use a simple write to grab the lock
  locking/qspinlock: Optimize for smaller NR_CPUS
  locking/qspinlock: Extract out code snippets for the next patch
  locking/qspinlock: Add pending bit
  ...
2015-06-22 14:54:22 -07:00
Linus Torvalds 052b398a43 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "In this pile: pathname resolution rewrite.

   - recursion in link_path_walk() is gone.

   - nesting limits on symlinks are gone (the only limit remaining is
     that the total amount of symlinks is no more than 40, no matter how
     nested).

   - "fast" (inline) symlinks are handled without leaving rcuwalk mode.

   - stack footprint (independent of the nesting) is below kilobyte now,
     about on par with what it used to be with one level of nested
     symlinks and ~2.8 times lower than it used to be in the worst case.

   - struct nameidata is entirely private to fs/namei.c now (not even
     opaque pointers are being passed around).

   - ->follow_link() and ->put_link() calling conventions had been
     changed; all in-tree filesystems converted, out-of-tree should be
     able to follow reasonably easily.

     For out-of-tree conversions, see Documentation/filesystems/porting
     for details (and in-tree filesystems for examples of conversion).

  That has sat in -next since mid-May, seems to survive all testing
  without regressions and merges clean with v4.1"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
  turn user_{path_at,path,lpath,path_dir}() into static inlines
  namei: move saved_nd pointer into struct nameidata
  inline user_path_create()
  inline user_path_parent()
  namei: trim do_last() arguments
  namei: stash dfd and name into nameidata
  namei: fold path_cleanup() into terminate_walk()
  namei: saner calling conventions for filename_parentat()
  namei: saner calling conventions for filename_create()
  namei: shift nameidata down into filename_parentat()
  namei: make filename_lookup() reject ERR_PTR() passed as name
  namei: shift nameidata inside filename_lookup()
  namei: move putname() call into filename_lookup()
  namei: pass the struct path to store the result down into path_lookupat()
  namei: uninline set_root{,_rcu}()
  namei: be careful with mountpoint crossings in follow_dotdot_rcu()
  Documentation: remove outdated information from automount-support.txt
  get rid of assorted nameidata-related debris
  lustre: kill unused helper
  lustre: kill unused macro (LOOKUP_CONTINUE)
  ...
2015-06-22 12:51:21 -07:00
Christoph Hellwig 68e8bb0334 nfsd: wrap too long lines in nfsd4_encode_read
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-22 14:15:05 -04:00
Christoph Hellwig 96bcad5064 nfsd: fput rd_file from XDR encode context
Remove the hack where we fput the read-specific file in generic code.
Instead we can do it in nfsd4_encode_read as that gets called for all
error cases as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-22 14:15:04 -04:00
Christoph Hellwig af90f707fa nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that.  For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid.  This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.

There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters.  In the long run we should probably aim to
cache full file structures used with special stateids instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-22 14:15:03 -04:00
Trond Myklebust 1372a3130a Merge branch 'bugfixes'
* bugfixes:
  NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes
  pNFS: Fix a memory leak when attempted pnfs fails
  NFS: Ensure that we update the sequence id under the slot table lock
  nfs: Initialize cb_sequenceres information before validate_seqid()
  nfs: Only update callback sequnce id when CB_SEQUENCE success
  NFSv4: nfs4_handle_delegation_recall_error should ignore EAGAIN
2015-06-22 09:55:08 -04:00
Anand Jain f90fc54728 Btrfs: Check if kobject is initialized before put
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Tested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.cz>
2015-06-22 14:43:31 +02:00
Miklos Szeredi cdb6727958 ovl: lookup whiteouts outside iterate_dir()
If jffs2 can deadlock on overlayfs readdir because it takes the same lock
on ->iterate() as in ->lookup().

Fix by moving whiteout checking outside iterate_dir().  Optimized by
collecting potential whiteouts (DT_CHR) in a temporary list and if
non-empty iterating throug these and checking for a 0/0 chardev.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: 49c21e1cac ("ovl: check whiteout while reading directory")
Reported-by: Roman Yeryomin <leroi.lists@gmail.com>
2015-06-22 13:53:48 +02:00
Miklos Szeredi 7c03b5d45b ovl: allow distributed fs as lower layer
Allow filesystems with .d_revalidate as lower layer(s), but not as upper
layer.

For local filesystems the rule was that modifications on the layers
directly while being part of the overlay results in undefined behavior.

This can easily be extended to distributed filesystems: we assume the tree
used as lower layer is static, which means ->d_revalidate() should always
return "1".  If that is not the case, return -ESTALE, don't try to work
around the modification.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-06-22 13:53:48 +02:00
Miklos Szeredi a6f15d9a75 ovl: don't traverse automount points
NFS and other distributed filesystems may place automount points in the
tree.  Previoulsy overlayfs refused to mount such filesystems types (based
on the existence of the .d_automount callback), even if the actual export
didn't have any automount points.

It cannot be determined in advance whether the filesystem has automount
points or not.  The solution is to allow fs with .d_automount but refuse to
traverse any automount points encountered.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-06-22 13:53:48 +02:00
Josef Bacik 3da40c7b08 ext4: only call ext4_truncate when size <= isize
At LSF we decided that if we truncate up from isize we shouldn't trim
fallocated blocks that were fallocated with KEEP_SIZE and are past the
new i_size.  This patch fixes ext4 to do this.

[ Completely reworked patch so that i_disksize would actually get set
  when truncating up.  Also reworked the code for handling truncate so
  that it's easier to handle. -- tytso ]

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2015-06-22 00:31:26 -04:00
Eric Whitney 04e22412f4 ext4: make online defrag error reporting consistent
Make the error reporting behavior resulting from the unsupported use
of online defrag on files with data journaling enabled consistent with
that implemented for bigalloc file systems. Difference found with
ext4/308.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2015-06-21 21:38:03 -04:00
Eric Whitney c27e43a10c ext4: minor cleanup of ext4_da_reserve_space()
Remove outdated comments and dead code from ext4_da_reserve_space.
Clean up its trace point, and relocate it to make it more useful.

While we're at it, fix a nearby conditional used to determine if
we have a non-bigalloc file system.  It doesn't match usage elsewhere
in the code, and misleadingly suggests that an s_cluster_ratio value
of 0 would be legal.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-06-21 21:37:05 -04:00
Darrick J. Wong 292db1bc6c ext4: don't retry file block mapping on bigalloc fs with non-extent file
ext4 isn't willing to map clusters to a non-extent file.  Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever.  Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.

(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-06-21 21:10:51 -04:00
Dave Chinner 496817b4be xfs: clean up XFS_MIN_FREELIST macros
We no longer calculate the minimum freelist size from the on-disk
AGF, so we don't need the macros used for this. That means the
nested macros can be cleaned up, and turn this into an actual
function so the logic is clear and concise. This will make it much
easier to add support for the rmap btree when the time comes.

This also gets rid of the XFS_AG_MAXLEVELS macro used by these
freelist macros as it is simply a wrapper around a single variable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 10:13:30 +10:00
Dave Chinner 396503fc83 xfs: sanitise error handling in xfs_alloc_fix_freelist
The error handling is currently an inconsistent mess as every error
condition handles return values and releasing buffers individually.
Clean this up by using gotos and a sane error label stack.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 10:13:19 +10:00
Dave Chinner 72d552854b xfs: factor out free space extent length check
The longest extent length checks in xfs_alloc_fix_freelist() are now
essentially identical. Factor them out into a helper function, so we
know they are checking exactly the same thing before and after we
lock the AGF.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 10:04:42 +10:00
Dave Chinner 50adbcb4c4 xfs: xfs_alloc_fix_freelist() can use incore perag structures
At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
access and agf buffer  based access to freelist and space usage
information. However, once the AGF buffer is locked inside this
function, it is guaranteed that both the in-memory and on-disk
values are identical. xfs_alloc_fix_freelist() doesn't modify the
values in the structures directly, so it is a read-only user of the
infomration, and hence can use the per-ag structure exclusively for
determining what it should do.

This opens up an avenue for cleaning up a lot of duplicated logic
whose only difference is the structure it gets the data from, and in
doing so removes a lot of needless byte swapping overhead when
fixing up the free list.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 10:04:31 +10:00
Christoph Hellwig b2a922cd6c xfs: remove xfs_caddr_t
Just use char pointers directly instead of the confusing typedef to a
pointer type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 09:45:10 +10:00
Christoph Hellwig 5809d5e083 xfs: use void pointers in log validation helpers
Compared to char pointers this saves us a lot of casting effort.  Also
add another local variable to make the code easier to read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 09:44:47 +10:00
Christoph Hellwig 88ee2df7f2 xfs: return a void pointer from xfs_buf_offset
This avoids all kinds of unessecary casts in an envrionment like Linux where
we can assume that pointer arithmetics are support on void pointers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 09:44:29 +10:00
Christoph Hellwig fc51c2b5f8 xfs: remove inst_t
We can simply use a void pointer to pass a long return addresses in the
debugging helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 09:44:02 +10:00
Christoph Hellwig db9d67d6b0 xfs: remove __psint_t and __psunsigned_t
Replace uses of __psint_t with the proper uintptr_t and ptrdiff_t types,
and remove the defintions of __psint_t and __psunsigned_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 09:43:32 +10:00
Eric Sandeen 2ac56d3d4b xfs: fix remote symlinks on V5/CRC filesystems
If we create a CRC filesystem, mount it, and create a symlink with
a path long enough that it can't live in the inode, we get a very
strange result upon remount:

# ls -l mnt
total 4
lrwxrwxrwx. 1 root root 929 Jun 15 16:58 link -> XSLM

XSLM is the V5 symlink block header magic (which happens to be
followed by a NUL, so the string looks terminated).

xfs_readlink_bmap() advanced cur_chunk by the size of the header
for CRC filesystems, but never actually used that pointer; it
kept reading from bp->b_addr, which is the start of the block,
rather than the start of the symlink data after the header.

Looks like this problem goes back to v3.10.

Fixing this gets us reading the proper link target, again.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 09:42:48 +10:00