Commit Graph

48478 Commits

Author SHA1 Message Date
Goldwyn Rodrigues 4d339d0106 btrfs: No need to check !(flags & MS_RDONLY) twice
Code cleanup.
The code block is for !(*flags & MS_RDONLY). We don't need
to check it again.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo 1a79c1f246 Btrfs: update comments in cache_save_setup
We also don't bother to flush free space cache while with free space
tree.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo 539b50d2f6 Btrfs: convert BUG_ON to WARN_ON
These two BUG_ON()s would never be true, ensured by callers' logic.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo 2b19a1fef7 Btrfs: helper for ops that requires full stripe
This adds a helper to show directly whether ops require full stripe.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo 6fad823f49 Btrfs: do not add extra mirror when dev_replace target dev is not available
With this, we can avoid allocating memory for dev replace copies if the
target dev is not available.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo 73c0f22825 Btrfs: handle operations for device replace separately
Since this part is mostly independent, this moves it to a separate
function.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo 5ab56090b8 Btrfs: introduce a function to get extra mirror from replace
As the part of getting extra mirror in __btrfs_map_block is
independent, this puts it into a separate function.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo 0b3d4cd371 Btrfs: separate DISCARD from __btrfs_map_block
Since DISCARD is not as important as an operation like write, we don't
copy it to target device during replace, and it makes __btrfs_map_block
less complex.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo 592d92eeab Btrfs: create a helper for getting chunk map
We have similar code here and there, this merges them into a helper.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo 09ed2f165c Btrfs: add file item tracepoints
While debugging truncate problems, I found that these tracepoints could
help us quickly know what went wrong.

Two sets of tracepoints are created to track regular/prealloc file item
and inline file item respectively, I put inline as a separate one since
what inline file items cares about are way less than the regular one.

This adds four tracepoints:
- btrfs_get_extent_show_fi_regular
- btrfs_get_extent_show_fi_inline
- btrfs_truncate_show_fi_regular
- btrfs_truncate_show_fi_inline

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ formatting adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova dec95574f4 btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova 99f4cdb16f btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova 78a764504d btrfs: convert scrub_parity.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova 186debd6ed btrfs: convert scrub_block.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova 6f615018b3 btrfs: convert scrub_recover.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova a50299ae7c btrfs: convert compressed_bio.pending_bios from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova b7ac31b7b2 btrfs: convert extent_state.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 0700cea7c8 btrfs: convert btrfs_root.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 089e77e10d btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 6de5f18e7b btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 6df8cdf5bd btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 1e4f4714d5 btrfs: convert btrfs_caching_control.count from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova e76edab7f0 btrfs: convert btrfs_ordered_extent.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 490b54d6fb btrfs: convert extent_map.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 9b64f57ddf btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova 140475ae4a btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Liu Bo f95fda8751 Btrfs: remove ASSERT in btrfs_truncate_inode_items
After 76b42abbf7 ("Btrfs: fix data loss after truncate when using the
no-holes feature"),

For either NO_HOLES or inline extents, we've set last_size to newsize to
avoid data loss after remount or inode got evicted and read again, thus,
we don't need this check anymore.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Adam Borowski 1450612797 btrfs: fix a bogus warning when converting only data or metadata
If your filesystem has, eg, data:raid0 metadata:raid1, and you run "btrfs
balance -dconvert=raid1", the meta.target field will be uninitialized.
That's otherwise ok, as it's unused except for this warning.

Thus, let's use the existing set of raid levels for the comparison.

As a side effect, non-convert balances will now nag about data>metadata.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Martin Brandenburg 1ec1688c53 orangefs: free superblock when mount fails
Otherwise lockdep says:

[ 1337.483798] ================================================
[ 1337.483999] [ BUG: lock held when returning to user space! ]
[ 1337.484252] 4.11.0-rc6 #19 Not tainted
[ 1337.484423] ------------------------------------------------
[ 1337.484626] mount/14766 is leaving the kernel with locks still held!
[ 1337.484841] 1 lock held by mount/14766:
[ 1337.485017]  #0:  (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520

Caught by xfstests generic/413 which tried to mount with the unsupported
mount option dax.  Then xfstests generic/422 ran sync which deadlocks.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-15 09:39:31 -07:00
Linus Torvalds c0eb027e5a vfs: don't do RCU lookup of empty pathnames
Normal pathname lookup doesn't allow empty pathnames, but using
AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you
can trigger an empty pathname lookup.

And not only is the RCU lookup in that case entirely unnecessary
(because we'll obviously immediately finalize the end result), it is
actively wrong.

Why? An empth path is a special case that will return the original
'dirfd' dentry - and that dentry may not actually be RCU-free'd,
resulting in a potential use-after-free if we were to initialize the
path lazily under the RCU read lock and depend on complete_walk()
finalizing the dentry.

Found by syzkaller and KASAN.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-15 09:34:52 -07:00
Linus Torvalds 4b31ac485d Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Dave Sterba collected a few more fixes for the last rc.

  These aren't marked for stable, but I'm putting them in with a batch
  were testing/sending by hand for this release"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix potential use-after-free for cloned bio
  Btrfs: fix segmentation fault when doing dio read
  Btrfs: fix invalid dereference in btrfs_retry_endio
  btrfs: drop the nossd flag when remounting with -o ssd
2017-04-14 16:53:45 -07:00
Linus Torvalds 5466f4dfce Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull more CIFS fixes from Steve French:
 "As promised, here is the remaining set of cifs/smb3 fixes for stable
  (and a fix for one regression) now that they have had additional
  review and testing"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix SMB3 mount without specifying a security mechanism
  CIFS: store results of cifs_reopen_file to avoid infinite wait
  CIFS: remove bad_network_name flag
  CIFS: reconnect thread reschedule itself
  CIFS: handle guest access errors to Windows shares
  CIFS: Fix null pointer deref during read resp processing
2017-04-14 16:51:29 -07:00
Mike Kravetz 045c7a3f53 hugetlbfs: fix offset overflow in hugetlbfs mmap
If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start.  Offset could be a negative value when
represented as a loff_t.  The offset plus length will be used to update
the file size (i_size) which is also a loff_t.

Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.

Found by syzcaller with commit ff8c0c53c4 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied.  Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.

To reproduce:

   mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Resulted in,

  kernel BUG at mm/hugetlb.c:742!
  Call Trace:
   hugetlbfs_evict_inode+0x80/0xa0
   evict+0x24a/0x620
   iput+0x48f/0x8c0
   dentry_unlink_inode+0x31f/0x4d0
   __dentry_kill+0x292/0x5e0
   dput+0x730/0x830
   __fput+0x438/0x720
   ____fput+0x1a/0x20
   task_work_run+0xfe/0x180
   exit_to_usermode_loop+0x133/0x150
   syscall_return_slowpath+0x184/0x1c0
   entry_SYSCALL_64_fastpath+0xab/0xad

Fixes: ff8c0c53c4 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13 18:24:21 -07:00
Kirill A. Shutemov 5b7abeae3a thp: fix MADV_DONTNEED vs clear soft dirty race
Yet another instance of the same race.

Fix is identical to change_huge_pmd().

See "thp: fix MADV_DONTNEED vs.  numa balancing race" for more details.

Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13 18:24:21 -07:00
Pavel Shilovsky 67dbea2ce6 CIFS: Fix SMB3 mount without specifying a security mechanism
Commit ef65aaede2 ("smb2: Enforce sec= mount option") changed the
behavior of a mount command to enforce a specified security mechanism
during mounting. On another hand according to the spec if SMB3 server
doesn't respond with a security context it implies that it supports
NTLMSSP. The current code doesn't keep it in mind and fails a mount
for such servers if no security mechanism is specified. Fix this by
indicating that a server supports NTLMSSP if a security context isn't
returned during negotiate phase. This allows the code to use NTLMSSP
by default for SMB3 mounts.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-13 10:03:26 -05:00
Liu Bo a967efb30b Btrfs: fix potential use-after-free for cloned bio
KASAN reports that there is a use-after-free case of bio in btrfs_map_bio.

If we need to submit IOs to several disks at a time, the original bio
would get cloned and mapped to the destination disk, but we really should
use the original bio instead of a cloned bio to do the sanity check
because cloned bios are likely to be freed by its endio.

Reported-by: Diego <diegocg@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-11 18:49:56 +02:00
Liu Bo 97bf5a5589 Btrfs: fix segmentation fault when doing dio read
Commit 2dabb32484 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
introduced this bug during iterating bio pages in dio read's endio hook,
and it could end up with segment fault of the dio reading task.

So the reason is 'if (nr_sectors--)', and it makes the code assume that
there is one more block in the same page, so page offset is increased and
the bio which is created to repair the bad block then has an incorrect
bvec.bv_offset, and a later access of the page content would throw a
segmentation fault.

This also adds ASSERT to check page offset against page size.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-11 18:49:29 +02:00
Liu Bo 2e949b0a55 Btrfs: fix invalid dereference in btrfs_retry_endio
When doing directIO repair, we have this oops:

[ 1458.532816] general protection fault: 0000 [#1] SMP
...
[ 1458.536291] Workqueue: btrfs-endio-repair btrfs_endio_repair_helper [btrfs]
[ 1458.536893] task: ffff88082a42d100 task.stack: ffffc90002b3c000
[ 1458.537499] RIP: 0010:btrfs_retry_endio+0x7e/0x1a0 [btrfs]
...
[ 1458.543261] Call Trace:
[ 1458.543958]  ? rcu_read_lock_sched_held+0xc4/0xd0
[ 1458.544374]  bio_endio+0xed/0x100
[ 1458.544750]  end_workqueue_fn+0x3c/0x40 [btrfs]
[ 1458.545257]  normal_work_helper+0x9f/0x900 [btrfs]
[ 1458.545762]  btrfs_endio_repair_helper+0x12/0x20 [btrfs]
[ 1458.546224]  process_one_work+0x34d/0xb70
[ 1458.546570]  ? process_one_work+0x29e/0xb70
[ 1458.546938]  worker_thread+0x1cf/0x960
[ 1458.547263]  ? process_one_work+0xb70/0xb70
[ 1458.547624]  kthread+0x17d/0x180
[ 1458.547909]  ? kthread_create_on_node+0x70/0x70
[ 1458.548300]  ret_from_fork+0x31/0x40

It turns out that btrfs_retry_endio is trying to get inode from a directIO
page.

This fixes the problem by using the saved inode pointer, done->inode.
btrfs_retry_endio_nocsum has the same problem, and it's fixed as well.

Also cleanup unused @start (which is too trivial for a separate patch).

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-11 18:49:08 +02:00
Adam Borowski 951e796639 btrfs: drop the nossd flag when remounting with -o ssd
The opposite case was already handled right in the very next switch entry.
And also when turning on nossd, drop ssd_spread.

Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-11 18:48:59 +02:00
Germano Percossi 1fa839b498 CIFS: store results of cifs_reopen_file to avoid infinite wait
This fixes Continuous Availability when errors during
file reopen are encountered.

cifs_user_readv and cifs_user_writev would wait for ever if
results of cifs_reopen_file are not stored and for later inspection.

In fact, results are checked and, in case of errors, a chain
of function calls leading to reads and writes to be scheduled in
a separate thread is skipped.
These threads will wake up the corresponding waiters once reads
and writes are done.

However, given the return value is not stored, when rc is checked
for errors a previous one (always zero) is inspected instead.
This leads to pending reads/writes added to the list, making
cifs_user_readv and cifs_user_writev wait for ever.

Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-10 23:36:39 -05:00
Germano Percossi a0918f1ce6 CIFS: remove bad_network_name flag
STATUS_BAD_NETWORK_NAME can be received during node failover,
causing the flag to be set and making the reconnect thread
always unsuccessful, thereafter.

Once the only place where it is set is removed, the remaining
bits are rendered moot.

Removing it does not prevent "mount" from failing when a non
existent share is passed.

What happens when the share really ceases to exist while the
share is mounted is undefined now as much as it was before.

Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-10 23:36:39 -05:00
Germano Percossi 18ea43113f CIFS: reconnect thread reschedule itself
In case of error, smb2_reconnect_server reschedule itself
with a delay, to avoid being too aggressive.

Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-10 23:36:39 -05:00
Mark Syms 40920c2bb1 CIFS: handle guest access errors to Windows shares
Commit 1a967d6c9b ("correctly to
anonymous authentication for the NTLM(v2) authentication") introduces
a regression in handling errors related to attempting a guest
connection to a Windows share which requires authentication. This
should result in a permission denied error but actually causes the
kernel module to enter a never-ending loop trying to follow a DFS
referal which doesn't exist.

The base cause of this is the failure now occurs later in the process
during tree connect and not at the session setup setup and all errors
in tree connect are interpreted as needing to follow the DFS paths
which isn't in this case correct. So, check the returned error against
EACCES and fail if this is returned error.

Feedback from Aurelien:

  PS> net user guest /activate:no
    PS> mkdir C:\guestshare
      PS> icacls C:\guestshare /grant 'Everyone:(OI)(CI)F'
        PS> new-smbshare -name guestshare -path C:\guestshare -fullaccess Everyone

        I've tested v3.10, v4.4, master, master+your patch using default options
        (empty or no user "NU") and user=abc (U).

        NT_LOGON_FAILURE in session setup: LF
        This is what you seem to have in 3.10.

        NT_ACCESS_DENIED in tree connect to the share: AD
        This is what you get before your infinite loop.

                     |   NU       U
                     --------------------------------
                     3.10         |   LF       LF
                     4.4          |   LF       LF
                     master       |   AD       LF
                     master+patch |   AD       LF

                     No infinite DFS loop :(
                     All these issues result in mount failing very fast with permission denied.

                     I guess it could be from either the Windows version or the share/folder
                     ACL. A deeper analysis of the packets might reveal more.

                     In any case I did not notice any issues for on a basic DFS setup with
                     the patch so I don't think it introduced any regressions, which is
                     probably all that matters. It still bothers me a little I couldn't hit
                     the bug.

                     I've included kernel output w/ debugging output and network capture of
                     my tests if anyone want to have a look at it. (master+patch = ml-guestfix).

Signed-off-by: Mark Syms <mark.syms@citrix.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-10 23:36:38 -05:00
Pavel Shilovsky 350be257ea CIFS: Fix null pointer deref during read resp processing
Currently during receiving a read response mid->resp_buf can be
NULL when it is being passed to cifs_discard_remaining_data() from
cifs_readv_discard(). Fix it by always passing server->smallbuf
instead and initializing mid->resp_buf at the end of read response
processing.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-10 23:36:38 -05:00
Linus Torvalds 84ced7fd06 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "This is a set of CIFS/SMB3 fixes for stable.

  There is another set of four SMB3 reconnect fixes for stable in
  progress but they are still being reviewed/tested, so didn't want to
  wait any longer to send these five below"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  Reset TreeId to zero on SMB2 TREE_CONNECT
  CIFS: Fix build failure with smb2
  Introduce cifs_copy_file_range()
  SMB3: Rename clone_range to copychunk_range
  Handle mismatched open calls
2017-04-09 09:10:02 -07:00
Linus Torvalds 5b50be743f Driver core fixes for 4.11-rc6
Here are 3 small fixes for 4.11-rc6.  One resolves a reported issue with
 sysfs files that NeilBrown found, one is a documenatation fix for the
 stable kernel rules, and the last is a small MAINTAINERS file update for
 kernfs.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWOnrMw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk/JQCfQKjOpGDAR9Hs6u4YQ4hJrAHFneYAn1F4MLDW
 3b0ZMnlZHkDq834UwKnB
 =iiei
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are 3 small fixes for 4.11-rc6.

  One resolves a reported issue with sysfs files that NeilBrown found,
  one is a documenatation fix for the stable kernel rules, and the last
  is a small MAINTAINERS file update for kernfs"

* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  MAINTAINERS: separate out kernfs maintainership
  sysfs: be careful of error returns from ops->show()
  Documentation: stable-kernel-rules: fix stable-tag format
2017-04-09 09:03:51 -07:00
Linus Torvalds 2a610b8aa8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
 "statx followup fixes and a fix for stack-smashing on alpha"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  alpha: fix stack smashing in old_adjtimex(2)
  statx: Include a mask for stx_attributes in struct statx
  statx: Reserve the top bit of the mask for future struct expansion
  xfs: report crtime and attribute flags to statx
  ext4: Add statx support
  statx: optimize copy of struct statx to userspace
  statx: remove incorrect part of vfs_statx() comment
  statx: reject unknown flags when using NULL path
  Documentation/filesystems: fix documentation for ->getattr()
2017-04-09 08:26:21 -07:00
NeilBrown c8a139d001 sysfs: be careful of error returns from ops->show()
ops->show() can return a negative error code.
Commit 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.

Commit 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.

This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.

I can reproduce it reliably only by putting delay like
	usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
  echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state

The bug can be triggered without the usleep.

Fixes: 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 17:33:32 +02:00
Linus Torvalds 56c2997965 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: move pcp and lru-pcp draining into single wq
  mailmap: update Yakir Yang email address
  mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
  dax: fix radix tree insertion race
  mm, thp: fix setting of defer+madvise thp defrag mode
  ptrace: fix PTRACE_LISTEN race corrupting task->state
  vmlinux.lds: add missing VMLINUX_SYMBOL macros
  mm/page_alloc.c: fix print order in show_free_areas()
  userfaultfd: report actual registered features in fdinfo
  mm: fix page_vma_mapped_walk() for ksm pages
2017-04-08 01:35:32 -07:00
Ross Zwisler e11f8b7b6c dax: fix radix tree insertion race
While running generic/340 in my test setup I hit the following race.  It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.

Thread 1				Thread 2
--------				--------
dax_iomap_pmd_fault()
  grab_mapping_entry()
    spin_lock_irq()
    get_unlocked_mapping_entry()
    'entry' is NULL, can't call lock_slot()
    spin_unlock_irq()
    radix_tree_preload()
					dax_iomap_pmd_fault()
					  grab_mapping_entry()
					    spin_lock_irq()
					    get_unlocked_mapping_entry()
					    ...
					    lock_slot()
					    spin_unlock_irq()
					  dax_pmd_insert_mapping()
					    <inserts a PMD mapping>
    spin_lock_irq()
    __radix_tree_insert() fails with -EEXIST
    <fall back to 4k fault, and die horribly
     when inserting a 4k entry where a PMD exists>

The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index.  For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.

So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.

This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop.  It used to fail within the first 10 iterations.

Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>    [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:49 -07:00