Commit Graph

55740 Commits

Author SHA1 Message Date
Nikolay Borisov 6c05040702 btrfs: Make btrfs_find_device_missing_or_by_path return directly a device
This function returns a numeric error value and additionally the
device found in one of its input parameters. Simplify this by making
the function directly return a pointer to btrfs_device. Additionally
adjust the caller to handle the case when we want to remove the
'missing' device and ENOENT is returned to return the expected
positive error value, parsed by progs. Finally, unexport the function
since it's not called outside of volume.c. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:29 +02:00
Nikolay Borisov b444ad46b2 btrfs: Make btrfs_find_device_by_path return struct btrfs_device
Currently this function returns an error code as well as uses one of
its arguments as a return value for struct btrfs_device. Change the
function so that it returns btrfs_device directly and use the usual
"encode error in pointer" mechanics if something goes wrong. No
functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:29 +02:00
Jeff Mahoney 374b0e2d6b btrfs: fix error handling in free_log_tree
When we hit an I/O error in free_log_tree->walk_log_tree during file system
shutdown we can crash due to there not being a valid transaction handle.

Use btrfs_handle_fs_error when there's no transaction handle to use.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
  IP: free_log_tree+0xd2/0x140 [btrfs]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  Modules linked in: <modules>
  CPU: 2 PID: 23544 Comm: umount Tainted: G        W        4.12.14-kvmsmall #9 SLE15 (unreleased)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  task: ffff96bfd3478880 task.stack: ffffa7cf40d78000
  RIP: 0010:free_log_tree+0xd2/0x140 [btrfs]
  RSP: 0018:ffffa7cf40d7bd10 EFLAGS: 00010282
  RAX: 00000000fffffffb RBX: 00000000fffffffb RCX: 0000000000000002
  RDX: 0000000000000000 RSI: ffff96c02f07d4c8 RDI: 0000000000000282
  RBP: ffff96c013cf1000 R08: ffff96c02f07d4c8 R09: ffff96c02f07d4d0
  R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
  R13: ffff96c005e800c0 R14: ffffa7cf40d7bdb8 R15: 0000000000000000
  FS:  00007f17856bcfc0(0000) GS:ffff96c03f600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000060 CR3: 0000000045ed6002 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? wait_for_writer+0xb0/0xb0 [btrfs]
   btrfs_free_log+0x17/0x30 [btrfs]
   btrfs_drop_and_free_fs_root+0x9a/0xe0 [btrfs]
   btrfs_free_fs_roots+0xc0/0x130 [btrfs]
   ? wait_for_completion+0xf2/0x100
   close_ctree+0xea/0x2e0 [btrfs]
   ? kthread_stop+0x161/0x260
   generic_shutdown_super+0x6c/0x120
   kill_anon_super+0xe/0x20
   btrfs_kill_super+0x13/0x100 [btrfs]
   deactivate_locked_super+0x3f/0x70
   cleanup_mnt+0x3b/0x70
   task_work_run+0x78/0x90
   exit_to_usermode_loop+0x77/0xa6
   do_syscall_64+0x1c5/0x1e0
   entry_SYSCALL_64_after_hwframe+0x42/0xb7
  RIP: 0033:0x7f1784f90827
  RSP: 002b:00007ffdeeb03118 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 0000556a60c62970 RCX: 00007f1784f90827
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556a60c62b50
  RBP: 0000000000000000 R08: 0000000000000005 R09: 00000000ffffffff
  R10: 0000556a60c63900 R11: 0000000000000246 R12: 0000556a60c62b50
  R13: 00007f17854a81c4 R14: 0000000000000000 R15: 0000000000000000
  RIP: free_log_tree+0xd2/0x140 [btrfs] RSP: ffffa7cf40d7bd10
  CR2: 0000000000000060

Fixes: 681ae50917 ("Btrfs: cleanup reserved space when freeing tree log on error")
CC: <stable@vger.kernel.org> # v3.13
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:29 +02:00
Misono Tomohiro 380fd06640 btrfs: remove redundant variable from btrfs_cross_ref_exist
Since commit d7df2c796d ("Btrfs attach delayed ref updates to delayed
ref heads"), check_delayed_ref() won't return -ENOENT.

In btrfs_cross_ref_exist(), two variables 'ret' and 'ret2' are
originally used to handle -ENOENT error case.  Since the code is not
needed anymore, let's just remove 'ret2'.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:29 +02:00
Liu Bo e49aabd973 Btrfs: set leave_spinning in btrfs_get_extent
Unless it's going to read inline extents from btree leaf to page,
btrfs_get_extent won't sleep during the period of holding path lock.

This sets leave_spinning at first and sets path to blocking mode right
before reading inline extent if that's the case.  The benefit is that a
path in spinning mode typically has lower impact (faster) on waiters
rather than that in the blocking mode.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:29 +02:00
Liu Bo de2c6615dc Btrfs: fix alignment in declaration and prototype of btrfs_get_extent
This fixes btrfs_get_extent to be consistent with our existing
declaration style.

Note: For the record, indentation styles that are accepted are both,
aligning under the opening ( and tab or double tab indentation on the
next line.  Preferrably not spliting the type or long expressions in the
argument lists.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:29 +02:00
Colin Ian King 29c5e5d496 btrfs: remove unused pointer 'tree' in btrfs_submit_compressed_read
Pointer 'tree' is being assigned but is never used hence it is redundant
and can be removed. This is a leftover from cleanup patch
00032d38ea ("btrfs: drop extent_io_ops::merge_bio_hook
callback").

Cleans up clang warning:

  warning: variable 'tree' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:28 +02:00
Colin Ian King d005dbeca0 btrfs: remove unused pointer inode in relink_file_extents
Pointer inode is being assigned but is never used hence it is redundant
and can be removed. It's been unused since the introduction in
38c227d87c ("Btrfs: snapshot-aware defrag").

Cleans up clang warning:

  variable ‘inode’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:28 +02:00
Su Yue 28c4a3e21a btrfs: defrag: use btrfs_mod_outstanding_extents in cluster_pages_for_defrag
Since commit 8b62f87bad ("Btrfs: rework outstanding_extents"),
manual operations of outstanding_extent in btrfs_inode are replaced by
btrfs_mod_outstanding_extents().
The one in cluster_pages_for_defrag seems to be lost, so replace it
of btrfs_mod_outstanding_extents().

Fixes: 8b62f87bad ("Btrfs: rework outstanding_extents")
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:28 +02:00
Liu Bo 6aadd9eb74 Btrfs: remove confusing tracepoint in btrfs_add_reserved_bytes
Here we're not releasing any space, but transferring bytes from
->bytes_may_use to ->bytes_reserved. The last change to the code in
commit 18513091af ("btrfs: update btrfs_space_info's
bytes_may_use timely") removed a conditional tracepoint and the logic
changed too but the tracepiont remained.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:28 +02:00
Liu Bo c64142807f btrfs: free path at an earlier point in btrfs_get_extent
trace_btrfs_get_extent() has nothing to do with path, place
btrfs_free_path ahead so that we can unlock path on error.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:28 +02:00
Liu Bo 9688e9a99e Btrfs: use next_state in find_first_extent_bit
As next_state() is already defined to get the next state, use it in
find_first_extent_bit. No functional changes.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:28 +02:00
Qu Wenruo b72c3aba09 btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock
[BUG]
For certain crafted image, whose csum root leaf has missing backref, if
we try to trigger write with data csum, it could cause deadlock with the
following kernel WARN_ON():

  WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
  CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
  Workqueue: btrfs-endio-write btrfs_endio_write_helper
  RIP: 0010:btrfs_tree_lock+0x3e2/0x400
  Call Trace:
   btrfs_alloc_tree_block+0x39f/0x770
   __btrfs_cow_block+0x285/0x9e0
   btrfs_cow_block+0x191/0x2e0
   btrfs_search_slot+0x492/0x1160
   btrfs_lookup_csum+0xec/0x280
   btrfs_csum_file_blocks+0x2be/0xa60
   add_pending_csums+0xaf/0xf0
   btrfs_finish_ordered_io+0x74b/0xc90
   finish_ordered_fn+0x15/0x20
   normal_work_helper+0xf6/0x500
   btrfs_endio_write_helper+0x12/0x20
   process_one_work+0x302/0x770
   worker_thread+0x81/0x6d0
   kthread+0x180/0x1d0
   ret_from_fork+0x35/0x40

[CAUSE]
That crafted image has missing backref for csum tree root leaf.  And
when we try to allocate new tree block, since there is no
EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
and use it.

The extent tree of the image looks like:

  Normal image                      |       This fuzzed image
  ----------------------------------+--------------------------------
  BG 29360128                       | BG 29360128
   One empty slot                   |  One empty slot
  29364224: backref to UUID tree    | 29364224: backref to UUID tree
   Two empty slots                  |  Two empty slots
  29376512: backref to CSUM tree    |  One empty slot (bad type) <<<
  29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree
  ...                               | ...

Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to
alloc tree block, it's an valid slot for btrfs.

And for finish_ordered_write, when we need to insert csum, we try to CoW
csum tree root.

By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K,
BG_OFFSET + 12K is already used by tree block COW for other trees, the
next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM
tree.

But due to the bad type, btrfs can recognize it and still consider it as
an empty slot, and will try to use it for csum tree CoW.

Then in the following call trace, we will try to lock the new tree
block, which turns out to be the old csum tree root which is already
locked:

btrfs_search_slot() called on csum tree root, which is at 29376512
|- btrfs_cow_block()
   |- btrfs_set_lock_block()
   |  |- Now locks tree block 29376512 (old csum tree root)
   |- __btrfs_cow_block()
      |- btrfs_alloc_tree_block()
         |- btrfs_reserve_extent()
            | Now it returns tree block 29376512, which extent tree
            | shows its empty slot, but it's already hold by csum tree
            |- btrfs_init_new_buffer()
               |- btrfs_tree_lock()
                  | Triggers WARN_ON(eb->lock_owner == current->pid)
                  |- wait_event()
                     Wait lock owner to release the lock, but it's
                     locked by ourself, so it will deadlock

[FIX]
This patch will do the lock_owner and current->pid check at
btrfs_init_new_buffer().
So above deadlock can be avoided.

Since such problem can only happen in crafted image, we will still
trigger kernel warning for later aborted transaction, but with a little
more meaningful warning message.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
Reported-by: Xu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:27 +02:00
Qu Wenruo 65c6e82bec btrfs: Handle owner mismatch gracefully when walking up tree
[BUG]
When mounting certain crafted image, btrfs will trigger kernel BUG_ON()
when trying to recover balance:

  kernel BUG at fs/btrfs/extent-tree.c:8956!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 662 Comm: mount Not tainted 4.18.0-rc1-custom+ #10
  RIP: 0010:walk_up_proc+0x336/0x480 [btrfs]
  RSP: 0018:ffffb53540c9b890 EFLAGS: 00010202
  Call Trace:
   walk_up_tree+0x172/0x1f0 [btrfs]
   btrfs_drop_snapshot+0x3a4/0x830 [btrfs]
   merge_reloc_roots+0xe1/0x1d0 [btrfs]
   btrfs_recover_relocation+0x3ea/0x420 [btrfs]
   open_ctree+0x1af3/0x1dd0 [btrfs]
   btrfs_mount_root+0x66b/0x740 [btrfs]
   mount_fs+0x3b/0x16a
   vfs_kern_mount.part.9+0x54/0x140
   btrfs_mount+0x16d/0x890 [btrfs]
   mount_fs+0x3b/0x16a
   vfs_kern_mount.part.9+0x54/0x140
   do_mount+0x1fd/0xda0
   ksys_mount+0xba/0xd0
   __x64_sys_mount+0x21/0x30
   do_syscall_64+0x60/0x210
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
Extent tree corruption.  In this particular case, reloc tree root's
owner is DATA_RELOC_TREE (should be TREE_RELOC), thus its backref is
corrupted and we failed the owner check in walk_up_tree().

[FIX]
It's pretty hard to take care of every extent tree corruption, but at
least we can remove such BUG_ON() and exit more gracefully.

And since in this particular image, DATA_RELOC_TREE and TREE_RELOC share
the same root (which is obviously invalid), we needs to make
__del_reloc_root() more robust to detect such invalid sharing to avoid
possible NULL dereference as root->node can be NULL in this case.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200411
Reported-by: Xu Wen <wen.xu@gatech.edu>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:27 +02:00
zhong jiang 45128b08f7 btrfs: change btrfs_pin_log_trans to return void
btrfs_pin_log_trans defines the variable "ret" for return value, but it
is not modified after initialization. Further, I find that none of the
callers do handles the return value, so it is safe to drop the unneeded
"ret" and make it return void.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:27 +02:00
zhong jiang 556f3ca88e btrfs: change btrfs_free_reserved_bytes to return void
btrfs_free_reserved_bytes uses the variable "ret" for return value,
but it is not modified after initialzation. Further, I find that any of
the callers do not handle the return value, so it is safe to drop the
unneeded "ret" and return void. There are no callees that would need the
function to handle or pass the value either.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:27 +02:00
Liu Bo bee6ec822a Btrfs: remove always true if branch in btrfs_get_extent
@path is always NULL when it comes to the if branch.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:27 +02:00
Qu Wenruo 9c7b0c2e8d btrfs: qgroup: Dirty all qgroups before rescan
[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:

  $ mkfs.btrfs -fq $DEV
  $ mount $DEV /mnt

  $ btrfs quota enable /mnt
  $ btrfs qgroup create 1/0 /mnt
  $ btrfs sub create /mnt/sub
  $ btrfs qgroup assign 0/257 1/0 /mnt

  $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
  $ btrfs sub snap /mnt/sub /mnt/snap
  $ btrfs quota rescan -w /mnt
  $ btrfs qgroup show -pcre /mnt
  qgroupid         rfer         excl     max_rfer     max_excl parent  child
  --------         ----         ----     --------     -------- ------  -----
  0/5          16.00KiB     16.00KiB         none         none ---     ---
  0/257      1016.00KiB     16.00KiB         none         none 1/0     ---
  0/258      1016.00KiB     16.00KiB         none         none ---     ---
  1/0        1016.00KiB     16.00KiB         none         none ---     0/257

So far so good, but:

  $ btrfs qgroup remove 0/257 1/0 /mnt
  WARNING: quotas may be inconsistent, rescan needed
  $ btrfs quota rescan -w /mnt
  $ btrfs qgroup show -pcre  /mnt
  qgoupid         rfer         excl     max_rfer     max_excl parent  child
  --------         ----         ----     --------     -------- ------  -----
  0/5          16.00KiB     16.00KiB         none         none ---     ---
  0/257      1016.00KiB     16.00KiB         none         none ---     ---
  0/258      1016.00KiB     16.00KiB         none         none ---     ---
  1/0        1016.00KiB     16.00KiB         none         none ---     ---
	     ^^^^^^^^^^     ^^^^^^^^ not cleared

[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.

However we don't mark all qgroups dirty, but rely on rescan to do so.

If we have any high level qgroup without children, it won't be marked
dirty during rescan, since we cannot reach that qgroup.

This will cause QGROUP_INFO items of childless qgroups never get updated
in the quota tree, thus their numbers will stay the same in "btrfs
qgroup show" output.

[FIX]
Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even if
we have childless qgroups, their QGROUP_INFO items will still get
updated during rescan.

Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Tested-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:26 +02:00
Omar Sandoval 3293428096 Btrfs: clean up scrub is_dev_replace parameter
struct scrub_ctx has an ->is_dev_replace member, so there's no point in
passing around is_dev_replace where sctx is available.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:26 +02:00
Anand Jain 1da739678e btrfs: add helper to obtain number of devices with ongoing dev-replace
When the replace is running the fs_devices::num_devices also includes
the replaced device, however in some operations like device delete and
balance it needs the actual num_devices without the repalced devices.
The function btrfs_num_devices() just provides that.

And here is a scenario how balance and repalce items could co-exist:

Consider balance is started and paused, now start the replace followed
by a unmount or system power-cycle. During following mount, the
open_ctree() first restarts the balance so it must check for the device
replace otherwise our num_devices calculation will be wrong.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:26 +02:00
Anand Jain 16220c467a btrfs: add assertions where number of devices could go below 0
In preparation to add helper function to deduce the num_devices with
replace running, use assert instead of BUG_ON or WARN_ON. The number of
devices would not normally drop to 0 due to other checks so the assert
is sufficient.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog, adjust the assert condition ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:26 +02:00
zhong jiang f8b00e0f06 btrfs: remove unneeded NULL checks before kfree
Kfree has taken the NULL pointer into account. So remove the check
before kfree.

The issue is detected with the help of Coccinelle.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:26 +02:00
Liu Bo 4b6f8e9695 Btrfs: do not unnecessarily pass write_lock_level when processing leaf
As we're going to return right after the call, it's not necessary to get
update the new write_lock_level from unlock_up.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:26 +02:00
Misono Tomohiro 4fd786e6c3 btrfs: Remove 'objectid' member from struct btrfs_root
There are two members in struct btrfs_root which indicate root's
objectid: objectid and root_key.objectid.

They are both set to the same value in __setup_root():

  static void __setup_root(struct btrfs_root *root,
                           struct btrfs_fs_info *fs_info,
                           u64 objectid)
  {
    ...
    root->objectid = objectid;
    ...
    root->root_key.objectid = objecitd;
    ...
  }

and not changed to other value after initialization.

grep in btrfs directory shows both are used in many places:
  $ grep -rI "root->root_key.objectid" | wc -l
  133
  $ grep -rI "root->objectid" | wc -l
  55
 (4.17, inc. some noise)

It is confusing to have two similar variable names and it seems
that there is no rule about which should be used in a certain case.

Since ->root_key itself is needed for tree reloc tree, let's remove
'objecitd' member and unify code to use ->root_key.objectid in all places.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Lu Fengqi 5a2cb25ab9 btrfs: remove a useless return statement in btrfs_block_rsv_add
Since ret must be 0 here, don't have to return.  No functional change
and code readability is not hurt.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Lu Fengqi 684572df94 btrfs: Remove root parameter from btrfs_insert_dir_item
All callers pass the root tree of dir, we can push that down to the
function itself.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Lu Fengqi 3a58417486 btrfs: switch update_size to bool in btrfs_block_rsv_migrate and btrfs_rsv_add_bytes
Using true and false here is closer to the expected semantic than using
0 and 1.  No functional change.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Lu Fengqi a7176f74fa btrfs: simplify the send_in_progress check in btrfs_delete_subvolume
Only when send_in_progress, we have to do something different such as
btrfs_warn() and return -EPERM. Therefore, we could check
send_in_progress first and process error handling, after the
root_item_lock has been got.

Just for better readability. No functional change.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Greg Kroah-Hartman 3a27203102 libnvdimm/dax 4.19-rc8
* Fix a livelock in dax_layout_busy_page() present since v4.18. The
   lockup triggers when truncating an actively mapped huge page out of a
   mapping pinned for direct-I/O.
 
 * Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5 mprotect()
   clears this flag that is needed to communicate the liveness of device
   pages to the get_user_pages() path.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbwiZhAAoJEB7SkWpmfYgCYFoQAL8ED6c1bfGUPRsWSrTRChU0
 ungVZ/Vf1+2ERd3ivUXPQzahNtqH5EWvEVp0aboVpyJUoVllrztInVS2hxaGJE+e
 w7WnzaXh36MY0kvLpK+Ny1Cxk7qg2rXnmzOAPRVdSUoSvh0TXOn5HFX1i/OdI7WK
 wgJwXraCoyKP9aTItw7oHQy9S36bi1RJVUakOAoEpEx4Vn+fwFxLNIt34G5CRJ+k
 iflicM7CPngxlFzwfoiX9v3DhV7toexk1A4LAzzwypG0Aiqd5tW2FG1lwLMPncNk
 8FezBm9VjkMwzv6hj7nD9UfU2lbh3GqqGDW0cPX1DPSgDxr/4pOLtKcbYWHh6yas
 NtCXk37q90ey3GtD2wYBRkBNly6UWvHJ0d3srtO6ZSl1VN6JQu8rhVhQ6KnON24B
 NcWlEVf2brqf0uaW4byCVbdVfIDp96/qgEvCo1pq3olXwCdDyOBJjYxaBcnu5JDV
 YsItMCJ49AxS/qoCt3vam7vC5TGhfYHL5xJPaF06cdjYvgfqOIV3VQT1ujBx4cvh
 MBFRBKDc6oDiJFgkrdYqHwJfn5fCQVS180Oy5S0AFGsVAzsJalKBZBLx2f2RQn8c
 r+kczvvPjpczEeEqzaqsxTgjowo/75Q8PRXc2PbwQzNxfkHuKf+xxQpnUg0mN6Hf
 w8zPSaCcCs2Wo21Kd/ua
 =VXnU
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Dan writes:
  "libnvdimm/dax 4.19-rc8

   * Fix a livelock in dax_layout_busy_page() present since v4.18. The
     lockup triggers when truncating an actively mapped huge page out of
     a mapping pinned for direct-I/O.

   * Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5
     mprotect() clears this flag that is needed to communicate the
     liveness of device pages to the get_user_pages() path."

* tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  mm: Preserve _PAGE_DEVMAP across mprotect() calls
  filesystem-dax: Fix dax_layout_busy_page() livelock
2018-10-14 08:34:31 +02:00
Richard Weinberger f8ccb14fd6 ubifs: Fix WARN_ON logic in exit path
ubifs_assert() is not WARN_ON(), so we have to invert
the checks.
Randy faced this warning with UBIFS being a module, since
most users use UBIFS as builtin because UBIFS is the rootfs
nobody noticed so far. :-(
Including me.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 54169ddd38 ("ubifs: Turn two ubifs_assert() into a WARN_ON()")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13 11:05:02 +02:00
Greg Kroah-Hartman 79fc170b1f Merge branch 'akpm'
Fixes from Andrew:

* akpm:
  fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()
  mm/thp: fix call to mmu_notifier in set_pmd_migration_entry() v2
  mm/mmap.c: don't clobber partially overlapping VMA with MAP_FIXED_NOREPLACE
  ocfs2: fix a GCC warning
2018-10-13 09:31:19 +02:00
Khazhismel Kumykov ac081c3be3 fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()
On non-preempt kernels this loop can take a long time (more than 50 ticks)
processing through entries.

Link: http://lkml.kernel.org/r/20181010172623.57033-1-khazhy@google.com
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13 09:31:03 +02:00
zhong jiang 1cff514a51 ocfs2: fix a GCC warning
Fix the following compile warning:

fs/ocfs2/dlmglue.c:99:30: warning: ‘lockdep_keys’ defined but not used [-Wunused-variable]
 static struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES];

Link: http://lkml.kernel.org/r/1536938148-32110-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13 09:31:02 +02:00
Greg Kroah-Hartman ed66c252d9 gfs2 4.19 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbwLzVAAoJENW/n+sDE2U6ZQUP/3mcvCqDjU+YBC/RCuA0Hvxd
 4rtzRqY5jTt4HZCTJ9Qj8aARZ4vN/LjKQuQU8IYhzJQcXzeyDi1V2zH4xGmEDomZ
 o9LqcVUUJO1bS/aykLrbnSEEfjU+zeA3z60Jbj3OHkpL6H9u6q+AwqCZnsIcKBnI
 Z8lgUda6oC3L60xzxpdr6BoYLIMqucuJ0YMyVo6YsKT1PEoP30EQhAoqnnKVfxx7
 +l66OUK4Qw3z45auen9MIOXjn+y7BC0RjFANetsBlAS29PE5X9rXWQ5xnv+9uYCh
 DDXPiMjdkLm2Xv8d5U80QhVuO8vxkgn3lnLvE99cmMmQrU/6pS91q+azgc/98RDN
 ZYCReIihyPbug3+uRziYcBHnI+4c8LrM/Za0TsHDG6gA6ddiiNHvdnJE9Nbk85Le
 dS+jQwA5eiYtGc6styoyk52v6huGTI3oPbffyYTTbOLnT6bSaEv+k0zWWCgTqMv3
 sb0imETfsdd1O/7MxR2kAVtQPdSzhD5GMIbkmlRb8nwP+QXb+FBVtQFPsZaPZklR
 qF//eFXGcoWfM1XerOYgl6VVolvntSc+ivycSEsDGHNSU5zUy2JYNTxmkypBB6pb
 wslVyIvjW2bcnUhjGr9PDSg+yQjaAk+5EyiJmxtJaqCiSfjgTpI4YwNrkfQo5csI
 H/Sq1WeTLjo2KeRb7LpU
 =gGo4
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.19.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Andreas writes:
  "gfs2 4.19 fixes

   Fix iomap buffered write support for journaled files"

* tag 'gfs2-4.19.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix iomap buffered write support for journaled files (2)
2018-10-13 09:06:27 +02:00
David Howells f014ffb025 afs: Fix afs_server struct leak
Fix a leak of afs_server structs.  The routine that installs them in the
various lookup lists and trees gets a ref on leaving the function, whether
it added the server or a server already exists.  It shouldn't increment
the refcount if it added the server.

The effect of this that "rmmod kafs" will hang waiting for the leaked
server to become unused.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-12 17:36:40 +02:00
Andreas Gruenbacher fee5150c48 gfs2: Fix iomap buffered write support for journaled files (2)
It turns out that the fix in commit 6636c3cc56 is bad; the assertion
that the iomap code no longer creates buffer heads is incorrect for
filesystems that set the IOMAP_F_BUFFER_HEAD flag.

Instead, what's happening is that gfs2_iomap_begin_write treats all
files that have the jdata flag set as journaled files, which is
incorrect as long as those files are inline ("stuffed").  We're handling
stuffed files directly via the page cache, which is why we ended up with
pages without buffer heads in gfs2_page_add_databufs.

Fix this by handling stuffed journaled files correctly in
gfs2_iomap_begin_write.

This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-12 17:14:42 +02:00
David Howells 6b3944e42e afs: Fix cell proc list
Access to the list of cells by /proc/net/afs/cells has a couple of
problems:

 (1) It should be checking against SEQ_START_TOKEN for the keying the
     header line.

 (2) It's only holding the RCU read lock, so it can't just walk over the
     list without following the proper RCU methods.

Fix these by using an hlist instead of an ordinary list and using the
appropriate accessor functions to follow it with RCU.

Since the code that adds a cell to the list must also necessarily change,
sort the list on insertion whilst we're at it.

Fixes: 989782dcdc ("afs: Overhaul cell database management")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-12 13:18:57 +02:00
Greg Kroah-Hartman 4718dcad7d xfs: fixes for 4.19-rc7
Update for 4.19-rc7 to fix numerous file clone and deduplication issues.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbvo1jAAoJEK3oKUf0dfodurwP/3OGvJo11XjE5Wuh5tbcSOUm
 duJdJvN+b0gx8Lb1IL2lHaBEt/Pw3PKi+KPx6d+pd47aflCYNMiU7I1kzejxvq9I
 TBVdhxffAmNBU02VU/qmhucMnKfpVr/UX19f0sHZcfXbZkpIuHrSpI25MNSYqbEs
 bRoMDkgbuu377hCJu6tnBAUm38z4iZdfJaGPVmJmuVMn8JJE8KA3Une/daxg0/HY
 zbCMWP3SGJwqx7SyyeurQSkRS/8GG3LgbnYOz0FlaHfBd4JVJscHOZU08nrYmMAN
 9SOowvahaPkDcyO8c6gRkyE6NIbOsb79718g+HTm4eqzyChnh+jnFTzitcTMuK2c
 vjblXZSDKnN/P4UaEEzIAnQ1Eew4WhLrKuBr+3fCr2bKTGfj0iiX36CjEgk1k+Df
 t1DV/Hj6me6crZpWO7GQZVLnDsiJBzaOsIgpYnB3vcJyof/cgm+5bfGFedDZFdpV
 XTh0oBN8B7oQztd4MvcfQOGXSktrMtq+6RomO8kv77mVtHFnjnroKHq/wIft2nyI
 rs7u7DFmCLyB9Lbm8aoeMPo4+0zxTp9385HSJ+XznHcjGXxkxIIGhhdU3omKxANa
 j3UQB1+VRC4lHSluUXOW7vH0j1tX3gaS3z/yJ+gfcAdizUmdp0h0rMx8c9VR8SEJ
 xv/fpbrcKHyBwuNvKtBm
 =yj1P
 -----END PGP SIGNATURE-----

Merge tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Dave writes:
  "xfs: fixes for 4.19-rc7

   Update for 4.19-rc7 to fix numerous file clone and deduplication issues."

* tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix data corruption w/ unaligned reflink ranges
  xfs: fix data corruption w/ unaligned dedupe ranges
  xfs: update ctime and remove suid before cloning files
  xfs: zero posteof blocks when cloning above eof
  xfs: refactor clonerange preparation into a separate helper
2018-10-11 07:17:42 +02:00
Andreas Gruenbacher dc480feb45 gfs2: Fix iomap buffered write support for journaled files
Commit 64bc06bb32 broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done.  However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs.  Fix
that by creating buffer heads ourself when needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-09 18:20:13 +02:00
Dan Williams d7782145e1 filesystem-dax: Fix dax_layout_busy_page() livelock
In the presence of multi-order entries the typical
pagevec_lookup_entries() pattern may loop forever:

	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
				min(end - index, (pgoff_t)PAGEVEC_SIZE),
				indices)) {
		...
		for (i = 0; i < pagevec_count(&pvec); i++) {
			index = indices[i];
			...
		}
		index++; /* BUG */
	}

The loop updates 'index' for each index found and then increments to the
next possible page to continue the lookup. However, if the last entry in
the pagevec is multi-order then the next possible page index is more
than 1 page away. Fix this locally for the filesystem-dax case by
checking for dax-multi-order entries. Going forward new users of
multi-order entries need to be similarly careful, or we need a generic
way to report the page increment in the radix iterator.

Fixes: 5fac7408d8 ("mm, fs, dax: handle layout changes to pinned dax...")
Cc: <stable@vger.kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-08 11:38:44 -07:00
Dave Chinner b39989009b xfs: fix data corruption w/ unaligned reflink ranges
When reflinking sub-file ranges, a data corruption can occur when
the source file range includes a partial EOF block. This shares the
unknown data beyond EOF into the second file at a position inside
EOF, exposing stale data in the second file.

XFS only supports whole block sharing, but we still need to
support whole file reflink correctly.  Hence if the reflink
request includes the last block of the souce file, only proceed with
the reflink operation if it lands at or past the destination file's
current EOF. If it lands within the destination file EOF, reject the
entire request with -EINVAL and make the caller go the hard way.

This avoids the data corruption vector, but also avoids disruption
of returning EINVAL to userspace for the common case of whole file
cloning.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-06 11:44:39 +10:00
Dave Chinner dceeb47b0e xfs: fix data corruption w/ unaligned dedupe ranges
A deduplication data corruption is Exposed by fstests generic/505 on
XFS. It is caused by extending the block match range to include the
partial EOF block, but then allowing unknown data beyond EOF to be
considered a "match" to data in the destination file because the
comparison is only made to the end of the source file. This corrupts
the destination file when the source extent is shared with it.

XFS only supports whole block dedupe, but we still need to appear to
support whole file dedupe correctly.  Hence if the dedupe request
includes the last block of the souce file, don't include it in the
actual XFS dedupe operation. If the rest of the range dedupes
successfully, then report the partial last block as deduped, too, so
that userspace sees it as a successful dedupe rather than return
EINVAL because we can't dedupe unaligned blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-06 11:44:19 +10:00
Ashish Samant cbe355f57c ocfs2: fix locking for res->tracking and dlm->tracking_list
In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock.  This can cause list
corruptions and can end up in kernel panic.

Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.

Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05 16:32:05 -07:00
Jann Horn f8a00cef17 proc: restrict kernel stack dumps to root
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU.  That means that
the stack can change under the stack walker.  The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers.  This can cause exposure of
kernel stack contents to userspace.

Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents.  See the added comment for a longer
rationale.

There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails.  Therefore, I believe
that this change is unlikely to break things.  In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.

Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05 16:32:05 -07:00
Larry Chen 69eb7765b9 ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages
is dirty.  When a page has not been written back, it is still in dirty
state.  If ocfs2_duplicate_clusters_by_page() is called against the dirty
page, the crash happens.

To fix this bug, we can just unlock the page and wait until the page until
its not dirty.

The following is the backtrace:

kernel BUG at /root/code/ocfs2/refcounttree.c:2961!
[exception RIP: ocfs2_duplicate_clusters_by_page+822]
__ocfs2_move_extent+0x80/0x450 [ocfs2]
? __ocfs2_claim_clusters+0x130/0x250 [ocfs2]
ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2]
__ocfs2_move_extents_range+0x2a4/0x470 [ocfs2]
ocfs2_move_extents+0x180/0x3b0 [ocfs2]
? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2]
ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2]
ocfs2_ioctl+0x253/0x640 [ocfs2]
do_vfs_ioctl+0x90/0x5f0
SyS_ioctl+0x74/0x80
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Once we find the page is dirty, we do not wait until it's clean, rather we
use write_one_page() to write it back

Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com
[lchen@suse.com: update comments]
  Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Larry Chen <lchen@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05 16:32:04 -07:00
Greg Kroah-Hartman 087f759a41 four small SMB3 fixes: one for stable, the others to address a more recent regression
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlu2jLMACgkQiiy9cAdy
 T1GLMQwAiUaUTtR0MN2ws+IHNnwFX5YnWFa1Cr/6Tgz9WBo5TrrvQcBGhTIUQok2
 7w2+I0rXPO7P7Tq2odWn66BT1/l0o2U10aB4OHYlS7zqfCAp6GuwKGtATivRqjsg
 ENn2mmBRuTmz+XtGmlVklmQpHjn4+MTR2GSLA57oo1fqX5YBeSxcq2UEciCGttRQ
 nAvn923NrPSbGRvL/eDDWGXas8gMATXnrNlN8st2AsxnoFC6CrzUuRM0s/IFgp6r
 Orh0jIlovlT2Q6LdND7n7xXnEVoyUTETG89IjRVwcBJ9LtXrJXpuWo7/sOr11cMy
 sGeUkILNlmkIdLtIlUaZcGaclq/yfeKdMxziBT8tWP6wXIH4LrGCzvZ9jDtrzygU
 EWL2nLsqMJNh0+wsYSZM/m9VFC1ReeB1AgIC2EHu/xzqWoTeL40i2OhKi/732FEX
 H+FlTZUoorhFoI96AMJMrqXjXyqZUZSQ1b5iJ7/jq888Vdey+m7HiMUtxv0+5yzC
 +lQksh8X
 =4PHD
 -----END PGP SIGNATURE-----

Merge tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Steve writes:
  "SMB3 fixes

   four small SMB3 fixes: one for stable, the others to address a more
   recent regression"

* tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix lease break problem introduced by compounding
  cifs: only wake the thread for the very last PDU in a compound
  cifs: add a warning if we try to to dequeue a deleted mid
  smb2: fix missing files in root share directory listing
2018-10-05 08:27:47 -07:00
Darrick J. Wong 7debbf015f xfs: update ctime and remove suid before cloning files
Before cloning into a file, update the ctime and remove sensitive
attributes like suid, just like we'd do for a regular file write.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-05 19:05:41 +10:00
Darrick J. Wong 410fdc72b0 xfs: zero posteof blocks when cloning above eof
When we're reflinking between two files and the destination file range
is well beyond the destination file's EOF marker, zero any posteof
speculative preallocations in the destination file so that we don't
expose stale disk contents.  The previous strategy of trying to clear
the preallocations does not work if the destination file has the
PREALLOC flag set.

Uncovered by shared/010.

Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-05 19:04:27 +10:00
Darrick J. Wong 0d41e1d28c xfs: refactor clonerange preparation into a separate helper
Refactor all the reflink preparation steps into a separate helper
that we'll use to land all the upcoming fixes for insufficient input
checks.

This rework also moves the invalidation of the destination range to
the prep function so that it is done before the range is remapped.
This ensures that nobody can access the data in range being remapped
until the remap is complete.

[dgc: fix xfs_reflink_remap_prep() return value and caller check to
handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to
do". ]

[dgc: make sure length changed by vfs_clone_file_prep_inodes() gets
propagated back to XFS code that does the remapping. ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-05 19:04:22 +10:00
Greg Kroah-Hartman 010bd965f9 overlayfs fixes for 4.19-rc7
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCW7YOCQAKCRDh3BK/laaZ
 PAmGAQC1/3+mkfHXqyDx+zv4f7A348ZULW26EWFWwMwlEljWowD+PlRFBgUu5v5c
 194mCApZU7hl6o0u07aijjz6m81e6QE=
 =dM89
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Miklos writes:
  "overlayfs fixes for 4.19-rc7

   This update fixes a couple of regressions in the stacked file update
   added in this cycle, as well as some older bugs uncovered by
   syzkaller.

   There's also one trivial naming change that touches other parts of
   the fs subsystem."

* tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix format of setxattr debug
  ovl: fix access beyond unterminated strings
  ovl: make symbol 'ovl_aops' static
  vfs: swap names of {do,vfs}_clone_file_range()
  ovl: fix freeze protection bypass in ovl_clone_file_range()
  ovl: fix freeze protection bypass in ovl_write_iter()
  ovl: fix memory leak on unlink of indexed file
2018-10-04 13:24:38 -07:00