Commit Graph

709 Commits

Author SHA1 Message Date
Andy Shevchenko 807fc790aa btrfs: switch to use new generic UUID API
There are new types and helpers that are supposed to be used in new code.

As a preparation to get rid of legacy types and API functions do
the conversion here.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:47 +01:00
David Sterba 56e9f6ea32 btrfs: merge unlocking to common exit block in btrfs_commit_transaction
The tree_log_mutex and reloc_mutex locks are properly nested so we can
simplify error handling and add labels for them. This reduces line count
of the function.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:46 +01:00
Nikolay Borisov fe119a6eeb btrfs: switch to per-transaction pinned extents
This commit flips the switch to start tracking/processing pinned extents
on a per-transaction basis. It mostly replaces all references from
btrfs_fs_info::(pinned_extents|freed_extents[]) to
btrfs_transaction::pinned_extents.

Two notable modifications that warrant explicit mention are changing
clean_pinned_extents to get a reference to the previously running
transaction. The other one is removal of call to
btrfs_destroy_pinned_extent since transactions are going to be cleaned
in btrfs_cleanup_one_transaction.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:38 +01:00
David Sterba bf31f87f71 btrfs: add wrapper for transaction abort predicate
The status of aborted transaction can change between calls and it needs
to be accessed by READ_ONCE. Add a helper that also wraps the unlikely
hint.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:34 +01:00
Josef Bacik bc44d7c4b2 btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root
Now that all callers of btrfs_get_fs_root are subsequently calling
btrfs_grab_fs_root and handling dropping the ref when they are done
appropriately, go ahead and push btrfs_grab_fs_root up into
btrfs_get_fs_root.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:32 +01:00
Josef Bacik 5119cfc36f btrfs: hold a ref on the root in create_pending_snapshot
We create the snapshot and then use it for a bunch of things, we need to
hold a ref on it while we're messing with it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:31 +01:00
Josef Bacik 3619c94f07 btrfs: open code btrfs_read_fs_root_no_name
All this does is call btrfs_get_fs_root() with check_ref == true.  Just
use btrfs_get_fs_root() so we don't have a bunch of different helpers
that do the same thing.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:26 +01:00
Jeff Mahoney 81f7eb00ff btrfs: destroy qgroup extent records on transaction abort
We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.

This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.

Fixes: 3368d001ba ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ Rebased to latest upstream, remove to_qgroup() helper, use
  rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-19 00:35:54 +01:00
Josef Bacik d62b23c949 btrfs: set trans->drity in btrfs_commit_transaction
If we abort a transaction we have the following sequence

if (!trans->dirty && list_empty(&trans->new_bgs))
	return;
WRITE_ONCE(trans->transaction->aborted, err);

The idea being if we didn't modify anything with our trans handle then
we don't really need to abort the whole transaction, maybe the other
trans handles are fine and we can carry on.

However in the case of create_snapshot we add a pending_snapshot object
to our transaction and then commit the transaction.  We don't actually
modify anything.  sync() behaves the same way, attach to an existing
transaction and commit it.  This means that if we have an IO error in
the right places we could abort the committing transaction with our
trans->dirty being not set and thus not set transaction->aborted.

This is a problem because in the create_snapshot() case we depend on
pending->error being set to something, or btrfs_commit_transaction
returning an error.

If we are not the trans handle that gets to commit the transaction, and
we're waiting on the commit to happen we get our return value from
cur_trans->aborted.  If this was not set to anything because sync() hit
an error in the transaction commit before it could modify anything then
cur_trans->aborted would be 0.  Thus we'd return 0 from
btrfs_commit_transaction() in create_snapshot.

This is a problem because we then try to do things with
pending_snapshot->snap, which will be NULL because we didn't create the
snapshot, and then we'll get a NULL pointer dereference like the
following

"BUG: kernel NULL pointer dereference, address: 00000000000001f0"
RIP: 0010:btrfs_orphan_cleanup+0x2d/0x330
Call Trace:
 ? btrfs_mksubvol.isra.31+0x3f2/0x510
 btrfs_mksubvol.isra.31+0x4bc/0x510
 ? __sb_start_write+0xfa/0x200
 ? mnt_want_write_file+0x24/0x50
 btrfs_ioctl_snap_create_transid+0x16c/0x1a0
 btrfs_ioctl_snap_create_v2+0x11e/0x1a0
 btrfs_ioctl+0x1534/0x2c10
 ? free_debug_processing+0x262/0x2a3
 do_vfs_ioctl+0xa6/0x6b0
 ? do_sys_open+0x188/0x220
 ? syscall_trace_enter+0x1f8/0x330
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4a/0x1b0

In order to fix this we need to make sure anybody who calls
commit_transaction has trans->dirty set so that they properly set the
trans->transaction->aborted value properly so any waiters know bad
things happened.

This was found while I was running generic/475 with my modified
fsstress, it reproduced within a few runs.  I ran with this patch all
night and didn't see the problem again.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-23 17:24:37 +01:00
Josef Bacik 889bfa3908 btrfs: drop log root for dropped roots
If we fsync on a subvolume and create a log root for that volume, and
then later delete that subvolume we'll never clean up its log root.  Fix
this by making switch_commit_roots free the log for any dropped roots we
encounter.  The extra churn is because we need a btrfs_trans_handle, not
the btrfs_transaction.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-23 17:24:36 +01:00
David Sterba 32da5386d9 btrfs: rename btrfs_block_group_cache
The type name is misleading, a single entry is named 'cache' while this
normally means a collection of objects. Rename that everywhere. Also the
identifier was quite long, making function prototypes harder to format.

Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:51 +01:00
Nikolay Borisov 8d510121bf btrfs: Rename btrfs_join_transaction_nolock
This function is used only during the final phase of freespace cache
writeout. This is necessary since using the plain btrfs_join_transaction
api is deadlock prone. The deadlock looks like:

T1:
btrfs_commit_transaction
  commit_cowonly_roots
    btrfs_write_dirty_block_groups
      btrfs_wait_cache_io
        __btrfs_wait_cache_io
       btrfs_wait_ordered_range <-- Triggers ordered IO for freespace
                                    inode and blocks transaction commit
				    until freespace cache writeout

T2: <-- after T1 has triggered the writeout
finish_ordered_fn
  btrfs_finish_ordered_io
    btrfs_join_transaction <--- this would block waiting for current
                                transaction to commit, but since trans
				commit is waiting for this writeout to
				finish

The special purpose functions prevents it by simply skipping the "wait
for writeout" since it's guaranteed the transaction won't proceed until
we are done.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:54 +01:00
Qu Wenruo 3296bf5624 btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED
The state was introduced in commit 4a9d8bdee3 ("Btrfs: make the state
of the transaction more readable"), then in commit 302167c50b
("btrfs: don't end the transaction for delayed refs in throttle") the
state is completely removed.

So we can just clean up the state since it's only compared but never
set.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:50 +01:00
Qu Wenruo 61c047b541 btrfs: transaction: describe transaction states and transitions
Add an overview of the basic btrfs transaction transitions, including
the following states:

- No transaction states
- Transaction N [[TRANS_STATE_RUNNING]]
- Transaction N [[TRANS_STATE_COMMIT_START]]
- Transaction N [[TRANS_STATE_COMMIT_DOING]]
- Transaction N [[TRANS_STATE_UNBLOCKED]]
- Transaction N [[TRANS_STATE_COMPLETED]]

For each state, the comment will include:

- Basic explaination about current state
- How to go next stage
- What will happen if we call various start_transaction() functions
- Relationship to transaction N+1

This doesn't provide tech details, but serves as a cheat sheet for
reader to get into the code a little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:50 +01:00
Filipe Manana b9fae2ebee Btrfs: make btrfs_wait_extents() static
It's not used ouside of transaction.c

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:45 +01:00
Nikolay Borisov 35b814f3c5 btrfs: Add assert to catch nested transaction commit
A recent patch to btrfs showed that there was at least 1 case where a
nested transaction was committed. Nested transaction in this case means
a code which has a transaction handle calls some function which in turn
obtains a copy of the same transaction handle. In such cases the correct
thing to do is for the lower callee to call btrfs_end_transaction which
contains appropriate checks so as to not commit the transaction which
will result in stale trans handler for the caller.

To catch such cases add an assert in btrfs_commit_transaction ensuring
btrfs_trans_handle::use_count is always 1.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:45 +01:00
David Sterba 602cbe91fb btrfs: move cond_wake_up functions out of ctree
The file ctree.h serves as a header for everything and has become quite
bloated. Split some helpers that are generic and create a new file that
should be the catch-all for code that's not btrfs-specific.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:15 +02:00
Josef Bacik 2bd36e7b4f btrfs: rename the btrfs_calc_*_metadata_size helpers
btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
it doesn't take into account any splitting at the levels, because
truncate will never split nodes.  However truncate _and_ changing will
never split nodes, so rename btrfs_calc_trunc_metadata_size to
btrfs_calc_metadata_size.  Also btrfs_calc_trans_metadata_size is purely
for inserting items, so rename this to btrfs_calc_insert_metadata_size.
Making these clearer will help when I start using them differently in
upcoming patches.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:13 +02:00
Josef Bacik aac0023c21 btrfs: move basic block_group definitions to their own header
This is prep work for moving all of the block group cache code into its
own file.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor comment updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:03 +02:00
Filipe Manana a6d155d2e3 Btrfs: fix deadlock between fiemap and transaction commits
The fiemap handler locks a file range that can have unflushed delalloc,
and after locking the range, it tries to attach to a running transaction.
If the running transaction started its commit, that is, it is in state
TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the
flushoncommit option or the transaction is creating a snapshot for the
subvolume that contains the file that fiemap is operating on, we end up
deadlocking. This happens because fiemap is blocked on the transaction,
waiting for it to complete, and the transaction is waiting for the flushed
dealloc to complete, which requires locking the file range that the fiemap
task already locked. The following stack traces serve as an example of
when this deadlock happens:

  (...)
  [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
  [404571.515956] Call Trace:
  [404571.516360]  ? __schedule+0x3ae/0x7b0
  [404571.516730]  schedule+0x3a/0xb0
  [404571.517104]  lock_extent_bits+0x1ec/0x2a0 [btrfs]
  [404571.517465]  ? remove_wait_queue+0x60/0x60
  [404571.517832]  btrfs_finish_ordered_io+0x292/0x800 [btrfs]
  [404571.518202]  normal_work_helper+0xea/0x530 [btrfs]
  [404571.518566]  process_one_work+0x21e/0x5c0
  [404571.518990]  worker_thread+0x4f/0x3b0
  [404571.519413]  ? process_one_work+0x5c0/0x5c0
  [404571.519829]  kthread+0x103/0x140
  [404571.520191]  ? kthread_create_worker_on_cpu+0x70/0x70
  [404571.520565]  ret_from_fork+0x3a/0x50
  [404571.520915] kworker/u8:6    D    0 31651      2 0x80004000
  [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
  (...)
  [404571.537000] fsstress        D    0 13117  13115 0x00004000
  [404571.537263] Call Trace:
  [404571.537524]  ? __schedule+0x3ae/0x7b0
  [404571.537788]  schedule+0x3a/0xb0
  [404571.538066]  wait_current_trans+0xc8/0x100 [btrfs]
  [404571.538349]  ? remove_wait_queue+0x60/0x60
  [404571.538680]  start_transaction+0x33c/0x500 [btrfs]
  [404571.539076]  btrfs_check_shared+0xa3/0x1f0 [btrfs]
  [404571.539513]  ? extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.539866]  extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.540170]  do_vfs_ioctl+0x526/0x6f0
  [404571.540436]  ksys_ioctl+0x70/0x80
  [404571.540734]  __x64_sys_ioctl+0x16/0x20
  [404571.540997]  do_syscall_64+0x60/0x1d0
  [404571.541279]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)
  [404571.543729] btrfs           D    0 14210  14208 0x00004000
  [404571.544023] Call Trace:
  [404571.544275]  ? __schedule+0x3ae/0x7b0
  [404571.544526]  ? wait_for_completion+0x112/0x1a0
  [404571.544795]  schedule+0x3a/0xb0
  [404571.545064]  schedule_timeout+0x1ff/0x390
  [404571.545351]  ? lock_acquire+0xa6/0x190
  [404571.545638]  ? wait_for_completion+0x49/0x1a0
  [404571.545890]  ? wait_for_completion+0x112/0x1a0
  [404571.546228]  wait_for_completion+0x131/0x1a0
  [404571.546503]  ? wake_up_q+0x70/0x70
  [404571.546775]  btrfs_wait_ordered_extents+0x27c/0x400 [btrfs]
  [404571.547159]  btrfs_commit_transaction+0x3b0/0xae0 [btrfs]
  [404571.547449]  ? btrfs_mksubvol+0x4a4/0x640 [btrfs]
  [404571.547703]  ? remove_wait_queue+0x60/0x60
  [404571.547969]  btrfs_mksubvol+0x605/0x640 [btrfs]
  [404571.548226]  ? __sb_start_write+0xd4/0x1c0
  [404571.548512]  ? mnt_want_write_file+0x24/0x50
  [404571.548789]  btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs]
  [404571.549048]  btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs]
  [404571.549307]  btrfs_ioctl+0x133f/0x3150 [btrfs]
  [404571.549549]  ? mem_cgroup_charge_statistics+0x4c/0xd0
  [404571.549792]  ? mem_cgroup_commit_charge+0x84/0x4b0
  [404571.550064]  ? __handle_mm_fault+0xe3e/0x11f0
  [404571.550306]  ? do_raw_spin_unlock+0x49/0xc0
  [404571.550608]  ? _raw_spin_unlock+0x24/0x30
  [404571.550976]  ? __handle_mm_fault+0xedf/0x11f0
  [404571.551319]  ? do_vfs_ioctl+0xa2/0x6f0
  [404571.551659]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [404571.552087]  do_vfs_ioctl+0xa2/0x6f0
  [404571.552355]  ksys_ioctl+0x70/0x80
  [404571.552621]  __x64_sys_ioctl+0x16/0x20
  [404571.552864]  do_syscall_64+0x60/0x1d0
  [404571.553104]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)

If we were joining the transaction instead of attaching to it, we would
not risk a deadlock because a join only blocks if the transaction is in a
state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc
flush performed by a transaction is done before it reaches that state,
when it is in the state TRANS_STATE_COMMIT_START. However a transaction
join is intended for use cases where we do modify the filesystem, and
fiemap only needs to peek at delayed references from the current
transaction in order to determine if extents are shared, and, besides
that, when there is no current transaction or when it blocks to wait for
a current committing transaction to complete, it creates a new transaction
without reserving any space. Such unnecessary transactions, besides doing
unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary
rotation of the precious backup roots.

So fix this by adding a new transaction join variant, named join_nostart,
which behaves like the regular join, but it does not create a transaction
when none currently exists or after waiting for a committing transaction
to complete.

Fixes: 03628cdbc6 ("Btrfs: do not start a transaction during fiemap")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30 18:25:12 +02:00
Filipe Manana cb2d3daddb Btrfs: fix race leading to fs corruption after transaction abort
When one transaction is finishing its commit, it is possible for another
transaction to start and enter its initial commit phase as well. If the
first ends up getting aborted, we have a small time window where the second
transaction commit does not notice that the previous transaction aborted
and ends up committing, writing a superblock that points to btrees that
reference extent buffers (nodes and leafs) that were not persisted to disk.
The consequence is that after mounting the filesystem again, we will be
unable to load some btree nodes/leafs, either because the content on disk
is either garbage (or just zeroes) or corresponds to the old content of a
previouly COWed or deleted node/leaf, resulting in the well known error
messages "parent transid verify failed on ...".
The following sequence diagram illustrates how this can happen.

        CPU 1                                           CPU 2

 <at transaction N>

 btrfs_commit_transaction()
   (...)
   --> sets transaction state to
       TRANS_STATE_UNBLOCKED
   --> sets fs_info->running_transaction
       to NULL

                                                    (...)
                                                    btrfs_start_transaction()
                                                      start_transaction()
                                                        wait_current_trans()
                                                          --> returns immediately
                                                              because
                                                              fs_info->running_transaction
                                                              is NULL
                                                        join_transaction()
                                                          --> creates transaction N + 1
                                                          --> sets
                                                              fs_info->running_transaction
                                                              to transaction N + 1
                                                          --> adds transaction N + 1 to
                                                              the fs_info->trans_list list
                                                        --> returns transaction handle
                                                            pointing to the new
                                                            transaction N + 1
                                                    (...)

                                                    btrfs_sync_file()
                                                      btrfs_start_transaction()
                                                        --> returns handle to
                                                            transaction N + 1
                                                      (...)

   btrfs_write_and_wait_transaction()
     --> writeback of some extent
         buffer fails, returns an
	 error
   btrfs_handle_fs_error()
     --> sets BTRFS_FS_STATE_ERROR in
         fs_info->fs_state
   --> jumps to label "scrub_continue"
   cleanup_transaction()
     btrfs_abort_transaction(N)
       --> sets BTRFS_FS_STATE_TRANS_ABORTED
           flag in fs_info->fs_state
       --> sets aborted field in the
           transaction and transaction
	   handle structures, for
           transaction N only
     --> removes transaction from the
         list fs_info->trans_list
                                                      btrfs_commit_transaction(N + 1)
                                                        --> transaction N + 1 was not
							    aborted, so it proceeds
                                                        (...)
                                                        --> sets the transaction's state
                                                            to TRANS_STATE_COMMIT_START
                                                        --> does not find the previous
                                                            transaction (N) in the
                                                            fs_info->trans_list, so it
                                                            doesn't know that transaction
                                                            was aborted, and the commit
                                                            of transaction N + 1 proceeds
                                                        (...)
                                                        --> sets transaction N + 1 state
                                                            to TRANS_STATE_UNBLOCKED
                                                        btrfs_write_and_wait_transaction()
                                                          --> succeeds writing all extent
                                                              buffers created in the
                                                              transaction N + 1
                                                        write_all_supers()
                                                           --> succeeds
                                                           --> we now have a superblock on
                                                               disk that points to trees
                                                               that refer to at least one
                                                               extent buffer that was
                                                               never persisted

So fix this by updating the transaction commit path to check if the flag
BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
transaction in the fs_info->trans_list. If the flag is set, just fail the
transaction commit with -EROFS, as we do in other places. The exact error
code for the previous transaction abort was already logged and reported.

Fixes: 49b25e0540 ("btrfs: enhance transaction abort infrastructure")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30 18:25:12 +02:00
Josef Bacik fb6dea2660 btrfs: migrate btrfs_trans_release_chunk_metadata
Move this into transaction.c with the rest of the transaction related
code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-04 17:26:17 +02:00
Filipe Manana 74f657d89c Btrfs: remove no longer used member num_dirty_bgs from transaction
The member num_dirty_bgs of struct btrfs_transaction is not used anymore,
it is set and incremented but nothing reads its value anymore. Its last
read use was removed by commit 64403612b7 ("btrfs: rework
btrfs_check_space_for_delayed_refs"). So just remove that member.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:43 +02:00
David Sterba 2b584c688b btrfs: get fs_info from trans in btrfs_run_dev_replace
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:43 +02:00
David Sterba 196c9d8de8 btrfs: get fs_info from trans in btrfs_run_dev_stats
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:43 +02:00
David Sterba 5742d15fa7 btrfs: get fs_info from trans in btrfs_write_dirty_block_groups
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:40 +02:00
David Sterba bbebb3e0ba btrfs: get fs_info from trans in btrfs_setup_space_cache
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:40 +02:00
Jeff Mahoney 1c11b63eff btrfs: replace pending/pinned chunks lists with io tree
The pending chunks list contains chunks that are allocated in the
current transaction but haven't been created yet. The pinned chunks
list contains chunks that are being released in the current transaction.
Both describe chunks that are not reflected on disk as in use but are
unavailable just the same.

The pending chunks list is anchored by the transaction handle, which
means that we need to hold a reference to a transaction when working
with the list.

The way we use them is by iterating over both lists to perform
comparisons on the stripes they describe for each device. This is
backwards and requires that we keep a transaction handle open while
we're trimming.

This patchset adds an extent_io_tree to btrfs_device that maintains
the allocation state of the device.  Extents are set dirty when
chunks are first allocated -- when the extent maps are added to the
mapping tree. They're cleared when last removed -- when the extent
maps are removed from the mapping tree. This matches the lifespan
of the pending and pinned chunks list and allows us to do trims
on unallocated space safely without pinning the transaction for what
may be a lengthy operation. We can also use this io tree to mark
which chunks have already been trimmed so we don't repeat the operation.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:37 +02:00
Nikolay Borisov 41e7acd38c btrfs: Rename and export clear_btree_io_tree
This function is going to be used to clear out the device extent
allocation information. Give it a more generic name and export it. This
is in preparation to replacing the pending/pinned chunk lists with an
extent tree. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:36 +02:00
Nikolay Borisov bbbf7243d6 btrfs: combine device update operations during transaction commit
We currently overload the pending_chunks list to handle updating
btrfs_device->commit_bytes used.  We don't actually care about the
extent mapping or even the device mapping for the chunk - we just need
the device, and we can end up processing it multiple times.  The
fs_devices->resized_list does more or less the same thing, but with the
disk size.  They are called consecutively during commit and have more or
less the same purpose.

We can combine the two lists into a single list that attaches to the
transaction and contains a list of devices that need updating.  Since we
always add the device to a list when we change bytes_used or
disk_total_size, there's no harm in copying both values at once.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:36 +02:00
Qu Wenruo 43eb5f2975 btrfs: Introduce extent_io_tree::owner to distinguish different io_trees
Btrfs has the following different extent_io_trees used:

- fs_info::free_extents[2]
- btrfs_inode::io_tree - for both normal inodes and the btree inode
- btrfs_inode::io_failure_tree
- btrfs_transaction::dirty_pages
- btrfs_root::dirty_log_pages

If we want to trace changes in those trees, it will be pretty hard to
distinguish them.

Instead of using hard-to-read pointer address, this patch will introduce
a new member extent_io_tree::owner to track the owner.

This modification needs all the callers of extent_io_tree_init() to
accept a new parameter @owner.

This patch provides the basis for later trace events.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:18 +02:00
Qu Wenruo c258d6e364 btrfs: Introduce fs_info to extent_io_tree
This patch will add a new member fs_info to extent_io_tree.

This provides the basis for later trace events to distinguish the output
between different btrfs filesystems. While this increases the size of
the structure, we want to know the source of the trace events and
passing the fs_info as an argument to all contexts is not possible.

The selftests are now allowed to set it to NULL as they don't use the
tracepoints.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:18 +02:00
Filipe Manana 609e804d77 Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
When we are mixing buffered writes with direct IO writes against the same
file and snapshotting is happening concurrently, we can end up with a
corrupt file content in the snapshot. Example:

1) Inode/file is empty.

2) Snapshotting starts.

2) Buffered write at offset 0 length 256Kb. This updates the i_size of the
   inode to 256Kb, disk_i_size remains zero. This happens after the task
   doing the snapshot flushes all existing delalloc.

3) DIO write at offset 256Kb length 768Kb. Once the ordered extent
   completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and
   updates the inode item in the fs tree with a size of 1Mb (which is
   the value of disk_i_size).

4) The dealloc for the range [0, 256Kb[ did not start yet.

5) The transaction used in the DIO ordered extent completion, which updated
   the inode item, is committed by the snapshotting task.

6) Snapshot creation completes.

7) Dealloc for the range [0, 256Kb[ is flushed.

After that when reading the file from the snapshot we always get zeroes for
the range [0, 256Kb[, the file has a size of 1Mb and the data written by
the direct IO write is found. From an application's point of view this is
a corruption, since in the source subvolume it could never read a version
of the file that included the data from the direct IO write without the
data from the buffered write included as well. In the snapshot's tree,
file extent items are missing for the range [0, 256Kb[.

The issue, obviously, does not happen when using the -o flushoncommit
mount option.

Fix this by flushing delalloc for all the roots that are about to be
snapshotted when committing a transaction. This guarantees total ordering
when updating the disk_i_size of an inode since the flush for dealloc is
done when a transaction is in the TRANS_STATE_COMMIT_START state and wait
is done once no more external writers exist. This is similar to what we
do when using the flushoncommit mount option, but we do it only if the
transaction has snapshots to create and only for the roots of the
subvolumes to be snapshotted. The bulk of the dealloc is flushed in the
snapshot creation ioctl, so the flush work we do inside the transaction
is minimized.

This issue, involving buffered and direct IO writes with snapshotting, is
often triggered by fstest btrfs/078, and got reported by fsck when not
using the NO_HOLES features, for example:

  $ cat results/btrfs/078.full
  (...)
  _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
  *** fsck.btrfs output ***
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space cache
  [4/7] checking fs roots
  root 258 inode 264 errors 100, file extent discount
  Found file extent holes:
        start: 524288, len: 65536
  ERROR: errors found in fs roots

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-13 17:13:48 +01:00
David Sterba 8bead25820 btrfs: open code now trivial btrfs_set_lock_blocking
btrfs_set_lock_blocking is now only a simple wrapper around
btrfs_set_lock_blocking_write. The name does not bring any semantic
value that could not be inferred from the new function so there's no
point keeping it.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:27 +01:00
Qu Wenruo 370a11b811 btrfs: qgroup: Introduce per-root swapped blocks infrastructure
To allow delayed subtree swap rescan, btrfs needs to record per-root
information about which tree blocks get swapped.  This patch introduces
the required infrastructure.

The designed workflow will be:

1) Record the subtree root block that gets swapped.

   During subtree swap:
   O = Old tree blocks
   N = New tree blocks
         reloc tree                         subvolume tree X
            Root                               Root
           /    \                             /    \
         NA     OB                          OA      OB
       /  |     |  \                      /  |      |  \
     NC  ND     OE  OF                   OC  OD     OE  OF

  In this case, NA and OA are going to be swapped, record (NA, OA) into
  subvolume tree X.

2) After subtree swap.
         reloc tree                         subvolume tree X
            Root                               Root
           /    \                             /    \
         OA     OB                          NA      OB
       /  |     |  \                      /  |      |  \
     OC  OD     OE  OF                   NC  ND     OE  OF

3a) COW happens for OB
    If we are going to COW tree block OB, we check OB's bytenr against
    tree X's swapped_blocks structure.
    If it doesn't fit any, nothing will happen.

3b) COW happens for NA
    Check NA's bytenr against tree X's swapped_blocks, and get a hit.
    Then we do subtree scan on both subtrees OA and NA.
    Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).

    Then no matter what we do to subvolume tree X, qgroup numbers will
    still be correct.
    Then NA's record gets removed from X's swapped_blocks.

4)  Transaction commit
    Any record in X's swapped_blocks gets removed, since there is no
    modification to swapped subtrees, no need to trigger heavy qgroup
    subtree rescan for them.

This will introduce 128 bytes overhead for each btrfs_root even qgroup
is not enabled. This is to reduce memory allocations and potential
failures.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:26 +01:00
Josef Bacik 119e80df7d btrfs: call btrfs_create_pending_block_groups unconditionally
The first thing we do is loop through the list, this

if (!list_empty())
	btrfs_create_pending_block_groups();

thing is just wasted space.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:25 +01:00
David Sterba c7cc64a985 btrfs: clean up pending block groups when transaction commit aborts
The fstests generic/475 stresses transaction aborts and can reveal
space accounting or use-after-free bugs regarding block goups.

In this case the pending block groups that remain linked to the
structures after transaction commit aborts in the middle.

The corrupted slabs lead to failures in following tests, eg. generic/476

  [ 8172.752887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  [ 8172.755799] #PF error: [normal kernel read fault]
  [ 8172.757571] PGD 661ae067 P4D 661ae067 PUD 3db8e067 PMD 0
  [ 8172.759000] Oops: 0000 [#1] PREEMPT SMP
  [ 8172.760209] CPU: 0 PID: 39 Comm: kswapd0 Tainted: G        W         5.0.0-rc2-default #408
  [ 8172.762495] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8172.765772] RIP: 0010:shrink_page_list+0x2f9/0xe90
  [ 8172.770453] RSP: 0018:ffff967f00663b18 EFLAGS: 00010287
  [ 8172.771184] RAX: 0000000000000000 RBX: ffff967f00663c20 RCX: 0000000000000000
  [ 8172.772850] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c0620ab20e0
  [ 8172.774629] RBP: ffff967f00663dd8 R08: 0000000000000000 R09: 0000000000000000
  [ 8172.776094] R10: ffff8c0620ab22f8 R11: ffff8c063f772688 R12: ffff967f00663b78
  [ 8172.777533] R13: ffff8c063f625600 R14: ffff8c063f625608 R15: dead000000000200
  [ 8172.778886] FS:  0000000000000000(0000) GS:ffff8c063d400000(0000) knlGS:0000000000000000
  [ 8172.780545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8172.781787] CR2: 0000000000000058 CR3: 000000004e962000 CR4: 00000000000006f0
  [ 8172.783547] Call Trace:
  [ 8172.784112]  shrink_inactive_list+0x194/0x410
  [ 8172.784747]  shrink_node_memcg.constprop.85+0x3a5/0x6a0
  [ 8172.785472]  shrink_node+0x62/0x1e0
  [ 8172.786011]  balance_pgdat+0x216/0x460
  [ 8172.786577]  kswapd+0xe3/0x4a0
  [ 8172.787085]  ? finish_wait+0x80/0x80
  [ 8172.787795]  ? balance_pgdat+0x460/0x460
  [ 8172.788799]  kthread+0x116/0x130
  [ 8172.789640]  ? kthread_create_on_node+0x60/0x60
  [ 8172.790323]  ret_from_fork+0x24/0x30
  [ 8172.794253] CR2: 0000000000000058

or accounting errors at umount time:

  [ 8159.537251] WARNING: CPU: 2 PID: 19031 at fs/btrfs/extent-tree.c:5987 btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.543325] CPU: 2 PID: 19031 Comm: umount Tainted: G        W         5.0.0-rc2-default #408
  [ 8159.545472] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8159.548155] RIP: 0010:btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.554030] RSP: 0018:ffff967f079cbde8 EFLAGS: 00010206
  [ 8159.555144] RAX: 0000000001000000 RBX: ffff8c06366cf800 RCX: 0000000000000000
  [ 8159.556730] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8c06255ad800
  [ 8159.558279] RBP: ffff8c0637ac0000 R08: 0000000000000001 R09: 0000000000000000
  [ 8159.559797] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8c0637ac0108
  [ 8159.561296] R13: ffff8c0637ac0158 R14: 0000000000000000 R15: dead000000000100
  [ 8159.562852] FS:  00007f7f693b9fc0(0000) GS:ffff8c063d800000(0000) knlGS:0000000000000000
  [ 8159.564839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8159.566160] CR2: 00007f7f68fab7b0 CR3: 000000000aec7000 CR4: 00000000000006e0
  [ 8159.567898] Call Trace:
  [ 8159.568597]  close_ctree+0x17f/0x350 [btrfs]
  [ 8159.569628]  generic_shutdown_super+0x64/0x100
  [ 8159.570808]  kill_anon_super+0x14/0x30
  [ 8159.571857]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8159.573063]  deactivate_locked_super+0x29/0x60
  [ 8159.574234]  cleanup_mnt+0x3b/0x70
  [ 8159.575176]  task_work_run+0x98/0xc0
  [ 8159.576177]  exit_to_usermode_loop+0x83/0x90
  [ 8159.577315]  do_syscall_64+0x15b/0x180
  [ 8159.578339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This fix is based on 2 Josef's patches that used sideefects of
btrfs_create_pending_block_groups, this fix introduces the helper that
does what we need.

CC: stable@vger.kernel.org # 4.4+
CC: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:47 +01:00
Josef Bacik 302167c50b btrfs: don't end the transaction for delayed refs in throttle
Previously callers to btrfs_end_transaction_throttle() would commit the
transaction if there wasn't enough delayed refs space.  This happens in
relocation, and if the fs is relatively empty we'll run out of delayed
refs space basically immediately, so we'll just be stuck in this loop of
committing the transaction over and over again.

This code existed because we didn't have a good feedback mechanism for
running delayed refs, but with the delayed refs rsv we do now.  Delete
this throttling code and let the btrfs_start_transaction() in relocation
deal with putting pressure on the delayed refs infrastructure.  With
this patch we no longer take 5 minutes to balance a metadata only fs.

Qu has submitted a fstest to catch slow balance or excessive transaction
commits. Steps to reproduce:

* create subvolume
* create many (eg. 16000) inlined files, of size 2KiB
* iteratively snapshot and touch several files to trigger metadata
  updates
* start balance -m

Reported-by: Qu Wenruo <wqu@suse.com>
Fixes: 64403612b7 ("btrfs: rework btrfs_check_space_for_delayed_refs")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add tags and steps to reproduce ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-28 15:41:11 +01:00
Andrea Gelmini 52042d8e82 btrfs: Fix typos in comments and strings
The typos accumulate over time so once in a while time they get fixed in
a large patch.

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:50 +01:00
Josef Bacik db2462a6ad btrfs: don't run delayed refs in the end transaction logic
Over the years we have built up a lot of infrastructure to keep delayed
refs in check, mostly by running them at btrfs_end_transaction() time.
We have a lot of different maths we do to figure out how much, if we
should do it inline or async, etc.  This existed because we had no
feedback mechanism to force the flushing of delayed refs when they
became a problem.  However with the enospc flushing infrastructure in
place for flushing delayed refs when they put too much pressure on the
enospc system we have this problem solved.  Rip out all of this code as
it is no longer needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:47 +01:00
Josef Bacik 64403612b7 btrfs: rework btrfs_check_space_for_delayed_refs
Now with the delayed_refs_rsv we can now know exactly how much pending
delayed refs space we need.  This means we can drastically simplify
btrfs_check_space_for_delayed_refs by simply checking how much space we
have reserved for the global rsv (which acts as a spill over buffer) and
the delayed refs rsv.  If our total size is beyond that amount then we
know it's time to commit the transaction and stop any more delayed refs
from being generated.

With the introduction of dealyed_refs_rsv infrastructure, namely
btrfs_update_delayed_refs_rsv we now know exactly how much pending
delayed refs space is required.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:47 +01:00
Josef Bacik ba2c4d4e3b btrfs: introduce delayed_refs_rsv
Traditionally we've had voodoo in btrfs to account for the space that
delayed refs may take up by having a global_block_rsv.  This works most
of the time, except when it doesn't.  We've had issues reported and seen
in production where sometimes the global reserve is exhausted during
transaction commit before we can run all of our delayed refs, resulting
in an aborted transaction.  Because of this voodoo we have equally
dubious flushing semantics around throttling delayed refs which we often
get wrong.

So instead give them their own block_rsv.  This way we can always know
exactly how much outstanding space we need for delayed refs.  This
allows us to make sure we are constantly filling that reservation up
with space, and allows us to put more precise pressure on the enospc
system.  Instead of doing math to see if its a good time to throttle,
the normal enospc code will be invoked if we have a lot of delayed refs
pending, and they will be run via the normal flushing mechanism.

For now the delayed_refs_rsv will hold the reservations for the delayed
refs, the block group updates, and deleting csums.  We could have a
separate rsv for the block group updates, but the csum deletion stuff is
still handled via the delayed_refs so that will stay there.

Historical background:

The global reserve has grown to cover everything we don't reserve space
explicitly for, and we've grown a lot of weird ad-hoc heuristics to know
if we're running short on space and when it's time to force a commit.  A
failure rate of 20-40 file systems when we run hundreds of thousands of
them isn't super high, but cleaning up this code will make things less
ugly and more predictible.

Thus the delayed refs rsv.  We always know how many delayed refs we have
outstanding, and although running them generates more we can use the
global reserve for that spill over, which fits better into it's desired
use than a full blown reservation.  This first approach is to simply
take how many times we're reserving space for and multiply that by 2 in
order to save enough space for the delayed refs that could be generated.
This is a niave approach and will probably evolve, but for now it works.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com> # high-level review
[ added background notes from the cover letter ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:46 +01:00
Lu Fengqi 2ab4fd3135 btrfs: cleanup the useless DEFINE_WAIT in cleanup_transaction
When it was introduced in commit f094ac32ab ("Btrfs: fix NULL pointer
after aborting a transaction"), it was not used.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:42 +01:00
Filipe Manana 85dd506c8e Btrfs: remove no longer used stuff for tracking pending ordered extents
Tracking pending ordered extents per transaction was introduced in commit
50d9aa99bd ("Btrfs: make sure logged extents complete in the current
transaction V3") and later updated in commit 161c3549b4 ("Btrfs: change
how we wait for pending ordered extents").

However now that on fsync we always wait for ordered extents to complete
before logging, done in commit 5636cf7d6d ("btrfs: remove the logged
extents infrastructure"), we no longer need the stuff to track for pending
ordered extents, which was not completely removed in the mentioned commit.
So remove the remaining of the pending ordered extents infrastructure.

Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:25 +01:00
Josef Bacik 30928e9baa btrfs: don't run delayed_iputs in commit
This could result in a really bad case where we do something like

evict
  evict_refill_and_join
    btrfs_commit_transaction
      btrfs_run_delayed_iputs
        evict
          evict_refill_and_join
            btrfs_commit_transaction
... forever

We have plenty of other places where we run delayed iputs that are much
safer, let those do the work.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-19 12:20:03 +02:00
Lu Fengqi 7c8616278b btrfs: remove fs_info from btrfs_should_throttle_delayed_refs
The avg_delayed_ref_runtime can be referenced from the transaction
handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:41 +02:00
Lu Fengqi af9b8a0e20 btrfs: remove fs_info from btrfs_check_space_for_delayed_refs
It can be referenced from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:41 +02:00
Josef Bacik f45c752b65 btrfs: release metadata before running delayed refs
We want to release the unused reservation we have since it refills the
delayed refs reserve, which will make everything go smoother when
running the delayed refs if we're short on our reservation.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:38 +02:00
David Sterba 9f6cbcbb09 btrfs: open code btrfs_after_dev_replace_commit
Too trivial, the purpose can be simply documented in a comment.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:37 +02:00
Liu Bo 5c9d028b3b Btrfs: delayed-refs: use rb_first_cached for href_root
rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

Functions manipulating href_root need to get the first entry, this
converts href_root to use rb_first_cached().

This patch is first in the sequenct of similar updates to other rbtrees
and this is analysis of the expected behaviour and improvements.

There's a common pattern:

while (node = rb_first) {
        entry = rb_entry(node)
        next = rb_next(node)
        rb_erase(node)
        cleanup(entry)
}

rb_first needs to traverse the tree up to logN depth, rb_erase can
completely reshuffle the tree. With the caching we'll skip the traversal
in rb_first.  That's a cached memory access vs looped pointer
dereference trade-off that IMHO has a clear winner.

Measurements show there's not much difference in a sample tree with
10000 nodes: 4.5s / rb_first and 4.8s / rb_first_cached. Real effects of
caching and pointer chasing are unpredictable though.

Further optimzations can be done to avoid the expensive rb_erase step.
In some cases it's ok to process the nodes in any order, so the tree can
be traversed in post-order, not rebalancing the children nodes and just
calling free. Care must be taken regarding the next node.

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog from mail discussions ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:33 +02:00
Misono Tomohiro 4fd786e6c3 btrfs: Remove 'objectid' member from struct btrfs_root
There are two members in struct btrfs_root which indicate root's
objectid: objectid and root_key.objectid.

They are both set to the same value in __setup_root():

  static void __setup_root(struct btrfs_root *root,
                           struct btrfs_fs_info *fs_info,
                           u64 objectid)
  {
    ...
    root->objectid = objectid;
    ...
    root->root_key.objectid = objecitd;
    ...
  }

and not changed to other value after initialization.

grep in btrfs directory shows both are used in many places:
  $ grep -rI "root->root_key.objectid" | wc -l
  133
  $ grep -rI "root->objectid" | wc -l
  55
 (4.17, inc. some noise)

It is confusing to have two similar variable names and it seems
that there is no rule about which should be used in a certain case.

Since ->root_key itself is needed for tree reloc tree, let's remove
'objecitd' member and unify code to use ->root_key.objectid in all places.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Lu Fengqi 684572df94 btrfs: Remove root parameter from btrfs_insert_dir_item
All callers pass the root tree of dir, we can push that down to the
function itself.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:25 +02:00
Al Viro 8d9e220ca0 btrfs: simplify IS_ERR/PTR_ERR checks
IS_ERR(p) && PTR_ERR(p) == n is a weird way to spell p == ERR_PTR(n).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:02 +02:00
Lu Fengqi 6025c19fb2 btrfs: Remove fs_info from btrfs_add_root_ref
It can be referenced from the passed transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:00 +02:00
Lu Fengqi a937742250 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_inherit
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:53 +02:00
Lu Fengqi 280f8bd2cb btrfs: qgroup: Drop fs_info parameter from btrfs_run_qgroups
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:52 +02:00
Arnd Bergmann afd48513f0 btrfs: use monotonic time for transaction handling
The transaction times were changed to ktime_get_real_seconds to avoid
the y2038 overflow, but they still have a minor problem when they go
backwards or jump due to settimeofday() or leap seconds.

This changes the transaction handling to instead use ktime_get_seconds(),
which returns a CLOCK_MONOTONIC timestamp that has neither of those
problems.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:38 +02:00
Allen Pais a944442c2b btrfs: replace get_seconds with new 64bit time API
The get_seconds() function is deprecated as it truncates the timestamp
to 32 bits. Change it to or ktime_get_real_seconds().

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:29 +02:00
Linus Torvalds 7a932516f5 vfs/y2038: inode timestamps conversion to timespec64
This is a late set of changes from Deepa Dinamani doing an automated
 treewide conversion of the inode and iattr structures from 'timespec'
 to 'timespec64', to push the conversion from the VFS layer into the
 individual file systems.
 
 There were no conflicts between this and the contents of linux-next
 until just before the merge window, when we saw multiple problems:
 
 - A minor conflict with my own y2038 fixes, which I could address
   by adding another patch on top here.
 - One semantic conflict with late changes to the NFS tree. I addressed
   this by merging Deepa's original branch on top of the changes that
   now got merged into mainline and making sure the merge commit includes
   the necessary changes as produced by coccinelle.
 - A trivial conflict against the removal of staging/lustre.
 - Multiple conflicts against the VFS changes in the overlayfs tree.
   These are still part of linux-next, but apparently this is no longer
   intended for 4.18 [1], so I am ignoring that part.
 
 As Deepa writes:
 
   The series aims to switch vfs timestamps to use struct timespec64.
   Currently vfs uses struct timespec, which is not y2038 safe.
 
   The series involves the following:
   1. Add vfs helper functions for supporting struct timepec64 timestamps.
   2. Cast prints of vfs timestamps to avoid warnings after the switch.
   3. Simplify code using vfs timestamps so that the actual
      replacement becomes easy.
   4. Convert vfs timestamps to use struct timespec64 using a script.
      This is a flag day patch.
 
   Next steps:
   1. Convert APIs that can handle timespec64, instead of converting
      timestamps at the boundaries.
   2. Update internal data structures to avoid timestamp conversions.
 
 Thomas Gleixner adds:
 
   I think there is no point to drag that out for the next merge window.
   The whole thing needs to be done in one go for the core changes which
   means that you're going to play that catchup game forever. Let's get
   over with it towards the end of the merge window.
 
 [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
 MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
 L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
 Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
 LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
 nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
 wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
 c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
 tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
 WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
 A3HkjIBrKW5AgQDxfgvm
 =CZX2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
 "This is a late set of changes from Deepa Dinamani doing an automated
  treewide conversion of the inode and iattr structures from 'timespec'
  to 'timespec64', to push the conversion from the VFS layer into the
  individual file systems.

  As Deepa writes:

   'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
       timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
       becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
       This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
       timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

  Thomas Gleixner adds:

   'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  pstore: Remove bogus format string definition
  vfs: change inode times to use struct timespec64
  pstore: Convert internal records to timespec64
  udf: Simplify calls to udf_disk_stamp_to_time
  fs: nfs: get rid of memcpys for inode times
  ceph: make inode time prints to be long long
  lustre: Use long long type to print inode time
  fs: add timespec64_truncate()
2018-06-15 07:31:07 +09:00
Deepa Dinamani 95582b0083 vfs: change inode times to use struct timespec64
struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
  current_time ( ... )
  {
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
  ...
- return timespec_trunc(
+ return timespec64_trunc(
  ... );
  }

@ depends on patch @
identifier xtime;
@@
 struct \( iattr \| inode \| kstat \) {
 ...
-       struct timespec xtime;
+       struct timespec64 xtime;
 ...
 }

@ depends on patch @
identifier t;
@@
 struct inode_operations {
 ...
int (*update_time) (...,
-       struct timespec t,
+       struct timespec64 t,
...);
 ...
 }

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
 fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
 ...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
  ) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
 fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
|
 node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 stat->xtime = node2->i_xtime1;
|
 stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
 node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>
2018-06-05 16:57:31 -07:00
Lu Fengqi cdb345a877 btrfs: Remove fs_info argument from btrfs_uuid_tree_add
This function always takes a transaction handle which contains a
reference to the fs_info. Use that and remove the extra argument.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30 16:46:52 +02:00
Omar Sandoval a575ceeb13 Btrfs: get rid of unused orphan infrastructure
Now that we don't keep long-standing reservations for orphan items,
root->orphan_block_rsv isn't used. We can git rid of it, along with:

- root->orphan_lock, which was used to protect root->orphan_block_rsv
- root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv
- BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations
  in root->orphan_block_rsv
- btrfs_orphan_commit_root(), which was the last user of any of these
  and does nothing else

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:23:57 +02:00
David Sterba 093258e6eb btrfs: replace waitqueue_actvie with cond_wake_up
Use the wrappers and reduce the amount of low-level details about the
waitqueue management.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:23:09 +02:00
Qu Wenruo a514d63882 btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT
Unlike previous method that tries to commit transaction inside
qgroup_reserve(), this time we will try to commit transaction using
fs_info->transaction_kthread to avoid nested transaction and no need to
worry about locking context.

Since it's an asynchronous function call and we won't wait for
transaction commit, unlike previous method, we must call it before we
hit the qgroup limit.

So this patch will use the ratio and size of qgroup meta_pertrans
reservation as indicator to check if we should trigger a transaction
commit.  (meta_prealloc won't be cleaned in transaction committ, it's
useless anyway)

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-18 16:46:47 +02:00
David Sterba c1d7c514f7 btrfs: replace GPL boilerplate by SPDX -- sources
Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 16:29:51 +02:00
Qu Wenruo 733e03a0b2 btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans
Btrfs uses 2 different methods to reseve metadata qgroup space.

1) Reserve at btrfs_start_transaction() time
   This is quite straightforward, caller will use the trans handler
   allocated to modify b-trees.

   In this case, reserved metadata should be kept until qgroup numbers
   are updated.

2) Reserve by using block_rsv first, and later btrfs_join_transaction()
   This is more complicated, caller will reserve space using block_rsv
   first, and then later call btrfs_join_transaction() to get a trans
   handle.

   In this case, before we modify trees, the reserved space can be
   modified on demand, and after btrfs_join_transaction(), such reserved
   space should also be kept until qgroup numbers are updated.

Since these two types behave differently, split the original "META"
reservation type into 2 sub-types:

  META_PERTRANS:
    For above case 1)

  META_PREALLOC:
    For reservations that happened before btrfs_join_transaction() of
    case 2)

NOTE: This patch will only convert existing qgroup meta reservation
callers according to its situation, not ensuring all callers are at
correct timing.
Such fix will be added in later patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Nikolay Borisov 5ead2dd02c btrfs: Drop fs_info parameter from btrfs_finish_extent_commit
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:11 +02:00
Nikolay Borisov 460fb20a4b btrfs: Drop fs_info parameter from btrfs_qgroup_account_extents
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:10 +02:00
Nikolay Borisov c79a70b133 btrfs: drop fs_info parameter from btrfs_run_delayed_refs
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:10 +02:00
Nikolay Borisov 92e2f7e370 btrfs: Remove btrfs_fs_info::open_ioctl_trans
Since userspace transaction have been removed we no longer have use
for this field so delete it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Nikolay Borisov bcf3a3e7fb btrfs: Remove code referencing unused TRANS_USERSPACE
Now that the userspace transaction ioctls have been removed,
TRANS_USERSPACE is no longer used hence we can remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Qu Wenruo 4d31778aa2 btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled
When multiple pending snapshots referring to the same source subvolume
are executed, enabled quota will cause root item corruption, where root
items are using old bytenr (no backref in extent tree).

This can be triggered by fstests btrfs/152.

The cause is when source subvolume is still dirty, extra commit
(simplied transaction commit) of qgroup_account_snapshot() can skip
dirty roots not recorded in current transaction, making root item of
source subvolume not updated.

Fix it by forcing recording source subvolume in current transaction
before qgroup sub-transaction commit.

Reported-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Nikolay Borisov f9cacae314 btrfs: Move error handling of btrfs_start_dirty_block_groups closer to call site
Even though btrfs_start_dirty_block_groups is fairly in the beginning of
btrfs_commit_transaction outside of the critical section defined by the
transaction states it can only be run by a single comitter. In other
words it defines its own critical section thanks to the
BTRFS_TRANS_DIRTY_BG run flag and ro_block_group_mutex. However, its
error handling is outside of this critical section which is a bit
counter-intuitive. So move the error handling righ after the function is
executed and let the sole runner of dirty block groups handle the return
value. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:36 +02:00
Nikolay Borisov 70458a5819 btrfs: Remove fs_info argument of btrfs_write_and_wait_transaction
We already pass btrfs_trans_handle which contains a reference to the
fs_info so use that. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:33 +02:00
Nikolay Borisov e9b919b1f7 btrfs: Remove fs_info argument from btrfs_update_commit_device_bytes_used
We already pass the btrfs_transaction which references fs_info so no
need to pass the later as an argument. Also use the opportunity to
shorten transaction->trans. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:32 +02:00
Nikolay Borisov 08d50ca32c btrfs: Remove fs_info argument from create_pending_snapshots/create_pending_snapshot
We already pass the trans handle which has a reference to fs_info to
create_pending_snapshot so we can refer to it directly. Doing this
obviates the need to pass the fs_info to create_pending_snapshots as
well. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:32 +02:00
Nikolay Borisov 16916a88d4 btrfs: Remove fs_info argument from switch_commit_roots
We already have the fs_info from the passed transaction so use it
directly. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:32 +02:00
Nikolay Borisov 97cb39bb91 btrfs: Remove root argument of cleanup_transaction
The only thing the passed root is used for is:
1. get a reference to the fs_info and to
2. call trace_btrfs_transaction_commit.

We can achieve 1) by simply referring to the fs_info from passed trans
object. As far as 2) is concerned cleanup_transaction is called from
only one place and the 'root' argument passed is the one from the trans
handle. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:32 +02:00
Nikolay Borisov 9386d8bc58 btrfs: Don't pass fs_info to commit_cowonly_roots
We already pass a transaction handle which refrences the fs_info so
we can grab it from there. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:32 +02:00
Nikolay Borisov 7e4443d9eb btrfs: Don't pass fs_info to commit_fs_roots
We already pass the transaction handle which has a reference to the
fs_info. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:32 +02:00
Nikolay Borisov e5c304e651 btrfs: Don't pass fs_info to btrfs_run_delayed_items/_nr
We already pass the transaction which has a reference to the fs_info,
so use that. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:31 +02:00
Nikolay Borisov 2121705420 btrfs: Don't pass fs_info arg to btrfs_start_dirty_block_groups
It can be referenced from the passed transaction so no point in passing
it as a function argument. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:31 +02:00
Nikolay Borisov 6c686b359a btrfs: Remove fs_info argument from btrfs_create_pending_block_groups
It can be referenced from the passed transaciton so no point in
passing it as function argument. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:31 +02:00
Nikolay Borisov dc60c525cf btrfs: Remove fs_info argument from btrfs_trans_release_metadata
All current callers of this function just get a reference to the
trans->fs_info member and pass it as the second argument. Collapse this
into the function itself. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:31 +02:00
Nikolay Borisov c9b577c01a btrfs: Open code btrfs_write_and_wait_marked_extents
btrfs_write_and_wait_transaction is essentially a wrapper of
btrfs_write_and_wait_marked_extents with the addition of calling
clear_btree_io_tree. Having the code split doesn't really bring any
benefit. Open code the later into the former and add proper
documentation header.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ reformat comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:30 +02:00
Nikolay Borisov 0e34693f7b btrfs: Make btrfs_trans_release_metadata private to transaction.c
This function is only ever used in __btrfs_end_transaction and
btrfs_commit_transaction so there is no need to export it via header.
Let's move it closer to where it's used, make it static and remove it
from the header. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:30 +02:00
Nikolay Borisov 7806c6eb15 btrfs: Remove unused btrfs_start_transaction_lflush function
Commit 0e8c36a9fd ("Btrfs: fix lots of orphan inodes when the space
is not enough") changed the way transaction reservation is made in
btrfs_evict_node and as a result this function became unused. This has
been the status quo for 5 years in which time no one noticed, so I'd
say it's safe to assume it's unlikely it will ever be used again.

Historical note: there were more attempts to remove the function, the
reasoning was missing and only based on some static analysis tool
reports. Other reason for rejection was that there seemed to be
connection to BTRFS_RESERVE_FLUSH_LIMIT and that would need to be
removeed to. This was not correct so removing the function is all we can
do.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ add the note ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-26 15:09:29 +02:00
David Sterba 093e037ca8 Revert "btrfs: use proper endianness accessors for super_copy"
This reverts commit 3c181c12c4.

The offending patch was merged in 4.16-rc4 and was promptly applied to
stable kernels 4.14.25 and 4.15.8.

The patch causes a corruption in several superblock items on big-endian
machines because of messed up endianity conversions. The damage is
manually repairable. A filesystem cannot be mounted again after it has
been unmounted once.

We do a full revert and not a fixup so stable can pick that patch ASAP.

Fixes: 3c181c12c4 ("btrfs: use proper endianness accessors for super_copy")
Link: https://lkml.kernel.org/r/1521139304@msgid.manchmal.in-ulm.de
CC: stable@vger.kernel.org # 4.14+
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-16 14:49:44 +01:00
Anand Jain 3c181c12c4 btrfs: use proper endianness accessors for super_copy
The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value.  This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

CC: stable@vger.kernel.org
Fixes: df93589a17 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:17:27 +01:00
Anand Jain c1f32b7c1f btrfs: simplify mutex unlocking code in btrfs_commit_transaction
No functional change rearrange the mutex_unlock.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ edit subject ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22 16:08:20 +01:00
David Sterba b50fff816c btrfs: switch to refcount_t type for btrfs_trans_handle::use_count
The use_count is a reference counter, we can use the refcount_t type,
though we don't use the atomicity. This is not a performance critical
code and we could catch the underflows. The type is changed from long,
but the number of references will fit an int.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22 16:08:14 +01:00
David Sterba ae0f162534 btrfs: sink gfp parameter to clear_extent_bit
All callers use GFP_NOFS, we don't have to pass it as an argument. The
built-in tests pass GFP_KERNEL, but they run only at module load time
and NOFS works there as well.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22 16:08:12 +01:00
Josef Bacik ce8ea7cc6e btrfs: don't call btrfs_start_delalloc_roots in flushoncommit
We're holding the sb_start_intwrite lock at this point, and doing async
filemap_flush of the inodes will result in a deadlock if we freeze the
fs during this operation.  This is because we could do a
btrfs_join_transaction() in the thread we are waiting on which would
block at sb_start_intwrite, and thus deadlock.  Using
writeback_inodes_sb() side steps the problem by not introducing all of
these extra locking dependencies.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-01 20:45:35 +01:00
Josef Bacik 7c777430e8 Btrfs: only check delayed ref usage in should_end_transaction
We were only doing btrfs_check_space_for_delayed_refs() if the metadata
space was full, ie we couldn't allocate chunks.  This assumes we'll be
able to allocate chunks during transaction commit, but since nothing
does a LIMIT flush during the transaction commit this won't actually
happen unless we happen to run shy of actual space.  We already take
into account a full fs in btrfs_check_space_for_delayed_refs() so just
kill this extra check to make sure we're ending the transaction when we
need to.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30 12:28:00 +01:00
Liu Bo 6300463b14 Btrfs: make plug in writing meta blocks really work
We have started plug in btrfs_write_and_wait_marked_extents() but the
generated IOs actually go to device's schedule IO list where the work
is doing in another task, thus the started plug doesn't make any
sense.

And since we wait for IOs immediately after writing meta blocks, it's
the same case as writing log tree, doing sync submit can merge more
IOs.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30 12:27:56 +01:00
Chris Mason 6374e57ad8 btrfs: fix integer overflow in calc_reclaim_items_nr
Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6.  This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number.  It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29 20:17:02 +02:00
Qu Wenruo d1b8b94a2b btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().

Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.

Merging them will make code cleaner and less bug prone.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ changelog and comment adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-29 20:17:02 +02:00
David Sterba fac03c8dae btrfs: move fs_info::fs_frozen to the flags
We can keep the state among the other fs_info flags, there's no reason
why fs_frozen would need to be separate.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-20 14:22:42 +02:00
David Sterba 4b5faeac46 btrfs: use generic slab for for btrfs_transaction
Observing the number of slab objects of btrfs_transaction, there's just
one active on an almost quiescent filesystem, and the number of objects
goes to about ten when sync is in progress. Then the nubmer goes down to
1.  This matches the expectations of the transaction lifetime.

For such use the separate slab cache is not justified, as we do not
reuse objects frequently. For the shortlived transaction, the generic
slab (size 512) should be ok. We can optimistically expect that the 512
slabs are not all used (fragmentation) and there are free slots to take
when we do the allocation, compared to potentially allocating a whole new
page for the separate slab.

We'll lose the stats about the object use, which could be added later if
we really need them.

Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-19 18:26:01 +02:00
Josef Bacik c6100a4b4e Btrfs: replace tree->mapping with tree->private_data
For extent_io tree's we have carried the address_mapping of the inode
around in the io tree in order to pull the inode back out for calling
into various tree ops hooks.  This works fine when everything that has
an extent_io_tree has an inode.  But we are going to remove the
btree_inode, so we need to change this.  Instead just have a generic
void * for private data that we can initialize with, and have all the
tree ops use that instead.  This had a lot of cascading changes but
should be relatively straightforward.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor reordering of the callback prototypes ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-19 18:25:58 +02:00
Qu Wenruo 82bafb38c2 btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
[BUG]
The easist way to reproduce the bug is:
------
 # mkfs.btrfs -f $dev -n 16K
 # mount $dev $mnt -o inode_cache
 # btrfs quota enable $mnt
 # btrfs quota rescan -w $mnt
 # btrfs qgroup show $mnt
qgroupid         rfer         excl
--------         ----         ----
0/5          32.00KiB     32.00KiB
             ^^ Twice the correct value
------

And fstests/btrfs qgroup test group can easily detect them with
inode_cache mount option.
Although some of them are false alerts since old test cases are using
fixed golden output.
While new test cases will use "btrfs check" to detect qgroup mismatch.

[CAUSE]
Inode_cache mount option will make commit_fs_roots() to call
btrfs_save_ino_cache() to update fs/subvol trees, and generate new
delayed refs.

However we call btrfs_qgroup_prepare_account_extents() too early, before
commit_fs_roots().
This makes the "old_roots" for newly generated extents are always NULL.
For freeing extent case, this makes both new_roots and old_roots to be
empty, while correct old_roots should not be empty.
This causing qgroup numbers not decreased correctly.

[FIX]
Modify the timing of calling btrfs_qgroup_prepare_account_extents() to
just before btrfs_qgroup_account_extents(), and add needed delayed_refs
handler.
So qgroup can handle inode_map mount options correctly.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:26 +02:00
David Sterba f486135eba btrfs: remove unused qgroup members from btrfs_trans_handle
The members have been effectively unused since "Btrfs: rework qgroup
accounting" (fcebe4562d), there's no substitute for
assert_qgroups_uptodate so it's removed as well.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Elena Reshetova 9b64f57ddf btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Nikolay Borisov 6ef06d2790 btrfs: Make btrfs_i_size_write take btrfs_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 11:30:06 +01:00
Nikolay Borisov 877574e254 btrfs: Make btrfs_set_inode_index take btrfs_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 11:30:06 +01:00
Nikolay Borisov 8e7611cf38 btrfs: Make btrfs_insert_dir_item take btrfs_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 11:30:06 +01:00
David Sterba 8b74c03e3c btrfs: remove unused parameter from btrfs_prepare_extent_commit
Added but never used.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-17 12:03:52 +01:00
David Sterba eece6a9cf6 btrfs: merge two superblock writing helpers
write_all_supers and write_ctree_super are almost equal, the parameter
'trans' is unused so we can drop it and have just one helper.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-17 12:03:51 +01:00
David Sterba 9ea6e2b548 btrfs: remove unnecessary mutex lock in qgroup_account_snapshot
The quota status used to be tracked as a variable, so the mutex was
needed (until "Btrfs: add a flags field to btrfs_fs_info" afcdd129e0).
Since the status is a bit modified atomically and we don't hold the
mutex beyond the check, we can drop it.

Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-17 12:03:50 +01:00
Jeff Mahoney 003d7c59e8 btrfs: allow unlink to exceed subvolume quota
Once a qgroup limit is exceeded, it's impossible to restore normal
operation to the subvolume without modifying the limit or removing
the subvolume.  This is a surprising situation for many users used
to the typical workflow with quotas on other file systems where it's
possible to remove files until the used space is back under the limit.

When we go to unlink a file and start the transaction, we'll hit
the qgroup limit while trying to reserve space for the items we'll
modify while removing the file.  We discussed last month how best
to handle this situation and agreed that there is no perfect solution.
The best principle-of-least-surprise solution is to handle it similarly
to how we already handle ENOSPC when unlinking, which is to allow
the operation to succeed with the expectation that it will ultimately
release space under most circumstances.

This patch modifies the transaction start path to select whether to
honor the qgroups limits.  btrfs_start_transaction_fallback_global_rsv
is the only caller that skips enforcement.  The reservation and tracking
still happens normally -- it just skips the enforcement step.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:59 +01:00
Nikolay Borisov 4a0cc7ca6c btrfs: Make btrfs_ino take a struct btrfs_inode
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
Seraphime Kirkovski 20c7bcec6f Btrfs: ACCESS_ONCE cleanup
This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros

Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Jeff Mahoney 3a45bb207e btrfs: remove root parameter from transaction commit/end routines
Now we only use the root parameter to print the root objectid in
a tracepoint.  We can use the root parameter from the transaction
handle for that.  It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney bf89d38feb btrfs: split btrfs_wait_marked_extents into normal and tree log functions
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.

This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root.  The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines.  This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney 2ff7e61e0d btrfs: take an fs_info directly when the root is not used otherwise
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer.  Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney ccdf9b305a btrfs: root->fs_info cleanup, access fs_info->delayed_root directly
This results in btrfs_assert_delayed_root_empty and
btrfs_destroy_delayed_inode taking an fs_info instead of a root.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney 0b246afa62 btrfs: root->fs_info cleanup, add fs_info convenience variables
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney 27965b6c2c btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney da17066c40 btrfs: pull node/sector/stripe sizes out of root and into fs_info
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney 6bccf3ab1e btrfs: call functions that always use the same root with fs_info instead
There are many functions that are always called with the same root
argument.  Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney 5b4aacefb8 btrfs: call functions that overwrite their root parameter with fs_info
There are 11 functions that accept a root parameter and immediately
overwrite it.  We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Linus Torvalds f29135b54b Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is a big variety of fixes and cleanups.

  Liu Bo continues to fixup fuzzer related problems, and some of Josef's
  cleanups are prep for his bigger extent buffer changes (slated for
  v4.10)"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (39 commits)
  Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
  Btrfs: remove unnecessary btrfs_mark_buffer_dirty in split_leaf
  Btrfs: don't BUG() during drop snapshot
  btrfs: fix btrfs_no_printk stub helper
  Btrfs: memset to avoid stale content in btree leaf
  btrfs: parent_start initialization cleanup
  btrfs: Remove already completed TODO comment
  btrfs: Do not reassign count in btrfs_run_delayed_refs
  btrfs: fix a possible umount deadlock
  Btrfs: fix memory leak in do_walk_down
  btrfs: btrfs_debug should consume fs_info when DEBUG is not defined
  btrfs: convert send's verbose_printk to btrfs_debug
  btrfs: convert pr_* to btrfs_* where possible
  btrfs: convert printk(KERN_* to use pr_* calls
  btrfs: unsplit printed strings
  btrfs: clean the old superblocks before freeing the device
  Btrfs: kill BUG_ON in run_delayed_tree_ref
  Btrfs: don't leak reloc root nodes on error
  btrfs: squash lines for simple wrapper functions
  Btrfs: improve check_node to avoid reading corrupted nodes
  ...
2016-10-11 11:23:06 -07:00
Deepa Dinamani c2050a454c fs: Replace current_fs_time() with current_time()
current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:22 -04:00
Jeff Mahoney ab8d0fc48d btrfs: convert pr_* to btrfs_* where possible
For many printks, we want to know which file system issued the message.

This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.

fs/btrfs/check-integrity.c is left alone for another day.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 19:37:04 +02:00
Jeff Mahoney 62e855771d btrfs: convert printk(KERN_* to use pr_* calls
This patch converts printk(KERN_* style messages to use the pr_* versions.

One side effect is that anything that was KERN_DEBUG is now automatically
a dynamic debug message.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:08:44 +02:00
Jeff Mahoney 5d163e0e68 btrfs: unsplit printed strings
CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."

This patch unsplits user-visible strings.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:08:44 +02:00
Liu Bo a43f7f8206 Btrfs: remove BUG_ON in start_transaction
Since we could get errors from the concurrent aborted transaction,
the check of this BUG_ON in start_transaction is not true any more.

Say, while flushing free space cache inode's dirty pages,
btrfs_finish_ordered_io
 -> btrfs_join_transaction_nolock
      (the transaction has been aborted.)
      -> BUG_ON(type == TRANS_JOIN_NOLOCK);

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:04:01 +02:00
Josef Bacik afcdd129e0 Btrfs: add a flags field to btrfs_fs_info
We have a lot of random ints in btrfs_fs_info that can be put into flags.  This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 17:59:49 +02:00
Wang Xiaoguang 9e7cc91a6d btrfs: fix fsfreeze hang caused by delayed iputs deal
When running fstests generic/068, sometimes we got below deadlock:
  xfs_io          D ffff8800331dbb20     0  6697   6693 0x00000080
  ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
  ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
  ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
  Call Trace:
  [<ffffffff816a9045>] schedule+0x35/0x80
  [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
  [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
  [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
  [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
  [<ffffffff810d32b5>] percpu_down_read+0x35/0x50
  [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
  [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
  [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
  [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
  [<ffffffff81230a1a>] evict+0xba/0x1a0
  [<ffffffff812316b6>] iput+0x196/0x200
  [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
  [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
  [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
  [<ffffffff81218040>] freeze_super+0xf0/0x190
  [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
  [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
  [<ffffffff81229409>] SyS_ioctl+0x79/0x90
  [<ffffffff81003c12>] do_syscall_64+0x62/0x110
  [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25

>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.

The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. There still maybe some codes adding
delayed iputs, see below sample race window:

         CPU1                                  |         CPU2
|-> freeze_super()                             |
    |-> sync_filesystem(sb);                   |
    |                                          |-> cleaner_kthread()
    |                                          |   |-> btrfs_delete_unused_bgs()
    |                                          |       |-> btrfs_remove_chunk()
    |                                          |           |-> btrfs_remove_block_group()
    |                                          |               |-> btrfs_add_delayed_iput()
    |                                          |
    |-> sb->s_writers.frozen = SB_FREEZE_FS;   |
    |-> sb_wait_write(sb, SB_FREEZE_FS);       |
    |   acquire SB_FREEZE_FS lock.             |
    |                                          |
    |-> btrfs_freeze()                         |
        |-> btrfs_commit_transaction()         |
            |-> btrfs_run_delayed_iputs()      |
            |   will handle delayed iputs,     |
            |   that means start_transaction() |
            |   will be called, which will try |
            |   to get SB_FREEZE_FS lock.      |

To fix this issue, introduce a "int fs_frozen" to record internally whether
fs has been frozen. If fs has been frozen, we can not handle delayed iputs.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment to btrfs_freeze ]
Signed-off-by: David Sterba <dsterba@suse.com>

Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:26 -07:00
Jeff Mahoney 66642832f0 btrfs: btrfs_abort_transaction, drop root parameter
__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer.  We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:26 +02:00
Jeff Mahoney 64b6358072 btrfs: add btrfs_trans_handle->fs_info pointer
btrfs_trans_handle->root is documented as for use for confirming
that the root passed in to start the transaction is the same as the
one ending it.  It's used in several places when an fs_info pointer
is needed, so let's just add an fs_info pointer directly.  Eventually,
the root pointer can be removed.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:26 +02:00
Jeff Mahoney 3cdde2240d btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
btrfs_test_opt and friends only use the root pointer to access
the fs_info.  Let's pass the fs_info directly in preparation to
eliminate similar patterns all over btrfs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:53:16 +02:00
Josef Bacik 31b9655f43 Btrfs: track transid for delayed ref flushing
Using the offwakecputime bpf script I noticed most of our time was spent waiting
on the delayed ref throttling.  This is what is supposed to happen, but
sometimes the transaction can commit and then we're waiting for throttling that
doesn't matter anymore.  So change this stuff to be a little smarter by tracking
the transid we were in when we initiated the throttling.  If the transaction we
get is different then we can just bail out.  This resulted in a 50% speedup in
my fs_mark test, and reduced the amount of time spent throttling by 60 seconds
over the entire run (which is about 30 minutes).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:54:18 -07:00
David Sterba 89c5a5441d btrfs: remove build fixup for qgroup_account_snapshot
Introduced in 2c1984f244 ("btrfs: build fixup for
qgroup_account_snapshot") as temporary bisectability build fixup.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
David Sterba f7af3934c2 btrfs: use new error message helper in qgroup_account_snapshot
We've renamed btrfs_std_error, this one is left from last merge.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
David Sterba 42f31734eb Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 2016-05-25 22:51:03 +02:00
Chris Mason c315ef8d9d Merge branch 'for-chris-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7
Signed-off-by: Chris Mason <clm@fb.com>
2016-05-17 14:43:19 -07:00
David Sterba 5ef64a3e75 Merge branch 'cleanups-4.7' into for-chris-4.7-20160516 2016-05-16 15:46:24 +02:00
Filipe Manana 578def7c50 Btrfs: don't wait for unrelated IO to finish before relocation
Before the relocation process of a block group starts, it sets the block
group to readonly mode, then flushes all delalloc writes and then finally
it waits for all ordered extents to complete. This last step includes
waiting for ordered extents destinated at extents allocated in other block
groups, making us waste unecessary time.

So improve this by waiting only for ordered extents that fall into the
block group's range.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2016-05-13 01:59:14 +01:00
David Sterba 2c1984f244 btrfs: build fixup for qgroup_account_snapshot
The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an
independent branch for the same merge target (4.7). To make the code
compilable for bisectability reasons, add a temporary stub.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-12 11:05:03 +02:00
Qu Wenruo 6426c7ad69 btrfs: qgroup: Fix qgroup accounting when creating snapshot
Current btrfs qgroup design implies a requirement that after calling
btrfs_qgroup_account_extents() there must be a commit root switch.

Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
inside btrfs_commit_transaction() just be commit_cowonly_roots().

However there is a exception at create_pending_snapshot(), which will
call btrfs_qgroup_account_extents() but no any commit root switch.

In case of creating a snapshot whose parent root is itself (create a
snapshot of fs tree), it will corrupt qgroup by the following trace:
(skipped unrelated data)
======
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0
qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384
btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0
======

The problem here is in first qgroup_account_extent(), the
nr_new_roots of the extent is 1, which means its reference got
increased, and qgroup increased its rfer and excl.

But at second qgroup_account_extent(), its reference got decreased, but
between these two qgroup_account_extent(), there is no switch roots.
This leads to the same nr_old_roots, and this extent just got ignored by
qgroup, which means this extent is wrongly accounted.

Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
create_pending_snapshot(), with needed preparation.

Mark: I added a check at the top of qgroup_account_snapshot() to skip this
code if qgroups are turned off. xfstest btrfs/122 exposes this problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-12 10:47:31 +02:00
David Sterba 210aa27768 btrfs: sink gfp parameter to convert_extent_bit
Single caller passes GFP_NOFS. We can get rid of the
gfpflags_allow_blocking checks as NOFS can block but does not recurse to
filesystem through reclaim.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-29 13:48:14 +02:00
Anand Jain 34d9700702 btrfs: rename btrfs_std_error to btrfs_handle_fs_error
btrfs_std_error() handles errors, puts FS into readonly mode
(as of now). So its good idea to rename it to btrfs_handle_fs_error().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ edit changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
David Sterba f004fae0cf Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
Deepa Dinamani 04b285f35e btrfs: Replace CURRENT_TIME by current_fs_time()
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18 11:46:03 +01:00
Josef Bacik 88d3a5aaf6 Btrfs: add transaction space reservation tracepoints
There are a few places where we add to trans->bytes_reserved but don't have the
corresponding trace point.  With these added my tool no longer sees transaction
leaks.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18 11:22:41 +01:00
Chris Mason b28cf57246 Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
Signed-off-by: Chris Mason <clm@fb.com>
2016-01-11 06:08:37 -08:00
Chris Mason a3058101c1 Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 05:59:32 -08:00
David Sterba 8546b57051 btrfs: preallocate path for snapshot creation at ioctl time
We can also preallocate btrfs_path that's used during pending snapshot
creation and avoid another late ENOMEM failure.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07 15:20:55 +01:00
David Sterba b0c0ea6338 btrfs: allocate root item at snapshot ioctl time
The actual snapshot creation is delayed until transaction commit. If we
cannot get enough memory for the root item there, we have to fail the
whole transaction commit which is bad. So we'll allocate the memory at
the ioctl call and pass it along with the pending_snapshot struct. The
potential ENOMEM will be returned to the caller of snapshot ioctl.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07 15:20:54 +01:00