Commit Graph

1138863 Commits

Author SHA1 Message Date
Josef Bacik 637e3b48c2 btrfs: add helpers for manipulating leaf items and data
We have some gnarly memmove and copy_extent_buffer calls for leaf
manipulation.  This is because our item offsets aren't absolute, they're
based on 0 being where the items start in the leaf, which is after the
btrfs_header.  This means any manipulation of the data requires adding
sizeof(struct btrfs_header) to the offsets we pull from the items.
Moving the items themselves is easier as the helpers are absolute
offsets, however we of course have to call the helpers to get the
offsets for the item numbers.  This makes for
copy_extent_buffer/memmove_extent_buffer calls that are kind of hard to
reason about what's happening.

Fix this by pushing this logic into helpers.  For data we'll only use
the item provided offsets, and the helpers will use the
BTRFS_LEAF_DATA_OFFSET addition for the offsets.  Additionally for the
item manipulation simply pass in the item numbers, and then the helpers
will call the offset helper to get the actual offset into the leaf.

The diffstat makes this look like more code, but that's simply because I
added comments for the helpers, it's net negative for the amount of
code, and is easier to reason.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik e23efd8e87 btrfs: add eb to btrfs_node_key_ptr_offset
This is a change needed for extent tree v2, as we will be growing the
header size.  This exists in btrfs-progs currently, and not having it
makes syncing accessors.[ch] more problematic.  So make this change to
set us up for extent tree v2 and match what btrfs-progs does to make
syncing easier.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik 42c9419a4c btrfs: pass the extent buffer for the btrfs_item_nr helpers
This is actually a change for extent tree v2, but it exists in
btrfs-progs but not in the kernel.  This makes it annoying to sync
accessors.h with btrfs-progs, and since this is the way I need it for
extent-tree v2 simply update these helpers to take the extent buffer in
order to make syncing possible now, and make the extent tree v2 stuff
easier moving forward.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik 0e6c40ebbb btrfs: move the csum helpers into ctree.h
These got moved because of copy+paste, but this code exists in ctree.c,
so move the declarations back into ctree.h.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik 9b48addac4 btrfs: move eb offset helpers into extent_io.h
These are very specific to how the extent buffer is defined, so this
differs between btrfs-progs and the kernel.  Make things easier by
moving these helpers into extent_io.h so we don't have to worry about
this when syncing ctree.h.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik 6bfd0ffa6f btrfs: move file_extent_item helpers into file-item.h
These helpers use functions that are in multiple places, which makes it
tricky to sync them into btrfs-progs.  Move them to file-item.h and then
include file-item.h in places that use these helpers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik 3a3178c7f7 btrfs: move leaf_data_end into ctree.c
This is only used in ctree.c, with the exception of zero'ing out extent
buffers we're getting ready to write out.  In theory we shouldn't have
an extent buffer with 0 items that we're writing out, however I'd rather
be safe than sorry so open code it in extent_io.c, and then copy the
helper into ctree.c.  This will make it easier to sync accessors.[ch]
into btrfs-progs, as this requires a helper that isn't defined in
accessors.h.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik 1fe5ebc4e1 btrfs: move root helpers back into ctree.h
These accidentally got brought into accessors.h, but belong with the
btrfs_root definitions which are currently in ctree.h.  Move these to
make it easier to sync accessors.[ch] into btrfs-progs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Christoph Hellwig bacf60e515 btrfs: move repair_io_failure to bio.c
repair_io_failure ties directly into all the glory low-level details of
mapping a bio with a logic address to the actual physical location.
Move it right below btrfs_submit_bio to keep all the related logic
together.

Also move btrfs_repair_eb_io_failure to its caller in disk-io.c now that
repair_io_failure is available in a header.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Christoph Hellwig 103c19723c btrfs: split the bio submission path into a separate file
The code used by btrfs_submit_bio only interacts with the rest of
volumes.c through __btrfs_map_block (which itself is a more generic
version of two exported helpers) and does not really have anything
to do with volumes.c.  Create a new bio.c file and a bio.h header
going along with it for the btrfs_bio-based storage layer, which
will grow even more going forward.

Also update the file with my copyright notice given that a large
part of the moved code was written or rewritten by me.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Christoph Hellwig 27137fac4c btrfs: move struct btrfs_tree_parent_check out of disk-io.h
Move struct btrfs_tree_parent_check out of disk-io.h so that volumes.h
an various .c files don't have to include disk-io.h just for it.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ use tree-checker.h for the structure ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Qu Wenruo 7a31507230 btrfs: raid56: do data csum verification during RMW cycle
[BUG]
For the following small script, btrfs will be unable to recover the
content of file1:

  mkfs.btrfs -f -m raid1 -d raid5 -b 1G $dev1 $dev2 $dev3

  mount $dev1 $mnt
  xfs_io -f -c "pwrite -S 0xff 0 64k" -c sync $mnt/file1
  md5sum $mnt/file1
  umount $mnt

  # Corrupt the above 64K data stripe.
  xfs_io -f -c "pwrite -S 0x00 323026944 64K" -c sync $dev3
  mount $dev1 $mnt

  # Write a new 64K, which should be in the other data stripe
  # And this is a sub-stripe write, which will cause RMW
  xfs_io -f -c "pwrite 0 64k" -c sync $mnt/file2
  md5sum $mnt/file1
  umount $mnt

Above md5sum would fail.

[CAUSE]
There is a long existing problem for raid56 (not limited to btrfs
raid56) that, if we already have some corrupted on-disk data, and then
trigger a sub-stripe write (which needs RMW cycle), it can cause further
damage into P/Q stripe.

  Disk 1: data 1 |0x000000000000| <- Corrupted
  Disk 2: data 2 |0x000000000000|
  Disk 2: parity |0xffffffffffff|

In above case, data 1 is already corrupted, the original data should be
64KiB of 0xff.

At this stage, if we read data 1, and it has data checksum, we can still
recovery going via the regular RAID56 recovery path.

But if now we decide to write some data into data 2, then we need to go
RMW.
Let's say we want to write 64KiB of '0x00' into data 2, then we read the
on-disk data of data 1, calculate the new parity, resulting the
following layout:

  Disk 1: data 1 |0x000000000000| <- Corrupted
  Disk 2: data 2 |0x000000000000| <- New '0x00' writes
  Disk 2: parity |0x000000000000| <- New Parity.

But the new parity is calculated using the *corrupted* data 1, we can
no longer recover the correct data of data1.  Thus the corruption is
forever there.

[FIX]
To solve above problem, this patch will do a full stripe data checksum
verification at RMW time.

This involves the following changes:

- Always read the full stripe (including data/P/Q) when doing RMW
  Before we only read the missing data sectors, but since we may do a
  data csum verification and recovery, we need to read everything out.

  Please note that, if we have a cached rbio, we don't need to read
  anything, and can treat it the same as full stripe write.

  As only stripe with all its csum matches can be cached.

- Verify the data csum during read.
  The goal is only the rbio stripe sectors, and only if the rbio
  already has csum_buf/csum_bitmap filled.

  And sectors which cannot pass csum verification will have their bit
  set in error_bitmap.

- Always call recovery_sectors() after we read out all the sectors
  Since error_bitmap will be updated during read, recover_sectors()
  can easily find out all the bad sectors and try to recover (if still
  under tolerance).

  And since recovery_sectors() is already migrated to use error_bitmap,
  it can skip vertical stripes which don't have any error.

- Verify the repaired sectors against its csum in recover_vertical()

- Rename rmw_read_and_wait() to rmw_read_wait_recover()
  Since we will always recover the sectors, the old name is no longer
  accurate.

  Furthermore since recovery is already done in rmw_read_wait_recover(),
  we no longer need to call recovery_sectors() inside rmw_rbio().

Obviously this will have a performance impact, as we are doing more
work during RMW cycle:

- Fetch the data checksums
- Do checksum verification for all data stripes
- Do checksum verification again after repair

But for full stripe write or cached rbio we won't have the overhead all,
thus for fully optimized RAID56 workload (always full stripe write),
there should be no extra overhead.

To me, the extra overhead looks reasonable, as data consistency is way
more important than performance.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Qu Wenruo c5a415627b btrfs: raid56: prepare data checksums for later RMW verification
This is for later data checksum verification at RMW time.

This patch will try to allocate the needed memory for a locked rbio if
the rbio is for data exclusively (we don't want to handle mixed bg yet).
The memory will be released when the rbio is finished.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Qu Wenruo 97e3823933 btrfs: introduce a bitmap based csum range search function
Although we have an existing function, btrfs_lookup_csums_range(), to
find all data checksums for a range, it's based on a btrfs_ordered_sum
list.

For the incoming RAID56 data checksum verification at RMW time, we don't
want to waste time by allocating temporary memory.

So this patch will introduce a new helper, btrfs_lookup_csums_bitmap().
It will use bitmap based result, which will be a perfect fit for later
RAID56 usage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Qu Wenruo cb649e81da btrfs: refactor checksum calculations in btrfs_lookup_csums_range()
The refactoring involves the following parts:

- Introduce bytes_to_csum_size() and csum_size_to_bytes() helpers
  As we have quite some open-coded calculations, some of them are even
  split into two assignments just to fit 80 chars limit.

- Remove the @csum_size parameter from max_ordered_sum_bytes()
  Csum size can be fetched from @fs_info.
  And we will use the csum_size_to_bytes() helper anyway.

- Add a comment explaining how we handle the first search result

- Use newly introduced helpers to cleanup btrfs_lookup_csums_range()

- Move variables declaration to the minimal scope

- Never mix number of sectors with bytes
  There are several locations doing things like:

 			size = min_t(size_t, csum_end - start,
				     max_ordered_sum_bytes(fs_info));
			...
			size >>= fs_info->sectorsize_bits

  Or

			offset = (start - key.offset) >> fs_info->sectorsize_bits;
			offset *= csum_size;

  Make sure these variables can only represent BYTES inside the
  function, by using the above bytes_to_csum_size() helpers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Li zeming 9f0eac070d btrfs: allocate btrfs_io_context without GFP_NOFAIL
The __GFP_NOFAIL flag could loop indefinitely when allocation memory in
alloc_btrfs_io_context. The callers starting from __btrfs_map_block
already handle errors so it's safe to drop the flag.

Signed-off-by: Li zeming <zeming@nfschina.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Qu Wenruo cb3e217bdb btrfs: use btrfs_dev_name() helper to handle missing devices better
[BUG]
If dev-replace failed to re-construct its data/metadata, the kernel
message would be incorrect for the missing device:

 BTRFS info (device dm-1): dev_replace from <missing disk> (devid 2) to /dev/mapper/test-scratch2 started
 BTRFS error (device dm-1): failed to rebuild valid logical 38862848 for dev (efault)

Note the above "dev (efault)" of the second line.
While the first line is properly reporting "<missing disk>".

[CAUSE]
Although dev-replace is using btrfs_dev_name(), the heavy lifting work
is still done by scrub (scrub is reused by both dev-replace and regular
scrub).

Unfortunately scrub code never uses btrfs_dev_name() helper, as it's
only declared locally inside dev-replace.c.

[FIX]
Fix the output by:

- Move the btrfs_dev_name() helper to volumes.h

- Use btrfs_dev_name() to replace open-coded rcu_str_deref() calls
  Only zoned code is not touched, as I'm not familiar with degraded
  zoned code.

- Constify return value and parameter

Now the output looks pretty sane:

 BTRFS info (device dm-1): dev_replace from <missing disk> (devid 2) to /dev/mapper/test-scratch2 started
 BTRFS error (device dm-1): failed to rebuild valid logical 38862848 for dev <missing disk>

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Filipe Manana 3c32c7212f btrfs: use cached state when looking for delalloc ranges with lseek
During lseek (SEEK_HOLE/DATA), whenever we find a hole or prealloc extent,
we will look for delalloc in that range, and one of the things we do for
that is to find out ranges in the inode's io_tree marked with
EXTENT_DELALLOC, using calls to count_range_bits().

Typically there's a single, or few, searches in the io_tree for delalloc
per lseek call. However it's common for applications to keep calling
lseek with SEEK_HOLE and SEEK_DATA to find where extents and holes are in
a file, read the extents and skip holes in order to avoid unnecessary IO
and save disk space by preserving holes.

One popular user is the cp utility from coreutils. Starting with coreutils
9.0, cp uses SEEK_HOLE and SEEK_DATA to iterate over the extents of a
file. Before 9.0, it used fiemap to figure out where holes and extents are
in the source file. Another popular user is the tar utility when used with
the --sparse / -S option to detect and preserve holes.

Given that the pattern is to keep calling lseek with a start offset that
matches the returned offset from the previous lseek call, we can benefit
from caching the last extent state visited in count_range_bits() and use
it for the next count_range_bits() from the next lseek call. Example,
the following strace excerpt from running tar:

   $ strace tar cJSvf foo.tar.xz qemu_disk_file.raw
   (...)
   lseek(5, 125019574272, SEEK_HOLE)       = 125024989184
   lseek(5, 125024989184, SEEK_DATA)       = 125024993280
   lseek(5, 125024993280, SEEK_HOLE)       = 125025239040
   lseek(5, 125025239040, SEEK_DATA)       = 125025255424
   lseek(5, 125025255424, SEEK_HOLE)       = 125025353728
   lseek(5, 125025353728, SEEK_DATA)       = 125025357824
   lseek(5, 125025357824, SEEK_HOLE)       = 125026766848
   lseek(5, 125026766848, SEEK_DATA)       = 125026770944
   lseek(5, 125026770944, SEEK_HOLE)       = 125027053568
   (...)

Shows that pattern, which is the same as with cp from coreutils 9.0+.

So start using a cached state for the delalloc searches in lseek, and
store it in struct file's private data so that it can be reused across
lseek calls.

This change is part of a patchset that is comprised of the following
patches:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

The following test was run before and after applying the whole patchset:

   $ cat test-cp.sh
   #!/bin/bash

   DEV=/dev/sdh
   MNT=/mnt/sdh

   # coreutils 8.32, cp uses fiemap to detect holes and extents
   #CP_PROG=/usr/bin/cp
   # coreutils 9.1, cp uses SEEK_HOLE/DATA to detect holes and extents
   CP_PROG=/home/fdmanana/git/hub/coreutils/src/cp

   umount $DEV &> /dev/null
   mkfs.btrfs -f $DEV
   mount $DEV $MNT

   FILE_SIZE=$((1024 * 1024 * 1024))
   echo "Creating file with a size of $((FILE_SIZE / 1024 / 1024))M"
   # Create a very sparse file, where each extent has a length of 4K and
   # is preceded by a 4K hole and followed by another 4K hole.
   start=$(date +%s%N)
   echo -n > $MNT/foobar
   for ((off = 0; off < $FILE_SIZE; off += 8192)); do
           xfs_io -c "pwrite -S 0xab $off 4K" $MNT/foobar > /dev/null
           echo -ne "\r$off / $FILE_SIZE ..."
   done
   end=$(date +%s%N)
   echo -e "\nFile created ($(( (end - start) / 1000000 )) milliseconds)"

   start=$(date +%s%N)
   $CP_PROG $MNT/foobar /dev/null
   end=$(date +%s%N)
   dur=$(( (end - start) / 1000000 ))
   echo "cp took $dur milliseconds with data/metadata cached and delalloc"

   # Flush all delalloc.
   sync

   start=$(date +%s%N)
   $CP_PROG $MNT/foobar /dev/null
   end=$(date +%s%N)
   dur=$(( (end - start) / 1000000 ))
   echo "cp took $dur milliseconds with data/metadata cached and no delalloc"

   # Unmount and mount again to test the case without any metadata
   # loaded in memory.
   umount $MNT
   mount $DEV $MNT

   start=$(date +%s%N)
   $CP_PROG $MNT/foobar /dev/null
   end=$(date +%s%N)
   dur=$(( (end - start) / 1000000 ))
   echo "cp took $dur milliseconds without data/metadata cached and no delalloc"

   umount $MNT

The results, running on a box with a non-debug kernel (Debian's default
kernel config), were the following:

128M file, before patchset:

   cp took 16574 milliseconds with data/metadata cached and delalloc
   cp took 122 milliseconds with data/metadata cached and no delalloc
   cp took 20144 milliseconds without data/metadata cached and no delalloc

128M file, after patchset:

   cp took 6277 milliseconds with data/metadata cached and delalloc
   cp took 109 milliseconds with data/metadata cached and no delalloc
   cp took 210 milliseconds without data/metadata cached and no delalloc

512M file, before patchset:

   cp took 14369 milliseconds with data/metadata cached and delalloc
   cp took 429 milliseconds with data/metadata cached and no delalloc
   cp took 88034 milliseconds without data/metadata cached and no delalloc

512M file, after patchset:

   cp took 12106 milliseconds with data/metadata cached and delalloc
   cp took 427 milliseconds with data/metadata cached and no delalloc
   cp took 824 milliseconds without data/metadata cached and no delalloc

1G file, before patchset:

   cp took 10074 milliseconds with data/metadata cached and delalloc
   cp took 886 milliseconds with data/metadata cached and no delalloc
   cp took 181261 milliseconds without data/metadata cached and no delalloc

1G file, after patchset:

   cp took 3320 milliseconds with data/metadata cached and delalloc
   cp took 880 milliseconds with data/metadata cached and no delalloc
   cp took 1801 milliseconds without data/metadata cached and no delalloc

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:57 +01:00
Filipe Manana b3e744fe6d btrfs: use cached state when looking for delalloc ranges with fiemap
During fiemap, whenever we find a hole or prealloc extent, we will look
for delalloc in that range, and one of the things we do for that is to
find out ranges in the inode's io_tree marked with EXTENT_DELALLOC, using
calls to count_range_bits().

Since we process file extents from left to right, if we have a file with
several holes or prealloc extents, we benefit from keeping a cached extent
state record for calls to count_range_bits(). Most of the time the last
extent state record we visited in one call to count_range_bits() matches
the first extent state record we will use in the next call to
count_range_bits(), so there's a benefit here. So use an extent state
record to cache results from count_range_bits() calls during fiemap.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana 1ee51a0625 btrfs: update stale comment for count_range_bits()
The comment for count_range_bits() mentions that the search is fast if we
are asking for a range with the EXTENT_DIRTY bit set. However that is no
longer true since we don't use that bit and the optimization for that was
removed in:

  commit 71528e9e16 ("btrfs: get rid of extent_io_tree::dirty_bytes")

So remove that part of the comment mentioning the no longer existing
optimized case, and, while at it, add proper documentation describing the
purpose, arguments and return value of the function.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana 8c6e53a79d btrfs: allow passing a cached state record to count_range_bits()
An inode's io_tree can be quite large and there are cases where due to
delalloc it can have thousands of extent state records, which makes the
red black tree have a depth of 10 or more, making the operation of
count_range_bits() slow if we repeatedly call it for a range that starts
where, or after, the previous one we called it for. Such use cases are
when searching for delalloc in a file range that corresponds to a hole or
a prealloc extent, which is done during lseek SEEK_HOLE/DATA and fiemap.

So introduce a cached state parameter to count_range_bits() which we use
to store the last extent state record we visited, and then allow the
caller to pass it again on its next call to count_range_bits(). The next
patches in the series will make fiemap and lseek use the new parameter.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana cfd7a17d9b btrfs: remove no longer used btrfs_next_extent_map()
There are no more users of btrfs_next_extent_map(), the previous patch
in the series ("btrfs: search for delalloc more efficiently during
lseek/fiemap") removed the last usage of the function, so delete it.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana 8ddc8274e4 btrfs: search for delalloc more efficiently during lseek/fiemap
During lseek (SEEK_HOLE/DATA) and fiemap, when processing a file range
that corresponds to a hole or a prealloc extent, we have to check if
there's any delalloc in the range. We do it by searching for delalloc
ranges in the inode's io_tree (for unflushed delalloc) and in the inode's
extent map tree (for delalloc that is flushing).

We avoid searching the extent map tree if the number of outstanding
extents is 0, as in that case we can't have extent maps for our search
range in the tree that correspond to delalloc that is flushing. However
if we have any unflushed delalloc, due to buffered writes or mmap writes,
then the outstanding extents counter is not 0 and we'll search the extent
map tree. The tree may be large because it can have lots of extent maps
that were loaded by reads or created by previous writes, therefore taking
a significant time to search the tree, specially if have a file with a
lot of holes and/or prealloc extents.

We can improve on this by instead of searching the extent map tree,
searching the ordered extents tree of the inode, since when delalloc is
flushing we create an ordered extent along with the new extent map, while
holding the respective file range locked in the inode's io_tree. The
ordered extents tree is typically much smaller, since ordered extents have
a short life and get removed from the tree once they are completed, while
extent maps can stay for a very long time in the extent map tree, either
created by previous writes or loaded by read operations.

So use the ordered extents tree instead of the extent maps tree.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana af979fd618 btrfs: skip unnecessary delalloc searches during lseek/fiemap
During lseek (SEEK_HOLE/DATA) and fiemap, when processing a file range
that corresponds to a hole or a prealloc extent, if we find that there is
no delalloc marked in the inode's io_tree but there is delalloc due to
an extent map in the io tree, then on the next iteration that calls
find_delalloc_subrange() we can skip searching the io tree again, since
on the first call we had no delalloc in the io tree for the whole range.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana 40daf3e095 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
During fiemap and lseek (SEEK_HOLE/DATA), when looking for delalloc in a
range corresponding to a hole or a prealloc extent, if we found the whole
range marked as delalloc in the inode's io_tree, then we can terminate
immediately and avoid searching the extent map tree. If not, and if the
found delalloc starts at the same offset of our search start but ends
before our search range's end, then we can adjust the search range for
the search in the extent map tree. So implement those changes.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Filipe Manana 2c8f5e8cdf btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
We don't need to set the EXTENT_UPDATE bit in an inode's io_tree to mark a
range as uptodate, we rely on the pages themselves being uptodate - page
reading is not triggered for already uptodate pages. Recently we removed
most use of the EXTENT_UPTODATE for buffered IO with commit 52b029f427
("btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path"),
but there were a few leftovers, namely when reading from holes and
successfully finishing read repair.

These leftovers are unnecessarily making an inode's tree larger and deeper,
slowing down searches on it. So remove all the leftovers.

This change is part of a patchset that has the goal to make performance
better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to
iterate over the extents of a file. Two examples are the cp program from
coreutils 9.0+ and the tar program (when using its --sparse / -S option).
A sample test and results are listed in the changelog of the last patch
in the series:

  1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
  2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
  3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
  4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
  5/9 btrfs: remove no longer used btrfs_next_extent_map()
  6/9 btrfs: allow passing a cached state record to count_range_bits()
  7/9 btrfs: update stale comment for count_range_bits()
  8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
  9/9 btrfs: use cached state when looking for delalloc ranges with lseek

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Qu Wenruo 947a629988 btrfs: move tree block parentness check into validate_extent_buffer()
[BACKGROUND]
Although both btrfs metadata and data has their read time verification
done at endio time (btrfs_validate_metadata_buffer() and
btrfs_verify_data_csum()), metadata has extra verification, mostly
parentness check including first key/transid/owner_root/level, done at
read_tree_block() and btrfs_read_extent_buffer().

On the other hand, all the data verification is done at endio context.

[ENHANCEMENT]
This patch will make a new union in btrfs_bio, taking the space of the
old data checksums, thus it will not increase the memory usage.

With that extra btrfs_tree_parent_check inside btrfs_bio, we can just
pass the check parameter into read_extent_buffer_pages(), and before
submitting the bio, we can copy the check structure into btrfs_bio.

And finally at endio time, we can grab btrfs_bio::parent_check and pass
it to validate_extent_buffer(), to move the remaining checks into it.

This brings the following benefits:

- Much simpler btrfs_read_extent_buffer()
  Now it only needs to iterate through all mirrors.

- Simpler read-time transid check
  Previously we go verify_parent_transid() after reading out the extent
  buffer.
  Now the transid check is done inside the endio function, no other
  code can modify the content.
  Thus no need to use the extent lock anymore.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Qu Wenruo 789d6a3a87 btrfs: concentrate all tree block parentness check parameters into one structure
There are several different tree block parentness check parameters used
across several helpers:

- level
  Mandatory

- transid
  Under most cases it's mandatory, but there are several backref cases
  which skips this check.

- owner_root
- first_key
  Utilized by most top-down tree search routine. Otherwise can be
  skipped.

Those four members are not always mandatory checks, and some of them are
the same u64, which means if some arguments got swapped compiler will
not catch it.

Furthermore if we're going to further expand the parentness check, we
need to modify quite some helpers just to add one more parameter.

This patch will concentrate all these members into a structure called
btrfs_tree_parent_check, and pass that structure for the following
helpers:

- btrfs_read_extent_buffer()
- read_tree_block()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:56 +01:00
Anand Jain bb21e30260 btrfs: move device->name RCU allocation and assign to btrfs_alloc_device()
There is a repeating code section in the parent function after calling
btrfs_alloc_device(), as below:

      name = rcu_string_strdup(path, GFP_...);
      if (!name) {
              btrfs_free_device(device);
              return ERR_PTR(-ENOMEM);
      }
      rcu_assign_pointer(device->name, name);

Except in add_missing_dev() for obvious reasons.

This patch consolidates that repeating code into the btrfs_alloc_device()
itself so that the parent function doesn't have to duplicate code.
This consolidation also helps to review issues regarding RCU lock
violation with device->name.

Parent function device_list_add() and add_missing_dev() use GFP_NOFS for
the allocation, whereas the rest of the parent functions use GFP_KERNEL,
so bring the NOFS allocation context using memalloc_nofs_save() in the
function device_list_add() and add_missing_dev() is already doing it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba 3e09b5b229 btrfs: constify input buffer parameter in compression code
The input buffers passed down to compression must never be changed,
switch type to u8 as it's a raw byte buffer and use const.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
Qu Wenruo ad3daf1c3f btrfs: raid56: remove the old error tracking system
Since all the recovery paths have been migrated to the new error bitmap
based system, we can remove the old stripe number based system.

This cleanup involves one behavior change:

- Rebuild rbio can no longer be merged
  Previously a rebuild rbio (caused by retry after data csum mismatch)
  can be merged, if the error happens in the same stripe.

  But with the new error bitmap based solution, it's much harder to
  compare error bitmaps.

  So here we just don't merge rebuild rbio at all.
  This may introduce some performance impact at extreme corner cases,
  but we're willing to take it.

Other than that, this patch will cleanup the following members:

- rbio::faila
- rbio::failb
  They will be replaced by per-vertical stripe check, which is more
  accurate.

- rbio::error
  It will be replace by per-vertical stripe error bitmap check.

- Allow get_rbio_vertical_errors() to accept NULL pointers for
  @faila and @failb
  Some call sites only want to check if we have errors beyond the
  tolerance.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
Qu Wenruo 75b4703329 btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap
Since we have rbio::error_bitmap to indicate exactly where the errors
are (including read error and csum mismatch error), we can make recovery
path more accurate.

For example:

             0        32K       64K
     Data 1  |XXXXXXXX|         |
     Data 2  |        |XXXXXXXXX|
     Parity  |        |         |

1) Get csum mismatch when reading data 1 [0, 32K)

2) Mark corresponding range error
   The old code will mark the whole data 1 stripe as error.
   While the new code will only mark data 1 [0, 32K) as error.

3) Recovery path
   The old code will recover data 1 [0, 64K), all using Data 2 and
   parity.

   This means, Data 1 [32K, 64K) will be corrupted data, as data 2
   [32K, 64K) is already corrupted.

   While the new code will only recover data 1 [0, 32K), as only
   that range has error so far.

This new behavior can avoid populating rbio cache with incorrect data.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
Qu Wenruo 2942a50dea btrfs: raid56: introduce btrfs_raid_bio::error_bitmap
Currently btrfs raid56 uses btrfs_raid_bio::faila and failb to indicate
which stripe(s) had IO errors.

But that has some problems:

- If one sector failed csum check, the whole stripe where the corruption
  is will be marked error.
  This can reduce the chance we do recover, like this:

          0  4K 8K
  Data 1  |XX|  |
  Data 2  |  |XX|
  Parity  |  |  |

  In above case, 0~4K in data 1 should be recovered using data 2 and
  parity, while 4K~8K in data 2 should be recovered using data 1 and
  parity.

  Currently if we trigger read on 0~4K of data 1, we will also recover
  4K~8K of data 1 using corrupted data 2 and parity, causing wrong
  result in rbio cache.

- Harder to expand for future M-N scheme
  As we're limited to just faila/b, two corruptions.

- Harder to expand to handle extra csum errors
  This can be problematic if we start to do csum verification.

This patch will introduce an extra @error_bitmap, where one bit
represents error that happened for that sector.

The choice to introduce a new error bitmap other than reusing
sector_ptr, is to avoid extra search between rbio::stripe_sectors[] and
rbio::bio_sectors[].

Since we can submit bio using sectors from both sectors, doing proper
search on both array will more complex.

Although the new bitmap will take extra memory, later we can remove
things like @error and faila/b to save some memory.

Currently the new error bitmap and failab mechanism coexists, the error
bitmap is only updated at endio time and recover entrance.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba e55cf7ca85 btrfs: pass btrfs_inode to btrfs_add_delayed_iput
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba 5fc24314c8 btrfs: use btrfs_inode inside btrfs_verify_data_csum
The function is mostly using internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba 99a01bd638 btrfs: use btrfs_inode inside compress_file_range
The function is mostly using internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba 99a81a4444 btrfs: switch async_chunk::inode to btrfs_inode
The async_chunk::inode structure is for internal interfaces so we should
use the btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba 7a0443f031 btrfs: pass btrfs_inode to btrfs_inherit_iflags
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 4c45a4f4de btrfs: pass btrfs_inode to inode_tree_add
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 3c1b1c4c0e btrfs: pass btrfs_inode to fixup_tree_root_location
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba d1de429bce btrfs: pass btrfs_inode to btrfs_inode_by_name
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 5b7544cb06 btrfs: pass btrfs_inode to btrfs_unlink_subvol
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba bd54766e40 btrfs: pass btrfs_inode to btrfs_clear_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 62798a4915 btrfs: pass btrfs_inode to btrfs_split_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 4c5d166f6b btrfs: pass btrfs_inode to btrfs_set_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 2454151cde btrfs: pass btrfs_inode to btrfs_merge_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 0988fc7bda btrfs: switch extent_io_tree::private_data to btrfs_inode and rename
The extent_io_tree::private_data was meant to be a preparatory work for
the metadata inode rework but that never materialized. Now it's used
only for an inode so it's better to change the appropriate type and
rename it.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 35da5a7ede btrfs: drop private_data parameter from extent_io_tree_init
All callers except one pass NULL, so the parameter can be dropped and
the inode::io_tree initialization can be open coded.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba 3c4f91e23a btrfs: pass btrfs_inode to btrfs_delete_subvolume
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:53 +01:00
David Sterba e569b1d545 btrfs: pass btrfs_inode to __unlink_start_trans
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:53 +01:00