Move the logic from the snapshot creation ioctl into send. This avoids
doing the transaction commit if send isn't used, and ensures that if
a crash/reboot happens after the transaction commit that created the
snapshot and before the transaction commit that switched the commit
root, send will not get a commit root that differs from the main root
(that has orphan items).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
If between two snapshots we rename an existing directory named X to Y and
make it a child (direct or not) of a new inode named X, we were delaying
the move/rename of the former directory unnecessarily, which would result
in attempting to rename the new directory from its orphan name to name X
prematurely.
Minimal reproducer:
$ mkfs.btrfs -f /dev/vdd
$ mount /dev/vdd /mnt
$ mkdir -p /mnt/merlin/RC/OSD/Source
$ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
$ mkdir /mnt/OSD
$ mv /mnt/merlin/RC/OSD /mnt/OSD/OSD-Plane_788
$ mv /mnt/OSD /mnt/merlin/RC
$ btrfs subvolume snapshot -r /mnt /mnt/mysnap2
$ btrfs send /mnt/mysnap1 -f /tmp/1.snap
$ btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap
$ mkfs.btrfs -f /dev/vdc
$ mount /dev/vdc /mnt2
$ btrfs receive /mnt2 -f /tmp/1.snap
$ btrfs receive /mnt2 -f /tmp/2.snap
The second receive (from an incremental send) failed with the following
error message: "rename o261-7-0 -> merlin/RC/OSD failed".
This is a regression introduced in the 3.16 kernel.
A test case for xfstests follows.
Reported-by: Marc Merlin <marc@merlins.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Maximum xattr size can be up to nearly the leaf size. For an fs with a
leaf size larger than the page size, using kmalloc requires allocating
multiple pages that are contiguous, which might not be possible if
there's heavy memory fragmentation. Therefore fallback to vmalloc if
we fail to allocate with kmalloc. Also start with a smaller buffer size,
since xattr values typically are smaller than a page.
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Fix the following sparse warning:
fs/btrfs/send.c:518:51: warning: incorrect type in argument 2 (different address spaces)
fs/btrfs/send.c:518:51: expected char const [noderef] <asn:1>*<noident>
fs/btrfs/send.c:518:51: got char *
We can safely use (const char __user *) with set_fs(KERNEL_DS)
__force added to avoid sparse-all warning:
fs/btrfs/send.c:518:40: warning: cast adds address space to expression (<asn:1>)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Zach Brown <zab@zabbo.net>
Signed-off-by: Chris Mason <clm@fb.com>
We were limiting the sum of the xattr name and value lengths to PATH_MAX,
which is not correct, specially on filesystems created with btrfs-progs
v3.12 or higher, where the default leaf size is max(16384, PAGE_SIZE), or
systems with page sizes larger than 4096 bytes.
Xattrs have their own specific maximum name and value lengths, which depend
on the leaf size, therefore use these limits to be able to send xattrs with
sizes larger than PATH_MAX.
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
If we are doing an incremental send and the base snapshot has a
directory with name X that doesn't exist anymore in the second
snapshot and a new subvolume/snapshot exists in the second snapshot
that has the same name as the directory (name X), the incremental
send would fail with -ENOENT error. This is because it attempts
to lookup for an inode with a number matching the objectid of a
root, which doesn't exist.
Steps to reproduce:
mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt
mkdir /mnt/testdir
btrfs subvolume snapshot -r /mnt /mnt/mysnap1
rmdir /mnt/testdir
btrfs subvolume create /mnt/testdir
btrfs subvolume snapshot -r /mnt /mnt/mysnap2
btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data
A test case for xfstests follows.
Reported-by: Robert White <rwhite@pobox.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
I've noticed an extra line after "use no compression", but search
revealed much more in messages of more critical levels and rare errors.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
This is a continuation of the previous changes titled:
Btrfs: fix incremental send's decision to delay a dir move/rename
Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
There's a few more cases where a directory rename/move must be delayed which was
previously overlooked. If our immediate ancestor has a lower inode number than
ours and it doesn't have a delayed rename/move operation associated to it, it
doesn't mean there isn't any non-direct ancestor of our current inode that needs
to be renamed/moved before our current inode (i.e. with a higher inode number
than ours).
So we can't stop the search if our immediate ancestor has a lower inode number than
ours, we need to navigate the directory hierarchy upwards until we hit the root or:
1) find an ancestor with an higher inode number that was renamed/moved in the send
root too (or already has a pending rename/move registered);
2) find an ancestor that is a new directory (higher inode number than ours and
exists only in the send root).
Reproducer for case 1)
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/b
$ mkdir -p /mnt/a/c/d
$ mkdir /mnt/a/b/e
$ mkdir /mnt/a/c/d/f
$ mv /mnt/a/b /mnt/a/c/d/2b
$ mkdir /mnt/a/x
$ mkdir /mnt/a/y
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mv /mnt/a/x /mnt/a/y
$ mv /mnt/a/c/d/2b/e /mnt/a/c/d/2b/2e
$ mv /mnt/a/c/d /mnt/a/h/2d
$ mv /mnt/a/c /mnt/a/h/2d/2b/2c
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
Simple reproducer for case 2)
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/b
$ mkdir /mnt/a/c
$ mv /mnt/a/b /mnt/a/c/b2
$ mkdir /mnt/a/e
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mv /mnt/a/c/b2 /mnt/a/e/b3
$ mkdir /mnt/a/e/b3/f
$ mkdir /mnt/a/h
$ mv /mnt/a/c /mnt/a/e/b3/f/c2
$ mv /mnt/a/e /mnt/a/h/e2
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
Another simple reproducer for case 2)
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/b
$ mkdir /mnt/a/c
$ mkdir /mnt/a/b/d
$ mkdir /mnt/a/c/e
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mkdir /mnt/a/b/d/f
$ mkdir /mnt/a/b/g
$ mv /mnt/a/c/e /mnt/a/b/g/e2
$ mv /mnt/a/c /mnt/a/b/d/f/c2
$ mv /mnt/a/b/d/f /mnt/a/b/g/e2/f2
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
More complex reproducer for case 2)
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/b
$ mkdir -p /mnt/a/c/d
$ mkdir /mnt/a/b/e
$ mkdir /mnt/a/c/d/f
$ mv /mnt/a/b /mnt/a/c/d/2b
$ mkdir /mnt/a/x
$ mkdir /mnt/a/y
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mv /mnt/a/x /mnt/a/y
$ mv /mnt/a/c/d/2b/e /mnt/a/c/d/2b/2e
$ mv /mnt/a/c/d /mnt/a/h/2d
$ mv /mnt/a/c /mnt/a/h/2d/2b/2c
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
For both cases the incremental send would enter an infinite loop when building
path strings.
While solving these cases, this change also re-implements the code to detect
when directory moves/renames should be delayed. Instead of dealing with several
specific cases separately, it's now more generic handling all cases with a simple
detection algorithm and if when applying a delayed move/rename there's a path loop
detected, it further delays the move/rename registering a new ancestor inode as
the dependency inode (so our rename happens after that ancestor is renamed).
Tests for these cases is being added to xfstests too.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
If we have directories with a pending move/rename operation, we must take into
account any orphan directories that got created before executing the pending
move/rename. Those orphan directories are directories with an inode number higher
then the current send progress and that don't exist in the parent snapshot, they
are created before current progress reaches their inode number, with a generated
name of the form oN-M-I and at the root of the filesystem tree, and later when
progress matches their inode number, moved/renamed to their final location.
Reproducer:
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/b/c/d
$ mkdir /mnt/a/b/e
$ mv /mnt/a/b/c /mnt/a/b/e/CC
$ mkdir /mnt/a/b/e/CC/d/f
$ mkdir /mnt/a/g
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mkdir /mnt/a/g/h
$ mv /mnt/a/b/e /mnt/a/g/h/EE
$ mv /mnt/a/g/h/EE/CC/d /mnt/a/g/h/EE/DD
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
The second receive command failed with the following error:
ERROR: rename a/b/e/CC/d -> o264-7-0/EE/DD failed. No such file or directory
A test case for xfstests follows soon.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Regardless of whether the caller is interested or not in knowing the inode's
generation (dir_gen != NULL), get_first_ref always does a btree lookup to get
the inode item. Avoid this useless lookup if dir_gen parameter is NULL (which
is in some cases).
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
The patch "Btrfs: fix protection between send and root deletion"
(18f687d538) does not actually prevent to delete the snapshot
and just takes care during background cleaning, but this seems rather
user unfriendly, this patch implements the idea presented in
http://www.spinics.net/lists/linux-btrfs/msg30813.html
- add an internal root_item flag to denote a dead root
- check if the send_in_progress is set and refuse to delete, otherwise
set the flag and proceed
- check the flag in send similar to the btrfs_root_readonly checks, for
all involved roots
The root lookup in send via btrfs_read_fs_root_no_name will check if the
root is really dead or not. If it is, ENOENT, aborted send. If it's
alive, it's protected by send_in_progress, send can continue.
CC: Miao Xie <miaox@cn.fujitsu.com>
CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
If a path has more than 230 characters, we allocate a new buffer to
use for the path, but we were forgotting to copy the contents of the
previous buffer into the new one, which has random content from the
kmalloc call.
Test:
mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt
TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#"
mkdir -p $TEST_PATH
echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt
btrfs subvolume snapshot -r /mnt /mnt/mysnap1
btrfs send /mnt/mysnap1 -f /tmp/1.snap
A test for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Cc: Marc Merlin <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Chris Mason <clm@fb.com>
fs_path_ensure_buf is used to make sure our path buffers for
send are big enough for the path names as we construct them.
The buffer size is limited to 32K by the length field in
the struct.
But bugs in the path construction can end up trying to build
a huge buffer, and we'll do invalid memmmoves when the
buffer length field wraps.
This patch is step one, preventing the overflows.
Signed-off-by: Chris Mason <clm@fb.com>
There's no point building the path string in each iteration of the
send_hole loop, as it produces always the same string.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
We currently rely too heavily on roots being read-only to save us from just
accessing root->commit_root. We can easily balance blocks out from underneath a
read only root, so to save us from getting screwed make sure we only access
root->commit_root under the commit root sem. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Lets try this again. We can deadlock the box if we send on a box and try to
write onto the same fs with the app that is trying to listen to the send pipe.
This is because the writer could get stuck waiting for a transaction commit
which is being blocked by the send. So fix this by making sure looking at the
commit roots is always going to be consistent. We do this by keeping track of
which roots need to have their commit roots swapped during commit, and then
taking the commit_root_sem and swapping them all at once. Then make sure we
take a read lock on the commit_root_sem in cases where we search the commit root
to make sure we're always looking at a consistent view of the commit roots.
Previously we had problems with this because we would swap a fs tree commit root
and then swap the extent tree commit root independently which would cause the
backref walking code to screw up sometimes. With this patch we no longer
deadlock and pass all the weird send/receive corner cases. Thanks,
Reportedy-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
So I have an awful exercise script that will run snapshot, balance and
send/receive in parallel. This sometimes would crash spectacularly and when it
came back up the fs would be completely hosed. Turns out this is because of a
bad interaction of balance and send/receive. Send will hold onto its entire
path for the whole send, but its blocks could get relocated out from underneath
it, and because it doesn't old tree locks theres nothing to keep this from
happening. So it will go to read in a slot with an old transid, and we could
have re-allocated this block for something else and it could have a completely
different transid. But because we think it is invalid we clear uptodate and
re-read in the block. If we do this before we actually write out the new block
we could write back stale data to the fs, and boom we're screwed.
Now we definitely need to fix this disconnect between send and balance, but we
really really need to not allow ourselves to accidently read in stale data over
new data. So make sure we check if the extent buffer is not under io before
clearing uptodate, this will kick back EIO to the caller instead of reading in
stale data and keep us from corrupting the fs. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
For an incremental send, fix the process of determining whether the directory
inode we're currently processing needs to have its move/rename operation delayed.
We were ignoring the fact that if the inode's new immediate ancestor has a higher
inode number than ours but wasn't renamed/moved, we might still need to delay our
move/rename, because some other ancestor directory higher in the hierarchy might
have an inode number higher than ours *and* was renamed/moved too - in this case
we have to wait for rename/move of that ancestor to happen before our current
directory's rename/move operation.
Simple steps to reproduce this issue:
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/x1/x2
$ mkdir /mnt/a/Z
$ mkdir -p /mnt/a/x1/x2/x3/x4/x5
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mv /mnt/a/x1/x2/x3 /mnt/a/Z/X33
$ mv /mnt/a/x1/x2 /mnt/a/Z/X33/x4/x5/X22
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
The incremental send caused the kernel code to enter an infinite loop when
building the path string for directory Z after its references are processed.
A more complex scenario:
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir -p /mnt/a/b/c/d
$ mkdir /mnt/a/b/c/d/e
$ mkdir /mnt/a/b/c/d/f
$ mv /mnt/a/b/c/d/e /mnt/a/b/c/d/f/E2
$ mkdir /mmt/a/b/c/g
$ mv /mnt/a/b/c/d /mnt/a/b/D2
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mkdir /mnt/a/o
$ mv /mnt/a/b/c/g /mnt/a/b/D2/f/G2
$ mv /mnt/a/b/D2 /mnt/a/b/dd
$ mv /mnt/a/b/c /mnt/a/C2
$ mv /mnt/a/b/dd/f /mnt/a/o/FF
$ mv /mnt/a/b /mnt/a/o/FF/E2/BB
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
It's possible to change the parent/child relationship between directories
in such a way that if a child directory has a higher inode number than
its parent, it doesn't necessarily means the child rename/move operation
can be performed immediately. The parent migth have its own rename/move
operation delayed, therefore in this case the child needs to have its
rename/move operation delayed too, and be performed after its new parent's
rename/move.
Steps to reproduce the issue:
$ umount /mnt
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ mkdir /mnt/C
$ mv /mnt/C /mnt/A
$ mv /mnt/B /mnt/A/C
$ mkdir /mnt/A/C/D
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send /mnt/snap1 -f /tmp/base.send
$ mv /mnt/A/C/D /mnt/A/D2
$ mv /mnt/A/C/B /mnt/A/D2/B2
$ mv /mnt/A/C /mnt/A/D2/B2/C2
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
The incremental send caused the kernel code to enter an infinite loop when
building the path string for directory C after its references are processed.
The necessary conditions here are that C has an inode number higher than both
A and B, and B as an higher inode number higher than A, and D has the highest
inode number, that is:
inode_number(A) < inode_number(B) < inode_number(C) < inode_number(D)
The same issue could happen if after the first snapshot there's any number
of intermediary parent directories between A2 and B2, and between B2 and C2.
A test case for xfstests follows, covering this simple case and more advanced
ones, with files and hard links created inside the directories.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
No need to search in the send tree for the generation number of the inode,
we already have it in the recorded_ref structure passed to us.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Btrfs send reads data from disk and then writes to a stream via pipe or
a file via flush.
Currently we're going to read each page a time, so every page results
in a disk read, which is not friendly to disks, esp. HDD. Given that,
the performance can be gained by adding readahead for those pages.
Here is a quick test:
$ btrfs subvolume create send
$ xfs_io -f -c "pwrite 0 1G" send/foobar
$ btrfs subvolume snap -r send ro
$ time "btrfs send ro -f /dev/null"
w/o w
real 1m37.527s 0m9.097s
user 0m0.122s 0m0.086s
sys 0m53.191s 0m12.857s
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
This has no functional change, only picks out the same part of two functions,
and makes it shared.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
When we're finishing processing of an inode, if we're dealing with a
directory inode that has a pending move/rename operation, we don't
need to send a utimes update instruction to the send stream, as we'll
do it later after doing the move/rename operation. Therefore we save
some time here building paths and doing btree lookups.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
It is really unnecessary to search tree again for @gen, @mode and @rdev
in the case of REG inodes' creation, as we've got btrfs_inode_item in sctx,
and @gen, @mode and @rdev can easily be fetched.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
In "btrfs: send: lower memory requirements in common case" the code to
save the old_buf_len was incorrectly moved to a wrong place and broke
the original logic.
Reported-by: Filipe David Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Filipe David Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
When doing an incremental send, if we had a directory pending a move/rename
operation and none of its parents, except for the immediate parent, were
pending a move/rename, after processing the directory's references, we would
be issuing utimes, chown and chmod intructions against am outdated path - a
path which matched the one in the parent root.
This change also simplifies a bit the code that deals with building a path
for a directory which has a move/rename operation delayed.
Steps to reproduce:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b/c/d/e
$ mkdir /mnt/btrfs/a/b/c/f
$ chmod 0777 /mnt/btrfs/a/b/c/d/e
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
$ mv /mnt/btrfs/a/b/c/f /mnt/btrfs/a/b/f2
$ mv /mnt/btrfs/a/b/c/d/e /mnt/btrfs/a/b/f2/e2
$ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2
$ mv /mnt/btrfs/a/b/c2/d /mnt/btrfs/a/b/c2/d2
$ chmod 0700 /mnt/btrfs/a/b/f2/e2
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
$ umount /mnt/btrfs
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ btrfs receive /mnt/btrfs -f /tmp/base.send
$ btrfs receive /mnt/btrfs -f /tmp/incremental.send
The second btrfs receive command failed with:
ERROR: chmod a/b/c/d/e failed. No such file or directory
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
The incremental send algorithm assumed that it was possible to issue
a directory remove (rmdir) if the the inode number it was currently
processing was greater than (or equal) to any inode that referenced
the directory's inode. This wasn't a valid assumption because any such
inode might be a child directory that is pending a move/rename operation,
because it was moved into a directory that has a higher inode number and
was moved/renamed too - in other words, the case the following commit
addressed:
9f03740a95
(Btrfs: fix infinite path build loops in incremental send)
This made an incremental send issue an rmdir operation before the
target directory was actually empty, which made btrfs receive fail.
Therefore it needs to wait for all pending child directory inodes to
be moved/renamed before sending an rmdir operation.
Simple steps to reproduce this issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b/c/x
$ mkdir /mnt/btrfs/a/b/y
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
$ mv /mnt/btrfs/a/b/y /mnt/btrfs/a/b/YY
$ mv /mnt/btrfs/a/b/c/x /mnt/btrfs/a/b/YY
$ rmdir /mnt/btrfs/a/b/c
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
$ umount /mnt/btrfs
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ btrfs receive /mnt/btrfs -f /tmp/base.send
$ btrfs receive /mnt/btrfs -f /tmp/incremental.send
The second btrfs receive command failed with:
ERROR: rmdir o259-6-0 failed. Directory not empty
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
When doing an incremental send, if we delete a directory that has N > 1
hardlinks for the same file and that file has the highest inode number
inside the directory contents, an incremental send would send N times an
rmdir operation against the directory. This made the btrfs receive command
fail on the second rmdir instruction, as the target directory didn't exist
anymore.
Steps to reproduce the issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b/c
$ echo 'ola mundo' > /mnt/btrfs/a/b/c/foo.txt
$ ln /mnt/btrfs/a/b/c/foo.txt /mnt/btrfs/a/b/c/bar.txt
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
$ rm -f /mnt/btrfs/a/b/c/foo.txt
$ rm -f /mnt/btrfs/a/b/c/bar.txt
$ rmdir /mnt/btrfs/a/b/c
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
$ umount /mnt/btrfs
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ btrfs receive /mnt/btrfs -f /tmp/base.send
$ btrfs receive /mnt/btrfs -f /tmp/incremental.send
The second btrfs receive command failed with:
ERROR: rmdir o259-6-0 failed. No such file or directory
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
This fixes yet one more case not caught by the commit titled:
Btrfs: fix infinite path build loops in incremental send
In this case, even before the initial full send, we have a directory
which is a child of a directory with a higher inode number. Then we
perform the initial send, and after we rename both the child and the
parent, without moving them around. After doing these 2 renames, an
incremental send sent a rename instruction for the child directory
which contained an invalid "from" path (referenced the parent's old
name, not the new one), which made the btrfs receive command fail.
Steps to reproduce:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b
$ mkdir /mnt/btrfs/d
$ mkdir /mnt/btrfs/a/b/c
$ mv /mnt/btrfs/d /mnt/btrfs/a/b/c
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
$ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/x
$ mv /mnt/btrfs/a/b/x/d /mnt/btrfs/a/b/x/y
$ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
$ umout /mnt/btrfs
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ btrfs receive /mnt/btrfs -f /tmp/base.send
$ btrfs receive /mnt/btrfs -f /tmp/incremental.send
The second btrfs receive command failed with:
"ERROR: rename a/b/c/d -> a/b/x/y failed. No such file or directory"
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
This reverts commit 41ce9970a8.
Previously i was thinking we can use readonly root's commit root
safely while it is not true, readonly root may be cowed with the
following cases.
1.snapshot send root will cow source root.
2.balance,device operations will also cow readonly send root
to relocate.
So i have two ideas to make us safe to use commit root.
-->approach 1:
make it protected by transaction and end transaction properly and we research
next item from root node(see btrfs_search_slot_for_read()).
-->approach 2:
add another counter to local root structure to sync snapshot with send.
and add a global counter to sync send with exclusive device operations.
So with approach 2, send can use commit root safely, because we make sure
send root can not be cowed during send. Unfortunately, it make codes *ugly*
and more complex to maintain.
To make snapshot and send exclusively, device operations and send operation
exclusively with each other is a little confusing for common users.
So why not drop into previous way.
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
The fs_path structure uses an inline buffer and falls back to a chain of
allocations, but vmalloc is not necessary because PATH_MAX fits into
PAGE_SIZE.
The size of fs_path has been reduced to 256 bytes from PAGE_SIZE,
usually 4k. Experimental measurements show that most paths on a single
filesystem do not exceed 200 bytes, and these get stored into the inline
buffer directly, which is now 230 bytes. Longer paths are kmalloced when
needed.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
We have this pattern where we do search for a contiguous group of
items in a tree and everytime we find an item, we process it, then
we release our path, increment the offset of the search key, do
another full tree search and repeat these steps until a tree search
can't find more items we're interested in.
Instead of doing these full tree searches after processing each item,
just process the next item/slot in our leaf and don't release the path.
Since all these trees are read only and we always use the commit root
for a search and skip node/leaf locks, we're not affecting concurrency
on the trees.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
This was a leftover from the commit:
74dd17fbe3
(Btrfs: fix btrfs send for inline items and compression)
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
If cleaning the name cache fails, we could try to proceed at the cost of
some memory leak. This is not expected to happen often.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
There are only 2 static callers, the BUG would normally be never
reached, but let's be nice.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
We know that buf_len is at most PATH_MAX, 4k, and can merge it with the
reversed member. This saves 3 bytes in favor of inline_buf.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
We don't need to keep track of that, it's available via is_vmalloc_addr.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
The member is used only to return value back from
fs_path_prepare_for_add, we can do it locally and save 8 bytes for the
inline_buf path.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
The buffer passed to snprintf can hold the fully expanded format string,
64 = 3x largest ULL + 3x char + trailing null. I don't think that removing the
check entirely is a good idea, hence the ASSERT.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
The commit titled "Btrfs: fix infinite path build loops in incremental send"
didn't cover a particular case where the parent-child relationship inversion
of directories doesn't imply a rename of the new parent directory. This was
due to a simple logic mistake, a logical and instead of a logical or.
Steps to reproduce:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b/bar1/bar2/bar3/bar4
$ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ mv /mnt/btrfs/a/b/bar1/bar2/bar3/bar4 /mnt/btrfs/a/b/k44
$ mv /mnt/btrfs/a/b/bar1/bar2/bar3 /mnt/btrfs/a/b/k44
$ mv /mnt/btrfs/a/b/bar1/bar2 /mnt/btrfs/a/b/k44/bar3
$ mv /mnt/btrfs/a/b/bar1 /mnt/btrfs/a/b/k44/bar3/bar2/k11
$ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
A patch to update the test btrfs/030 from xfstests, so that it covers
this case, will be submitted soon.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
This fixes a case that the commit titled:
Btrfs: fix infinite path build loops in incremental send
didn't cover. If the parent-child relationship between 2 directories
is inverted, both get renamed, and the former parent has a file that
got renamed too (but remains a child of that directory), the incremental
send operation would use the file's old path after sending an unlink
operation for that old path, causing receive to fail on future operations
like changing owner, permissions or utimes of the corresponding inode.
This is not a regression from the commit mentioned before, as without
that commit we would fall into the issues that commit fixed, so it's
just one case that wasn't covered before.
Simple steps to reproduce this issue are:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b/c/d
$ touch /mnt/btrfs/a/b/c/d/file
$ mkdir -p /mnt/btrfs/a/b/x
$ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ mv /mnt/btrfs/a/b/x /mnt/btrfs/a/b/c/x2
$ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c/x2/d2
$ mv /mnt/btrfs/a/b/c/x2/d2/file /mnt/btrfs/a/b/c/x2/d2/file2
$ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
A patch to update the test btrfs/030 from xfstests, so that it covers
this case, will be submitted soon.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Function wait_for_parent_move() returns negative value if an error
happened, 0 if we don't need to wait for the parent's move, and
1 if the wait is needed.
Before this change an error return value was being treated like the
return value 1, which was not correct.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.
Issue reproducible with the following excerpt from the test I made for
xfstests:
_scratch_mkfs
_scratch_mount "-o compress-force=lzo"
$XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
$XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
$XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo
# will be used for incremental send to be able to issue clone operations
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
$FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
-x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
-x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2
$BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
$BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
-c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap
_scratch_unmount
_scratch_mkfs
_scratch_mount
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
$FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
$FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
$FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>