btrfs: save i_size to avoid double evaluation of i_size_read in compress_file_range
We hit a regression while rolling out 5.2 internally where we were hitting the following panic kernel BUG at mm/page-writeback.c:2659! RIP: 0010:clear_page_dirty_for_io+0xe6/0x1f0 Call Trace: __process_pages_contig+0x25a/0x350 ? extent_clear_unlock_delalloc+0x43/0x70 submit_compressed_extents+0x359/0x4d0 normal_work_helper+0x15a/0x330 process_one_work+0x1f5/0x3f0 worker_thread+0x2d/0x3d0 ? rescuer_thread+0x340/0x340 kthread+0x111/0x130 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x1f/0x30 This is happening because the page is not locked when doing clear_page_dirty_for_io. Looking at the core dump it was because our async_extent had a ram_size of 24576 but our async_chunk range only spanned 20480, so we had a whole extra page in our ram_size for our async_extent. This happened because we try not to compress pages outside of our i_size, however a cleanup patch changed us to do actual_end = min_t(u64, i_size_read(inode), end + 1); which is problematic because i_size_read() can evaluate to different values in between checking and assigning. So either an expanding truncate or a fallocate could increase our i_size while we're doing writeout and actual_end would end up being past the range we have locked. I confirmed this was what was happening by installing a debug kernel that had actual_end = min_t(u64, i_size_read(inode), end + 1); if (actual_end > end + 1) { printk(KERN_ERR "KABOOM\n"); actual_end = end + 1; } and installing it onto 500 boxes of the tier that had been seeing the problem regularly. Last night I got my debug message and no panic, confirming what I expected. [ dsterba: the assembly confirms a tiny race window: mov 0x20(%rsp),%rax cmp %rax,0x48(%r15) # read movl $0x0,0x18(%rsp) mov %rax,%r12 mov %r14,%rax cmovbe 0x48(%r15),%r12 # eval Where r15 is inode and 0x48 is offset of i_size. The original fix was to revert62b3762271
that would do an intermediate assignment and this would also avoid the doulble evaluation but is not future-proof, should the compiler merge the stores and call i_size_read anyway. There's a patch adding READ_ONCE to i_size_read but that's not being applied at the moment and we need to fix the bug. Instead, emulate READ_ONCE by two barrier()s that's what effectively happens. The assembly confirms single evaluation: mov 0x48(%rbp),%rax # read once mov 0x20(%rsp),%rcx mov $0x20,%edx cmp %rax,%rcx cmovbe %rcx,%rax mov %rax,(%rsp) mov %rax,%rcx mov %r14,%rax Where 0x48(%rbp) is inode->i_size stored to %eax. ] Fixes:62b3762271
("btrfs: Remove isize local variable in compress_file_range") CC: stable@vger.kernel.org # v5.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ changelog updated ] Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
0cab7acc4a
commit
d98da49977
|
@ -474,6 +474,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
|
||||||
u64 start = async_chunk->start;
|
u64 start = async_chunk->start;
|
||||||
u64 end = async_chunk->end;
|
u64 end = async_chunk->end;
|
||||||
u64 actual_end;
|
u64 actual_end;
|
||||||
|
u64 i_size;
|
||||||
int ret = 0;
|
int ret = 0;
|
||||||
struct page **pages = NULL;
|
struct page **pages = NULL;
|
||||||
unsigned long nr_pages;
|
unsigned long nr_pages;
|
||||||
|
@ -488,7 +489,19 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
|
||||||
inode_should_defrag(BTRFS_I(inode), start, end, end - start + 1,
|
inode_should_defrag(BTRFS_I(inode), start, end, end - start + 1,
|
||||||
SZ_16K);
|
SZ_16K);
|
||||||
|
|
||||||
actual_end = min_t(u64, i_size_read(inode), end + 1);
|
/*
|
||||||
|
* We need to save i_size before now because it could change in between
|
||||||
|
* us evaluating the size and assigning it. This is because we lock and
|
||||||
|
* unlock the page in truncate and fallocate, and then modify the i_size
|
||||||
|
* later on.
|
||||||
|
*
|
||||||
|
* The barriers are to emulate READ_ONCE, remove that once i_size_read
|
||||||
|
* does that for us.
|
||||||
|
*/
|
||||||
|
barrier();
|
||||||
|
i_size = i_size_read(inode);
|
||||||
|
barrier();
|
||||||
|
actual_end = min_t(u64, i_size, end + 1);
|
||||||
again:
|
again:
|
||||||
will_compress = 0;
|
will_compress = 0;
|
||||||
nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
|
nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
|
||||||
|
|
Loading…
Reference in New Issue