OpenCloudOS-Kernel/fs/btrfs
Filipe Manana 61391d5622 Btrfs: fix hang on error (such as ENOSPC) when writing extent pages
When running low on available disk space and having several processes
doing buffered file IO, I got the following trace in dmesg:

[ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
[ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
[ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
[ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
[ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
[ 4202.720922] Call Trace:
[ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
[ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
[ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
(...)
[ 4202.721027] 2 locks held by kworker/u8:1/5450:
[ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
[ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
[ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
[ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
[ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
[ 4202.721718] Call Trace:
[ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
[ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
[ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
[ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
[ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
[ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
[ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
[ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
[ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
(...)

It turns out that extent_io.c:__extent_writepage(), which ends up being called
through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
-ENOSPC when calling the fill_delalloc callback. In this situation, it returned
without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
ever being called for the respective page, which prevents the ordered extent's
bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
is never queued into the endio_write_workers queue. This makes the task that
called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
extent.

This is fairly easy to reproduce using a small filesystem and fsstress on
a quad core vm:

    mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
    mount /dev/sdd /mnt

    fsstress -p 6 -d /mnt -n 100000 -x \
        "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
	    -f allocsp=0 \
	    -f bulkstat=0 \
	    -f bulkstat1=0 \
	    -f chown=0 \
	    -f creat=1 \
	    -f dread=0 \
	    -f dwrite=0 \
	    -f fallocate=1 \
	    -f fdatasync=0 \
	    -f fiemap=0 \
	    -f freesp=0 \
	    -f fsync=0 \
	    -f getattr=0 \
	    -f getdents=0 \
	    -f link=0 \
	    -f mkdir=0 \
	    -f mknod=0 \
	    -f punch=1 \
	    -f read=0 \
	    -f readlink=0 \
	    -f rename=0 \
	    -f resvsp=0 \
	    -f rmdir=0 \
	    -f setxattr=0 \
	    -f stat=0 \
	    -f symlink=0 \
	    -f sync=0 \
	    -f truncate=1 \
	    -f unlink=0 \
	    -f unresvsp=0 \
	    -f write=4

So just ensure that if an error happens while writing the extent page
we call the writepage_end_io_hook callback. Also make it return the
error code and ensure the caller (extent_write_cache_pages) processes
all pages in the page vector even if an error happens only for some
of them, so that ordered extents end up released.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:20 -07:00
..
tests Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-01-30 20:08:20 -08:00
Kconfig Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
Makefile Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
acl.c btrfs: remove dead code 2014-01-28 13:19:50 -08:00
async-thread.c btrfs: fix crash in remount(thread_pool=) case 2014-04-07 10:41:52 -07:00
async-thread.h btrfs: Add trace for btrfs_workqueue alloc/destroy 2014-03-20 17:15:28 -07:00
backref.c Btrfs: remove transaction from send 2014-04-06 17:39:30 -07:00
backref.h Btrfs: allocate prelim_ref with a slab allocater 2013-09-01 08:16:27 -04:00
btrfs_inode.h Btrfs: use signed integer instead of unsigned long integer for log transid 2014-03-10 15:16:42 -04:00
check-integrity.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-02-04 12:26:56 -08:00
check-integrity.h block: submit_bio_wait() conversions 2013-11-24 16:33:41 -07:00
compression.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
compression.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
ctree.c Btrfs: hold the commit_root_sem when getting the commit root during send 2014-04-07 09:08:39 -07:00
ctree.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-27 13:26:28 -07:00
delayed-inode.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
delayed-inode.h Btrfs: introduce the delayed inode ref deletion for the single link inode 2014-01-28 13:20:09 -08:00
delayed-ref.c Btrfs: fix race when updating existing ref head 2014-03-20 17:15:28 -07:00
delayed-ref.h Btrfs: attach delayed ref updates to delayed ref heads 2014-01-28 13:20:25 -08:00
dev-replace.c Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock 2014-03-10 15:17:27 -04:00
dev-replace.h Btrfs: add new sources for device replace code 2012-12-12 17:15:41 -05:00
dir-item.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
disk-io.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-27 13:26:28 -07:00
disk-io.h Btrfs: add a sanity test for btrfs_split_item 2013-11-11 21:51:02 -05:00
export.c btrfs: remove fs/btrfs/compat.h 2013-11-11 22:03:19 -05:00
export.h
extent-tree.c Btrfs: correctly set profile flags on seqlock retry 2014-04-24 16:43:33 -07:00
extent_io.c Btrfs: fix hang on error (such as ENOSPC) when writing extent pages 2014-06-09 17:20:20 -07:00
extent_io.h Btrfs: don't clear uptodate if the eb is under IO 2014-04-06 17:34:37 -07:00
extent_map.c Btrfs: more efficient btrfs_drop_extent_cache 2014-03-10 15:16:57 -04:00
extent_map.h Btrfs: more efficient btrfs_drop_extent_cache 2014-03-10 15:16:57 -04:00
file-item.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-01-30 20:08:20 -08:00
file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-27 13:26:28 -07:00
free-space-cache.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
free-space-cache.h Btrfs: remove path arg from btrfs_truncate_free_space_cache 2013-11-11 21:51:33 -05:00
hash.c Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
hash.h Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
inode-item.c btrfs: cleanup: removed unused 'btrfs_get_inode_ref_index' 2014-01-28 13:19:39 -08:00
inode-map.c Btrfs: fix inode caching vs tree log 2014-04-24 16:43:33 -07:00
inode-map.h Btrfs: Support reading/writing on disk free ino cache 2011-04-25 16:46:11 +08:00
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-11 14:16:53 -07:00
ioctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-05-22 05:40:13 +09:00
locking.c btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
locking.h Btrfs: remove btrfs_try_spin_lock 2013-03-14 14:57:10 -04:00
lzo.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
math.h Btrfs: cleanup duplicated division functions 2012-12-11 13:31:30 -05:00
ordered-data.c Btrfs: split the global ordered extents mutex 2014-03-10 15:17:28 -04:00
ordered-data.h btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
orphan.c btrfs: expand btrfs_find_item() to include find_orphan_item functionality 2014-01-28 13:19:37 -08:00
print-tree.c Btrfs: don't use ram_bytes for uncompressed inline items 2014-01-29 07:06:29 -08:00
print-tree.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
props.c Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
props.h Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
qgroup.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
raid56.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-04 15:31:36 -07:00
raid56.h Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
rcu-string.h Btrfs: use rcu to protect device->name 2012-06-14 21:29:16 -04:00
reada.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
relocation.c Btrfs: do not reset last_snapshot after relocation 2014-04-06 17:34:35 -07:00
root-tree.c btrfs: Use PTR_ERR_OR_ZERO 2014-03-10 15:16:51 -04:00
scrub.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-11 14:16:53 -07:00
send.c Btrfs: send, fix corrupted path strings for long paths 2014-06-06 12:00:46 -07:00
send.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
struct-funcs.c Btrfs: rewrite BTRFS_SETGET_FUNCS 2012-07-23 16:28:06 -04:00
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-27 13:26:28 -07:00
sysfs.c btrfs: add simple debugfs interface 2014-03-10 15:15:51 -04:00
sysfs.h btrfs: add simple debugfs interface 2014-03-10 15:15:51 -04:00
transaction.c Btrfs: remove transaction from send 2014-04-06 17:39:30 -07:00
transaction.h Btrfs: remove transaction from send 2014-04-06 17:39:30 -07:00
tree-defrag.c Btrfs: cleanup dead code of defragment 2013-11-11 21:59:45 -05:00
tree-log.c Btrfs: stop joining the log transaction if sync log fails 2014-03-10 15:16:44 -04:00
tree-log.h Btrfs: just wait or commit our own log sub-transaction 2014-03-10 15:16:43 -04:00
ulist.c Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
ulist.h Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
uuid-tree.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
volumes.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-04-11 14:16:53 -07:00
volumes.h btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
xattr.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-01-30 20:08:20 -08:00
xattr.h btrfs: use generic posix ACL infrastructure 2014-01-25 23:58:18 -05:00
zlib.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00