OpenCloudOS-Kernel/fs/btrfs
Liu Bo 9e0af23764 Btrfs: fix task hang under heavy compressed write
This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() <---- (we name it work X)
    for ordered_work in wq->ordered_list
            ordered_work->ordered_func()
            ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() <--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker->current_work = arg;  <-- arg is A->normal_work
	worker->current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A->func()
		     A->ordered_func()
		     A->ordered_free()  <-- A gets freed

		     B->ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   <-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B->ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work->func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-08-24 07:17:02 -07:00
..
tests Btrfs: fix qgroups sanity test crash or hang 2014-06-13 09:52:24 -07:00
Kconfig Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
Makefile Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
acl.c btrfs: remove useless ACL check 2014-06-09 17:20:42 -07:00
async-thread.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
async-thread.h Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
backref.c Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
backref.h Btrfs: fix scrub_print_warning to handle skinny metadata extents 2014-06-09 17:21:17 -07:00
btrfs_inode.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
check-integrity.c btrfs: check_int: propagate out-of-memory error upwards 2014-06-09 17:20:21 -07:00
check-integrity.h block: submit_bio_wait() conversions 2013-11-24 16:33:41 -07:00
compression.c btrfs compression: reuse recently used workspace 2014-06-28 13:48:46 -07:00
compression.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
ctree.c Btrfs: __btrfs_mod_ref should always use no_quota 2014-08-15 07:43:11 -07:00
ctree.h Btrfs: __btrfs_mod_ref should always use no_quota 2014-08-15 07:43:11 -07:00
delayed-inode.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
delayed-inode.h Btrfs: introduce the delayed inode ref deletion for the single link inode 2014-01-28 13:20:09 -08:00
delayed-ref.c Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
delayed-ref.h Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
dev-replace.c btrfs: dev replace should replace the sysfs entry 2014-06-28 13:48:44 -07:00
dev-replace.h Btrfs: add new sources for device replace code 2012-12-12 17:15:41 -05:00
dir-item.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
disk-io.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
disk-io.h Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
export.c btrfs: remove fs/btrfs/compat.h 2013-11-11 22:03:19 -05:00
export.h
extent-tree.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
extent_io.c Btrfs: fix crash on endio of reading corrupted block 2014-08-21 07:55:30 -07:00
extent_io.h Btrfs: remove unused wait queue in struct extent_buffer 2014-06-19 14:20:28 -07:00
extent_map.c Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent_map.h Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
file-item.c Btrfs: fix csum tree corruption, duplicate and outdated checksums 2014-08-15 07:43:40 -07:00
file.c Btrfs: fix filemap_flush call in btrfs_file_release 2014-08-21 07:55:31 -07:00
free-space-cache.c Btrfs: fix broken free space cache after the system crashed 2014-06-19 14:20:54 -07:00
free-space-cache.h Btrfs: remove path arg from btrfs_truncate_free_space_cache 2013-11-11 21:51:33 -05:00
hash.c Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
hash.h Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
inode-item.c btrfs: cleanup: removed unused 'btrfs_get_inode_ref_index' 2014-01-28 13:19:39 -08:00
inode-map.c btrfs: remove newline from inode cache kthread name 2014-06-09 17:20:53 -07:00
inode-map.h
inode.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
ioctl.c Btrfs: clone, don't create invalid hole extent map 2014-08-21 07:55:26 -07:00
locking.c Btrfs: fix deadlocks with trylock on tree nodes 2014-06-19 14:19:55 -07:00
locking.h Btrfs: remove btrfs_try_spin_lock 2013-03-14 14:57:10 -04:00
lzo.c btrfs: return errno instead of -1 from compression 2014-06-09 17:20:21 -07:00
math.h Btrfs: cleanup duplicated division functions 2012-12-11 13:31:30 -05:00
ordered-data.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
ordered-data.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
orphan.c btrfs: expand btrfs_find_item() to include find_orphan_item functionality 2014-01-28 13:19:37 -08:00
print-tree.c Btrfs: fix btrfs_print_leaf for skinny metadata 2014-07-03 07:04:16 -07:00
print-tree.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
props.c Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
props.h Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
qgroup.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
qgroup.h btrfs: qgroup: account shared subtrees during snapshot delete 2014-08-15 07:43:14 -07:00
raid56.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
raid56.h Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
rcu-string.h Btrfs: use rcu to protect device->name 2012-06-14 21:29:16 -04:00
reada.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
relocation.c btrfs: remove stale newlines from log messages 2014-06-09 17:20:53 -07:00
root-tree.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
scrub.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
send.c Btrfs: send, use the right limits for xattr names and values 2014-06-09 17:21:00 -07:00
send.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
struct-funcs.c Btrfs: rewrite BTRFS_SETGET_FUNCS 2012-07-23 16:28:06 -04:00
super.c btrfs: adjust statfs calculations according to raid profiles 2014-08-15 07:43:10 -07:00
sysfs.c Btrfs: fix regression of btrfs device replace 2014-08-21 07:55:20 -07:00
sysfs.h btrfs: dev add should add its sysfs entry 2014-06-28 13:48:43 -07:00
transaction.c btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
transaction.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
tree-defrag.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
tree-log.c Btrfs: fix hole detection during file fsync 2014-08-21 07:55:24 -07:00
tree-log.h Btrfs: use helpers for last_trans_log_full_commit instead of opencode 2014-06-09 17:20:45 -07:00
ulist.c Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
ulist.h Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
uuid-tree.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
volumes.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
volumes.h Btrfs: fix deadlock when mounting a degraded fs 2014-06-19 14:20:56 -07:00
xattr.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2014-01-30 20:08:20 -08:00
xattr.h btrfs: use generic posix ACL infrastructure 2014-01-25 23:58:18 -05:00
zlib.c btrfs: use E2BIG instead of EIO if compression does not help 2014-07-03 07:04:13 -07:00