OpenCloudOS-Kernel/drivers/md
colyli@suse.de 03a9e24ef2 md linear: fix a race between linear_add() and linear_congested()
Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.

There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend().  After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.

Here I explain how the race still exists in current code.  For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.

Now I use a possible code execution time sequence to demo how the possible
race happens,

seq    linear_add()                linear_congested()
 0                                 conf=mddev->private
 1   oldconf=mddev->private
 2   mddev->raid_disks++
 3                              for (i=0; i<mddev->raid_disks;i++)
 4                                bdev_get_queue(conf->disks[i].rdev->bdev)
 5   mddev->private=newconf

In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.

To fix this race, there are 2 parts of modification in the patch,
 1) Add 'int raid_disks' in struct linear_conf, as a copy of
    mddev->raid_disks. It is initialized in linear_conf(), always being
    consistent with pointers number of 'struct dev_info disks[]'. When
    iterating conf->disks[] in linear_congested(), use conf->raid_disks to
    replace mddev->raid_disks in the for-loop, then NULL pointer deference
    will not happen again.
 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
    free oldconf memory. Because oldconf may be referenced as mddev->private
    in linear_congested(), kfree_rcu() makes sure that its memory will not
    be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.

This patch can be applied for kernels since v4.0 after commit:
3be260cc18 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.

Changelog:
 - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
       replace rcu_call() in linear_add().
 - v2: add RCU stuffs by suggestion from Shaohua and Neil.
 - v1: initial effort.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-13 09:17:50 -08:00
..
bcache bcache: partition support: add 16 minors per bcacheN device 2016-12-17 13:02:00 -07:00
persistent-data dm space map: always set ev if sm_ll_mutate() succeeds 2016-12-08 14:13:15 -05:00
Kconfig dm block manager: make block locking optional 2016-11-14 15:17:47 -05:00
Makefile dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
bitmap.c md: separate flags for superblock changes 2016-12-08 22:01:47 -08:00
bitmap.h md-cluster: sync bitmap when node received RESYNCING msg 2016-05-04 12:39:35 -07:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h
dm-bufio.c . various fixes and improvements to request-based DM and DM multipath 2016-12-14 11:01:00 -08:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-cache-block-types.h linux: drop __bitwise__ everywhere 2016-12-16 00:13:41 +02:00
dm-cache-metadata.c dm cache metadata: remove an extra newline in DMERR and code 2016-11-21 09:52:02 -05:00
dm-cache-metadata.h dm cache: make sure every metadata function checks fail_io 2016-03-10 17:12:12 -05:00
dm-cache-policy-cleaner.c dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-internal.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-smq.c dm cache policy smq: use hash_32() instead of hash_32_generic() 2016-12-08 19:42:37 -05:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-target.c dm cache: add missing cache device name to DMERR in set_cache_mode() 2016-11-21 09:52:03 -05:00
dm-core.h dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-crypt.c dm crypt: replace RCU read-side section with rwsem 2017-02-03 10:26:14 -05:00
dm-delay.c dm: rename target's per_bio_data_size to per_io_data_size 2016-02-22 22:34:37 -05:00
dm-era-target.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-flakey.c dm flakey: introduce "error_writes" feature 2016-12-13 15:01:31 -05:00
dm-io.c dm io: use bvec iterator helpers to implement .get_page and .next_page 2016-11-21 09:51:57 -05:00
dm-ioctl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
dm-kcopyd.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-linear.c libnvdimm for 4.8 2016-07-28 17:38:16 -07:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c dm log userspace transfer: match wait_for_completion_timeout return type 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block 2016-10-07 14:42:05 -07:00
dm-log.c block,fs: use REQ_* flags directly 2016-11-01 09:43:26 -06:00
dm-mpath.c dm mpath: cleanup -Wbool-operation warning in choose_pgpath() 2017-02-03 10:18:37 -05:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-queue-length.c dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-raid.c . various fixes and improvements to request-based DM and DM multipath 2016-12-14 11:01:00 -08:00
dm-raid1.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
dm-region-hash.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-round-robin.c dm round robin: do not use this_cpu_ptr() without having preemption disabled 2016-08-15 09:23:14 -04:00
dm-rq.c dm rq: cope with DM device destruction while in dm_old_request_fn() 2017-02-03 10:18:43 -05:00
dm-rq.h dm rq: introduce dm_mq_kick_requeue_list() 2016-09-15 11:16:05 -04:00
dm-service-time.c dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-snap-persistent.c block,fs: use REQ_* flags directly 2016-11-01 09:43:26 -06:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-snap.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-stats.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-table.c dm table: simplify dm_table_determine_type() 2016-12-08 14:13:03 -05:00
dm-target.c libnvdimm for 4.8 2016-07-28 17:38:16 -07:00
dm-thin-metadata.c dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin-metadata.h dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-uevent.c
dm-uevent.h
dm-verity-fec.c dm verity fec: fix block calculation 2016-07-01 23:29:08 -04:00
dm-verity-fec.h dm verity: add support for forward error correction 2015-12-10 10:39:03 -05:00
dm-verity-target.c dm verity: fix incorrect error message 2016-11-21 09:52:01 -05:00
dm-verity.h dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
dm-zero.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm.c . various fixes and improvements to request-based DM and DM multipath 2016-12-14 11:01:00 -08:00
dm.h dm: add infrastructure for DAX support 2016-07-20 23:49:49 -04:00
faulty.c MD: rename some functions 2016-01-20 13:52:20 -08:00
linear.c md linear: fix a race between linear_add() and linear_congested() 2017-02-13 09:17:50 -08:00
linear.h md linear: fix a race between linear_add() and linear_congested() 2017-02-13 09:17:50 -08:00
md-cluster.c md-cluster: make resync lock also could be interruptted 2016-09-21 09:09:44 -07:00
md-cluster.h md-cluster: gather resync infos and enable recv_thread after bitmap is ready 2016-05-09 09:24:03 -07:00
md.c md/r5cache: flush data only stripes in r5l_recovery_log() 2017-01-24 11:20:15 -08:00
md.h md: cleanup mddev flag clear for takeover 2017-01-05 11:45:18 -08:00
multipath.c Merge branch 'md-next' into md-linus 2016-12-13 12:40:15 -08:00
multipath.h
raid0.c md: cleanup mddev flag clear for takeover 2017-01-05 11:45:18 -08:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md: cleanup mddev flag clear for takeover 2017-01-05 11:45:18 -08:00
raid1.h md/raid1: add failfast handling for reads. 2016-11-22 09:13:18 -08:00
raid5-cache.c md/r5cache: disable write back for degraded array 2017-01-24 11:26:06 -08:00
raid5.c md/r5cache: disable write back for degraded array 2017-01-24 11:26:06 -08:00
raid5.h md/r5cache: disable write back for degraded array 2017-01-24 11:26:06 -08:00
raid10.c md/raid10: Refactor raid10_make_request 2017-01-03 08:56:52 -08:00
raid10.h md/raid10: add failfast handling for reads. 2016-11-22 09:14:28 -08:00