OpenCloudOS-Kernel/drivers
colyli@suse.de 03a9e24ef2 md linear: fix a race between linear_add() and linear_congested()
Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.

There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend().  After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.

Here I explain how the race still exists in current code.  For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.

Now I use a possible code execution time sequence to demo how the possible
race happens,

seq    linear_add()                linear_congested()
 0                                 conf=mddev->private
 1   oldconf=mddev->private
 2   mddev->raid_disks++
 3                              for (i=0; i<mddev->raid_disks;i++)
 4                                bdev_get_queue(conf->disks[i].rdev->bdev)
 5   mddev->private=newconf

In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.

To fix this race, there are 2 parts of modification in the patch,
 1) Add 'int raid_disks' in struct linear_conf, as a copy of
    mddev->raid_disks. It is initialized in linear_conf(), always being
    consistent with pointers number of 'struct dev_info disks[]'. When
    iterating conf->disks[] in linear_congested(), use conf->raid_disks to
    replace mddev->raid_disks in the for-loop, then NULL pointer deference
    will not happen again.
 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
    free oldconf memory. Because oldconf may be referenced as mddev->private
    in linear_congested(), kfree_rcu() makes sure that its memory will not
    be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.

This patch can be applied for kernels since v4.0 after commit:
3be260cc18 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.

Changelog:
 - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
       replace rcu_call() in linear_add().
 - v2: add RCU stuffs by suggestion from Shaohua and Neil.
 - v1: initial effort.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-13 09:17:50 -08:00
..
accessibility
acpi Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 2017-02-06 19:36:04 -08:00
amba
android
ata ata: sata_mv:- Handle return value of devm_ioremap. 2017-01-06 15:45:32 -05:00
atm Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
auxdisplay auxdisplay: fix new ht16k33 build errors 2017-01-11 09:27:30 +01:00
base Merge branches 'pm-core-fixes' and 'pm-cpufreq-fixes' 2017-02-06 14:52:10 +01:00
bcma Revert "bcma: init serial console directly from ChipCommon code" 2017-01-17 14:23:44 +02:00
block xen-blkfront: correct maximum segment accounting 2017-01-23 13:27:42 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-16 10:24:44 -08:00
bus cpu/hotplug: Cleanup state names 2016-12-25 10:47:44 +01:00
cdrom Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
char Revert "hwrng: core - zeroize buffers with random data" 2017-02-08 18:06:03 -08:00
clk One fix for Samsung Exynos524x SoCs where recent IOMMU patches have 2017-01-21 18:46:45 -08:00
clocksource clocksource/exynos_mct: Clear interrupt when cpu is shut down 2017-01-17 10:08:38 +01:00
connector
cpufreq Merge branches 'pm-core-fixes' and 'pm-cpufreq-fixes' 2017-02-06 14:52:10 +01:00
cpuidle Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-02-06 14:16:23 -08:00
dax libnvdimm for 4.10 2016-12-18 15:49:10 -08:00
dca
devfreq PM / devfreq: exynos-bus: Fix the wrong return value 2017-01-03 00:21:45 +01:00
dio Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
dma dmaengine: pl330: fix double lock 2017-01-25 15:35:11 +05:30
dma-buf
edac Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
eisa
extcon extcon: return error code on failure 2017-01-11 09:11:39 +01:00
firewire
firmware efi/fdt: Avoid FDT manipulation after ExitBootServices() 2017-02-01 21:17:49 +01:00
fmc
fpga fpga: Clarify how write_init works streaming modes 2016-11-29 15:51:49 -06:00
gpio gpio: provide lockdep keys for nested/unnested irqchips 2017-01-19 09:57:20 +01:00
gpu Merge tag 'drm-intel-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes 2017-02-10 10:14:24 +10:00
hid HID: cp2112: fix gpio-callback error handling 2017-01-31 12:59:33 +01:00
hsi
hv Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read() 2017-01-31 10:59:48 +01:00
hwmon hwmon: (lm90) fix temp1_max_alarm attribute 2017-01-02 10:15:28 -08:00
hwspinlock
hwtracing coresight/etm3/4x: Consolidate hotplug state space 2016-12-25 10:47:44 +01:00
i2c i2c: piix4: Request the SMBUS semaphore inside the mutex 2017-02-09 17:13:01 +01:00
ide Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
idle Power management material for v4.10-rc1 2016-12-13 10:41:53 -08:00
iio iio: dht11: Use usleep_range instead of msleep for start signal 2017-01-22 13:35:40 +00:00
infiniband IB/rxe: Fix mem_check_range integer overflow 2017-02-08 12:28:30 -05:00
input Input: synaptics-rmi4 - select 'SERIO' when needed 2017-02-07 14:31:45 -08:00
iommu IOMMU Fixes for Linux v4.10-rc2 2017-01-06 10:49:36 -08:00
ipack
irqchip irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND 2016-12-31 19:06:44 +00:00
isdn ISDN: eicon: silence misleading array-bounds warning 2017-01-27 11:27:34 -05:00
leds cpu/hotplug: Cleanup state names 2016-12-25 10:47:44 +01:00
lguest Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
lightnvm Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
macintosh Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mailbox ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
mcb
md md linear: fix a race between linear_add() and linear_congested() 2017-02-13 09:17:50 -08:00
media media fixes for v4.10-rc8 2017-02-06 14:37:55 -08:00
memory Fixes for drivers already queued to prevent 2016-11-30 14:58:00 +01:00
memstick drivers/memstick/core/memstick.c: avoid -Wnonnull warning 2017-01-24 16:26:14 -08:00
message Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mfd - New Device Support 2016-12-19 08:16:26 -08:00
misc mei: bus: enable OS version only for SPT and newer 2017-01-11 07:43:57 +01:00
mmc mmc: mmci: avoid clearing ST Micro busy end interrupt mistakenly 2017-02-08 12:22:27 +01:00
mtd mtd: nand: lpc32xx: fix invalid error handling of a requested irq 2017-01-04 20:50:18 +01:00
net xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend() 2017-02-10 13:44:49 -05:00
nfc
ntb ntb_transport: Remove unnecessary call to ntb_peer_spad_read 2016-12-23 16:11:07 -05:00
nubus Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
nvdimm libnvdimm, pfn: fix memmap reservation size versus 4K alignment 2017-02-04 14:47:31 -08:00
nvme nvme-fc: use blk_rq_nr_phys_segments 2017-01-26 17:49:14 +02:00
nvmem nvmem: fix nvmem_cell_read() return type doc 2017-01-04 18:22:47 +01:00
of pci-v4.10-changes 2016-12-15 12:46:48 -08:00
oprofile Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
parisc Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
parport parisc, parport_gsc: Fixes for printk continuation lines 2017-01-28 21:54:21 +01:00
pci Revert "PCI: pciehp: Add runtime PM support for PCIe hotplug ports" 2017-02-03 08:53:51 -06:00
pcmcia drivers/pcmcia/m32r_pcc.c: check return from add_pcc_socket 2016-12-12 18:55:06 -08:00
perf cpu/hotplug: Cleanup state names 2016-12-25 10:47:44 +01:00
phy SCSI misc on 20161213 2016-12-14 10:49:33 -08:00
pinctrl pinctrl: baytrail: Add missing spinlock usage in byt_gpio_irq_handler 2017-01-30 15:53:57 +01:00
platform platform/x86: ideapad-laptop: handle ACPI event 1 2017-01-22 12:47:06 +02:00
pnp Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
power ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
powercap powercap / RAPL: Add Knights Mill CPUID 2016-11-30 23:41:33 +01:00
pps
ps3
ptp Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:56:15 -08:00
pwm pwm: Changes for v4.10-rc1 2016-12-15 11:45:13 -08:00
rapidio
ras
regulator regulator: Fixes for v4.10 2017-02-03 13:46:38 -08:00
remoteproc Revert "remoteproc: Merge table_ptr and cached_table pointers" 2016-12-30 03:26:31 -08:00
reset ARM: SoC driver updates for v4.10 2016-12-15 16:03:25 -08:00
rpmsg rpmsg: virtio_rpmsg_bus: fix channel creation 2016-12-30 03:12:11 -08:00
rtc rtc: jz4740: make the driver buildable as a module again 2017-01-26 23:03:21 +01:00
s390 SCSI fixes on 20170210 2017-02-11 09:01:03 -08:00
sbus Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
scsi SCSI fixes on 20170210 2017-02-11 09:01:03 -08:00
sfi
sh lib: radix-tree: check accounting of existing slot replacement users 2016-12-12 18:55:08 -08:00
sn
soc soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe() 2017-01-12 13:37:49 -08:00
spi spi: Fixes for v4.10 2017-01-20 12:25:11 -08:00
spmi
ssb
staging mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers 2017-02-08 15:41:43 -08:00
target target: Fix COMPARE_AND_WRITE ref leak for non GOOD status 2017-02-08 08:25:39 -08:00
tc
thermal Revert "thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()" 2017-01-25 09:51:08 +08:00
thunderbolt Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
tty sysrq: attach sysrq handler correctly for 32-bit kernel 2017-01-11 09:22:54 +01:00
uio uio-hv-generic: store physical addresses instead of virtual 2016-12-10 14:57:58 +01:00
usb USB-serial fixes for v4.10-rc7 2017-02-03 22:19:15 +01:00
uwb
vfio vfio/spapr_tce: Set window when adding additional groups to container 2017-02-07 11:48:16 -07:00
vhost vhost: fix initialization for vq->is_le 2017-02-03 23:38:57 +02:00
video fbdev: color map copying bounds checking 2017-01-24 16:26:14 -08:00
virt
virtio Revert "vring: Force use of DMA API for ARM-based systems with legacy devices" 2017-02-03 23:38:50 +02:00
vlynq
vme vme: Fix wrong pointer utilization in ca91cx42_slave_get 2017-01-11 10:42:16 +01:00
w1
watchdog Watchdog updates for v4.10 2016-12-24 11:27:45 -08:00
xen Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2017-01-27 12:17:07 -08:00
zorro Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
Kconfig
Makefile