OpenCloudOS-Kernel/drivers
Sreekanth Reddy e70183143c scsi: mpt3sas: Fix calltrace observed while running IO & reset
Below kernel BUG was observed while running IOs with host reset (issued
from application),

mpt3sas_cm0: diag reset: SUCCESS
------------[ cut here ]------------
WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Call Trace:
 [<ffffffff9cf16583>] dump_stack+0x19/0x1b
 [<ffffffff9c891698>] __warn+0xd8/0x100
 [<ffffffff9c8917dd>] warn_slowpath_null+0x1d/0x20
 [<ffffffffc04f3f4d>] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
 [<ffffffffc05047d2>] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas]
 [<ffffffffc05095db>] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas]
 [<ffffffff9c894829>] ? vprintk_default+0x29/0x40
 [<ffffffff9cf10531>] ? printk+0x60/0x77
 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [<ffffffffc04f794d>] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas]
 [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
---[ end trace 5dac5b98d89aaa3c ]---
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1476!
invalid opcode: 0000 [] SMP
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000
RIP: 0010:[<ffffffff9cb19ec0>]  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
RSP: 0018:ffff903c6b783dc0  EFLAGS: 00010087
RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200
RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0
RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000
R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580
R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057
FS:  00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0
Call Trace:
 <IRQ>
 [<ffffffff9cca68ff>] __scsi_queue_insert+0xbf/0x110
 [<ffffffff9cca79ca>] scsi_io_completion+0x5da/0x6a0
 [<ffffffff9cc9ca3c>] scsi_finish_command+0xdc/0x140
 [<ffffffff9cca6aa2>] scsi_softirq_done+0x132/0x160
 [<ffffffff9cb240c6>] blk_done_softirq+0x96/0xc0
 [<ffffffff9c89a905>] __do_softirq+0xf5/0x280
 [<ffffffff9cf2bd2c>] call_softirq+0x1c/0x30
 [<ffffffff9c82d625>] do_softirq+0x65/0xa0
 [<ffffffff9c89ac85>] irq_exit+0x105/0x110
 [<ffffffff9cf2d0a8>] smp_apic_timer_interrupt+0x48/0x60
 [<ffffffff9cf297f2>] apic_timer_interrupt+0x162/0x170
 <EOI>
 [<ffffffff9cca5f41>] ? scsi_done+0x21/0x60
 [<ffffffff9cb5ac18>] ? delay_tsc+0x38/0x60
 [<ffffffff9cb5ab5d>] __const_udelay+0x2d/0x30
 [<ffffffffc04effde>] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas]
 [<ffffffffc04f0b13>] _base_get_ioc_facts+0x123/0x590 [mpt3sas]
 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [<ffffffffc04f7993>] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas]
 [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
 RSP <ffff903c6b783dc0>

As a part of host reset operation, driver will flushout all IOs outstanding
at driver level with "DID_RESET" result.  To find which are all commands
outstanding at the driver level, driver loops with smid starting from one
to HBA queue depth and calls mpt3sas_scsih_scsi_lookup_get() to get scmd as
shown below

 for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
                scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
                if (!scmd)
                        continue;

But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some scsi
cmnds which are not outstanding at the driver level (possibly request is
constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if
driver uses scsi_block_requests and scsi_unblock_requests, issue still
persists as they will be just blocking further IO from scsi layer and not
from block layer) and these commands are flushed with DID_RESET host bytes
thus resulting into above kernel BUG.

This issue got introduced by commit dbec4c9040 ("scsi: mpt3sas: lockless
command submission").

To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get() to
check for smid equals to zero (note: whenever any scsi cmnd is processing
at the driver level then smid for that scsi cmnd will be non-zero, always
it starts from one) before it returns the scmd pointer to the caller. If
smid is zero then this function returns scmd pointer as NULL and driver
won't flushout those scsi cmnds at driver level with DID_RESET host byte
thus this issue will not be observed.

[mkp: amended with updated fix from Sreekanth]

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Fixes: dbec4c9040 ("scsi: mpt3sas: lockless command submission")
Cc: stable@vger.kernel.org # v4.16+
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-08 21:26:47 -04:00
..
accessibility
acpi pwm: Changes for v4.18-rc1 2018-06-14 16:25:43 +09:00
amba Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-06-06 13:49:25 -07:00
android treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
ata scsi: libsas: dynamically allocate and free ata host 2018-06-19 22:02:25 -04:00
atm treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
auxdisplay treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
base Additional power management updates for 4.18-rc1 2018-06-13 07:24:18 -07:00
bcma dma-mapping updates for 4.18: 2018-06-04 10:58:12 -07:00
block The main piece is a set of libceph changes that revamps how OSD 2018-06-15 07:24:58 +09:00
bluetooth Bluetooth: btusb: Add additional device ID for RTL8822BE 2018-05-30 15:45:01 +02:00
bus - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
cdrom treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
char docs: Fix some broken references 2018-06-15 18:10:01 -03:00
clk docs: Fix some broken references 2018-06-15 18:10:01 -03:00
clocksource treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
connector Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
cpufreq Additional power management updates for 4.18-rc1 2018-06-13 07:24:18 -07:00
cpuidle powerpc updates for 4.18 2018-06-07 10:23:33 -07:00
crypto treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
dax libnvdimm for 4.18 2018-06-08 17:21:52 -07:00
dca
devfreq treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
dio
dma fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
dma-buf
edac treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
eisa
extcon treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
firewire treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
firmware Merge branch 'akpm' (patches from Andrew) 2018-06-15 08:51:42 +09:00
fmc treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
fpga fpga: clarify that unregister functions also free 2018-05-25 18:23:56 +02:00
fsi
gpio treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
gpu Solve a series of broken links for files under Documentation: 2018-06-17 05:25:18 +09:00
hid docs: fix broken references with multiple hints 2018-06-15 18:10:01 -03:00
hsi
hv treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
hwmon treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
hwspinlock hwspinlock updates for v4.18 2018-06-11 12:09:19 -07:00
hwtracing treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
i2c Merge branch 'i2c/for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-06-14 16:21:46 +09:00
ide treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
idle
iio treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
infiniband scsi: target: srp, vscsi, sbp, qla: use target_remove_session 2018-08-02 15:29:31 -04:00
input docs: Fix some broken references 2018-06-15 18:10:01 -03:00
iommu - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
ipack treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
irqchip treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
isdn treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
leds treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
lightnvm docs: Fix some broken references 2018-06-15 18:10:01 -03:00
macintosh powerpc updates for 4.18 2018-06-07 10:23:33 -07:00
mailbox treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
mcb
md docs: Fix some broken references 2018-06-15 18:10:01 -03:00
media Solve a series of broken links for files under Documentation: 2018-06-17 05:25:18 +09:00
memory - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
memstick treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
message scsi: message: fusion: Replace GFP_ATOMIC with GFP_KERNEL 2018-07-30 23:17:53 -04:00
mfd Merge branch 'i2c/for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-06-14 16:21:46 +09:00
misc Merge branch 'i2c/for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-06-14 16:21:46 +09:00
mmc treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
mtd - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
mux
net Solve a series of broken links for files under Documentation: 2018-06-17 05:25:18 +09:00
nfc treewide: devm_kmalloc() -> devm_kmalloc_array() 2018-06-12 16:19:22 -07:00
ntb - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
nubus Char/Misc driver patches for 4.18-rc1 2018-06-05 16:20:22 -07:00
nvdimm Merge branch 'for-4.18/mcsafe' into libnvdimm-for-next 2018-06-08 15:16:44 -07:00
nvme Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linus 2018-06-15 08:11:05 -06:00
nvmem treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
of - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
opp treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
oprofile treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
parisc dma-mapping updates for 4.18: 2018-06-04 10:58:12 -07:00
parport docs: Fix some broken references 2018-06-15 18:10:01 -03:00
pci - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
pcmcia treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
perf drivers/bus: arm-cci: fix build warnings 2018-05-29 16:38:16 +01:00
phy Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
pinctrl treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
platform fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
pnp media updates for v4.18-rc1 2018-06-07 12:34:37 -07:00
power treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
powercap treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
pps
ps3
ptp ptp_qoriq: move some definitions to header file 2018-05-28 23:05:11 -04:00
pwm pwm: Changes for v4.18-rc1 2018-06-14 16:25:43 +09:00
rapidio treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
ras
regulator treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
remoteproc treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX 2018-06-15 07:55:25 +09:00
reset - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
rpmsg rpmsg: smd: do not use mananged resources for endpoints and channels 2018-06-04 12:35:03 -07:00
rtc - New Device Support 2018-06-11 07:20:17 -07:00
s390 treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
sbus fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
scsi scsi: mpt3sas: Fix calltrace observed while running IO & reset 2018-08-08 21:26:47 -04:00
sfi
sh treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
siox
slimbus treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
sn
soc treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX 2018-06-15 07:55:25 +09:00
soundwire docs: Fix more broken references 2018-06-15 18:11:26 -03:00
spi treewide: devm_kzalloc() -> devm_kcalloc() 2018-06-12 16:19:22 -07:00
spmi
ssb
staging media: v4l: fix broken video4linux docs locations 2018-06-15 18:10:01 -03:00
target scsi: target: loop, usb, vhost, xen: use target_remove_session 2018-08-02 15:29:31 -04:00
tc
tee
thermal - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
thunderbolt
tty vfs/y2038: inode timestamps conversion to timespec64 2018-06-15 07:31:07 +09:00
uio treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
usb scsi: target: loop, usb, vhost, xen: use target_remove_session 2018-08-02 15:29:31 -04:00
uwb treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
vfio VFIO updates for v4.18 2018-06-12 13:11:26 -07:00
vhost scsi: target: loop, usb, vhost, xen: use target_remove_session 2018-08-02 15:29:31 -04:00
video Solve a series of broken links for files under Documentation: 2018-06-17 05:25:18 +09:00
virt treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
virtio virtio, vhost: features, fixes 2018-06-16 06:35:02 +09:00
visorbus
vlynq
vme
w1 Char/Misc driver patches for 4.18-rc1 2018-06-05 16:20:22 -07:00
watchdog MIPS changes for 4.18 2018-06-12 12:56:02 -07:00
xen scsi: target: loop, usb, vhost, xen: use target_remove_session 2018-08-02 15:29:31 -04:00
zorro - Introduce arithmetic overflow test helper functions (Rasmus) 2018-06-06 17:27:14 -07:00
Kconfig
Makefile