From 4e6c82b3614a18740ef63109d58743a359266daf Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Mon, 7 Nov 2011 08:51:24 -0600 Subject: [PATCH 1/5] [SCSI] fix WARNING: at drivers/scsi/scsi_lib.c:1704 On Mon, 2011-11-07 at 17:24 +1100, Stephen Rothwell wrote: > Hi all, > > Starting some time last week I am getting the following during boot on > our PPC970 blade: > > calling .ipr_init+0x0/0x68 @ 1 > ipr: IBM Power RAID SCSI Device Driver version: 2.5.2 (April 27, 2011) > ipr 0000:01:01.0: Found IOA with IRQ: 26 > ipr 0000:01:01.0: Starting IOA initialization sequence. > ipr 0000:01:01.0: Adapter firmware version: 06160039 > ipr 0000:01:01.0: IOA initialized. > scsi0 : IBM 572E Storage Adapter > ------------[ cut here ]------------ > WARNING: at drivers/scsi/scsi_lib.c:1704 > Modules linked in: > NIP: c00000000053b3d4 LR: c00000000053e5b0 CTR: c000000000541d70 > REGS: c0000000783c2f60 TRAP: 0700 Not tainted (3.1.0-autokern1) > MSR: 8000000000029032 CR: 24002024 XER: 20000002 > TASK = c0000000783b8000[1] 'swapper' THREAD: c0000000783c0000 CPU: 0 > GPR00: 0000000000000001 c0000000783c31e0 c000000000cf38b0 c00000000239a9d0 > GPR04: c000000000cbe8f8 0000000000000000 c0000000783c3040 0000000000000000 > GPR08: c000000075daf488 c000000078a3b7ff c000000000bcacc8 0000000000000000 > GPR12: 0000000044002028 c000000007ffb000 0000000002e40000 000000000099b800 > GPR16: 0000000000000000 c000000000bba5fc c000000000a61db8 0000000000000000 > GPR20: 0000000001b77200 0000000000000000 c000000078990000 0000000000000001 > GPR24: c000000002396828 0000000000000000 0000000000000000 c000000078a3b938 > GPR28: fffffffffffffffa c0000000008ad2c0 c000000000c7faa8 c00000000239a9d0 > NIP [c00000000053b3d4] .scsi_free_queue+0x24/0x90 > LR [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0 > Call Trace: > [c0000000783c31e0] [c000000000c7faa8] wireless_seq_fops+0x278d0/0x2eb88 (unreliable) > [c0000000783c3270] [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0 > [c0000000783c3330] [c00000000053eba0] .scsi_probe_and_add_lun+0x390/0xb40 > [c0000000783c34a0] [c00000000053f7ec] .__scsi_scan_target+0x16c/0x650 > [c0000000783c35f0] [c00000000053fd90] .scsi_scan_channel+0xc0/0x100 > [c0000000783c36a0] [c00000000053fefc] .scsi_scan_host_selected+0x12c/0x1c0 > [c0000000783c3750] [c00000000083dcb4] .ipr_probe+0x2c0/0x390 > [c0000000783c3830] [c0000000003f50b4] .local_pci_probe+0x34/0x50 > [c0000000783c38a0] [c0000000003f5f78] .pci_device_probe+0x148/0x150 > [c0000000783c3950] [c0000000004e1e8c] .driver_probe_device+0xdc/0x210 > [c0000000783c39f0] [c0000000004e20cc] .__driver_attach+0x10c/0x110 > [c0000000783c3a80] [c0000000004e1228] .bus_for_each_dev+0x98/0xf0 > [c0000000783c3b30] [c0000000004e1bf8] .driver_attach+0x28/0x40 > [c0000000783c3bb0] [c0000000004e07d8] .bus_add_driver+0x218/0x340 > [c0000000783c3c60] [c0000000004e2a2c] .driver_register+0x9c/0x1b0 > [c0000000783c3d00] [c0000000003f62d4] .__pci_register_driver+0x64/0x140 > [c0000000783c3da0] [c000000000b99f88] .ipr_init+0x4c/0x68 > [c0000000783c3e20] [c00000000000ad24] .do_one_initcall+0x1a4/0x1e0 > [c0000000783c3ee0] [c000000000b512d0] .kernel_init+0x14c/0x1fc > [c0000000783c3f90] [c000000000022468] .kernel_thread+0x54/0x70 > Instruction dump: > ebe1fff8 7c0803a6 4e800020 7c0802a6 fba1ffe8 fbe1fff8 7c7f1b78 f8010010 > f821ff71 e8030398 3120ffff 7c090110 <0b000000> e86303b0 482de065 60000000 > ---[ end trace 759bed76a85e8dec ]--- > scsi 0:0:1:0: Direct-Access IBM-ESXS MAY2036RC T106 PQ: 0 ANSI: 5 > ------------[ cut here ]------------ > > I get lots more of these. The obvious commit to point the finger at > is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI > commands") but the root cause may be something different. Caused by commit f7c9c6bb14f3104608a3a83cadea10a6943d2804 Author: Anton Blanchard Date: Thu Nov 3 08:56:22 2011 +1100 [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev Doesn't completely do the teardown. The true fix is to do a proper teardown instead of hand rolling it Reported-by: Stephen Rothwell Tested-by: Stephen Rothwell Cc: stable@kernel.org #2.6.38+ Signed-off-by: James Bottomley --- drivers/scsi/scsi_scan.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 72273a0e5666..b3c6d957fbd8 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -319,11 +319,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, return sdev; out_device_destroy: - scsi_device_set_state(sdev, SDEV_DEL); - transport_destroy_device(&sdev->sdev_gendev); - put_device(&sdev->sdev_dev); - scsi_free_queue(sdev->request_queue); - put_device(&sdev->sdev_gendev); + __scsi_remove_device(sdev); out: if (display_failure_msg) printk(ALLOC_FAILURE_MSG, __func__); From 745718132c3c7cac98a622b610e239dcd5217f71 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Wed, 9 Nov 2011 08:39:24 +0100 Subject: [PATCH 2/5] [SCSI] Silencing 'killing requests for dead queue' When we tear down a device we try to flush all outstanding commands in scsi_free_queue(). However the check in scsi_request_fn() is imperfect as it only signals that we _might start_ aborting commands, not that we've actually aborted some. So move the printk inside the scsi_kill_request function, this will also give us a hint about which commands are aborted. Signed-off-by: Hannes Reinecke Signed-off-by: James Bottomley --- drivers/scsi/scsi_lib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 06bc26554a67..f85cfa6c47b5 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1409,6 +1409,8 @@ static void scsi_kill_request(struct request *req, struct request_queue *q) blk_start_request(req); + scmd_printk(KERN_INFO, cmd, "killing request\n"); + sdev = cmd->device; starget = scsi_target(sdev); shost = sdev->host; @@ -1490,7 +1492,6 @@ static void scsi_request_fn(struct request_queue *q) struct request *req; if (!sdev) { - printk("scsi: killing requests for dead queue\n"); while ((req = blk_peek_request(q)) != NULL) scsi_kill_request(req, q); return; From 181a9d795ab763b03b15f73402691abde98d1803 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 4 Nov 2011 21:25:01 +0300 Subject: [PATCH 3/5] [SCSI] mpt2sas: add missing allocation. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There was supposed to be a kzalloc() here and the compiler complained about it. mpt2sas_scsih.c: In function ‘mpt2sas_scsih_reset_handler’: mpt2sas_scsih.c:2807:21: warning: ‘fw_event’ may be used uninitialized in this function [-Wuninitialized] Signed-off-by: Dan Carpenter Acked-by: "Nandigama, Nagalakshmi" Signed-off-by: James Bottomley --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index 8889b1babcac..4e041f6d808c 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -2802,6 +2802,11 @@ _scsih_error_recovery_delete_devices(struct MPT2SAS_ADAPTER *ioc) if (ioc->is_driver_loading) return; + + fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC); + if (!fw_event) + return; + fw_event->event = MPT2SAS_REMOVE_UNRESPONDING_DEVICES; fw_event->ioc = ioc; _scsih_fw_event_add(ioc, fw_event); From cf16123c9c8e346ed1dd171295a678d77648d7f8 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Fri, 11 Nov 2011 13:42:16 +0400 Subject: [PATCH 4/5] [SCSI] aacraid: controller hangs if kernel uses non-default ASPM policy Aacraid controller can hang on some nodes if kernel uses non-default (powersave) ASPM policy. Controller hangs shortly after successful load and hardware detection. Scsi error handler detects this hang and tries to restart hardware but it does not help. Initially it was noticed on RHEL6-based openVZ kernel after backporting aacraid driver from mainline (RHEL6 kernel with original driver works well) http://bugzilla.openvz.org/show_bug.cgi?id=2043 This issue happens because default ASPM policy was changed in Red Hat kernels. Therefore guys from Red Hat have noticed this problem long time ago: on Fedora 12 https://bugzilla.redhat.com/show_bug.cgi?id=540478 on Fedora 14 https://bugzilla.redhat.com/show_bug.cgi?id=679385 In RHEL6 kernel this issue was fixed, ASPM was disabled in aacraid driver. In kernel changelog I've found that seems it was done by Matthew Garrett: - [scsi] aacraid: Disable ASPM by default (Matthew Garrett) [599735] However seems this patch was not submitted to mainline. I've reproduced this issue on vanilla 3.1.0 kernel booted with "pcie_aspm.policy=powersave" option, So I believe it makes sense to do it now. Signed-off-by: Vasily Averin [mjg: Checking the Windows drivers indicates that they disable ASPM under all circumstances, so:] Acked-by: Matthew Garrett Acked-by: Achim Leubner Cc: stable@kernel.org Signed-off-by: James Bottomley --- drivers/scsi/aacraid/linit.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c index 4aa76d6f11df..705e13e470af 100644 --- a/drivers/scsi/aacraid/linit.c +++ b/drivers/scsi/aacraid/linit.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -1109,6 +1110,9 @@ static int __devinit aac_probe_one(struct pci_dev *pdev, unique_id++; } + pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 | + PCIE_LINK_STATE_CLKPM); + error = pci_enable_device(pdev); if (error) goto out; From e5a44df85e8d78e5c2d3d2e4f59b460905691e2f Mon Sep 17 00:00:00 2001 From: Matthew Garrett Date: Fri, 11 Nov 2011 11:14:23 -0500 Subject: [PATCH 5/5] [SCSI] hpsa: Disable ASPM The Windows driver .inf disables ASPM on hpsa devices. Do the same because the selection of a non default ASPM policy can cause the device to hang. Signed-off-by: Matthew Garrett Cc: stable@kernel.org Acked-by: Mike Miller Signed-off-by: James Bottomley --- drivers/scsi/hpsa.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index e76107b2ade3..865d452542be 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -3922,6 +3923,10 @@ static int __devinit hpsa_pci_init(struct ctlr_info *h) dev_warn(&h->pdev->dev, "controller appears to be disabled\n"); return -ENODEV; } + + pci_disable_link_state(h->pdev, PCIE_LINK_STATE_L0S | + PCIE_LINK_STATE_L1 | PCIE_LINK_STATE_CLKPM); + err = pci_enable_device(h->pdev); if (err) { dev_warn(&h->pdev->dev, "unable to enable PCI device\n");