Commit Graph

302 Commits

Author SHA1 Message Date
Luo Jiaxing f4df167ad5 scsi: hisi_sas: Print SATA device SAS address for soft reset failure
Add (pseudo) SAS address for ATA software reset failure log to assist in
debugging.

Link: https://lore.kernel.org/r/1617709711-195853-7-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12 23:21:26 -04:00
Jianqin Xie 2c74cb1f92 scsi: hisi_sas: Directly snapshot registers when executing a reset
The debugfs snapshot should be executed before the reset occurs to ensure
that the register contents are saved properly.

As such, it is incorrect to queue the debugfs dump when running a reset as
the reset will occur prior to the snapshot work item is handler.

Therefore, directly snapshot registers in the reset work handler.

Link: https://lore.kernel.org/r/1617709711-195853-5-git-send-email-john.garry@huawei.com
Signed-off-by: Jianqin Xie <xiejianqin@hisilicon.com>
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12 23:21:26 -04:00
Xiang Chen f467666504 scsi: hisi_sas: Call sas_unregister_ha() to roll back if .hw_init() fails
Function sas_unregister_ha() needs to be called to roll back if
hisi_hba->hw->hw_init() fails in function hisi_sas_probe() or
hisi_sas_v3_probe(). Make that change.

Link: https://lore.kernel.org/r/1617709711-195853-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-12 23:21:26 -04:00
Luo Jiaxing 1dbe61bf7d scsi: hisi_sas: Enable debugfs support by default
Add a config option to enable debugfs support by default. And if debugfs
support is enabled by default, dump count default value is increased to 50
as generally users want something bigger than the current default in that
situation.

Link: https://lore.kernel.org/r/1611659068-131975-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-26 23:02:11 -05:00
John Garry 69bfa5fd7b scsi: hisi_sas: Don't check .nr_hw_queues in hisi_sas_task_prep()
Now that v2 and v3 hw expose their HW queues (and so shost.nr_hw_queues is
set), remove the conditional checks in hisi_sas_task_prep().

This change would affect v1 HW performance (as it does not expose HW
queues), but nobody uses it and support may be dropped soon.

Link: https://lore.kernel.org/r/1611659068-131975-3-git-send-email-john.garry@huawei.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-26 23:02:11 -05:00
Ahmed S. Darwish 872a90b5b4 scsi: hisi_sas: Switch back to original libsas event notifiers
libsas event notifiers required an extension where gfp_t flags must be
explicitly passed. For bisectability, a temporary _gfp() variant of such
functions were added. All call sites then got converted use the _gfp()
variants and explicitly pass GFP context. Having no callers left, the
original libsas notifiers were then modified to accept gfp_t flags by
default.

Switch back to the original libas API, while still passing GFP context.
The libsas _gfp() variants will be removed afterwards.

Link: https://lore.kernel.org/r/20210118100955.1761652-14-a.darwish@linutronix.de
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-22 21:31:09 -05:00
Ahmed S. Darwish 26c7efc3f9 scsi: hisi_sas: Pass gfp_t flags to libsas event notifiers
Use the new libsas event notifiers API, which requires callers to
explicitly pass the gfp_t memory allocation flags.

Below are the context analysis for modified functions:

=> hisi_sas_bytes_dmaed():

Since it is invoked from both process and atomic contexts, let its callers
pass the gfp_t flags:

  * hisi_sas_main.c:
  ------------------

    hisi_sas_phyup_work(): workqueue context
      -> hisi_sas_bytes_dmaed(..., GFP_KERNEL)

    hisi_sas_controller_reset_done(): has an msleep()
      -> hisi_sas_rescan_topology()
        -> hisi_sas_phy_down()
          -> hisi_sas_bytes_dmaed(..., GFP_KERNEL)

    hisi_sas_debug_I_T_nexus_reset(): calls wait_for_completion_timeout()
      -> hisi_sas_phy_down()
        -> hisi_sas_bytes_dmaed(..., GFP_KERNEL)

  * hisi_sas_v1_hw.c:
  -------------------

    int_abnormal_v1_hw(): irq handler
      -> hisi_sas_phy_down()
        -> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)

  * hisi_sas_v[23]_hw.c:
  ----------------------

    int_phy_updown_v[23]_hw(): irq handler
      -> phy_down_v[23]_hw()
        -> hisi_sas_phy_down()
          -> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)

=> int_bcast_v1_hw() and phy_bcast_v3_hw():

Both are invoked exclusively from irq handlers. Pass GFP_ATOMIC.

Link: https://lore.kernel.org/r/20210118100955.1761652-12-a.darwish@linutronix.de
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-22 21:31:08 -05:00
John Garry 121181f3f8 scsi: libsas: Remove notifier indirection
LLDDs report events to libsas with .notify_port_event and .notify_phy_event
callbacks.

These callbacks are fixed and so there is no reason why the functions
cannot be called directly, so do that.

This neatens the code slightly, makes it more obvious, and reduces function
pointer usage, which is generally a good thing. Downside is that there are
2x more symbol exports.

[a.darwish@linutronix.de: Remove the now unused "sas_ha" local variables]

Link: https://lore.kernel.org/r/20210118100955.1761652-3-a.darwish@linutronix.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-22 21:31:07 -05:00
Martin K. Petersen a8f808839a Merge branch '5.11/scsi-postmerge' into 5.11/scsi-fixes
Merge two commits that had dependencies on other 5.11 trees (the block
and the irq trees respectively).

 - We reverted a megaraid_sas change in 5.10 due to missing block
   layer plumbing. Now that this is in place, reinstate the change.

 - The hisi_sas driver had a dependency on a driver core irq change
   that went in through Thomas' tree.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-04 13:27:39 -05:00
John Garry 74a2921948 scsi: hisi_sas: Expose HW queues for v2 hw
As a performance enhancement, make the completion queue interrupts managed.

In addition, in commit bf0beec060 ("blk-mq: drain I/O when all CPUs in a
hctx are offline"), CPU hotplug for MQ devices using managed interrupts is
made safe. So expose HW queues to blk-mq to take advantage of this.

Flag Scsi_host.host_tagset is also set to ensure that the HBA is not sent
more commands than it can handle. However the driver still does not use
request tag for IPTT as there are many HW bugs means that special rules
apply for IPTT allocation.

Link: https://lore.kernel.org/r/1606905417-183214-6-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-21 22:21:05 -05:00
Linus Torvalds 60f7c503d9 SCSI misc on 20201216
This series consists of the usual driver updates (ufs, qla2xxx,
 smartpqi, target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of
 cleanups, a major power management rework and a load of assorted minor
 updates.  There are a few core updates (formatting fixes being the big
 one) but nothing major this cycle.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX9o0KSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbOZAP9D5NTN
 J7dJUo2MIMy84YBu+d9ag7yLlNiRWVY2yw5vHwD/Z7JjAVLwz/tzmyjU9//o2J6w
 hwhOv6Uto89gLCWSEz8=
 =KUPT
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This consists of the usual driver updates (ufs, qla2xxx, smartpqi,
  target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of cleanups, a major
  power management rework and a load of assorted minor updates.

  There are a few core updates (formatting fixes being the big one) but
  nothing major this cycle"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
  scsi: mpt3sas: Update driver version to 36.100.00.00
  scsi: mpt3sas: Handle trigger page after firmware update
  scsi: mpt3sas: Add persistent MPI trigger page
  scsi: mpt3sas: Add persistent SCSI sense trigger page
  scsi: mpt3sas: Add persistent Event trigger page
  scsi: mpt3sas: Add persistent Master trigger page
  scsi: mpt3sas: Add persistent trigger pages support
  scsi: mpt3sas: Sync time periodically between driver and firmware
  scsi: qla2xxx: Update version to 10.02.00.104-k
  scsi: qla2xxx: Fix device loss on 4G and older HBAs
  scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry
  scsi: qla2xxx: Fix the call trace for flush workqueue
  scsi: qla2xxx: Fix flash update in 28XX adapters on big endian machines
  scsi: qla2xxx: Handle aborts correctly for port undergoing deletion
  scsi: qla2xxx: Fix N2N and NVMe connect retry failure
  scsi: qla2xxx: Fix FW initialization error on big endian machines
  scsi: qla2xxx: Fix crash during driver load on big endian machines
  scsi: qla2xxx: Fix compilation issue in PPC systems
  scsi: qla2xxx: Don't check for fw_started while posting NVMe command
  scsi: qla2xxx: Tear down session if FW say it is down
  ...
2020-12-16 13:34:31 -08:00
Xiang Chen 359db63378 scsi: hisi_sas: Select a suitable queue for internal I/Os
For when managed interrupts are used (and shost->nr_hw_queues is set), a
fixed queue - set per-device - is still used for internal I/Os.

If all the CPUs mapped to that queue are offlined, then the completions for
that queue are not serviced and any internal I/Os will time out.

Fix by selecting a queue for internal I/Os from the queue mapped from the
current CPU in this scenario.

This is still not ideal as it does not deal with CPU hotplug for inflight
internal I/Os, and needs proper support from [0].

[0] https://lore.kernel.org/linux-scsi/20200703130122.111448-1-hare@suse.de/T/#m7d77d049b18f33a24ef206af69ebb66d07440556

Link: https://lore.kernel.org/r/1607347855-59091-1-git-send-email-john.garry@huawei.com
Fixes: 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-07 21:23:51 -05:00
Ahmed S. Darwish 18577cdcae scsi: hisi_sas: Remove preemptible()
hisi_sas_task_exec() uses preemptible() to see if it's safe to block.  This
does not work for CONFIG_PREEMPT_COUNT=n kernels in which preemptible()
always returns 0.

The problem is masked when enabling some of the common Kconfig.debug
options (like CONFIG_DEBUG_ATOMIC_SLEEP), as they implicitly enable the
preemption counter.

In general, driver leaf functions should not make logic decisions based on
the context they're called from. The caller should be the entity
responsible for explicitly indicating context.

Since hisi_sas_task_exec() already has a gfp_t flags parameter, use it as
the explicit context marker.

Link: https://lore.kernel.org/r/20201126132952.2287996-3-bigeasy@linutronix.de
Fixes: 214e702d4b ("scsi: hisi_sas: Adjust task reject period during host reset")
Fixes: 550c0d89d5 ("scsi: hisi_sas: Replace in_softirq() check in hisi_sas_task_exec()")
Cc: Xiaofei Tan <tanxiaofei@huawei.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: John Garry <john.garry@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-01 00:03:52 -05:00
Luo Jiaxing 623a4b6d5c scsi: hisi_sas: Move debugfs code to v3 hw driver
Relocate all the debugfs code for DFX to v3 hw since no other versions
support it.

Link: https://lore.kernel.org/r/1606207594-196362-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-30 23:42:16 -05:00
John Garry fab09aaee8 scsi: hisi_sas: Stop using queue #0 always for v2 hw
In commit 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ"), the dispatch
function was changed to choose the delivery queue based on the request tag
HW queue index.

This heavily degrades performance for v2 hw, since the HW queues are not
exposed there, and, as such, HW queue #0 is used for every command.

Revert to previous behaviour for when nr_hw_queues is not set, that being
to choose the HW queue based on target device index.

Link: https://lore.kernel.org/r/1602750425-240341-1-git-send-email-john.garry@huawei.com
Fixes: 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-26 17:34:27 -04:00
Linus Torvalds 55e0500eb5 SCSI misc on 20201013
This series consists of the usual driver updates (ufs, qla2xxx, tcmu,
 ibmvfc, lpfc, smartpqi, hisi_sas, qedi, qedf, mpt3sas) and minor bug
 fixes.  There are only three core changes: adding sense codes,
 cleaning up noretry and adding an option for limitless retries.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX4YulyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaZDAQCT7rwG
 UEZYHgYkU9EX9ERVBQM0SW4mLrxf3g3P5ioJsAEAtkclCM4QsIOP+MIPjIa0EyUY
 khu0kcrmeFR2YwA8zhw=
 =4w4S
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
  hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.

  There are only three core changes: adding sense codes, cleaning up
  noretry and adding an option for limitless retries"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
  scsi: hisi_sas: Recover PHY state according to the status before reset
  scsi: hisi_sas: Filter out new PHY up events during suspend
  scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
  scsi: hisi_sas: Add check for methods _PS0 and _PR0
  scsi: hisi_sas: Add controller runtime PM support for v3 hw
  scsi: hisi_sas: Switch to new framework to support suspend and resume
  scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
  scsi: qedf: Remove redundant assignment to variable 'rc'
  scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
  scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
  scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
  scsi: sun_esp: Use module_platform_driver to simplify the code
  scsi: sun3x_esp: Use module_platform_driver to simplify the code
  scsi: sni_53c710: Use module_platform_driver to simplify the code
  scsi: qlogicpti: Use module_platform_driver to simplify the code
  scsi: mac_esp: Use module_platform_driver to simplify the code
  scsi: jazz_esp: Use module_platform_driver to simplify the code
  scsi: mvumi: Fix error return in mvumi_io_attach()
  scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
  scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
  ...
2020-10-14 15:15:35 -07:00
Xiang Chen 69f4ec1edb scsi: hisi_sas: Recover PHY state according to the status before reset
Currently the PHY state is set according to the state of the PHYs after
reset. This is invalid as the PHYs are already re-initialized.

Set PHY state according to the state before the reset instead of after.

Link: https://lore.kernel.org/r/1601649038-25534-8-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-06 20:47:06 -04:00
Xiang Chen b14a37e011 scsi: hisi_sas: Filter out new PHY up events during suspend
Currently sas_resume_ha() is called while resuming the controller to wait
for all suspended PHYs to come up and all the libsas events to be
completed.

There is a scenario which will cause task hung: For direct attach with two
disks connected with two PHYs, disable phy0 before suspending the disk on
phy1 and the controller, then enable phy0 and resume the controller, and
task hung occurs as follows:

[  591.901463] hisi_sas_v3_hw 0000:b4:02.0: resuming from operating state [D0]
[  593.113525] hisi_sas_v3_hw 0000:b4:02.0: neither _PS0 nor _PR0 is defined
[  593.120301] hisi_sas_v3_hw 0000:b4:02.0: waiting up to 25 seconds for 1 phy to resume
[  593.120836] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy0 link_rate=10(sata)
[  593.134680] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy1 link_rate=10(sata)
[  593.134733] sas: phy-2:0 added to port-2:0, phy_mask:0x1 (5000000000000200)
[  593.148350] sas: DOING DISCOVERY on port 0, pid:948
[  593.153227] hisi_sas_v3_hw 0000:b4:02.0: dev[3:5] found
[  593.159840] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[  593.165663] sas: ata7: end_device-2:0: dev error handler
[  593.165730] sas: ata2: end_device-2:1: dev error handler
[  593.172532] hisi_sas_v3_hw 0000:b4:02.0: phydown: phy0 phy_state=0x2
[  593.182570] hisi_sas_v3_hw 0000:b4:02.0: ignore flutter phy0 down
[  593.331277] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy0 link_rate=10(sata)
[  593.498956] ata7.00: ATA-11: SAMSUNG MZ7LH960HAJR-00005, HXT7404Q, max UDMA/133
[  593.506235] ata7.00: 1875385008 sectors, multi 16: LBA48 NCQ (depth 32)
[  593.514295] ata7.00: configured for UDMA/133
[  593.518557] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[  593.528613] sas: ata7: end_device-2:0: model:SAMSUNG MZ7LH960HAJR-00005
serial:S45NNA0M712225
[  593.537520] device_link_add 316: dev=2:0:2:0 supplier:2 consumer:0
[  593.543674] device_link_add 324
[  593.546801] device_link_add 352
[  593.549930] device_link_add 406
[  593.553058] device_link_add 440: dev=2:0:2:0 supplier:2 consumer:0
[  593.559208] device_link_add 444
[  593.562335] device_link_add 455
[  593.565517] scsi 2:0:2:0: Direct-Access     ATA      SAMSUNG MZ7LH960 404Q PQ: 0
ANSI: 5
[  620.057464]  phy-2:1: resume timeout
[  738.841445] INFO: task kworker/u256:0:8 blocked for more than 120 seconds.
[  738.848295]       Not tainted 5.8.0-rc1-76154-g0d52b59-dirty #744
[  738.854361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  738.862155] kworker/u256:0  D    0     8      2 0x00000028
[  738.867626] Workqueue: 0000:b4:02.0_event_q sas_port_event_worker
[  738.873693] Call trace:
[  738.876133]  __switch_to+0xf4/0x148
[  738.879613]  __schedule+0x270/0x5d8
[  738.883091]  schedule+0x78/0x110
[  738.886307]  schedule_timeout+0x1ac/0x280
[  738.890299]  wait_for_completion+0x94/0x138
[  738.894472]  flush_workqueue+0x114/0x438
[  738.898377]  sas_porte_bytes_dmaed+0x400/0x500
[  738.902801]  sas_port_event_worker+0x28/0x40
[  738.907053]  process_one_work+0x1e8/0x360
[  738.911046]  worker_thread+0x44/0x478
[  738.914698]  kthread+0x150/0x158
[  738.917915]  ret_from_fork+0x10/0x1c
[  738.921534] INFO: task kworker/u256:1:948 blocked for more than 120 seconds.
[  738.928550]       Not tainted 5.8.0-rc1-76154-g0d52b59-dirty #744
[  738.934614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  738.942408] kworker/u256:1  D    0   948      2 0x00000028
[  738.947873] Workqueue: 0000:b4:02.0_disco_q sas_discover_domain
[  738.953766] Call trace:
[  738.956203]  __switch_to+0xf4/0x148
[  738.959678]  __schedule+0x270/0x5d8
[  738.963152]  schedule+0x78/0x110
[  738.966368]  rpm_resume+0xcc/0x550
[  738.969757]  __pm_runtime_resume+0x3c/0x88
[  738.973836]  rpm_get_suppliers+0x50/0x148
[  738.977829]  __pm_runtime_set_status+0x124/0x2f0
[  738.982427]  scsi_sysfs_add_sdev+0x1a0/0x2a8
[  738.986679]  scsi_probe_and_add_lun+0x888/0xab0
[  738.991190]  __scsi_scan_target+0xec/0x520
[  738.995268]  scsi_scan_target+0x11c/0x128
[  738.999261]  sas_rphy_add+0x15c/0x1e8
[  739.002907]  sas_probe_devices+0xe4/0x150
[  739.006899]  sas_discover_domain+0x33c/0x588
[  739.011150]  process_one_work+0x1e8/0x360
[  739.015143]  worker_thread+0x44/0x478
[  739.018789]  kthread+0x150/0x158
[  739.022003]  ret_from_fork+0x10/0x1c
...

If an extra phy0 up happens during resume of the SAS controller, it will
emit a new libsas event (event PORTE_BYTES_DMAED and event
DISCE_DISCOVER_DOMAIN). We will call function scsi_sysfs_add_sdev() in
event DISCE_DISCOVER_DOMAIN, which will call __pm_runtime_set_status() to
resume supplier (host controller). For runtime PM core, if device is in the
resuming state, the later resume request of the device will wait for
previous resume request to complete synchronously. At that point in time
the state of the controller is still resuming as it waits for all libsas
events to be completed, while libsas event DISCE_DISCOVER_DOMAIN is blocked
as the state of the controller is resuming which causes a deadlock.

To avoid the issue, filter out new PHY up events while the controller is
suspended.

Link: https://lore.kernel.org/r/1601649038-25534-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-06 20:47:06 -04:00
John Garry 8d98416a55 scsi: hisi_sas: Switch v3 hw to MQ
Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.

Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06 08:33:44 -06:00
Luo Jiaxing 26f84f9bc3 scsi: hisi_sas: Code style cleanup
Remove extra blank lines and add spaces around operators.

Link: https://lore.kernel.org/r/1598958790-232272-9-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-02 22:49:08 -04:00
Xiang Chen b601577df6 scsi: hisi_sas: Add missing newlines
Newline is missing from some printk() statements. Add them.

Link: https://lore.kernel.org/r/1598958790-232272-8-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-02 22:49:08 -04:00
Luo Jiaxing 981cc23e74 scsi: hisi_sas: Add BIST support for fixed code pattern
Through the new debugfs interface the user can select fixed code
patterns. Add two new interfaces fixed_code and fixed_code1.

Link: https://lore.kernel.org/r/1598958790-232272-7-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-02 22:49:08 -04:00
Luo Jiaxing 2c4d582322 scsi: hisi_sas: Add BIST support for phy FFE
Add BIST support for phy FFE (Feed forward equalizer) setting. The user can
configure FFE through the new debugfs interface.

FFE is a parameter used for link layer control. It will affect the link
quality between the SAS controller and the backplane. In the BIST test, the
FFE interface is provided to assist board testers in optimizing link
parameters.

The modification of the FFE parameter will affect the test after BIST or
the normal running of the board. The user should save the initial FFE
values and restore them after BIST test is complete.

Link: https://lore.kernel.org/r/1598958790-232272-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-02 22:49:08 -04:00
Xiang Chen 847e835529 scsi: hisi_sas: Avoid accessing to SSP task for SMP I/Os
hisi_sas_slot_task_free() attempts to dereference SSP task for non-ATA
tasks. If the task is SMP, the code may reference the wrong structure
although this may not cause any problems.

To avoid this, only access to SSP task when slot->n_elem_dif is not 0 which
indicates this is an SSP task.

Link: https://lore.kernel.org/r/1598958790-232272-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-02 22:49:07 -04:00
Gustavo A. R. Silva df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Luo Jiaxing e16b9ed61e scsi: hisi_sas: Do not reset phy timer to wait for stray phy up
We found out that after phy up, the hardware reports another oob interrupt
but did not follow a phy up interrupt:

oob ready -> phy up -> DEV found -> oob read -> wait phy up -> timeout

We run link reset when wait phy up timeout, and it send a normal disk into
reset processing. So we made some circumvention action in the code, so that
this abnormal oob interrupt will not start the timer to wait for phy up.

Link: https://lore.kernel.org/r/1589552025-165012-2-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-19 21:23:57 -04:00
John Garry 11e673206f scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
In future we will want to use hisi_sas_cq.pci_irq_mask for non-pci
interrupt masks, so rename to be more general.

Link: https://lore.kernel.org/r/1579522957-4393-7-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 19:31:14 -05:00
Luo Jiaxing 3cd2f3c35d scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
The trigger_dump file is only used to manually trigger the dump, and did
not provide a read callback function for it, so its file permission
setting to 600 is wrong,and should be changed to 200.

Link: https://lore.kernel.org/r/1579522957-4393-5-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 19:31:14 -05:00
Xiang Chen e9dc5e11c9 scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
After changing tasklet to workqueue or threaded irq, some critical
resources are only used on threads (not in interrupt or bottom half of
interrupt), so replace spin_lock_irqsave/spin_unlock_restore with
spin_lock/spin_unlock to protect those critical resources.

Link: https://lore.kernel.org/r/1579522957-4393-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 19:31:13 -05:00
Xiang Chen 81f338e970 scsi: hisi_sas: use threaded irq to process CQ interrupts
Currently IRQ_EFFECTIVE_AFF_MASK is enabled for ARM_GIC and ARM_GIC3, so it
only allows a single target CPU in the affinity mask to process interrupts
and also interrupt thread, and the performance of using threaded irq is
almost the same as tasklet. But if the config is not enabled, the interrupt
thread will be allowed all the CPUs in the affinity mask. At that situation
it improves the performance (about 20%).

Note: IRQ_EFFECTIVE_AFF_MASK is configured differently for different
architecture chip, and it seems to be better to make it be configured
easily.

Link: https://lore.kernel.org/r/1579522957-4393-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 19:31:13 -05:00
John Garry 964231aa0c scsi: hisi_sas: Stop converting a bool into a bool
The !! operator on a bool is pointless, so remove an example in
hisi_sas_rescan_topology().

Link: https://lore.kernel.org/r/1573551059-107873-5-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-12 22:21:34 -05:00
Xiang Chen 8c39673d54 scsi: hisi_sas: Check sas_port before using it
Need to check the structure sas_port before using it.

Link: https://lore.kernel.org/r/1573551059-107873-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-12 22:21:33 -05:00
Luo Jiaxing f873b66119 scsi: hisi_sas: Record the phy down event in debugfs
The number of phy down reflects the quality of the link between SAS
controller and disk. In order to allow the user to confirm the link quality
of the system, we record the number of phy down for each phy.

The user can check the current phy down count by reading the debugfs file
corresponding to the specific phy, or clear the phy down count by writing 0
to the debugfs file.

Link: https://lore.kernel.org/r/1571926105-74636-19-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:15 -04:00
Luo Jiaxing cabe7c10c9 scsi: hisi_sas: Delete the debugfs folder of hisi_sas when the probe fails
Although if the debugfs initialization fails, we will delete the debugfs
folder of hisi_sas, but we did not consider the scenario where debugfs was
successfully initialized, but the probe failed for other reasons. We found
out that hisi_sas folder is still remain after the probe failed.

When probe fail, we should delete debugfs folder to avoid the above issue.

Link: https://lore.kernel.org/r/1571926105-74636-18-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:15 -04:00
Luo Jiaxing 8f6432986e scsi: hisi_sas: Add ability to have multiple debugfs dumps
We use the module parameter debugfs_dump_count to manage the upper limit of
the memory block for multiple dumps.

Link: https://lore.kernel.org/r/1571926105-74636-17-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:15 -04:00
Luo Jiaxing 905ab01faf scsi: hisi_sas: Add module parameter for debugfs dump count
We still only use dump index #0 however.

Link: https://lore.kernel.org/r/1571926105-74636-16-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:15 -04:00
Luo Jiaxing a70e33eae3 scsi: hisi_sas: Allocate memory for multiple dumps of debugfs
We add multiple dumps for debugfs, but only allocate memory this time and
only dump #0.

Link: https://lore.kernel.org/r/1571926105-74636-15-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:15 -04:00
Luo Jiaxing 357e4fc7a9 scsi: hisi_sas: Add debugfs file structure for ITCT cache
Create a file structure which was used to save the memory address for
ITCT cache at debugfs. This structure is bound to the corresponding debugfs
file, it can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-14-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing b714dd8f36 scsi: hisi_sas: Add debugfs file structure for IOST cache
Create a file structure which was used to save the memory address for IOST
cache at debugfs. This structure is bound to the corresponding debugfs
file, it can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-13-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing 0161d55f23 scsi: hisi_sas: Add debugfs file structure for ITCT
Create a file structure which was used to save the memory address for ITCT
at debugfs. This structure is bound to the corresponding debugfs file, it
can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-12-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing e15f2e2dff scsi: hisi_sas: Add debugfs file structure for IOST
Create a file structure which was used to save the memory address for IOST
at debugfs. This structure is bound to the corresponding debugfs file, it
can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-11-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing 1f66e1fd26 scsi: hisi_sas: Add debugfs file structure for port
Create a file structure which was used to save the memory address and phy
pointer for port at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.

Link: https://lore.kernel.org/r/1571926105-74636-10-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing c611639810 scsi: hisi_sas: Add debugfs file structure for registers
Create a file structure which was used to save the memory address and
hisi_hba pointer for REGS at debugfs. This structure is bound to the
corresponding debugfs file, it can help callback function of debugfs file
to get what it need.

Link: https://lore.kernel.org/r/1571926105-74636-9-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing 1b54c4db72 scsi: hisi_sas: Add debugfs file structure for DQ
Create a file structure which was used to save the memory address and DQ
pointer for DQ at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.

Link: https://lore.kernel.org/r/1571926105-74636-8-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:14 -04:00
Luo Jiaxing 35ea630b2b scsi: hisi_sas: Add debugfs file structure for CQ
Create a file structure which was used to save the memory address and CQ
pointer for CQ at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.

Link: https://lore.kernel.org/r/1571926105-74636-7-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:13 -04:00
Luo Jiaxing d28ed83b76 scsi: hisi_sas: Add timestamp for a debugfs dump
It's useful to know when the dump occurred, so add a timestamp file for
this.

Link: https://lore.kernel.org/r/1571926105-74636-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:13 -04:00
Xiang Chen 550c0d89d5 scsi: hisi_sas: Replace in_softirq() check in hisi_sas_task_exec()
For IOs from upper layer, preemption may be disabled as it may be called by
function __blk_mq_delay_run_hw_queue which will call get_cpu() (it disables
preemption). So if flags HISI_SAS_REJECT_CMD_BIT is set in function
hisi_sas_task_exec(), it may disable preempt twice after down() and up()
which will cause following call trace:

BUG: scheduling while atomic: fio/60373/0x00000002
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x24/0x30
dump_stack+0xa0/0xc4
__schedule_bug+0x68/0x88
__schedule+0x4b8/0x548
schedule+0x40/0xd0
schedule_timeout+0x200/0x378
__down+0x78/0xc8
down+0x54/0x70
hisi_sas_task_exec.isra.10+0x598/0x8d8 [hisi_sas_main]
hisi_sas_queue_command+0x28/0x38 [hisi_sas_main]
sas_queuecommand+0x168/0x1b0 [libsas]
scsi_queue_rq+0x2ac/0x980
blk_mq_dispatch_rq_list+0xb0/0x550
blk_mq_do_dispatch_sched+0x6c/0x110
blk_mq_sched_dispatch_requests+0x114/0x1d8
__blk_mq_run_hw_queue+0xb8/0x130
__blk_mq_delay_run_hw_queue+0x1c0/0x220
blk_mq_run_hw_queue+0xb0/0x128
blk_mq_sched_insert_requests+0xdc/0x208
blk_mq_flush_plug_list+0x1b4/0x3a0
blk_flush_plug_list+0xdc/0x110
blk_finish_plug+0x3c/0x50
blkdev_direct_IO+0x404/0x550
generic_file_read_iter+0x9c/0x848
blkdev_read_iter+0x50/0x78
aio_read+0xc8/0x170
io_submit_one+0x1fc/0x8d8
__arm64_sys_io_submit+0xdc/0x280
el0_svc_common.constprop.0+0xe0/0x1e0
el0_svc_handler+0x34/0x90
el0_svc+0x10/0x14
...

To solve the issue, check preemptible() to avoid disabling preempt multiple
when flag HISI_SAS_REJECT_CMD_BIT is set.

Link: https://lore.kernel.org/r/1571926105-74636-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:13 -04:00
Xiang Chen 8fa9a7bd30 scsi: hisi_sas: use wait_for_completion_timeout() when clearing ITCT
When injecting 2bit ecc errors, it will cause confusion inside SAS
controller which needs host reset to recover it. If a device is gone at the
same times inject 2bit ecc errors, we may not receive the ITCT interrupt so
it will wait for completion in clear_itct_v3_hw() all the time. And host
reset will also not occur because it can't require hisi_hba->sem, so the
system will be suspended.

To solve the issue, use wait_for_completion_timeout() instead of
wait_for_completion(), and also don't mark the gone device as
SAS_PHY_UNUSED when device gone.

Link: https://lore.kernel.org/r/1571926105-74636-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:13 -04:00
Xiang Chen 35160421b6 scsi: hisi_sas: Don't create debugfs dump folder twice
Due to a merge error, we attempt to create 2x debugfs dump folders, which
fails:
[  861.101914] debugfs: Directory 'dump' with parent '0000:74:02.0'
already present!

This breaks the dump function.

To fix, remove the superfluous attempt to create the folder.

Fixes: 7ec7082c57 ("scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation")
Link: https://lore.kernel.org/r/1571926105-74636-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:31:13 -04:00
Martin K. Petersen a3a8d13f62 Merge branch '5.4/scsi-fixes' into 5.5/scsi-queue
The qla2xxx driver updates for 5.5 depend on the fixes queued for
5.4.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-09 21:54:04 -04:00