Invalidation check of arguments should have been checked before
ufshcd_hold(). This can help to prevent ufshcd_hold()/ ufshcd_release()
from being invoked unnecessarily.
[mkp: removed unused out: labels]
Link: https://lore.kernel.org/r/1606973132-5937-1-git-send-email-user@jang-Samsung-DeskTop-System
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: jintae jang <jt77.jang@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dump registers and states prior to leaving IRQ handler when an AH8 error
occurs.
Link: https://lore.kernel.org/r/1606910644-21185-4-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In current task abort routine, if task abort happens to the device W-LUN,
the code directly jumps to ufshcd_eh_host_reset_handler() to perform a full
reset and restore then returns FAIL or SUCCESS. Commands sent to the device
W-LUN are most likely the SSU cmds sent during UFS PM operations. If such
SSU cmd enters task abort routine when ufshcd_eh_host_reset_handler()
flushes eh_work, it will get stuck there since err_handler is serialized
with PM operations.
In order to unblock above call path, we merely clean up the lrb taken by
this cmd, queue the eh_work and return SUCCESS. Once the cmd is aborted,
the PM operation which sends out the cmd just errors out, then err_handler
shall be able to proceed with the full reset and restore.
In this scenario, the cmd is aborted even before it is actually cleared by
HW, set the lrb->in_use flag to prevent subsequent cmds, including SCSI
cmds and dev cmds, from taking the lrb released from abort. The flag shall
evetually be cleared in __ufshcd_transfer_req_compl() invoked by the full
reset and restore from err_handler.
[mkp: conflict with event logging series]
Link: https://lore.kernel.org/r/1606910644-21185-3-git-send-email-cang@codeaurora.org
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Serialize eh_work with system PM events and async scan to make sure eh_work
does not run in parallel with them.
Link: https://lore.kernel.org/r/1606910644-21185-2-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since setup_regulators variant function is not used by any vendors, simply
remove it.
Link: https://lore.kernel.org/r/20201205120041.26869-2-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce event_notify variant function to allow vendor to get notification
of important events and connect to any proprietary debugging facilities.
Link: https://lore.kernel.org/r/20201205115901.26815-4-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The UFS error history does not only have "history of errors" but also a
log of some other events which are not defined as errors.
This patch fixes the confused naming of related functions and changes the
approach for updating and printing history in preparation of next patch.
This patch does not change any functionality.
Link: https://lore.kernel.org/r/20201205115901.26815-3-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add error history for abort event in UFS Device W-LUN.
Use specified value as parameter of ufshcd_update_reg_hist() to identify
the aborted tag or LUNs.
Link: https://lore.kernel.org/r/20201205115901.26815-2-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of hardcoding the scale down gear, make it a member of
the ufs_clk_scaling struct.
Link: https://lore.kernel.org/r/1606442334-22641-1-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the param skip_ref_clk from __ufshcd_setup_clocks(), but keep a flag
in struct ufs_clk_info to tell whether a clock can be disabled or not while
the link is active.
Link: https://lore.kernel.org/r/1606356063-38380-2-git-send-email-cang@codeaurora.org
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the case that auto_bkops_enable is false, which means auto bkops has
been disabled, there is no need to call ufshcd_disable_auto_bkops().
Link: https://lore.kernel.org/r/20201125185300.3394-1-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The introduction of ufshcd_dme_configure_adapt() refactored out duplication
from the Mediatek and Qualcomm drivers.
Both these implementations had the logic of:
gear_tx == UFS_HS_G4 => PA_INITIAL_ADAPT
gear_tx != UFS_HS_G4 => PA_NO_ADAPT
but now both implementations pass PA_INITIAL_ADAPT as "adapt_val" and if
gear_tx is not UFS_HS_G4 that is replaced with PA_INITIAL_ADAPT. In other
words, it's PA_INITIAL_ADAPT in both above cases.
The result is that e.g. Qualcomm SM8150 has no longer functional UFS, so
adjust the logic to match the previous implementation.
Link: https://lore.kernel.org/r/20201121044810.507288-1-bjorn.andersson@linaro.org
Fixes: fc85a74e28 ("scsi: ufs: Refactor ADAPT configuration function")
Reviewed-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have LBA and length for unmap commands.
Link: https://lore.kernel.org/r/20201117165839.1643377-8-jaegeuk@kernel.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Leo Liou <leoliou@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following call stack prevents clk_gating at every I/O completion. We
can remove the condition, ufshcd_any_tag_in_use(), since clkgating_work
will check it again.
ufshcd_complete_requests(struct ufs_hba *hba)
ufshcd_transfer_req_compl()
__ufshcd_transfer_req_compl()
__ufshcd_release(hba)
if (ufshcd_any_tag_in_use() == 1)
return;
ufshcd_tmc_handler(hba);
blk_mq_tagset_busy_iter();
Note that this still requires work to deal with a potential race condition
when user sets clkgating.delay_ms to very small value. That can cause
preventing clkgating by the check of ufshcd_any_tag_in_use() in gate_work.
Link: https://lore.kernel.org/r/20201117165839.1643377-7-jaegeuk@kernel.org
Fixes: 7252a36030 ("scsi: ufs: Avoid busy-waiting by eliminating tag conflicts")
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This adds user-friendly tracepoints with group id.
Link: https://lore.kernel.org/r/20201117165839.1643377-6-jaegeuk@kernel.org
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Must have WQ_MEM_RECLAIM
``WQ_MEM_RECLAIM``
All workqueues which might be used in the memory reclaim paths **MUST**
have this flag set. The wq is guaranteed to have at least one execution
context regardless of memory pressure.
Link: https://lore.kernel.org/r/20201117165839.1643377-5-jaegeuk@kernel.org
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to conduct FFU or RPMB operations, UFS needs to clear UNIT
ATTENTION condition. Clear it explicitly so that we get no failures during
initialization.
Link: https://lore.kernel.org/r/20201117165839.1643377-4-jaegeuk@kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While running a stress test which enables/disables clkgating, we
occasionally hit device timeout. This patch avoids a subtle race condition
to address it.
Link: https://lore.kernel.org/r/20201117165839.1643377-3-jaegeuk@kernel.org
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Once UFS is gated with CLKS_OFF, it should not call REQ_CLKS_OFF
again. This can lead to hibern8_enter failure.
Link: https://lore.kernel.org/r/20201117165839.1643377-2-jaegeuk@kernel.org
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several vendors are using same code to configure ADAPT settings for
HS-G4. Simply refactor it as common function.
Link: https://lore.kernel.org/r/20201116065054.7658-8-stanley.chu@mediatek.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Once HBA enabling has failed, add retry mechanism and allow vendors to
apply specific tweaks before the next retry. For example, vendors can do
vendor-specific host reset flow in variant function
"ufshcd_vops_hce_enable_notify()".
Link: https://lore.kernel.org/r/20201112054537.22494-1-stanley.chu@mediatek.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/ufs/ufshcd.c:6603: warning: Function parameter or member 'hba' not described in 'ufshcd_try_to_abort_task'
drivers/scsi/ufs/ufshcd.c:6603: warning: Function parameter or member 'tag' not described in 'ufshcd_try_to_abort_task'
drivers/scsi/ufs/ufshcd.c:6603: warning: Excess function parameter 'cmd' description in 'ufshcd_try_to_abort_task'
Link: https://lore.kernel.org/r/20201102142359.561122-12-lee.jones@linaro.org
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Santosh Yaraganavi <santosh.sy@samsung.com>
Cc: Vinayak Holikatti <h.vinayak@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
DeepSleep is a UFS v3.1 feature that achieves the lowest power consumption
of the device, apart from power off.
In DeepSleep mode, no commands are accepted, and the only way to exit is
using a hardware reset or power cycle.
This patch assumes that if a power cycle was an option, then power off
would be preferable, so only exit via a hardware reset is supported.
Drivers that wish to support DeepSleep need to set a new capability flag
UFSHCD_CAP_DEEPSLEEP and provide a hardware reset via the existing
->device_reset() callback.
It is assumed that UFS devices with wspecversion >= 0x310 support
DeepSleep.
[mkp: dropped sysfs ABI doc due to conflicts]
Link: https://lore.kernel.org/r/20201103141403.2142-2-adrian.hunter@intel.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During clock gating, after clocks are disabled, put HBA into LPM to save
more power.
Link: https://lore.kernel.org/r/52198e70bff750632740d78678a815256d697e43.1603825776.git.asutoshd@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, qla2xxx, tcmu,
ibmvfc, lpfc, smartpqi, hisi_sas, qedi, qedf, mpt3sas) and minor bug
fixes. There are only three core changes: adding sense codes,
cleaning up noretry and adding an option for limitless retries.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX4YulyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaZDAQCT7rwG
UEZYHgYkU9EX9ERVBQM0SW4mLrxf3g3P5ioJsAEAtkclCM4QsIOP+MIPjIa0EyUY
khu0kcrmeFR2YwA8zhw=
=4w4S
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.
There are only three core changes: adding sense codes, cleaning up
noretry and adding an option for limitless retries"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
scsi: hisi_sas: Recover PHY state according to the status before reset
scsi: hisi_sas: Filter out new PHY up events during suspend
scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
scsi: hisi_sas: Add check for methods _PS0 and _PR0
scsi: hisi_sas: Add controller runtime PM support for v3 hw
scsi: hisi_sas: Switch to new framework to support suspend and resume
scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
scsi: qedf: Remove redundant assignment to variable 'rc'
scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
scsi: sun_esp: Use module_platform_driver to simplify the code
scsi: sun3x_esp: Use module_platform_driver to simplify the code
scsi: sni_53c710: Use module_platform_driver to simplify the code
scsi: qlogicpti: Use module_platform_driver to simplify the code
scsi: mac_esp: Use module_platform_driver to simplify the code
scsi: jazz_esp: Use module_platform_driver to simplify the code
scsi: mvumi: Fix error return in mvumi_io_attach()
scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
...
Boot occasionally fails with some Samsung low-power UFS devices. The reason
is that these devices have a little bit higher latency for NOP OUT
responses. This causes boot to fail because the NOP OUT command is issued
during initialization to check whether the device transport protocol is
ready or not. Increase NOP_OUT_TIMEOUT value from 30 to 50ms.
Link: https://lore.kernel.org/r/231786897.01599016081767.JavaMail.epsvc@epcpadp2
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Setting the Auto-Hibernate Timer to zero is a valid setting which indicates
the Auto-Hibernate feature being disabled. Correctly support this setting.
In addition, when the timer value is queried from sysfs, read from the host
controller's register and return that value instead of using the RAM value.
Link: https://lore.kernel.org/r/b141cfcd7998b8933635828b56fbb64f8ad4d175.1598661071.git.nguyenb@codeaurora.org
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PA Layer issues a LINERESET to the PHY at the recovery step in the Power
Mode change operation. If it happens during auto or manual hibern8 enter,
even if hibern8 enter succeeds, UFS power mode shall be set to PWM-G1 mode
and kept in that mode after exit from hibern8, leading to bad performance.
Handle the LINERESET in the eh_work by restoring power mode to HS mode
after all pending reqs and tasks are cleared from doorbell.
Link: https://lore.kernel.org/r/1598321228-21093-3-git-send-email-cang@codeaurora.org
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To recover non-fatal errors, no full reset is required, err_handler only
clears those pending TRs/TMRs so that SCSI layer can re-issue them. In
current err_handler, TRs are directly cleared from UFS host's doorbell but
not aborted from device side. However, according to the UFSHCI JEDEC spec,
the host software shall use UTP Transfer Request List Clear Register to
clear a task from UFS host's doorbell only when a UTP Transfer Request is
expected to not be completed, e.g. when the host software receives a
“FUNCTION COMPLETE” Task Management response which means a Transfer Request
was aborted. To follow the UFSHCI JEDEC spec, in err_handler, abort one TR
before clearing it from doorbell.
Link: https://lore.kernel.org/r/1598321228-21093-2-git-send-email-cang@codeaurora.org
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix ufshcd_print_trs() to consider UFSHCD_QUIRK_PRDT_BYTE_GRAN when using
utp_transfer_req_desc::prd_table_length, so that it doesn't treat the
number of bytes as the number of entries.
Originally from Kiwoong Kim
(https://lkml.kernel.org/r/20200218233115.8185-1-kwmad.kim@samsung.com).
Link: https://lore.kernel.org/r/20200826021040.152148-1-ebiggers@kernel.org
Fixes: 26f968d7de ("scsi: ufs: Introduce UFSHCD_QUIRK_PRDT_BYTE_GRAN quirk")
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have two knobs to control flush for write booster,
fWriteBoosterBufferFlushDuringHibernate and fWriteBoosterBufferFlushEn.
Some vendors use only fWriteBoosterBufferFlushDuringHibernate because this
can reportedly cover most scenarios. Also, there have been some reports
that flush by fWriteBoosterBufferFlushEn could lead to increased power
consumption thanks to unexpected internal operations. Consequently, we need
a way to enable or disable fWriteBoosterEn operations. Add quirk to bypass
manual flush.
Link: https://lore.kernel.org/r/ffdb0eda30515809f0ad9ee936b26917ee9b4593.1598319701.git.kwmad.kim@samsung.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 5586dd8ea2 ("scsi: ufs: Fix a race condition between error handler
and runtime PM ops") moves the ufshcd_scsi_block_requests() inside
err_handler() but forgets to remove the ufshcd_scsi_unblock_requests() in
the early return path. Correct the mistake.
Link: https://lore.kernel.org/r/1597798958-24322-1-git-send-email-cang@codeaurora.org
Fixes: 5586dd8ea2 ("scsi: ufs: Fix a race condition between error handler and runtime PM ops")
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Hongwu Su<hongwus@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, the UFS driver busy waits for fDeviceInit to be cleared. Provide
an upper bound and sleep between attempts instead of busy waiting.
Link: https://lore.kernel.org/r/1597053747-75171-1-git-send-email-kwmad.kim@samsung.com
Tested-by: Kiwoong Kim <kwmad.kim@samsung.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20200814095034.20709-3-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ufshcd_comp_devman_upiu() was poorly named leading people to think it was a
completion function. Rename it to ufshcd_compose_devman_upiu().
Link: https://lore.kernel.org/r/20200814095034.20709-2-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In current UFS task abort hook, namely ufshcd_abort(), if one task is
aborted successfully, clk_gating.active_reqs held by this task is not
decreased, which makes clk_gating.active_reqs stay above zero forever, thus
clock gating would never happen. Instead of releasing resources of one task
"manually", use the existing func __ufshcd_transfer_req_compl(). This
change also eliminates a possible race of scsi_dma_unmap() from the real
completion in IRQ handler path.
Link: https://lore.kernel.org/r/1596975355-39813-10-git-send-email-cang@codeaurora.org
Fixes: 1ab27c9cf8 ("ufs: Add support for clock gating")
CC: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the bit corresponding to a task in the Doorbell register has been
cleared, no need to poll the status of the task on the device side and to
send an Abort Task TM. Instead, let it directly goto cleanup.
In addition, to keep original debug output, move the goto below the debug
print.
Link: https://lore.kernel.org/r/20200811141859.27399-3-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If somehow no interrupt notification is raised for a completed request and
its doorbell bit is cleared by host, UFS driver needs to cleanup its
outstanding bit in ufshcd_abort(). Otherwise, system may behave abnormally
in the following scenario:
After ufshcd_abort() returns, this request will be requeued by SCSI layer
with its outstanding bit set. Any future completed request will trigger
ufshcd_transfer_req_compl() to handle all "completed outstanding bits". At
this time the "abnormal outstanding bit" will be detected and the "requeued
request" will be chosen to execute request post-processing flow. This is
wrong because this request is still "alive".
Link: https://lore.kernel.org/r/20200811141859.27399-2-huobean@gmail.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For shared interrupts, the interrupt status might be zero, so check that
first.
Link: https://lore.kernel.org/r/20200811133936.19171-2-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The interrupt might be shared, in which case it is not an error for the
interrupt handler to be called when the interrupt status is zero, so don't
print the message unless there was enabled interrupt status.
Link: https://lore.kernel.org/r/20200811133936.19171-1-adrian.hunter@intel.com
Fixes: 9333d77573 ("scsi: ufs: Fix irq return code")
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In ufshcd_suspend(), after clk-gating is suspended and link is set
as Hibern8 state, ufshcd_hold() is still possibly invoked before
ufshcd_suspend() returns. For example, MediaTek's suspend vops may
issue UIC commands which would call ufshcd_hold() during the command
issuing flow.
Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled,
then ufshcd_hold() may enter infinite loops because there is no
clk-ungating work scheduled or pending. In this case, ufshcd_hold()
shall just bypass, and keep the link as Hibern8 state.
Link: https://lore.kernel.org/r/20200809050734.18740-1-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Co-developed-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current IRQ handler blocks SCSI requests before scheduling eh_work,
when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume
sends a SCSI cmd, most likely the SSU cmd, since SCSI requests are blocked,
pm_runtime_get_sync() will never return because ufshcd_suspend/resume is
blocked by the SCSI cmd.
- In queuecommand path, hba->ufshcd_state check and ufshcd_send_command
should stay under the same spin lock. This is to make sure that no more
commands leak into doorbell after hba->ufshcd_state is changed.
- Don't block SCSI requests before error handler starts to run, let error
handler block SCSI requests when it is ready to start error recovery.
- Don't let SCSI layer keep requeuing the SCSI cmds sent from HBA runtime
PM ops, let them pass or fail them. Let them pass if eh_work is
scheduled due to non-fatal errors. Fail them if eh_work is scheduled due
to fatal errors, otherwise the cmds may eventually time out since UFS is
in bad state, which gets error handler blocked for too long. If we fail
the SCSI cmds sent from HBA runtime PM ops, HBA runtime PM ops fails
too, but it does not hurt since error handler can recover HBA runtime PM
error.
Link: https://lore.kernel.org/r/1596975355-39813-9-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Performing dumps in the IRQ handler causes system stability issues. Move
dumps to the error handler and only print basic host registers here.
Link: https://lore.kernel.org/r/1596975355-39813-8-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current error handler can not recover HBA runtime PM error if
ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit
error or SSU cmd error. When this happens, error handler may fail
performing a full reset and restore because error handler always assumes
that power, IRQs and clocks are ready after pm_runtime_get_sync returns,
but actually they are not if ufshcd_resume fails[1].
If ufschd_suspend/resume fails due to UFS errors, runtime PM framework
saves the error value to dev.power.runtime_error. After that, HBA dev
runtime suspend/resume would not be invoked anymore unless runtime_error is
cleared[2].
In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1],
error handler cannot assume anything of pm_runtime_get_sync, meaning error
handler should explicitly turn ON powers, IRQs and clocks again. To get the
HBA runtime PM work as regard for scenario [2], error handler can clear the
runtime_error by calling pm_runtime_set_active() if full reset and restore
succeeds. And, more important, if pm_runtime_set_active() returns no error,
which means runtime_error has been cleared, we also need to resume those
scsi devices under HBA in case any of them has failed to be resumed due to
HBA runtime resume failure. This is to unblock blk_queue_enter in case
there are bios waiting inside it.
Link: https://lore.kernel.org/r/1596975355-39813-7-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Error recovery can be invoked from multiple code paths, including hibern8
enter/exit (from ufshcd_link_recovery), ufshcd_eh_host_reset_handler() and
eh_work scheduled from IRQ context. Ultimately, these paths are all trying
to invoke ufshcd_reset_and_restore() in either a synchronous or
asynchronous manner. This causes problems:
- If link recovery happens during ungate work, ufshcd_hold() would be
called recursively. Although commit 53c12d0ef6 ("scsi: ufs: fix error
recovery after the hibern8 exit failure") fixed a deadlock due to
recursive calls of ufshcd_hold() by adding a check of eh_in_progress
into ufshcd_hold, this check allows eh_work to run in parallel while
link recovery is running.
- Similar concurrency can also happen when error recovery is invoked from
ufshcd_eh_host_reset_handler and ufshcd_link_recovery.
- Concurrency can even happen between eh_works. eh_work, currently queued
on system_wq, is allowed to have multiple instances running in parallel,
but we don't have proper protection for that.
If any of above concurrency scenarios happen, error recovery would fail and
lead ufs device and host into bad states. To fix the concurrency problem,
this change queues eh_work on a single threaded workqueue and removes link
recovery calls from the hibern8 enter/exit path. In addition, make use of
eh_work in eh_host_reset_handler instead of calling
ufshcd_reset_and_restore. This unifies the UFS error recovery mechanism.
According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs when
the link is broken. This essentially applies to any power mode change
operations (since they all use PACP_PWR cmds in UniPro layer). So, if a
power mode change operation (including AH8 enter/exit) fails, mark link
state as UIC_LINK_BROKEN_STATE and schedule the eh_work. In this case,
error handler needs to do a full reset and restore to recover the link back
to active. Before the link state is recovered to active,
ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors.
Link: https://lore.kernel.org/r/1596975355-39813-6-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Information about the last interrupt status and timestamp is helpful when
debugging system stability issues (IRQ starvation, for instance). Add this
information to ufshcd_print_host_state() output.
In addition, UFS device information such as model name and firmware version
also comes in handy during debugging. This is printed as well.
Link: https://lore.kernel.org/r/1596975355-39813-5-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clock gating can be turned on/off selectively which means the associated
state information is only correct if the feature is enabled. This change
makes sure that we only look at state of clk-gating if it is enabled.
Link: https://lore.kernel.org/r/1596975355-39813-2-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS devices require delay after VCC power rail is turned off.
Introduce a device quirk "DELAY_AFTER_LPM" to add 5ms delay after VCC
power-off during suspend flow.
Link: https://lore.kernel.org/r/20200729051840.31318-2-stanley.chu@mediatek.com
Reviewed-by: Andy Teng <andy.teng@mediatek.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>