OpenCloudOS-Kernel/drivers/infiniband/hw
Mike Marciniszyn 38fd98afee IB/hfi1: Add atomic triggered sleep/wakeup
When running iperf in a two host configuration the following trace can
occur:

[  319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out

The issue happens because the current implementation relies on the netif
txq being stopped to control the flushing of the tx list.

There are two resources that the transmit logic can wait on and stop the
txq:
- SDMA descriptors
- Ring space to hold completions

The ring space is tested on the sending side and relieved when the ring is
consumed in the napi tx reaping.

Unfortunately, that reaping can run conncurrently with the workqueue
flushing of the txlist.  If the txq is started just before the workitem
executes, the txlist will never be flushed, leading to the txq being
stuck.

Fix by:
- Adding sleep/wakeup wrappers
  * Use an atomic to control the call to the netif routines inside the
    wrappers

- Use another atomic to record ring space exhaustion
  * Only wakeup when the a ring space exhaustion has happened and it
    relieved

Add additional wrappers to clarify the ring space resource handling.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:13:38 -03:00
..
bnxt_re treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
cxgb4 treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
efa RDMA/efa: Set maximum pkeys device attribute 2020-06-18 09:41:07 -03:00
hfi1 IB/hfi1: Add atomic triggered sleep/wakeup 2020-06-24 16:13:38 -03:00
hns RDMA/hns: Fix an cmd queue issue when resetting 2020-06-18 10:48:39 -03:00
i40iw treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
mlx4 treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
mlx5 RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata 2020-06-22 14:40:53 -03:00
mthca treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
ocrdma treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
qedr RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532 2020-06-18 09:44:45 -03:00
qib treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
usnic treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
vmw_pvrdma treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernel 2019-10-04 15:08:59 -03:00