OpenCloudOS-Kernel/drivers/infiniband/hw/mlx5
Parav Pandit fbdd0049d9 RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:18:19 -03:00
..
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile RDMA/mlx5: Separate flow steering logic from main.c 2020-07-07 14:05:51 -03:00
ah.c RDMA: Restore ability to fail on AH destroy 2020-09-09 13:57:22 -03:00
cmd.c RDMA: Convert RWQ table logic to ib_core allocation scheme 2020-09-17 14:04:33 -03:00
cmd.h RDMA: Convert RWQ table logic to ib_core allocation scheme 2020-09-17 14:04:33 -03:00
cong.c RDMA/mlx5: Update mlx5_ib to use new cmd interface 2020-05-06 17:42:45 -03:00
counters.c RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters() 2020-09-25 09:17:42 -03:00
counters.h RDMA/mlx5: Separate counters from main.c 2020-07-07 14:05:51 -03:00
cq.c RDMA/core: Modify enum ib_gid_type and enum rdma_network_type 2020-10-01 21:20:11 -03:00
devx.c RDMA 5.9 merge window pull request 2020-08-06 16:43:36 -07:00
devx.h RDMA/mlx5: Cleanup DEVX initialization flow 2020-07-07 14:05:51 -03:00
doorbell.c IB: Allow calls to ib_umem_get from kernel ULPs 2020-01-16 16:14:28 +02:00
fs.c RDMA/mlx5: Enable sniffer when device is in switchdev mode 2020-08-18 15:03:32 -03:00
fs.h RDMA/mlx5: Separate flow steering logic from main.c 2020-07-07 14:05:51 -03:00
gsi.c RDMA/mlx5: Delete not needed GSI QP signal QP type 2020-09-29 13:09:49 -03:00
ib_rep.c IB/mlx5: Rename profile and init methods 2019-11-11 12:15:29 -08:00
ib_rep.h RDMA/mlx5: Assign profile before calling stages 2020-05-06 17:52:01 -03:00
ib_virt.c net/mlx5: Update vport.c to new cmd interface 2020-04-23 21:42:02 +03:00
mad.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
main.c RDMA/mlx5: Fix devlink deadlock on net namespace deletion 2020-10-26 19:18:19 -03:00
mem.c RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks() 2020-09-11 10:24:53 -03:00
mlx5_ib.h RDMA/mlx5: Sync device with CPU pages upon ODP MR registration 2020-10-01 16:44:44 -03:00
mr.c RDMA/mlx5: Sync device with CPU pages upon ODP MR registration 2020-10-01 16:44:44 -03:00
odp.c RDMA/mlx5: Sync device with CPU pages upon ODP MR registration 2020-10-01 16:44:44 -03:00
qos.c RDMA/core: Allow the ioctl layer to abort a fully created uobject 2020-05-21 20:10:46 -03:00
qp.c RDMA/drivers: Remove udata check from special QP 2020-09-29 13:11:06 -03:00
qp.h RDMA: Restore ability to return error for destroy WQ 2020-09-09 14:14:29 -03:00
qpc.c RDMA: Restore ability to return error for destroy WQ 2020-09-09 14:14:29 -03:00
restrack.c RDMA/mlx5: Separate restrack callbacks initialization from main.c 2020-07-07 14:05:51 -03:00
restrack.h RDMA/mlx5: Separate restrack callbacks initialization from main.c 2020-07-07 14:05:51 -03:00
srq.c RDMA: Restore ability to fail on SRQ destroy 2020-09-09 14:14:24 -03:00
srq.h RDMA: Restore ability to fail on SRQ destroy 2020-09-09 14:14:24 -03:00
srq_cmd.c RDMA: Restore ability to fail on SRQ destroy 2020-09-09 14:14:24 -03:00
std_types.c RDMA/mlx5: Introduce UAPI to query PD attributes 2020-07-06 19:50:34 -03:00
wr.c RDMA/mlx5: Clarify what the UMR is for when creating MRs 2020-09-18 13:02:43 -03:00
wr.h RDMA/mlx5: Move all WR logic from qp.c to separate file 2020-05-06 17:42:45 -03:00