Commit Graph

136 Commits

Author SHA1 Message Date
Linus Torvalds e4f1b8202f virtio: features
a small pull request this time around, mostly because the
 vduse network got postponed to next relase so we can be sure
 we got the security store right.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmT1BMAPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpYJUH+QHNhfn0JC/yE1IySwDwpmdgr73aaGik1LgV
 ObHi48ucRMtxB+QpXLjPWAlQhVVzZv1wBK+Up9QxW8e9USJrSeI/MWfoHtXOFnGe
 1JdmNr+XQM/uDngZ+mjI4ZUwRkA61iOcTR7gEDdfBUOr+Yl6R7Na/+kKtTDiDMfy
 O8bOCLYVyJNiny2eSMmXH0mb4oPplkne4PzW4i/+ssKNoHlBmUIcx0jqj/qUVpSR
 ozr0SpyhlXKSEQGAtNxwR4PONeMDOOdkRBhxHW5N5QgnP9P7HQ57Ar39Vz7+Kc0i
 6vO2g1gpYV1naQr9BCg8hIF9r68rjgi4IOSghmfpWWUL0yNURtU=
 =z/Df
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "A small pull request this time around, mostly because the vduse
  network got postponed to next relase so we can be sure we got the
  security store right"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
  virtio_vdpa: build affinity masks conditionally
  virtio_net: merge dma operations when filling mergeable buffers
  virtio_ring: introduce dma sync api for virtqueue
  virtio_ring: introduce dma map api for virtqueue
  virtio_ring: introduce virtqueue_reset()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: correct the expression of the description of virtqueue_resize()
  virtio_ring: skip unmap for premapped
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: support add premapped buf
  virtio_ring: introduce virtqueue_set_dma_premapped()
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: check use_dma_api before unmap desc for indirect
  vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
  vdpa: add get_backend_features vdpa operation
  vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
  vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
  vdpa/mlx5: Remove unused function declarations
2023-09-04 10:43:44 -07:00
Yue Haibing c1081002bf vdpa/mlx5: Remove unused function declarations
Commit 29064bfdab ("vdpa/mlx5: Add support library for mlx5 VDPA implementation")
declared but never implemented these.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Message-Id: <20230803143041.23388-1-yuehaibing@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Jakub Kicinski 7ff57803d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 12:44:56 -07:00
Dragos Tatulea 810b0cc1c2 vdpa/mlx5: Fix crash on shutdown for when no ndev exists
The ndev was accessed on shutdown without a check if it actually exists.
This triggered the crash pasted below.

Instead of doing the ndev check, delete the shutdown handler altogether.
The irqs will be released at the parent VF level (mlx5_core).

 BUG: kernel NULL pointer dereference, address: 0000000000000300
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
 RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
 FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? __die+0x20/0x60
  ? page_fault_oops+0x14c/0x3c0
  ? exc_page_fault+0x75/0x140
  ? asm_exc_page_fault+0x22/0x30
  ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
  device_shutdown+0x13e/0x1e0
  kernel_restart+0x36/0x90
  __do_sys_reboot+0x141/0x210
  ? vfs_writev+0xcd/0x140
  ? handle_mm_fault+0x161/0x260
  ? do_writev+0x6b/0x110
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
 RIP: 0033:0x7f496990fb56
 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
 RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
  </TASK>
 CR2: 0000000000000300
 ---[ end trace 0000000000000000 ]---

Fixes: bc9a2b3e68 ("vdpa/mlx5: Support interrupt bypassing")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20230803152648.199297-1-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10 15:24:29 -04:00
Eugenio Pérez ad03a0f44c vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
the control virtqueue ASID iotlb has been populated. The control vq
iotlb must not be cleared, since it will not be populated again.

So call the ASID aware destroy function which makes sure that the
right vq resource is destroyed.

Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10 15:24:29 -04:00
Dragos Tatulea 9ee811009a vdpa/mlx5: Fix mr->initialized semantics
The mr->initialized flag is shared between the control vq and data vq
part of the mr init/uninit. But if the control vq and data vq get placed
in different ASIDs, it can happen that initializing the control vq will
prevent the data vq mr from being initialized.

This patch consolidates the control and data vq init parts into their
own init functions. The mr->initialized will now be used for the data vq
only. The control vq currently doesn't need a flag.

The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
split into data and control vq functions which are now also ASID aware.

Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10 15:24:28 -04:00
Dragos Tatulea 3fe0241933 vdpa/mlx5: Correct default number of queues when MQ is on
The standard specifies that the initial number of queues is the
default, which is 1 (1 tx, 1 rx).

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230727172354.68243-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-08-10 15:24:28 -04:00
Maher Sanalla f14c1a14e6 net/mlx5: Allocate completion EQs dynamically
This commit enables the dynamic allocation of EQs at runtime, allowing
for more flexibility in managing completion EQs and reducing the memory
overhead of driver load. Whenever a CQ is created for a given vector
index, the driver will lookup to see if there is an already mapped
completion EQ for that vector, if so, utilize it. Otherwise, allocate a
new EQ on demand and then utilize it for the CQ completion events.

Add a protection lock to the EQ table to protect from concurrent EQ
creation attempts.

While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with
mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an
EQ on demand if no EQ is found for the given vector.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07 10:53:52 -07:00
Eli Cohen bc9a2b3e68 vdpa/mlx5: Support interrupt bypassing
Add support for generation of interrupts from the device directly to the
VM to the VCPU thus avoiding the overhead on the host CPU.

When supported, the driver will attempt to allocate vectors for each
data virtqueue. If a vector for a virtqueue cannot be provided it will
use the QP mode where notifications go through the driver.

In addition, we add a shutdown callback to make sure allocated
interrupts are released in case of shutdown to allow clean shutdown.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Message-Id: <20230607190007.290505-1-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27 10:47:09 -04:00
Dragos Tatulea 73790bdfba vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister
Currently the vdpa device is unregistered after the workqueue that
processes vq commands is disabled. However, the device unregister
process can still send commands to the cvq (a vlan delete for example)
which leads to a hang because the handing workqueue has been disabled
and the command never finishes:

 [ 2263.095764] rcu: INFO: rcu_sched self-detected stall on CPU
 [ 2263.096307] rcu:        9-....: (5250 ticks this GP) idle=dac4/1/0x4000000000000000 softirq=111009/111009 fqs=2544
 [ 2263.097154] rcu:        (t=5251 jiffies g=393549 q=347 ncpus=10)
 [ 2263.097648] CPU: 9 PID: 94300 Comm: kworker/u20:2 Not tainted 6.3.0-rc6_for_upstream_min_debug_2023_04_14_00_02 #1
 [ 2263.098535] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 2263.099481] Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
 [ 2263.100143] RIP: 0010:virtnet_send_command+0x109/0x170
 [ 2263.100621] Code: 1d df f5 ff 85 c0 78 5c 48 8b 7b 08 e8 d0 c5 f5 ff 84 c0 75 11 eb 22 48 8b 7b 08 e8 01 b7 f5 ff 84 c0 75 15 f3 90 48 8b 7b 08 <48> 8d 74 24 04 e8 8d c5 f5 ff 48 85 c0 74 de 48 8b 83 f8 00 00 00
 [ 2263.102148] RSP: 0018:ffff888139cf36e8 EFLAGS: 00000246
 [ 2263.102624] RAX: 0000000000000000 RBX: ffff888166bea940 RCX: 0000000000000001
 [ 2263.103244] RDX: 0000000000000000 RSI: ffff888139cf36ec RDI: ffff888146763800
 [ 2263.103864] RBP: ffff888139cf3710 R08: ffff88810d201000 R09: 0000000000000000
 [ 2263.104473] R10: 0000000000000002 R11: 0000000000000003 R12: 0000000000000002
 [ 2263.105082] R13: 0000000000000002 R14: ffff888114528400 R15: ffff888166bea000
 [ 2263.105689] FS:  0000000000000000(0000) GS:ffff88852cc80000(0000) knlGS:0000000000000000
 [ 2263.106404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 2263.106925] CR2: 00007f31f394b000 CR3: 000000010615b006 CR4: 0000000000370ea0
 [ 2263.107542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [ 2263.108163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [ 2263.108769] Call Trace:
 [ 2263.109059]  <TASK>
 [ 2263.109320]  ? check_preempt_wakeup+0x11f/0x230
 [ 2263.109750]  virtnet_vlan_rx_kill_vid+0x5a/0xa0
 [ 2263.110180]  vlan_vid_del+0x9c/0x170
 [ 2263.110546]  vlan_device_event+0x351/0x760 [8021q]
 [ 2263.111004]  raw_notifier_call_chain+0x41/0x60
 [ 2263.111426]  dev_close_many+0xcb/0x120
 [ 2263.111808]  unregister_netdevice_many_notify+0x130/0x770
 [ 2263.112297]  ? wq_worker_running+0xa/0x30
 [ 2263.112688]  unregister_netdevice_queue+0x89/0xc0
 [ 2263.113128]  unregister_netdev+0x18/0x20
 [ 2263.113512]  virtnet_remove+0x4f/0x230
 [ 2263.113885]  virtio_dev_remove+0x31/0x70
 [ 2263.114273]  device_release_driver_internal+0x18f/0x1f0
 [ 2263.114746]  bus_remove_device+0xc6/0x130
 [ 2263.115146]  device_del+0x173/0x3c0
 [ 2263.115502]  ? kernfs_find_ns+0x35/0xd0
 [ 2263.115895]  device_unregister+0x1a/0x60
 [ 2263.116279]  unregister_virtio_device+0x11/0x20
 [ 2263.116706]  device_release_driver_internal+0x18f/0x1f0
 [ 2263.117182]  bus_remove_device+0xc6/0x130
 [ 2263.117576]  device_del+0x173/0x3c0
 [ 2263.117929]  ? vdpa_dev_remove+0x20/0x20 [vdpa]
 [ 2263.118364]  device_unregister+0x1a/0x60
 [ 2263.118752]  mlx5_vdpa_dev_del+0x4c/0x80 [mlx5_vdpa]
 [ 2263.119232]  vdpa_match_remove+0x21/0x30 [vdpa]
 [ 2263.119663]  bus_for_each_dev+0x71/0xc0
 [ 2263.120054]  vdpa_mgmtdev_unregister+0x57/0x70 [vdpa]
 [ 2263.120520]  mlx5v_remove+0x12/0x20 [mlx5_vdpa]
 [ 2263.120953]  auxiliary_bus_remove+0x18/0x30
 [ 2263.121356]  device_release_driver_internal+0x18f/0x1f0
 [ 2263.121830]  bus_remove_device+0xc6/0x130
 [ 2263.122223]  device_del+0x173/0x3c0
 [ 2263.122581]  ? devl_param_driverinit_value_get+0x29/0x90
 [ 2263.123070]  mlx5_rescan_drivers_locked+0xc4/0x2d0 [mlx5_core]
 [ 2263.123633]  mlx5_unregister_device+0x54/0x80 [mlx5_core]
 [ 2263.124169]  mlx5_uninit_one+0x54/0x150 [mlx5_core]
 [ 2263.124656]  mlx5_sf_dev_remove+0x45/0x90 [mlx5_core]
 [ 2263.125153]  auxiliary_bus_remove+0x18/0x30
 [ 2263.125560]  device_release_driver_internal+0x18f/0x1f0
 [ 2263.126052]  bus_remove_device+0xc6/0x130
 [ 2263.126451]  device_del+0x173/0x3c0
 [ 2263.126815]  mlx5_sf_dev_remove+0x39/0xf0 [mlx5_core]
 [ 2263.127318]  mlx5_sf_dev_state_change_handler+0x178/0x270 [mlx5_core]
 [ 2263.127920]  blocking_notifier_call_chain+0x5a/0x80
 [ 2263.128379]  mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core]
 [ 2263.128951]  process_one_work+0x1bb/0x3c0
 [ 2263.129355]  ? process_one_work+0x3c0/0x3c0
 [ 2263.129766]  worker_thread+0x4d/0x3c0
 [ 2263.130140]  ? process_one_work+0x3c0/0x3c0
 [ 2263.130548]  kthread+0xb9/0xe0
 [ 2263.130895]  ? kthread_complete_and_exit+0x20/0x20
 [ 2263.131349]  ret_from_fork+0x1f/0x30
 [ 2263.131717]  </TASK>

The fix is to disable and destroy the workqueue after the device
unregister. It is expected that vhost will not trigger kicks after
the unregister. But even if it would, the wq is disabled already by
setting the pointer to NULL (done so in the referenced commit).

Fixes: ad6dc1daaf ("vdpa/mlx5: Avoid processing works if workqueue was destroyed")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20230516095800.3549932-1-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-08 15:43:08 -04:00
Eli Cohen e9d67e59f1 vdpa/mlx5: Extend driver support for new features
Extend the possible list for features that can be supported by firmware.
Note that different versions of firmware may or may not support these
features. The driver is made aware of them by querying the firmware.

While doing this, improve the code so we use enum names instead of hard
coded numerical values.

The new features supported by the driver are the following:

VIRTIO_NET_F_MRG_RXBUF
VIRTIO_NET_F_HOST_ECN
VIRTIO_NET_F_GUEST_ECN
VIRTIO_NET_F_GUEST_TSO6
VIRTIO_NET_F_GUEST_TSO4

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
2023-04-21 03:02:31 -04:00
Eli Cohen 791a1cb7b8 vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.

Current firmware versions show degradation in packet rate when using
MRG_RXBUF. Users who favor memory saving over packet rate could enable
this feature but we want to keep it off by default.

One can still enable it when creating the vdpa device using vdpa tool by
providing features that include it.

For example:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2023-04-21 03:02:31 -04:00
Eli Cohen c384c2401e vdpa/mlx5: Avoid losing link state updates
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
negotiated. However, link state updates could be received before feature
negotiation was completed , therefore causing link state events to be
lost, possibly leaving the link state down.

Modify the code so link state notifier is registered after DRIVER_OK was
negotiated and carry the registration only if
VIRTIO_NET_F_STATUS was negotiated.  Unregister the notifier when the
device is reset.

Fixes: 033779a708 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230417110343.138319-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21 03:02:29 -04:00
Eli Cohen f0417e72ad vdpa/mlx5: Add and remove debugfs in setup/teardown driver
The right place to add the debugfs create is in
setup_driver() and remove it in teardown_driver().

Current code adds the debugfs when creating the device but resetting a
device will remove the debugfs subtree and subsequent set_driver will
not be able to create the files since the debugfs pointer is NULL.

Fixes: 2942210043 ("vdpa/mlx5: Add debugfs subtree")
Signed-off-by: Eli Cohen <elic@nvidia.com>

v3 -> v4:
Fix error flow in setup_driver()
Message-Id: <20230403114039.11102-1-elic@nvidia.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-04 11:08:30 -04:00
Si-Wei Liu 09e65ee905 vdpa/mlx5: should not activate virtq object when suspended
Otherwise the virtqueue object to instate could point to invalid address
that was unmapped from the MTT:

  mlx5_core 0000:41:04.2: mlx5_cmd_out_err:782:(pid 8321):
  CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status
  bad parameter(0x3), syndrome (0x5fa1c), err(-22)

Fixes: cae15c2ed8 ("vdpa/mlx5: Implement susupend virtqueue callback")
Cc: Eli Cohen <elic@nvidia.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>

Message-Id: <1676424640-11673-1-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-03-10 04:02:31 -05:00
Si-Wei Liu deeacf35c9 vdpa/mlx5: support device features provisioning
This patch implements features provisioning for mlx5_vdpa.

1) Validate the provisioned features are a subset of the parent
    features.
2) Clearing features that are not wanted by userspace.

For example:

    # vdpa mgmtdev show
    pci/0000:41:04.2:
      supported_classes net
      max_supported_vqs 65
      dev_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM

1) Provision vDPA device with all features derived from the parent

    # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2
    # vdpa dev config show
    vdpa1: mac e4:11:c6:d3:45:f0 link up link_announce false max_vq_pairs 1 mtu 1500
      negotiated_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM

2) Provision vDPA device with a subset of parent features

    # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 device_features 0x300020000
    # vdpa dev config show
    vdpa1:
      negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Message-Id: <1675725124-7375-7-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:27:00 -05:00
Si-Wei Liu 033779a708 vdpa/mlx5: make MTU/STATUS presence conditional on feature bits
The spec says:
    mtu only exists if VIRTIO_NET_F_MTU is set
    status only exists if VIRTIO_NET_F_STATUS is set

We should only present MTU and STATUS conditionally depending on
the feature bits.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Message-Id: <1675725124-7375-6-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:27:00 -05:00
Eli Cohen c04e2145b8 vdpa/mlx5: Initialize CVQ iotlb spinlock
Initialize itolb spinlock.

Fixes: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230206122016.1149373-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:27:00 -05:00
Eli Cohen aef24311bd vdpa/mlx5: Don't clear mr struct on destroy MR
Clearing the mr struct erases the lock owner and causes warnings to be
emitted. It is not required to clear the mr so remove the memset call.

Fixes: 94abbccdf2 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230206121956.1149356-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:59 -05:00
Eli Cohen 446062e6ad vdpa/mlx5: Directly assign memory key
When creating a memory key, the key value should be assigned to the
passed pointer and not or'ed to.

No functional issue was observed due to this bug.

Fixes: 29064bfdab ("vdpa/mlx5: Add support library for mlx5 VDPA implementation")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230205072906.1108194-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:59 -05:00
Jason Wang 36871fb92b vdpa: mlx5: support per virtqueue dma device
This patch implements per virtqueue dma device for mlx5_vdpa. This is
needed for virtio_vdpa to work for CVQ which is backed by vringh but
not DMA. We simply advertise the vDPA device itself as the DMA device
for CVQ then DMA API can simply use PA so the identical mapping for
CVQ can still be used. Otherwise the identical (1:1) mapping won't
work when platform IOMMU is enabled since the IOVA is allocated on
demand which is not necessarily the PA.

This fixes the following crash when mlx5 vDPA device is bound to
virtio-vdpa with platform IOMMU enabled but not in passthrough mode:

BUG: unable to handle page fault for address: ff2fb3063deb1002
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1393001067 P4D 1393002067 PUD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 55 PID: 8923 Comm: kworker/u112:3 Kdump: loaded Not tainted 6.1.0+ #7
Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021
Workqueue: mlx5_vdpa_wq mlx5_cvq_kick_handler [mlx5_vdpa]
RIP: 0010:vringh_getdesc_iotlb+0x93/0x1d0 [vringh]
Code: 14 25 40 ef 01 00 83 82 c0 0a 00 00 01 48 2b 05 93 5a 1b ea 8b 4c 24 14 48 c1 f8 06 48 c1 e0 0c 48 03 05 90 5a 1b ea 48 01 c8 <0f> b7 00 83 aa c0 0a 00 00 01 65 ff 0d bc e4 41 3f 0f 84 05 01 00
RSP: 0018:ff46821ba664fdf8 EFLAGS: 00010282
RAX: ff2fb3063deb1002 RBX: 0000000000000a20 RCX: 0000000000000002
RDX: ff2fb318d2f94380 RSI: 0000000000000002 RDI: 0000000000000001
RBP: ff2fb3065e832410 R08: ff46821ba664fe00 R09: 0000000000000001
R10: 0000000000000000 R11: 000000000000000d R12: ff2fb3065e832488
R13: ff2fb3065e8324a8 R14: ff2fb3065e8324c8 R15: ff2fb3065e8324a8
FS:  0000000000000000(0000) GS:ff2fb3257fac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff2fb3063deb1002 CR3: 0000001392010006 CR4: 0000000000771ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
  mlx5_cvq_kick_handler+0x89/0x2b0 [mlx5_vdpa]
  process_one_work+0x1e2/0x3b0
  ? rescuer_thread+0x390/0x390
  worker_thread+0x50/0x3a0
  ? rescuer_thread+0x390/0x390
  kthread+0xd6/0x100
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-6-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:58 -05:00
Eli Cohen 0a59975088 vdpa/mlx5: Add RX counters to debugfs
For each interface, either VLAN tagged or untagged, add two hardware
counters: one for unicast and another for multicast. The counters count
RX packets and bytes and can be read through debugfs:

$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes

This feature is controlled via the config option
MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some
impact on performance.

includes a fixup By Yang Yingliang <yangyingliang@huawei.com>:

vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules()

The local variable 'rule' is not used anymore, fix return value
check after calling mlx5_add_flow_rules().

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-9-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen 2942210043 vdpa/mlx5: Add debugfs subtree
Add debugfs subtree and expose flow table ID and TIR number. This
information can be used by external tools to do extended
troubleshooting.

The information can be retrieved like so:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-8-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen 72c67e9b90 vdpa/mlx5: Move some definitions to a new header file
Move some definitions from mlx5_vnet.c to newly added header file
mlx5_vnet.h. We need these definitions for the following patches that
add debugfs tree to expose information vital for debug.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-7-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Colin Ian King a9f0a19ff7 RDMA/mlx5: remove variable i
Variable i is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <20221024133756.2158497-1-colin.i.king@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28 05:28:09 -05:00
Eli Cohen 38fc462f57 vdpa/mlx5: Avoid overwriting CVQ iotlb
When qemu uses different address spaces for data and control virtqueues,
the current code would overwrite the control virtqueue iotlb through the
dup_iotlb call. Fix this by referring to the address space identifier
and the group to asid mapping to determine which mapping needs to be
updated. We also move the address space logic from mlx5 net to core
directory.

Reported-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-6-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28 05:28:09 -05:00
Eli Cohen 0dbc1b4ae0 vdpa/mlx5: Avoid using reslock in event_handler
event_handler runs under atomic context and may not acquire reslock. We
can still guarantee that the handler won't be called after suspend by
clearing nb_registered, unregistering the handler and flushing the
workqueue.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-5-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28 05:28:09 -05:00
Eli Cohen 1ab53760d3 vdpa/mlx5: Fix wrong mac address deletion
Delete the old MAC from the table and not the new one which is not there
yet.

Fixes: baf2ad3f6a ("vdpa/mlx5: Add RX MAC VLAN filter support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-4-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28 05:28:09 -05:00
Eli Cohen 5aec804936 vdpa/mlx5: Return error on vlan ctrl commands if not supported
Check if VIRTIO_NET_F_CTRL_VLAN is negotiated and return error if
control VQ command is received.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28 05:28:09 -05:00
Eli Cohen a6ce72c0fb vdpa/mlx5: Fix rule forwarding VLAN to TIR
Set the VLAN id to the header values field instead of overwriting the
headers criteria field.

Before this fix, VLAN filtering would not really work and tagged packets
would be forwarded unfiltered to the TIR.

Fixes: baf2ad3f6a ("vdpa/mlx5: Add RX MAC VLAN filter support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28 05:28:09 -05:00
Eli Cohen a43ae8057c vdpa/mlx5: Fix MQ to support non power of two num queues
RQT objects require that a power of two value be configured for both
rqt_max_size and rqt_actual size.

For create_rqt, make sure to round up to the power of two the value of
given by the user who created the vdpa device and given by
ndev->rqt_size. The actual size is also rounded up to the power of two
using the current number of VQs given by ndev->cur_num_vqs.

Same goes with modify_rqt where we need to make sure act size is power
of two based on the new number of QPs.

Without this patch, attempt to create a device with non power of two QPs
would result in error from firmware.

Fixes: 52893733f2 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220912125019.833708-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-09-27 18:32:45 -04:00
Eli Cohen 93e530d2a1 vdpa/mlx5: Fix possible uninitialized return value
Initialize err local variable to return -EAGAIN if the asid cannot be
found thus avoiding returning uninitialized value.

Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220811134010.952291-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 10:00:36 -04:00
Eli Cohen 8fcd20c307 vdpa/mlx5: Support different address spaces for control and data
Partition virtqueues to two different address spaces: one for control
virtqueue which is implemented in software, and one for data virtqueues.

Based-on: <20220526124338.36247-1-eperezma@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220714113927.85729-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:26:08 -04:00
Eli Cohen cae15c2ed8 vdpa/mlx5: Implement susupend virtqueue callback
Implement the suspend callback allowing to suspend the virtqueues so
they stop processing descriptors. This is required to allow to query a
consistent state of the virtqueue while live migration is taking place.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220714113927.85729-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:26:08 -04:00
Xu Qiang 71aa95a69c vdpa/mlx5: Use eth_broadcast_addr() to assign broadcast address
Using eth_broadcast_addr() to assign broadcast address instead
of memset().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Qiang <xuqiang36@huawei.com>
Message-Id: <20220704021405.64545-1-xuqiang36@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11 04:26:07 -04:00
Eli Cohen ace9252446 vdpa/mlx5: Initialize CVQ vringh only once
Currently, CVQ vringh is initialized inside setup_virtqueues() which is
called every time a memory update is done. This is undesirable since it
resets all the context of the vring, including the available and used
indices.

Move the initialization to mlx5_vdpa_set_status() when
VIRTIO_CONFIG_S_DRIVER_OK is set.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220613075958.511064-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-06-24 02:49:47 -04:00
Eli Cohen 40f2f3e941 vdpa/mlx5: Update Control VQ callback information
The control VQ specific information is stored in the dedicated struct
mlx5_control_vq. When the callback is updated through
mlx5_vdpa_set_vq_cb(), make sure to update the control VQ struct.

Fixes: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220613075958.511064-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com)
2022-06-24 02:49:47 -04:00
Dan Carpenter f38b3c6a78 vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
These lines were supposed to be indented.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <Yp71IYMP+QfuCJ8t@kili>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-06-08 08:56:03 -04:00
Dan Carpenter f766c409fc vdpa/mlx5: fix error code for deleting vlan
Return success if we were able to delete a vlan.  The current code
always returns failure.

Fixes: baf2ad3f6a ("vdpa/mlx5: Add RX MAC VLAN filter support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <Yp709f1g9NcMBCHg@kili>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-06-08 08:56:03 -04:00
Xiang wangx 2f72b2262d vdpa/mlx5: Fix syntax errors in comments
Delete the redundant word 'is'.

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Message-Id: <20220604143858.16073-1-wangxiang@cdjrlc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-08 08:56:03 -04:00
Eli Cohen baf2ad3f6a vdpa/mlx5: Add RX MAC VLAN filter support
Support HW offloaded filtering of MAC/VLAN packets.
To allow that, we add a handler to handle VLAN configurations coming
through the control VQ. Two operations are supported.

1. Adding VLAN - in this case, an entry will be added to the RX flow
   table that will allow the combination of the MAC/VLAN to be
   forwarded to the TIR.
2. Removing VLAN - will remove the entry from the flow table,
   effectively blocking such packets from going through.

Currently the control VQ does not propagate changes to the MAC of the
VLAN device so we always use the MAC of the parent device.

Examples:
1. Create vlan device:
$ ip link add link ens1 name ens1.8 type vlan id 8

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220411122942.225717-4-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-01 02:16:38 -04:00
Eli Cohen 7becdd13b6 vdpa/mlx5: Remove flow counter from steering
The flow counter has been introduced in early versions of the driver to
aid in debugging. It is no longer needed and can harm performance.

Remove it.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220411122942.225717-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-06-01 02:16:38 -04:00
Gautam Dawar db9adcbf42 vdpa: multiple address spaces support
This patches introduces the multiple address spaces support for vDPA
device. This idea is to identify a specific address space via an
dedicated identifier - ASID.

During vDPA device allocation, vDPA device driver needs to report the
number of address spaces supported by the device then the DMA mapping
ops of the vDPA device needs to be extended to support ASID.

This helps to isolate the environments for the virtqueue that will not
be assigned directly. E.g in the case of virtio-net, the control
virtqueue will not be assigned directly to guest.

As a start, simply claim 1 virtqueue groups and 1 address spaces for
all vDPA devices. And vhost-vDPA will simply reject the device with
more than 1 virtqueue groups or address spaces.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Gautam Dawar <gdawar@xilinx.com>
Message-Id: <20220330180436.24644-7-gdawar@xilinx.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:27 -04:00
Gautam Dawar d4821902e4 vdpa: introduce virtqueue groups
This patch introduces virtqueue groups to vDPA device. The virtqueue
group is the minimal set of virtqueues that must share an address
space. And the address space identifier could only be attached to
a specific virtqueue group.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Gautam Dawar <gdawar@xilinx.com>
Message-Id: <20220330180436.24644-6-gdawar@xilinx.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:27 -04:00
Eli Cohen 759ae7f9bf vdpa/mlx5: Use readers/writers semaphore instead of mutex
Reading statistics could be done intensively and by several processes
concurrently. Reader's lock is sufficient in this case.

Change reslock from mutex to a rwsem.

Suggested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220518133804.1075129-7-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:23 -04:00
Eli Cohen 1892a3d425 vdpa/mlx5: Add support for reading descriptor statistics
Implement the get_vq_stats calback of vdpa_config_ops to return the
statistics for a virtqueue.

The statistics are provided as vendor specific statistics where the
driver provides a pair of attribute name and attribute value.

Currently supported are received descriptors and completed descriptors.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220518133804.1075129-6-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:22 -04:00
Eli Cohen acde392949 vdpa/mlx5: Use consistent RQT size
The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:

Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
   than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
   to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
   hold. This will emit errors and the driver state is compromised.

To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.

In addition to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.

Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.

Fixes: 52893733f2 ("vdpa/mlx5: Add multiqueue support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-18 12:31:31 -04:00
Jason Wang 1c80cf031e vdpa: mlx5: synchronize driver status with CVQ
Currently, CVQ doesn't have any synchronization with the driver
status. Then CVQ emulation code run in the middle of:

1) device reset
2) device status changed
3) map updating

The will lead several unexpected issue like trying to execute CVQ
command after the driver has been teared down.

Fixing this by using reslock to synchronize CVQ emulation code with
the driver status changing:

- protect the whole device reset, status changing and set_map()
  updating with reslock
- protect the CVQ handler with the reslock and check
  VIRTIO_CONFIG_S_DRIVER_OK in the CVQ handler

This will guarantee that:

1) CVQ handler won't work if VIRTIO_CONFIG_S_DRIVER_OK is not set
2) CVQ handler will see a consistent state of the driver instead of
   the partial one when it is running in the middle of the
   teardown_driver() or setup_driver().

Cc: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220329042109.4029-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
2022-03-30 04:18:14 -04:00
Jason Wang 55ebf0d60e vdpa: mlx5: prevent cvq work from hogging CPU
A userspace triggerable infinite loop could happen in
mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of
cvq requests.

Fixing this by introducing a quota and re-queue the work if we're out
of the budget (currently the implicit budget is one) . While at it,
using a per device work struct to avoid on demand memory allocation
for cvq.

Fixes: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220329042109.4029-1-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
2022-03-30 04:18:14 -04:00
Eli Cohen ad6dc1daaf vdpa/mlx5: Avoid processing works if workqueue was destroyed
If mlx5_vdpa gets unloaded while a VM is running, the workqueue will be
destroyed. However, vhost might still have reference to the kick
function and might attempt to push new works. This could lead to null
pointer dereference.

To fix this, set mvdev->wq to NULL just before destroying and verify
that the workqueue is not NULL in mlx5_vdpa_kick_vq before attempting to
push a new work.

Fixes: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220321141303.9586-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28 16:54:30 -04:00