Commit Graph

791 Commits

Author SHA1 Message Date
Eli Cohen 680ab9d69a vdpa: Protect vdpa reset with cf_mutex
Call reset using the wrapper function vdpa_reset() to make sure the
operation is serialized with cf_mutex.

This comes to protect from the following possible scenario:

vhost_vdpa_set_status() could call the reset op. Since the call is not
protected by cf_mutex, a netlink thread calling vdpa_dev_config_fill
could get passed the VIRTIO_CONFIG_S_FEATURES_OK check in
vdpa_dev_config_fill() and end up reporting wrong features.

Fixes: 5f6e85953d8f ("vdpa: Read device configuration only if FEATURES_OK")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220111183400.38418-3-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14 18:50:54 -05:00
Eli Cohen f6d955d808 vdpa: Avoid taking cf_mutex lock on get status
Avoid the wrapper holding cf_mutex since it is not protecting anything.
To avoid confusion and unnecessary overhead incurred by it, remove.

Fixes: f489f27bc0ab ("vdpa: Sync calls set/get config/status with cf_mutex")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220111183400.38418-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14 18:50:54 -05:00
Eli Cohen aba21aff77 vdpa: Allow to configure max data virtqueues
Add netlink support to configure the max virtqueue pairs for a device.
At least one pair is required. The maximum is dictated by the device.

Example:
$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-6-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Eli Cohen 73bc0dbb59 vdpa: Sync calls set/get config/status with cf_mutex
Add wrappers to get/set status and protect these operations with
cf_mutex to serialize these operations with respect to get/set config
operations.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-4-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Eli Cohen a64917bc2e vdpa: Provide interface to read driver features
Provide an interface to read the negotiated features. This is needed
when building the netlink message in vdpa_dev_net_config_fill().

Also fix the implementation of vdpa_dev_net_config_fill() to use the
negotiated features instead of the device features.

To make APIs clearer, make the following name changes to struct
vdpa_config_ops so they better describe their operations:

get_features -> get_device_features
set_features -> set_driver_features

Finally, add get_driver_features to return the negotiated features and
add implementation to all the upstream drivers.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Laura Abbott 870aaff92e vdpa: clean up get_config_size ret value handling
The return type of get_config_size is size_t so it makes
sense to change the type of the variable holding its result.

That said, this already got taken care of (differently, and arguably
not as well) by commit 3ed21c1451 ("vdpa: check that offsets are
within bounds").

The added 'c->off > size' test in that commit will be done as an
unsigned comparison on 32-bit (safe due to not being signed).

On a 64-bit platform, it will be done as a signed comparison, but in
that case the comparison will be done in 64-bit, and 'c->off' being an
u32 it will be valid thanks to the extended range (ie both values will
be positive in 64 bits).

So this was a real bug, but it was already addressed and marked for stable.

Signed-off-by: Laura Abbott <labbott@kernel.org>
Reported-by: Luo Likang <luolikang@nsfocus.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Xianting Tian 0800639207 vhost/test: fix memory leak of vhost virtqueues
We need free the vqs in .release(), which are allocated in .open().

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211228030924.3468439-1-xianting.tian@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14 18:50:53 -05:00
Eugenio Pérez 23118b09e6 vdpa: Avoid duplicate call to vp_vdpa get_status
It has no sense to call get_status twice, since we already have a
variable for that.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Link: https://lore.kernel.org/r/20211104195833.2089796-1-eperezma@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14 18:50:52 -05:00
Dan Carpenter 3ed21c1451 vdpa: check that offsets are within bounds
In this function "c->off" is a u32 and "size" is a long.  On 64bit systems
if "c->off" is greater than "size" then "size - c->off" is a negative and
we always return -E2BIG.  But on 32bit systems the subtraction is type
promoted to a high positive u32 value and basically any "c->len" is
accepted.

Fixes: 4c8cf31885 ("vhost: introduce vDPA-based backend")
Reported-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211208103337.GA4047@kili
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
2021-12-08 14:53:15 -05:00
Wu Zongyong ea8f17e44f vhost-vdpa: clean irqs before reseting vdpa device
Vdpa devices should be reset after unseting irqs of virtqueues, or we
will get errors when killing qemu process:

>> pi_update_irte: failed to update PI IRTE
>> irq bypass consumer (token 0000000065102a43) unregistration fails: -22

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Link: https://lore.kernel.org/r/a2cb60cf73be9da5c4e6399242117d8818f975ae.1636946171.git.wuzongyong@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-11-24 19:00:28 -05:00
Stefano Garzarella 11708ff92c vhost/vsock: cleanup removing `len` variable
We can increment `total_len` directly and remove `len` since it
is no longer used for vhost_add_used().

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211122163525.294024-3-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-11-24 19:00:28 -05:00
Stefano Garzarella 49d8c5ffad vhost/vsock: fix incorrect used length reported to the guest
The "used length" reported by calling vhost_add_used() must be the
number of bytes written by the device (using "in" buffers).

In vhost_vsock_handle_tx_kick() the device only reads the guest
buffers (they are all "out" buffers), without writing anything,
so we must pass 0 as "used length" to comply virtio spec.

Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: Halil Pasic <pasic@linux.ibm.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
2021-11-24 19:00:28 -05:00
Parav Pandit 6dbb1f1687 vdpa: Introduce and use vdpa device get, set config helpers
Subsequent patches enable get and set configuration either
via management device or via vdpa device' config ops.

This requires synchronization between multiple callers to get and set
config callbacks. Features setting also influence the layout of the
configuration fields endianness.

To avoid exposing synchronization primitives to callers, introduce
helper for setting the configuration and use it.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211026175519.87795-2-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 05:26:49 -04:00
Linus Torvalds 3bb50f8530 virtio,vdpa: fixes
Fixes up some issues in rc5.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmFm1OcPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpp0AIAL5lvwd/Dzj0trDEOm93yTiKigkllJ9if8pF
 G3V6IgyyeNujmEJLD2Fz5GPUVXg5G2+Yh2xgf8OP4syS/pZoCGkGcLIoo7lJRSvz
 D3KpNYrZ2O1kFw2XWP3p7O/H8pxWAMPjykRaVoniCd+rIIpRzWdYBDDWConUGqC1
 IlB5BS6cv3vRIoJ4Ac3YVOEUM4WOw/2fwzxejVjxQdNjgbWL0JBY1IBiJrfp7iEo
 L3KRmN25JWCwE0x+Ehy6/uSalVzPgjESBYzrBFihnFXDS/LIIIXuRb9Q3HzEFFHT
 UhccuExc8Nq8JDKWrZkYP2s950Gnv249bC9tqlHHTnoE+pxG3Ms=
 =i68W
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Fixes up some issues in rc5"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-vdpa: Fix the wrong input in config_cb
  VDUSE: fix documentation underline warning
  Revert "virtio-blk: Add validation for block size in config space"
  vhost_vdpa: unset vq irq before freeing irq
  virtio: write back F_VERSION_1 before validate
2021-10-17 18:17:19 -10:00
Cindy Lu bcef9356fc vhost-vdpa: Fix the wrong input in config_cb
Fix the wrong input in for config_cb. In function vhost_vdpa_config_cb,
the input cb.private was used as struct vhost_vdpa, so the input was
wrong here, fix this issue

Fixes: 776f395004 ("vhost_vdpa: Support config interrupt in vdpa")
Signed-off-by: Cindy Lu <lulu@redhat.com>
Link: https://lore.kernel.org/r/20210929090933.20465-1-lulu@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-13 08:42:07 -04:00
Wu Zongyong 97f854be20 vhost_vdpa: unset vq irq before freeing irq
Currently we unset vq irq after freeing irq and that will result in
error messages:

  pi_update_irte: failed to update PI IRTE
  irq bypass consumer (token 000000005a07a12b) unregistration fails: -22

This patch solves this.

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Link: https://lore.kernel.org/r/02637d38dcf4e4b836c5b3a65055fe92bf812b3b.1631687872.git.wuzongyong@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-10-13 08:35:36 -04:00
Linus Torvalds d33bec7b3d virtio,vdpa: fixes
Fixes up some issues in rc1.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmFSRZgPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpSnYIAMNstllnJgyDR0GUGO850AKv0x2acncf66wc
 J5vjWFWh5rtmdZSMhvA5mo3J8h/7s6Mn67fCwKt0Ii6fi1f6eIl4OBDYBfV8wXkN
 +e8eQAUboi3HLqsiuFSNpJTHnD70xbU4inxiTjaBndXaxk20nkWJsd1Mvfmxh+mE
 uMRnumAwrdL3c0n0Vrcq8j+zxLhlDXSFjICd6l+xRwPigsX/5gY+V5tPQg4lhnk+
 VuC2Q0eJHmCEjrVi4Tx7dkoDu9U4Go5CnVF0MF9AzVf7JYBPJPks3r17unMe+ZAI
 Nvwa/39no2APa9wZPQnGk5V9rOPtYFa6XXsCufN4BbtlqVX5AlQ=
 =2oo6
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vdpa fixes from Michael Tsirkin:
 "Fixes up some issues in rc1"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa: potential uninitialized return in vhost_vdpa_va_map()
  vdpa/mlx5: Avoid executing set_vq_ready() if device is reset
  vdpa/mlx5: Clear ready indication for control VQ
  vduse: Cleanup the old kernel states after reset failure
  vduse: missing error code in vduse_init()
  virtio: don't fail on !of_device_is_compatible
2021-09-28 07:27:29 -07:00
Linus Torvalds fc0c0548c1 Networking fixes for 5.15-rc2, including fixes from bpf.
Current release - regressions:
 
  - vhost_net: fix OoB on sendmsg() failure
 
  - mlx5: bridge, fix uninitialized variable usage
 
  - bnxt_en: fix error recovery regression
 
 Current release - new code bugs:
 
  - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset()
 
 Previous releases - regressions:
 
  - r6040: restore MDIO clock frequency after MAC reset
 
  - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
 
  - dsa: flush switchdev workqueue before tearing down CPU/DSA ports
 
 Previous releases - always broken:
 
  - ptp: dp83640: don't define PAGE0, avoid compiler warning
 
  - igc: fix tunnel segmentation offloads
 
  - phylink: update SFP selected interface on advertising changes
 
  - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume
 
  - mlx5e: fix mutual exclusion between CQE compression and HW TS
 
 Misc:
 
  - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode
 
  - sfc: fallback for lack of xdp tx queues
 
  - hns3: add option to turn off page pool feature
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFDnXIACgkQMUZtbf5S
 Irs+pA//UvT0UqL+G/9He9eVMaX/7mnvoShbU/hHaymLuvJE4dx6B3qw8im5fQ0i
 b7ovy1dxjMjSQI4E+2FleajRBgsdXvWgMCopFZr7h35+qPXziLxTTztwsaOIjD8U
 42HBpdwFrVtdrOhSsiKsGKe8KXqIehHhFAp4vQrRlsvsz1Yk/93o6wYaXaXqR6pQ
 qlTVgI7nAuLAw4VfFZ3k7iAiQsyrXrn5kUXq5/Wy6kK7iFnIN7X9j+epTJP/EXHn
 +17SW/iGnjUfwXsRSC2ysgICtOPatu48Cke5hq5XOGqXAtRVhhLfMDiZlr8a6+z4
 gS+QETakBXLrAfvYK6neLjlMhg1FX+Y952xd3XIU4jz+DgIh03r2Nd1xBsp17Yhu
 RRBc96w2KxuHQ6USEXbi5dY1K6o9DK52m/WRmnqXnaNX4ADeesgcrxXHqZ6O6I5R
 +ddvVWylaHxvngCQSL4DvZdYNJVoKjdlo6fHrwD1qu0Uv0rGz6O+AEI9o8DkK5Af
 ZhFBETvjTKzjCykd3UlaRo/ibYChIPYwpGTK+TTI9rUm8OWNw1+sC/Z14bx0/3Vl
 RB53rkJ93/qi6Wb9ljUxM8CPGViuPvMkQJjpJOaktrj0GOgQz3EBdDkIQMaplZx5
 r2AeHW7X5VGxCWMRg4NS/x7Do1jh8m/ch8/EulOAB5fG7KQPfYs=
 =xd6p
 -----END PGP SIGNATURE-----

Merge tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf.

  Current release - regressions:

   - vhost_net: fix OoB on sendmsg() failure

   - mlx5: bridge, fix uninitialized variable usage

   - bnxt_en: fix error recovery regression

  Current release - new code bugs:

   - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset()

  Previous releases - regressions:

   - r6040: restore MDIO clock frequency after MAC reset

   - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()

   - dsa: flush switchdev workqueue before tearing down CPU/DSA ports

  Previous releases - always broken:

   - ptp: dp83640: don't define PAGE0, avoid compiler warning

   - igc: fix tunnel segmentation offloads

   - phylink: update SFP selected interface on advertising changes

   - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume

   - mlx5e: fix mutual exclusion between CQE compression and HW TS

  Misc:

   - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode

   - sfc: fallback for lack of xdp tx queues

   - hns3: add option to turn off page pool feature"

* tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
  mlxbf_gige: clear valid_polarity upon open
  igc: fix tunnel offloading
  net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
  net: wan: wanxl: define CROSS_COMPILE_M68K
  selftests: nci: replace unsigned int with int
  net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  Revert "net: phy: Uniform PHY driver access"
  net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
  ptp: dp83640: don't define PAGE0
  bnx2x: Fix enabling network interfaces without VFs
  Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers""
  tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
  net-caif: avoid user-triggerable WARN_ON(1)
  bpf, selftests: Add test case for mixed cgroup v1/v2
  bpf, selftests: Add cgroup v1 net_cls classid helpers
  bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode
  bpf: Add oversize check before call kvcalloc()
  net: hns3: fix the timing issue of VF clearing interrupt sources
  net: hns3: fix the exception when query imp info
  net: hns3: disable mac in flr process
  ...
2021-09-16 13:05:42 -07:00
Dan Carpenter be9c6bad9b vdpa: potential uninitialized return in vhost_vdpa_va_map()
The concern here is that "ret" can be uninitialized if we hit the
"goto next" condition on every iteration through the loop.

Fixes: 41ba1b5f9d4b ("vdpa: Support transferring virtual addressing during DMA mapping")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20210907073253.GB18254@kili
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-09-14 18:10:43 -04:00
Linus Torvalds 78e709522d virtio,vdpa,vhost: features, fixes
vduse driver supporting blk
 virtio-vsock support for end of record with SEQPACKET
 vdpa: mac and mq support for ifcvf and mlx5
 vdpa: management netlink for ifcvf
 virtio-i2c, gpio dt bindings
 
 misc fixes, cleanups
 
 NB: when merging this with
 b542e383d8 ("eventfd: Make signal recursion protection a task bit")
 from Linus' tree, replace eventfd_signal_count with
 eventfd_signal_allowed, and drop the export of eventfd_wake_count from
 ("eventfd: Export eventfd_wake_count to modules").
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmE1+awPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpt6EIAJy0qrc62lktNA0IiIVJSLbUbTMmFj8MzkGR
 8UxZdhpjWqBPJPyaOuNeksAqTGm/UAPEYx3C2c95Jhej7anFpy7dbCtIXcPHLJME
 DjcJg+EDrlNCj8m0FcsHpHWsFzPMERJpyEZNxgB5WazQbv+yWhGrg2FN5DCnF0Ro
 ZFYeKSVty148pQ0nHl8X0JM2XMtqit+O+LvKN2HQZ+fubh7BCzMxzkHY0QLHIzUS
 UeZqd3Qm8YcbqnlX38P5D6k+NPiTEgknmxaBLkPxg6H3XxDAmaIRFb8Ldd1rsgy1
 zTLGDiSGpVDIpawRnuEAzqJThV3Y5/MVJ1WD+mDYQ96tmhfp+KY=
 =DBH/
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - vduse driver ("vDPA Device in Userspace") supporting emulated virtio
   block devices

 - virtio-vsock support for end of record with SEQPACKET

 - vdpa: mac and mq support for ifcvf and mlx5

 - vdpa: management netlink for ifcvf

 - virtio-i2c, gpio dt bindings

 - misc fixes and cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits)
  Documentation: Add documentation for VDUSE
  vduse: Introduce VDUSE - vDPA Device in Userspace
  vduse: Implement an MMU-based software IOTLB
  vdpa: Support transferring virtual addressing during DMA mapping
  vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
  vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
  vhost-iotlb: Add an opaque pointer for vhost IOTLB
  vhost-vdpa: Handle the failure of vdpa_reset()
  vdpa: Add reset callback in vdpa_config_ops
  vdpa: Fix some coding style issues
  file: Export receive_fd() to modules
  eventfd: Export eventfd_wake_count to modules
  iova: Export alloc_iova_fast() and free_iova_fast()
  virtio-blk: remove unneeded "likely" statements
  virtio-balloon: Use virtio_find_vqs() helper
  vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
  vsock_test: update message bounds test for MSG_EOR
  af_vsock: rename variables in receive loop
  virtio/vsock: support MSG_EOR bit processing
  vhost/vsock: support MSG_EOR bit processing
  ...
2021-09-11 14:48:42 -07:00
Paolo Abeni 3c4cea8fa7 vhost_net: fix OoB on sendmsg() failure.
If the sendmsg() call in vhost_tx_batch() fails, both the 'batched_xdp'
and 'done_idx' indexes are left unchanged. If such failure happens
when batched_xdp == VHOST_NET_BATCH, the next call to
vhost_net_build_xdp() will access and write memory outside the xdp
buffers area.

Since sendmsg() can only error with EBADFD, this change addresses the
issue explicitly freeing the XDP buffers batch on error.

Fixes: 0a0be13b8f ("vhost_net: batch submitting XDP buffers to underlayer sockets")
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-09 10:52:12 +01:00
Xie Yongji d8945ec411 vdpa: Support transferring virtual addressing during DMA mapping
This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-11-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:57 -04:00
Xie Yongji 22af48cf91 vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
The upcoming patch is going to support VA mapping/unmapping.
So let's factor out the logic of PA mapping/unmapping firstly
to make the code more readable.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-10-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:57 -04:00
Xie Yongji c10fb9454a vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
Add an opaque pointer for DMA mapping.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-9-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:57 -04:00
Xie Yongji 59dfe4f1e8 vhost-iotlb: Add an opaque pointer for vhost IOTLB
Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-8-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:57 -04:00
Xie Yongji 7f05630dc6 vhost-vdpa: Handle the failure of vdpa_reset()
The vdpa_reset() may fail now. This adds check to its return
value and fail the vhost_vdpa_open().

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-7-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:57 -04:00
Xie Yongji 0686082dbf vdpa: Add reset callback in vdpa_config_ops
This adds a new callback to support device specific reset
behavior. The vdpa bus driver will call the reset function
instead of setting status to zero during resetting.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210831103634.33-6-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:57 -04:00
Cai Huoqing 729ce5a5bd vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
it's a nice refactor to make use of
PFN_PHYS/PFN_UP/PFN_DOWN helper macro

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210802013717.851-1-caihuoqing@baidu.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 02:25:16 -04:00
Arseny Krasnov 1af7e55511 vhost/vsock: support MSG_EOR bit processing
'MSG_EOR' handling has similar logic as 'MSG_EOM' - if bit present
in packet's header, reset it to 0. Then restore it back if packet
processing wasn't completed. Instead of bool variable for each
flag, bit mask variable was added: it has logical OR of 'MSG_EOR'
and 'MSG_EOM' if needed, to restore flags, this variable is ORed
with flags field of packet.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Link: https://lore.kernel.org/r/20210903123238.3273526-1-arseny.krasnov@kaspersky.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-09-05 16:23:09 -04:00
Arseny Krasnov 9af8f10616 virtio/vsock: rename 'EOR' to 'EOM' bit.
This current implemented bit is used to mark end of messages
('EOM' - end of message), not records('EOR' - end of record).
Also rename 'record' to 'message' in implementation as it is
different things.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210903123109.3273053-1-arseny.krasnov@kaspersky.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-05 16:23:08 -04:00
Cai Huoqing 23b228cb89 vhost scsi: Convert to SPDX identifier
use SPDX-License-Identifier instead of a verbose license text

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210821123320.734-1-caihuoqing@baidu.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-09-05 16:23:08 -04:00
Yunsheng Lin 723783d077 sock: remove one redundant SKB_FRAG_PAGE_ORDER macro
Both SKB_FRAG_PAGE_ORDER are defined to the same value in
net/core/sock.c and drivers/vhost/net.c.

Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h,
as both net/core/sock.c and drivers/vhost/net.c include it,
and it seems a reasonable file to put the macro.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:46:20 +01:00
Neeraj Upadhyay e74cfa91f4 vringh: Use wiov->used to check for read/write desc order
As __vringh_iov() traverses a descriptor chain, it populates
each descriptor entry into either read or write vring iov
and increments that iov's ->used member. So, as we iterate
over a descriptor chain, at any point, (riov/wriov)->used
value gives the number of descriptor enteries available,
which are to be read or written by the device. As all read
iovs must precede the write iovs, wiov->used should be zero
when we are traversing a read descriptor. Current code checks
for wiov->i, to figure out whether any previous entry in the
current descriptor chain was a write descriptor. However,
iov->i is only incremented, when these vring iovs are consumed,
at a later point, and remain 0 in __vringh_iov(). So, correct
the check for read and write descriptor order, to use
wiov->used.

Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Link: https://lore.kernel.org/r/1624591502-4827-1-git-send-email-neeraju@codeaurora.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-11 06:44:24 -04:00
Xie Yongji f7ad318ea0 vhost: Fix the calculation in vhost_overflow()
This fixes the incorrect calculation for integer overflow
when the last address of iova range is 0xffffffff.

Fixes: ec33d031a1 ("vhost: detect 32 bit integer wrap around")
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210728130756.97-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-11 06:44:15 -04:00
Xie Yongji 0e398290cf vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
The "msg->iova + msg->size" addition can have an integer overflow
if the iotlb message is from a malicious user space application.
So let's fix it.

Fixes: 1b48dc03e5 ("vhost: vdpa: report iova range")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210728130756.97-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-10 11:52:23 -04:00
Jason Wang 530a5678bc vdpa: support packed virtqueue for set/get_vq_state()
This patch extends the vdpa_vq_state to support packed virtqueue
state which is basically the device/driver ring wrap counters and the
avail and used index. This will be used for the virito-vdpa support
for the packed virtqueue and the future vhost/vhost-vdpa support for
the packed virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602021536.39525-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08 07:49:01 -04:00
Mike Christie d8f35f41e2 vhost: fix up vhost_work coding style
Switch from a mix of tabs and spaces to just tabs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20210525174733.6212-6-michael.christie@oracle.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:55 -04:00
Mike Christie efb18e1e50 vhost: fix poll coding style
We use 3 coding styles in this struct. Switch to just tabs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210525174733.6212-5-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:55 -04:00
Mike Christie d60146c161 vhost-scsi: reduce flushes during endpoint clearing
vhost_scsi_flush will flush everything, so we can clear the backends then
flush, then destroy. We don't need to flush before each vq destruction
because after the flush we will have made sure there can be no new cmds
started and there are no running cmds.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20210525174733.6212-4-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:54 -04:00
Mike Christie 31fbea3ab9 vhost-scsi: remove extra flushes
The vhost work flush function was flushing the entire work queue, so
there is no need for the double vhost_work_dev_flush calls in
vhost_scsi_flush.

And we do not need to call vhost_poll_flush for each poller because
that call also ends up flushing the same work queue thread the
vhost_work_dev_flush call flushed.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210525174733.6212-3-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:54 -04:00
Mike Christie 1465cb6117 vhost: remove work arg from vhost_work_flush
vhost_work_flush doesn't do anything with the work arg. This patch drops
it and then renames vhost_work_flush to vhost_work_dev_flush to reflect
that the function flushes all the works in the dev and not just a
specific queue or work item.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210525174733.6212-2-michael.christie@oracle.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:54 -04:00
Shaokun Zhang 7a43ce37cd vhost: Remove the repeated declaration
Function 'vhost_vring_ioctl' is declared twice, remove the repeated
declaration.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/1621857884-19964-1-git-send-email-zhangshaokun@hisilicon.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:53 -04:00
Stefano Garzarella 8693059284 vhost-iotlb: fix vhost_iotlb_del_range() documentation
Trivial change for the vhost_iotlb_del_range() documentation,
fixing the function name in the comment block.

Discovered with `make C=2 M=drivers/vhost`:
../drivers/vhost/iotlb.c:92: warning: expecting prototype for vring_iotlb_del_range(). Prototype was for vhost_iotlb_del_range() instead

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210504135444.158716-1-sgarzare@redhat.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:50 -04:00
Alexander Aring e3ae2365ef net: sock: introduce sk_error_report
This patch introduces a function wrapper to call the sk_error_report
callback. That will prepare to add additional handling whenever
sk_error_report is called, for example to trace socket errors.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29 11:28:21 -07:00
Arseny Krasnov ced7b71371 vhost/vsock: support SEQPACKET for transport
When received packet is copied to guests's rx queue, data buffers
of rx queue could be smaller that data buffer of input packet, so
data of input packet is copied to each rx buffer, thus each rx
buffer will be a packet with dynamically created header. Fields
of such header are initialized from header of input packet(except
length field which value is depends on number of bytes copied to
rx buffer). But in SEQPACKET case, we also need to take care of
record delimeter bit: if input packet has this bit set, we don't
copy it to header of packet in rx buffer, except case when such
rx buffer is last part of input packet. Otherwise, we will get
sequence of packets with delimeter bit set, thus braking record
bounds.
Also remove ignore of non-stream type of packets, handle SEQPACKET
feature bit.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 13:32:47 -07:00
Matteo Croce 224bf7db55 vhost_net: use XDP helpers
Make use of the xdp_{init,prepare}_buff() helpers instead of
an open-coded version.

Also, the field xdp->rxq was never set, so pass NULL to xdp_init_buff()
to clear it.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:20:10 -07:00
Linus Torvalds 16bb86b556 virtio,vhost,vdpa: features, fixes
A bunch of new drivers including vdpa support for block
 and virtio-vdpa. Beginning of vq kick (aka doorbell) mapping support.
 Misc fixes.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmCRBBEPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpiCIH/iNNTeyl4hZJ8IOTlqTagjZgUBYslpda66pU
 XfGKmXWpCGHYSw0XgbfHDyTZTCmdyq/b4FrxPgYrrEsQqztLIaGHyapHPcXEAThb
 +pHtcxqsQ8DGucJZpNU44M3kB13u07gauR540HyXzEqLXd5vEhG7dkClBjm67TWN
 SbJoEP3eNJMUezYuGsmUAGoi/M9NyCx+RiLd7roIlTxhIDW17PFNY0sIgG/sX6/s
 1MXng0l00EjawIu4OnWfjg6kZoa6se41Rpcwd7XluTZncYKnMTJGoxDwv0xoJl4I
 pI5OS+Ea6ENuuygmYMEl294I5E0QeaMGFpEYyO9sm764K5bLjVw=
 =x0Ot
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "A bunch of new drivers including vdpa support for block and
  virtio-vdpa.

  Beginning of vq kick (aka doorbell) mapping support.

  Misc fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  virtio_pci_modern: correct sparse tags for notify
  virtio_pci_modern: __force cast the notify mapping
  vDPA/ifcvf: get_config_size should return dev specific config size
  vDPA/ifcvf: enable Intel C5000X-PL virtio-block for vDPA
  vDPA/ifcvf: deduce VIRTIO device ID when probe
  vdpa_sim_blk: add support for vdpa management tool
  vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID
  vdpa_sim_blk: implement ramdisk behaviour
  vdpa: add vdpa simulator for block device
  vhost/vdpa: Remove the restriction that only supports virtio-net devices
  vhost/vdpa: use get_config_size callback in vhost_vdpa_config_validate()
  vdpa: add get_config_size callback in vdpa_config_ops
  vdpa_sim: cleanup kiovs in vdpasim_free()
  vringh: add vringh_kiov_length() helper
  vringh: implement vringh_kiov_advance()
  vringh: explain more about cleaning riov and wiov
  vringh: reset kiov 'consumed' field in __vringh_iov()
  vringh: add 'iotlb_lock' to synchronize iotlb accesses
  vdpa_sim: use iova module to allocate IOVA addresses
  vDPA/ifcvf: deduce VIRTIO device ID from pdev ids
  ...
2021-05-05 13:31:39 -07:00
Xie Yongji 9d6d97bff7 vhost/vdpa: Remove the restriction that only supports virtio-net devices
Since the config checks are done by the vDPA drivers, we can remove the
virtio-net restriction and we should be able to support all kinds of
virtio devices.

<linux/virtio_net.h> is not needed anymore, but we need to include
<linux/slab.h> to avoid compilation failures.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210315163450.254396-11-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-05-03 04:55:53 -04:00
Stefano Garzarella d6d8bb92fd vhost/vdpa: use get_config_size callback in vhost_vdpa_config_validate()
Let's use the new 'get_config_size()' callback available instead of
using the 'virtio_id' to get the size of the device config space.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210315163450.254396-10-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-05-03 04:55:53 -04:00
Stefano Garzarella b8c06ad4d6 vringh: implement vringh_kiov_advance()
In some cases, it may be useful to provide a way to skip a number
of bytes in a vringh_kiov.

Let's implement vringh_kiov_advance() for this purpose, reusing the
code from vringh_iov_xfer().
We replace that code calling the new vringh_kiov_advance().

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210315163450.254396-6-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-05-03 04:55:53 -04:00