It is possible that a guest can send a packet that contains a head + 18
slots and yet has a len <= XEN_NETBACK_TX_COPY_LEN. This causes nr_slots
to underflow in xenvif_get_requests() which then causes the subsequent
loop's termination condition to be wrong, causing a buffer overrun of
queue->tx_map_ops.
Rework the code to account for the extra frag_overflow slots.
This is CVE-2023-34319 / XSA-432.
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
These are cleanups for architecture specific header files:
- the comments in include/linux/syscalls.h have gone out of sync
and are really pointless, so these get removed
- The asm/bitsperlong.h header no longer needs to be architecture
specific on modern compilers, so use a generic version for newer
architectures that use new enough userspace compilers
- A cleanup for virt_to_pfn/virt_to_bus to have proper type
checking, forcing the use of pointers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSl138ACgkQYKtH/8kJ
UieqWxAA2WjNVfyuieYckglOVE0PZPs2fzCwyzTY5iUTH3gE5cBFWJDWcg2EnouG
v3X3htEQcowYWaCF9+rypQXaGiSx4WXi2Bjxnz3D/BcreqWPI4eSQ0fpGG5SURTY
2zYF72GTt4JGR++l+7/R9MZwPbwYDT9BsD5tkel8PxnyVLM6/c5xFvbjzRSKFE8x
SMN1jGZ62ITLNf/8coAOEPNxBYtDT6yQyu7P2sx5cd65LAQq9yLKjFklnBBovgWT
OoCIZAdGkhcNwOh1LjyHcdNdpfNJGceKyqKPqty07IhCQuF2jxiyFYFzuBbeyQfE
S0itN8o/MIfUmxaQl3e8dPAVb1RlNVr1zfQ6y4tUtWNdkNL2WwSnSQSRHrBfHxCQ
QCF++PMeFcLhGwMYtqdNJ7XGLQ0PsjD74pRf0vo+vjmqDk2BJsJBP57VU+8MJn5r
SoxqnJ0WxLvm1TfrNKusV7zMNWquc2duJDW40zsOssP4itjYELSI6qa56qmzlqmX
zKmRx6mxAlx9RRK8FHXFYHbz3p93vv8z9vTOZV3AjIjjED960CLknUAwCC8FoJyz
9b5wyMXsLQHQjGt8luAvPc6OiU0EiU9a4SPK+feWcv27serFvnjJlRTS/yG2Z3zd
BYsUgsXHypsdoud+aE7MeCy7fE8n3mhoyMQQRBkOMFJ7RsG6wAE=
=S/he
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are cleanups for architecture specific header files:
- the comments in include/linux/syscalls.h have gone out of sync and
are really pointless, so these get removed
- The asm/bitsperlong.h header no longer needs to be architecture
specific on modern compilers, so use a generic version for newer
architectures that use new enough userspace compilers
- A cleanup for virt_to_pfn/virt_to_bus to have proper type checking,
forcing the use of pointers"
* tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: Remove file path comments from headers
tools arch: Remove uapi bitsperlong.h of hexagon and microblaze
asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
m68k/mm: Make pfn accessors static inlines
arm64: memory: Make virt_to_pfn() a static inline
ARM: mm: Make virt_to_pfn() a static inline
asm-generic/page.h: Make pfn accessors static inlines
xen/netback: Pass (void *) to virt_to_page()
netfs: Pass a pointer to virt_to_page()
cifs: Pass a pointer to virt_to_page() in cifsglob
cifs: Pass a pointer to virt_to_page()
riscv: mm: init: Pass a pointer to virt_to_page()
ARC: init: Pass a pointer to virt_to_pfn() in init
m68k: Pass a pointer to virt_to_pfn() virt_to_page()
fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
virt_to_page() takes a virtual address as argument but
the driver passes an unsigned long, which works because
the target platform(s) uses polymorphic macros to calculate
the page.
Since many architectures implement virt_to_pfn() as
a macro, this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).
Fix this up by an explicit (void *) cast.
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Paul Durrant <paul@xen.org>
Cc: xen-devel@lists.xenproject.org
Cc: netdev@vger.kernel.org
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc() to do that.
net/bpf/test_run.c does not call skb_frag_off_set() to
set the offset, "copy_from_user(page_address(page), ...)"
and 'shinfo' being part of the 'data' kzalloced in
bpf_test_init() suggest that it is assuming offset to be
initialized as zero, so call skb_frag_fill_page_desc()
with offset being zero for this case.
Also, skb_frag_set_page() is not used anymore, so remove
it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Issue the same error message in case an illegal page boundary crossing
has been detected in both cases where this is tested.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20230329080259.14823-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The tests for the number of grant mapping or copy operations reaching
the array size of the operations buffer at the end of the main loop in
xenvif_tx_build_gops() isn't needed.
The loop can handle at maximum MAX_PENDING_REQS transfer requests, as
XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into
consideration, too.
Remove the tests.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix xenvif_get_requests() not to do grant copy operations across local
page boundaries. This requires to double the maximum number of copy
operations per queue, as each copy could now be split into 2.
Make sure that struct xenvif_tx_cb doesn't grow too large.
Cc: stable@vger.kernel.org
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Current release - regressions:
- phy: multiple fixes for EEE rework
- wifi: wext: warn about usage only once
- wifi: ath11k: allow system suspend to survive ath11k
Current release - new code bugs:
- mlx5: Fix memory leak in IPsec RoCE creation
- ibmvnic: assign XPS map to correct queue index
Previous releases - regressions:
- netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
- netfilter: ctnetlink: make event listener tracking global
- nf_tables: allow to fetch set elements when table has an owner
- mlx5:
- fix skb leak while fifo resync and push
- fix possible ptp queue fifo use-after-free
Previous releases - always broken:
- sched: fix action bind logic
- ptp: vclock: use mutex to fix "sleep on atomic" bug if driver
also uses a mutex
- netfilter: conntrack: fix rmmod double-free race
- netfilter: xt_length: use skb len to match in length_mt6,
avoid issues with BIG TCP
Misc:
- ice: remove unnecessary CONFIG_ICE_GNSS
- mlx5e: remove hairpin write debugfs files
- sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP9JgYACgkQMUZtbf5S
IrsIRRAApy4Hjb8z5z3k4HOM2lA3b/3OWD301I5YtoU3FC4L938yETAPFYUGbWrX
rKN4YOTNh2Fvkgbni7vz9hbC84C6i86Q9+u7dT1U+kCk3kbyQPFZlEDj5fY0I8zK
1xweCRrC1CcG74S2M5UO3UnWz1ypWQpTnHfWZqq0Duh1j9Xc+MHjHC2IKrGnzM6U
1/ODk9FrtsWC+KGJlWwiV+yJMYUA4nCKIS/NrmdRlBa7eoP0oC1xkA8g0kz3/P3S
O+xMyhExcZbMYY5VMkiGBZ5l8Ve3t6lHcMXq7jWlSCOeXd4Ut6zzojHlGZjzlCy9
RQQJzva2wlltqB9rECUQixpZbVS6ubf5++zvACOKONhSIEdpWjZW9K/qsV8igbfM
Xx0hsG1jCBt/xssRw2UBsq73vjNf1AkdksvqJgcggAvBJU8cV3MxRRB4/9lyPdmB
NNFqehwCeE3aU0FSBKoxZVYpfg+8J/XhwKT63Cc2d1ENetsWk/LxvkYm24aokpW+
nn+jUH9AYk3rFlBVQG1xsCwU4VlGk/yZgRwRMYFBqPkAGcXLZOnqdoSviBPN3yN0
Habs1hxToMt3QBgLJcMVn8CYdWCJgnZpxs8Mfo+PGoWKHzQ9kXBdyYyIZm1GyesD
BN/2QN38yMGXRALd2NXS2Va4ygX7KptB7+HsitdkzKCqcp1Ao+I=
=Ko4p
-----END PGP SIGNATURE-----
Merge tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and netfilter.
The notable fixes here are the EEE fix which restores boot for many
embedded platforms (real and QEMU); WiFi warning suppression and the
ICE Kconfig cleanup.
Current release - regressions:
- phy: multiple fixes for EEE rework
- wifi: wext: warn about usage only once
- wifi: ath11k: allow system suspend to survive ath11k
Current release - new code bugs:
- mlx5: Fix memory leak in IPsec RoCE creation
- ibmvnic: assign XPS map to correct queue index
Previous releases - regressions:
- netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
- netfilter: ctnetlink: make event listener tracking global
- nf_tables: allow to fetch set elements when table has an owner
- mlx5:
- fix skb leak while fifo resync and push
- fix possible ptp queue fifo use-after-free
Previous releases - always broken:
- sched: fix action bind logic
- ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also
uses a mutex
- netfilter: conntrack: fix rmmod double-free race
- netfilter: xt_length: use skb len to match in length_mt6, avoid
issues with BIG TCP
Misc:
- ice: remove unnecessary CONFIG_ICE_GNSS
- mlx5e: remove hairpin write debugfs files
- sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"
* tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
tcp: tcp_check_req() can be called from process context
net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard
xen-netback: remove unused variables pending_idx and index
net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy
net: dsa: ocelot_ext: remove unnecessary phylink.h include
net: mscc: ocelot: fix duplicate driver name error
net: dsa: felix: fix internal MDIO controller resource length
net: dsa: seville: ignore mscc-miim read errors from Lynx PCS
net/sched: act_sample: fix action bind logic
net/sched: act_mpls: fix action bind logic
net/sched: act_pedit: fix action bind logic
wifi: wext: warn about usage only once
wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue
qede: avoid uninitialized entries in coal_entry array
nfc: fix memory leak of se_io context in nfc_genl_se_io
ice: remove unnecessary CONFIG_ICE_GNSS
net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy()
ibmvnic: Assign XPS map to correct queue index
docs: net: fix inaccuracies in msg_zerocopy.rst
tools: net: add __pycache__ to gitignore
...
building with gcc and W=1 reports
drivers/net/xen-netback/netback.c:886:21: error: variable
‘pending_idx’ set but not used [-Werror=unused-but-set-variable]
886 | u16 pending_idx;
| ^~~~~~~~~~~
pending_idx is not used so remove it. Since index was only
used to set pending_idx, remove index as well.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230226163429.2351600-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY76ohgAKCRCAXGG7T9hj
vo8fAP0XJ94B7asqcN4W3EyeyfqxUf1eZvmWRhrbKqpLnmHLaQEA/uJBkXL49Zj7
TTcbxR1coJ/hPwhtmONU4TNtCZ+RXw0=
=2Ib5
-----END PGP SIGNATURE-----
Merge tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- two cleanup patches
- a fix of a memory leak in the Xen pvfront driver
- a fix of a locking issue in the Xen hypervisor console driver
* tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvcalls: free active map buffer on pvcalls_front_free_map
hvc/xen: lock console list traversal
x86/xen: Remove the unused function p2m_index()
xen: make remove callback of xen driver void returned
Since commit fc7a6209d5 ("bus: Make remove callback return void")
forces bus_type::remove be void-returned, it doesn't make much sense for
any bus based driver implementing remove callbalk to return non-void to
its caller.
This change is for xen bus based drivers.
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Link: https://lore.kernel.org/r/TYCP286MB23238119AB4DF190997075C9CAE39@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Juergen Gross <jgross@suse.com>
Commit ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in
the non-linear area") introduced a (valid) build warning. There have
even been reports of this problem breaking networking of Xen guests.
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So remove kfree_skb()
from the spin_lock_irqsave() section and use the already existing
"drop" label in xenvif_start_xmit() for dropping the SKB. At the
same time replace the dev_kfree_skb() call there with a call of
dev_kfree_skb_any(), as xenvif_start_xmit() can be called with
disabled interrupts.
This is XSA-424 / CVE-2022-42328 / CVE-2022-42329.
Fixes: be81992f90 ("xen/netback: don't queue unlimited number of packages")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
In some cases, the frontend may send a packet where the protocol headers
are spread across multiple slots. This would result in netback creating
an skb where the protocol headers spill over into the non-linear area.
Some drivers and NICs don't handle this properly resulting in an
interface reset or worse.
This issue was introduced by the removal of an unconditional skb pull in
the tx path to improve performance. Fix this without reintroducing the
pull by setting up grant copy ops for as many slots as needed to reach
the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle
multiple copy operations per skb.
This is XSA-423 / CVE-2022-3643.
Fixes: 7e5d775395 ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
struct ubuf_info will be changed, use ubuf_info_msgzc instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
use kstrdup instead of open-coding it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removing 'hotplug-status' in backend_disconnected() means that it will be
removed even in the case that the frontend unilaterally disconnects (which
it is free to do at any time). The consequence of this is that, when the
frontend attempts to re-connect, the backend gets stuck in 'InitWait'
rather than moving straight to 'Connected' (which it can do because the
hotplug script has already run).
Instead, the 'hotplug-status' mode should be removed in netback_remove()
i.e. when the vif really is going away.
Fixes: 0f4558ae91 ("Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)"
to compute headers length for a TCP packet, but others
use more convoluted (but equivalent) ways.
Add skb_tcp_all_headers() and skb_inner_tcp_all_headers()
helpers to harmonize this a bit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some unused macros and functions, make local functions static.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220608043726.9380-1-jgross@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 6fac592cca ("xen: update ring.h") missed to fix one use case
of RING_HAS_UNCONSUMED_REQUESTS().
Reported-by: Jan Beulich <jbeulich@suse.com>
Fixes: 6fac592cca ("xen: update ring.h")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20220530113459.20124-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Drop the special defines in a bunch of drivers where the
removal is relatively simple so grouping into one patch
does not impact reviewability.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2afeec08ab.
The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).
This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.
More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 1f2565780e.
The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.
Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.
More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case a guest isn't consuming incoming network traffic as fast as it
is coming in, xen-netback is buffering network packages in unlimited
numbers today. This can result in host OOM situations.
Commit f48da8b14d ("xen-netback: fix unlimited guest Rx internal
queue and carrier flapping") meant to introduce a mechanism to limit
the amount of buffered data by stopping the Tx queue when reaching the
data limit, but this doesn't work for cases like UDP.
When hitting the limit don't queue further SKBs, but drop them instead.
In order to be able to tell Rx packages have been dropped increment the
rx_dropped statistics counter in this case.
It should be noted that the old solution to continue queueing SKBs had
the additional problem of an overflow of the 32-bit rx_queue_len value
would result in intermittent Tx queue enabling.
This is part of XSA-392
Fixes: f48da8b14d ("xen-netback: fix unlimited guest Rx internal queue and carrier flapping")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Commit 1d5d485239 ("xen-netback: require fewer guest Rx slots when
not using GSO") introduced a security problem in netback, as an
interface would only be regarded to be stalled if no slot is available
in the rx queue ring page. In case the SKB at the head of the queued
requests will need more than one rx slot and only one slot is free the
stall detection logic will never trigger, as the test for that is only
looking for at least one slot to be free.
Fix that by testing for the needed number of slots instead of only one
slot being available.
In order to not have to take the rx queue lock that often, store the
number of needed slots in the queue data. As all SKB dequeue operations
happen in the rx queue kernel thread this is safe, as long as the
number of needed slots is accessed via READ/WRITE_ONCE() only and
updates are always done with the rx queue lock held.
Add a small helper for obtaining the number of free slots.
This is part of XSA-392
Fixes: 1d5d485239 ("xen-netback: require fewer guest Rx slots when not using GSO")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The variable err is being initialized with a value that is never read, it
is being updated immediately afterwards. The assignment is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When re-entering the main loop of xenvif_tx_check_gop() a 2nd time, the
special considerations for the head of the SKB no longer apply. Don't
mistakenly report ERROR to the frontend for the first entry in the list,
even if - from all I can tell - this shouldn't matter much as the overall
transmit will need to be considered failed anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do this in order to prevent the task from being freed if the thread
returns (which can be triggered by the frontend) before the call to
kthread_stop done as part of the backend tear down. Not taking the
reference will lead to a use-after-free in that scenario. Such
reference was taken before but dropped as part of the rework done in
2ac061ce97.
Reintroduce the reference taking and add a comment this time
explaining why it's needed.
This is XSA-374 / CVE-2021-28691.
Fixes: 2ac061ce97 ('xen/netback: cleanup init and deinit code')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
The logic in connect() is currently written with the assumption that
xenbus_watch_pathfmt() will return an error for a node that does not
exist. This assumption is incorrect: xenstore does allow a watch to
be registered for a nonexistent node (and will send notifications
should the node be subsequently created).
As of commit 1f2565780 ("xen-netback: remove 'hotplug-status' once it
has served its purpose"), this leads to a failure when a domU
transitions into XenbusStateConnected more than once. On the first
domU transition into Connected state, the "hotplug-status" node will
be deleted by the hotplug_status_changed() callback in dom0. On the
second or subsequent domU transition into Connected state, the
hotplug_status_changed() callback will therefore never be invoked, and
so the backend will remain stuck in InitWait.
This failure prevents scenarios such as reloading the xen-netfront
module within a domU, or booting a domU via iPXE. There is
unfortunately no way for the domU to work around this dom0 bug.
Fix by explicitly checking for existence of the "hotplug-status" node,
thereby creating the behaviour that was previously assumed to exist.
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.
2) TX skb error handling fix in mt76 driver, also from Felix.
3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.
4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
From Cong Wang.
5) Use correct printf format for dma_addr_t in ath11k, from Geert
Uytterhoeven.
6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
Hsieh.
7) Don't report truncated frames to mac80211 in mt76 driver, from
Lorenzop Bianconi.
8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.
9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.
10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
Arjun Roy.
11) Ignore routes with deleted nexthop object in mlxsw, from Ido
Schimmel.
12) Need to undo tcp early demux lookup sometimes in nf_nat, from
Florian Westphal.
13) Fix gro aggregation for udp encaps with zero csum, from Daniel
Borkmann.
14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
Donenfeld.
15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.
16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.
17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
Oltean.
18) Make sure textsearch copntrol block is large enough, from Wilem de
Bruijn.
19) Revert MAC changes to r8152 leading to instability, from Hates Wang.
20) Advance iov in 9p even for empty reads, from Jissheng Zhang.
21) Double hook unregister in nftables, from PabloNeira Ayuso.
22) Fix memleak in ixgbe, fropm Dinghao Liu.
23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.
24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
Tang.
25) Fix DOI refcount bugs in cipso, from Paul Moore.
26) One too many irqsave in ibmvnic, from Junlin Yang.
27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
Balazs Nemeth.
* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
s390/qeth: fix notification for pending buffers during teardown
s390/qeth: schedule TX NAPI on QAOB completion
s390/qeth: improve completion of pending TX buffers
s390/qeth: fix memory leak after failed TX Buffer allocation
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
net: check if protocol extracted by virtio_net_hdr_set_proto is correct
net: dsa: xrs700x: check if partner is same as port in hsr join
net: lapbether: Remove netif_start_queue / netif_stop_queue
atm: idt77252: fix null-ptr-dereference
atm: uPD98402: fix incorrect allocation
atm: fix a typo in the struct description
net: qrtr: fix error return code of qrtr_sendmsg()
mptcp: fix length of ADD_ADDR with port sub-option
net: bonding: fix error return code of bond_neigh_init()
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
net: enetc: set MAC RX FIFO to recommended value
net: davicom: Use platform_get_irq_optional()
net: davicom: Fix regulator not turned off on driver removal
net: davicom: Fix regulator not turned off on failed probe
net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
...
Commit 3194a1746e ("xen-netback: don't "handle" error by BUG()")
dropped respective a BUG_ON() without noticing that with this the
variable's value wouldn't be consumed anymore. With gnttab_set_map_op()
setting all status fields to a non-zero value, in case of an error no
slot should have a status of GNTST_okay (zero).
This is part of XSA-367.
Cc: <stable@vger.kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/d933f495-619a-0086-5fb4-1ec3cf81a8fc@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
shinfo already holds the result of skb_shinfo(skb) at this point - no
need to re-invoke the construct even twice.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYCu8dgAKCRCAXGG7T9hj
vuxTAP0S1iJ6DR5Y2pdSy2dfxn/gItNqUlR7vbFdxgf/mBSNxAD/fxbtVWM1GuTs
3Fwz0T60BcxsHZXhDcPAA2cjoqORbQs=
=2b0M
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"A series of Xen related security fixes, all related to limited error
handling in Xen backend drivers"
* tag 'for-linus-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-blkback: fix error handling in xen_blkbk_map()
xen-scsiback: don't "handle" error by BUG()
xen-netback: don't "handle" error by BUG()
xen-blkback: don't "handle" error by BUG()
xen/arm: don't ignore return errors from set_phys_to_machine
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
This is part of XSA-362.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
In order to support the possibility of per-device event channel
settings (e.g. lateeoi spurious event thresholds) add a xenbus device
pointer to struct irq_info() and modify the related event channel
binding interfaces to take the pointer to the xenbus device as a
parameter instead of the domain id of the other side.
While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of a common event for rx and tx queue the event should be
regarded to be spurious if no rx and no tx requests are pending.
Unfortunately the condition for testing that is wrong causing to
decide a event being spurious if no rx OR no tx requests are
pending.
Fix that plus using local variables for rx/tx pending indicators in
order to split function calls and if condition.
Fixes: 23025393db ("xen/netback: use lateeoi irq binding")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 23025393db ("xen/netback: use lateeoi irq binding")
xenvif_rx_ring_slots_available() is no longer called only from the rx
queue kernel thread, so it needs to access the rx queue with the
associated queue held.
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Fixes: 23025393db ("xen/netback: use lateeoi irq binding")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Link: https://lore.kernel.org/r/20210202070938.7863-1-jgross@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 'dev' not described in 'frontend_changed'
drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 'frontend_state' not described in 'frontend_changed'
drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 'dev' not described in 'netback_probe'
drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 'id' not described in 'netback_probe'
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear(). This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some code does not directly make 'xenbus_watch' object and call
'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This
commit adds support of 'will_handle' callback in the
'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
If handling logics of watch events are slower than the events enqueue
logic and the events can be created from the guests, the guests could
trigger memory pressure by intensively inducing the events, because it
will create a huge number of pending events that exhausting the memory.
Fortunately, some watch events could be ignored, depending on its
handler callback. For example, if the callback has interest in only one
single path, the watch wouldn't want multiple pending events. Or, some
watches could ignore events to same path.
To let such watches to volutarily help avoiding the memory pressure
situation, this commit introduces new watch callback, 'will_handle'. If
it is not NULL, it will be called for each new event just before
enqueuing it. Then, if the callback returns false, the event will be
discarded. No watch is using the callback for now, though.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving netfront use the lateeoi
irq binding for netback and unmask the event channel only just before
going to sleep waiting for new events.
Make sure not to issue an EOI when none is pending by introducing an
eoi_pending element to struct xenvif_queue.
When no request has been consumed set the spurious flag when sending
the EOI for an interrupt.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>