The bluetooth code uses our bitmap infrastructure for the two bits (!)
of connection setup flags, and in the process causes odd problems when
it converts between a bitmap and just the regular values of said bits.
It's completely pointless to do things like bitmap_to_arr32() to convert
a bitmap into a u32. It shoudln't have been a bitmap in the first
place. The reason to use bitmaps is if you have arbitrary number of
bits you want to manage (not two!), or if you rely on the atomicity
guarantees of the bitmap setting and clearing.
The code could use an "atomic_t" and use "atomic_or/andnot()" to set and
clear the bit values, but considering that it then copies the bitmaps
around with "bitmap_to_arr32()" and friends, there clearly cannot be a
lot of atomicity requirements.
So just use a regular integer.
In the process, this avoids the warnings about erroneous use of
bitmap_from_u64() which were triggered on 32-bit architectures when
conversion from a u64 would access two words (and, surprise, surprise,
only one word is needed - and indeed overkill - for a 2-bit bitmap).
That was always problematic, but the compiler seems to notice it and
warn about the invalid pattern only after commit 0a97953fd2 ("lib: add
bitmap_{from,to}_arr64") changed the exact implementation details of
'bitmap_from_u64()', as reported by Sudip Mukherjee and Stephen Rothwell.
Fixes: fe92ee6425 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Link: https://lore.kernel.org/all/YpyJ9qTNHJzz0FHY@debian/
Link: https://lore.kernel.org/all/20220606080631.0c3014f2@canb.auug.org.au/
Link: https://lore.kernel.org/all/20220605162537.1604762-1-yury.norov@gmail.com/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYprzPAAKCRCAXGG7T9hj
vuTzAQC4GiDXcD/cfLVcEqdyw1diCWZjuOfuznUqy5ZUBAZjvAD/draFHTeO96+k
qyZyzFggPIziaAOIUZ2DkJ/NqSAmbA8=
=dl1E
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.19-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:
"Two cleanup patches for Xen related code and (more important) an
update of MAINTAINERS for Xen, as Boris Ostrovsky decided to step
down"
* tag 'for-linus-5.19-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: replace xen_remap() with memremap()
MAINTAINERS: Update Xen maintainership
xen: switch gnttab_end_foreign_access() to take a struct page pointer
Current release - new code bugs:
- af_packet: make sure to pull the MAC header, avoid skb panic in GSO
- ptp_clockmatrix: fix inverted logic in is_single_shot()
- netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag
- dt-bindings: net: adin: fix adi,phy-output-clock description syntax
- wifi: iwlwifi: pcie: rename CAUSE macro, avoid MIPS build warning
Previous releases - regressions:
- Revert "net: af_key: add check for pfkey_broadcast in function
pfkey_process"
- tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd
- nf_tables: disallow non-stateful expression in sets earlier
- nft_limit: clone packet limits' cost value
- nf_tables: double hook unregistration in netns path
- ping6: fix ping -6 with interface name
Previous releases - always broken:
- sched: fix memory barriers to prevent skbs from getting stuck
in lockless qdiscs
- neigh: set lower cap for neigh_managed_work rearming, avoid
constantly scheduling the probe work
- bpf: fix probe read error on big endian in ___bpf_prog_run()
- amt: memory leak and error handling fixes
Misc:
- ipv6: expand & rename accept_unsolicited_na to accept_untracked_na
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKY9lMACgkQMUZtbf5S
IrtNvA//WCpG53NwSy8aV2X/0vkprVEuO8EQeIYhaw1R4KlVcqrITQcaLqkq/xL/
RUq6F/plftMSiuGRhTp/Sgbl0o0XgJkf769m4zQxz9NqWqgcw5kwJPu4Xq1nSM9t
/2qAFNDnXShxRiSYrI0qxQrmd0OUjtsibxsKRTSrrlvcd6zYfrx/7+QK5qpLMF9E
zJpBSYQm2R0RLGRith99G8w3WauhlprPaxyQ71ogQtBhTF+Eg7K+xEm2D5DKtyvj
7CLyrQtR0jyDBAt2ZPCh5D/yVPkNI1rigQ8m4uiW9DE6mk1DsxxY+DIOt5vQPBdR
x9Pq0qG54KS5sP18ABeNRQn4NWdkhVf/CcPkaRxHJdRs13mpQUATJRpZ3Ytd9Nt0
HW6Kby+zY6bdpUX8+UYdhcG6wbt0Lw8B+bSCjiqfE/CBbfUFA3L9/q/5Hk8Xbnxn
lCIk4asxQgpNhcZ+PAkZfFgE0GNDKnXDu1thO+q7/N9srZrrh9WQW5qoq5lexo8V
c01jRbPTKa64Gbvm+xDDGEwSl2uIRITtea284bL3q6lnI50n50dlLOAW0z5tmbEg
X9OHae5bMAdtvS5A1ForJaWA/Mj35ZqtGG5oj0WcGcLupVyec3rgaYaJtNvwgoDx
ptCQVIMLTAHXtZMohm0YrBizg0qbqmCd2c0/LB+3odX328YStJU=
=bWkn
-----END PGP SIGNATURE-----
Merge tag 'net-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Current release - new code bugs:
- af_packet: make sure to pull the MAC header, avoid skb panic in GSO
- ptp_clockmatrix: fix inverted logic in is_single_shot()
- netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag
- dt-bindings: net: adin: fix adi,phy-output-clock description syntax
- wifi: iwlwifi: pcie: rename CAUSE macro, avoid MIPS build warning
Previous releases - regressions:
- Revert "net: af_key: add check for pfkey_broadcast in function
pfkey_process"
- tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd
- nf_tables: disallow non-stateful expression in sets earlier
- nft_limit: clone packet limits' cost value
- nf_tables: double hook unregistration in netns path
- ping6: fix ping -6 with interface name
Previous releases - always broken:
- sched: fix memory barriers to prevent skbs from getting stuck in
lockless qdiscs
- neigh: set lower cap for neigh_managed_work rearming, avoid
constantly scheduling the probe work
- bpf: fix probe read error on big endian in ___bpf_prog_run()
- amt: memory leak and error handling fixes
Misc:
- ipv6: expand & rename accept_unsolicited_na to accept_untracked_na"
* tag 'net-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits)
net/af_packet: make sure to pull mac header
net: add debug info to __skb_pull()
net: CONFIG_DEBUG_NET depends on CONFIG_NET
stmmac: intel: Add RPL-P PCI ID
net: stmmac: use dev_err_probe() for reporting mdio bus registration failure
tipc: check attribute length for bearer name
ice: fix access-beyond-end in the switch code
nfp: remove padding in nfp_nfdk_tx_desc
ax25: Fix ax25 session cleanup problems
net: usb: qmi_wwan: Add support for Cinterion MV31 with new baseline
sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels
sfc/siena: fix considering that all channels have TX queues
socket: Don't use u8 type in uapi socket.h
net/sched: act_api: fix error code in tcf_ct_flow_table_fill_tuple_ipv6()
net: ping6: Fix ping -6 with interface name
macsec: fix UAF bug for real_dev
octeontx2-af: fix error code in is_valid_offset()
wifi: mac80211: fix use-after-free in chanctx code
bonding: guard ns_targets by CONFIG_IPV6
tcp: tcp_rtx_synack() can be called from process context
...
It makes little sense to debug networking stacks
if networking is not compiled in.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nothing in particular standing out, except perhaps that the fact that
the MDS never really maintained atime was made official and thus it's
no longer updated on the client either.
We also have a MAINTAINERS update: Jeff is transitioning his filesystem
maintainership duties to Xiubo.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmKYs1wTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi+PvCACIj47W4FapO672xcIkQ4920ZT1Jw/o
2BfKXUtNyVLpGgBlweJWSTd1tfXp0tl9MFg00t/zbVarHH0SGAgF1z6e/tM7rjA/
vyCkFQXJDuwB0kCbCtZ9xt5XIQkkvPPJOmyLSKYl7RqImch7pTRd5IwxgKGWqXDx
FraVXqFqvr8L+szV/JCopdxdMVTFixWRD48z5pPlOReaOXiGjtTMoFIBIPp7GqVL
UB7wyOtDmyzcGnUsRNqMQFrkUBsBW1IEDKf/yVtQNDjUxmr3uXm8vugeISpMOGBO
cCkZACDeO0lpgHrXSo4UCf46bg3/HujxZu0nTc9HqPDiFdOmKmf58N4n
=MAi2
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A big pile of assorted fixes and improvements for the filesystem with
nothing in particular standing out, except perhaps that the fact that
the MDS never really maintained atime was made official and thus it's
no longer updated on the client either.
We also have a MAINTAINERS update: Jeff is transitioning his
filesystem maintainership duties to Xiubo"
* tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client: (23 commits)
MAINTAINERS: move myself from ceph "Maintainer" to "Reviewer"
ceph: fix decoding of client session messages flags
ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE
ceph: remove redundant variable ino
ceph: try to queue a writeback if revoking fails
ceph: fix statfs for subdir mounts
ceph: fix possible deadlock when holding Fwb to get inline_data
ceph: redirty the page for writepage on failure
ceph: try to choose the auth MDS if possible for getattr
ceph: disable updating the atime since cephfs won't maintain it
ceph: flush the mdlog for filesystem sync
ceph: rename unsafe_request_wait()
libceph: use swap() macro instead of taking tmp variable
ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
ceph: no need to invalidate the fscache twice
ceph: replace usage of found with dedicated list iterator variable
ceph: use dedicated list iterator variable
ceph: update the dlease for the hashed dentry when removing
ceph: stop retrying the request when exceeding 256 times
ceph: stop forwarding the request when exceeding 256 times
...
There are session cleanup problems in ax25_release() and
ax25_disconnect(). If we setup a session and then disconnect,
the disconnected session is still in "LISTENING" state that
is shown below.
Active AX.25 sockets
Dest Source Device State Vr/Vs Send-Q Recv-Q
DL9SAU-4 DL9SAU-3 ??? LISTENING 000/000 0 0
DL9SAU-3 DL9SAU-4 ??? LISTENING 000/000 0 0
The first reason is caused by del_timer_sync() in ax25_release().
The timers of ax25 are used for correct session cleanup. If we use
ax25_release() to close ax25 sessions and ax25_dev is not null,
the del_timer_sync() functions in ax25_release() will execute.
As a result, the sessions could not be cleaned up correctly,
because the timers have stopped.
In order to solve this problem, this patch adds a device_up flag
in ax25_dev in order to judge whether the device is up. If there
are sessions to be cleaned up, the del_timer_sync() in
ax25_release() will not execute. What's more, we add ax25_cb_del()
in ax25_kill_by_device(), because the timers have been stopped
and there are no functions that could delete ax25_cb if we do not
call ax25_release(). Finally, we reorder the position of
ax25_list_lock in ax25_cb_del() in order to synchronize among
different functions that call ax25_cb_del().
The second reason is caused by improper check in ax25_disconnect().
The incoming ax25 sessions which ax25->sk is null will close
heartbeat timer, because the check "if(!ax25->sk || ..)" is
satisfied. As a result, the session could not be cleaned up properly.
In order to solve this problem, this patch changes the improper
check to "if(ax25->sk && ..)" in ax25_disconnect().
What`s more, the ax25_disconnect() may be called twice, which is
not necessary. For example, ax25_kill_by_device() calls
ax25_disconnect() and sets ax25->state to AX25_STATE_0, but
ax25_release() calls ax25_disconnect() again.
In order to solve this problem, this patch add a check in
ax25_release(). If the flag of ax25->sk equals to SOCK_DEAD,
the ax25_disconnect() in ax25_release() should not be executed.
Fixes: 82e31755e5 ("ax25: Fix UAF bugs in ax25 timers")
Fixes: 8a367e74c0 ("ax25: Fix segfault after sock connection timeout")
Reported-and-tested-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220530152158.108619-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Steffen Klassert says:
====================
ipsec 2022-06-01
1) Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process"
From Michal Kubecek.
2) Don't set IPv4 DF bit when encapsulating IPv6 frames below 1280 bytes.
From Maciej Żenczykowski.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: do not set IPv4 DF flag when encapsulating IPv6 frames <= 1280 bytes.
Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process"
====================
Link: https://lore.kernel.org/r/20220601103349.2297361-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
First set of fixes for v5.19. Build fixes for iwlwifi and libertas, a
scheduling while atomic fix for rtw88 and use-after-free fix for
mac80211.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmKXSCgRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZvWpAf9G7EudAQVOCfP4ogJ89m60U5S3ui4HbSG
5hyTwxUnzdrngYmykSEOsChX4inhZOPGiGBD1652sy82NQpD6e8lxRsBQUQeb8lm
RTn4PBY9sWVJo7Qe1oLXLv6yKApAdiUlVtWJNZDjotOZUxd7KubCQWcTz4lNQ/Mu
AG5zvWuCDP+McY9r6rqwhI3+ql/gsewbcONHd9nZufQMydDtxE39lk5GKFvPJgg2
F6sAYPP/O243nVDUOmg73NrnlU+2tWwamCUp8v4dSlQpfXRd+G65k5AKruz5AXmJ
zQ9rQkCe1aCvyPOtRa6Q2uAsi553NaCDudHHRgxV8+qP7phXWOpV/g==
=Cly8
-----END PGP SIGNATURE-----
Merge tag 'wireless-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v5.19
First set of fixes for v5.19. Build fixes for iwlwifi and libertas, a
scheduling while atomic fix for rtw88 and use-after-free fix for
mac80211.
* tag 'wireless-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix use-after-free in chanctx code
wifi: rtw88: add a work to correct atomic scheduling warning of ::set_tim
wifi: iwlwifi: pcie: rename CAUSE macro
wifi: libertas: use variable-size data in assoc req/resp cmd
====================
Link: https://lore.kernel.org/r/20220601110741.90B28C385A5@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tcf_ct_flow_table_fill_tuple_ipv6() function is supposed to return
false on failure. It should not return negatives because that means
succes/true.
Fixes: fcb6aa8653 ("act_ct: Support GRE offload")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Link: https://lore.kernel.org/r/YpYFnbDxFl6tQ3Bn@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When passing interface parameter to ping -6:
$ ping -6 ::11:141:84:9 -I eth2
Results in:
PING ::11:141:84:10(::11:141:84:10) from ::11:141:84:9 eth2: 56 data bytes
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
Initialize the fl6's outgoing interface (OIF) before triggering
ip6_datagram_send_ctl. Don't wipe fl6 after ip6_datagram_send_ctl() as
changes in fl6 that may happen in the function are overwritten explicitly.
Update comment accordingly.
Fixes: 13651224c0 ("net: ping6: support setting basic SOL_IPV6 options via cmsg")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220531084544.15126-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In ieee80211_vif_use_reserved_context(), when we have an
old context and the new context's replace_state is set to
IEEE80211_CHANCTX_REPLACE_NONE, we free the old context
in ieee80211_vif_use_reserved_reassign(). Therefore, we
cannot check the old_ctx anymore, so we should set it to
NULL after this point.
However, since the new_ctx replace state is clearly not
IEEE80211_CHANCTX_REPLACES_OTHER, we're not going to do
anything else in this function and can just return to
avoid accessing the freed old_ctx.
Cc: stable@vger.kernel.org
Fixes: 5bcae31d9c ("mac80211: implement multi-vif in-place reservations")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220601091926.df419d91b165.I17a9b3894ff0b8323ce2afdb153b101124c821e5@changeid
Laurent reported the enclosed report [1]
This bug triggers with following coditions:
0) Kernel built with CONFIG_DEBUG_PREEMPT=y
1) A new passive FastOpen TCP socket is created.
This FO socket waits for an ACK coming from client to be a complete
ESTABLISHED one.
2) A socket operation on this socket goes through lock_sock()
release_sock() dance.
3) While the socket is owned by the user in step 2),
a retransmit of the SYN is received and stored in socket backlog.
4) At release_sock() time, the socket backlog is processed while
in process context.
5) A SYNACK packet is cooked in response of the SYN retransmit.
6) -> tcp_rtx_synack() is called in process context.
Before blamed commit, tcp_rtx_synack() was always called from BH handler,
from a timer handler.
Fix this by using TCP_INC_STATS() & NET_INC_STATS()
which do not assume caller is in non preemptible context.
[1]
BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
caller is tcp_rtx_synack.part.0+0x36/0xc0
CPU: 10 PID: 2180 Comm: epollpep Tainted: G OE 5.16.0-0.bpo.4-amd64 #1 Debian 5.16.12-1~bpo11+1
Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
Call Trace:
<TASK>
dump_stack_lvl+0x48/0x5e
check_preemption_disabled+0xde/0xe0
tcp_rtx_synack.part.0+0x36/0xc0
tcp_rtx_synack+0x8d/0xa0
? kmem_cache_alloc+0x2e0/0x3e0
? apparmor_file_alloc_security+0x3b/0x1f0
inet_rtx_syn_ack+0x16/0x30
tcp_check_req+0x367/0x610
tcp_rcv_state_process+0x91/0xf60
? get_nohz_timer_target+0x18/0x1a0
? lock_timer_base+0x61/0x80
? preempt_count_add+0x68/0xa0
tcp_v4_do_rcv+0xbd/0x270
__release_sock+0x6d/0xb0
release_sock+0x2b/0x90
sock_setsockopt+0x138/0x1140
? __sys_getsockname+0x7e/0xc0
? aa_sk_perm+0x3e/0x1a0
__sys_setsockopt+0x198/0x1e0
__x64_sys_setsockopt+0x21/0x30
do_syscall_64+0x38/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220530213713.601888-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- New Features:
- Add support for 'dacl' and 'sacl' attributes
- Bugfixes and Cleanups:
- Fixes for reporting mapping errors
- Fixes for memory allocation errors
- Improve warning message when locks are lost
- Update documentation for the nfs4_unique_id parameter
- Add an explanation of NFSv4 client identifiers
- Ensure the i_size attribute is written to the fscache storage
- Fix freeing uninitialized nfs4_labels
- Better handling when xprtrdma bc_serv is NULL
- Marke qualified async operations as MOVEABLE tasks
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmKWhFQACgkQ18tUv7Cl
QOszjRAAllmtKLbzOkwQwcT3e5ljh9NEJ8NL+Nv1SXjozpFY+1fuXc0ivT4rniU6
68ZHz+faK2UtLytwOO94M0jo2RCAlYS5rfnts89CpdfP3bqmGPAj0Ytw/c/vg+Qf
4eQbAzz++T35DgU7cdeKKZKg9Wtwbq7g0kYv1W8QCiCbxakSjnc/V9Ll5XhS/CAC
1WqKD90TRKUkX0Y1NNsNdXB1dJn/6QAq9B6JTjan+2Rhn7/NCTU8p98mEZGcVD7r
cPHyXTqkPF4IH7lgjEMIRf6eXEzDDZNIs98QLdHJ2Gk0LxW7p7IL7VW8TKzYunvl
coA1bZfYhUZBUJ+eDrrKZ5hHMSn/+eNR5iiIcfqtyU8o3J0NXAXGlLh/iJSGsxIH
PjyjWSfpCgoZVPc4dG3lxR9Iu7UZeAuuB2ZoiNakUkd+UNKK5U5PpaPnYT6adaIp
TegivZclCmgyLQiAdPRifDzhaL5J2pp6kVb5iMY6oX+ObyclW/UcqzKMqIKSt3R8
6+JAmZ6633ojS4r3xFsw/dlEUWuuVq7kYwXK209LqiBn5vvjWNa/WgH4MaSfnJe9
rlw+fs8Aky0w59IhzRJMMVCJ/Q2EYDKmtQLQgYVw80RBFiFgBpMW0wDqMGiddTcu
1IZ2c5+t1GxfASpyu8miexQjRJW6A2MTp0gfHGiHarxdCpaAycA=
=0ccI
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add support for 'dacl' and 'sacl' attributes
Bugfixes and Cleanups:
- Fixes for reporting mapping errors
- Fixes for memory allocation errors
- Improve warning message when locks are lost
- Update documentation for the nfs4_unique_id parameter
- Add an explanation of NFSv4 client identifiers
- Ensure the i_size attribute is written to the fscache storage
- Fix freeing uninitialized nfs4_labels
- Better handling when xprtrdma bc_serv is NULL
- Mark qualified async operations as MOVEABLE tasks"
* tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1 mark qualified async operations as MOVEABLE tasks
xprtrdma: treat all calls not a bcall when bc_serv is NULL
NFSv4: Fix free of uninitialized nfs4_label on referral lookup.
NFS: Pass i_size to fscache_unuse_cookie() when a file is released
Documentation: Add an explanation of NFSv4 client identifiers
NFS: update documentation for the nfs4_unique_id parameter
NFS: Improve warning message when locks are lost.
NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes
NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes
NFSv4: Specify the type of ACL to cache
NFSv4: Don't hold the layoutget locks across multiple RPC calls
pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors
NFS: Further fixes to the writeback error handling
NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
NFS: Memory allocation failures are not server fatal errors
NFS: Don't report errors from nfs_pageio_complete() more than once
NFS: Do not report flush errors in nfs_write_end()
NFS: Don't report ENOSPC write errors twice
NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS
NFS: Do not report EINTR/ERESTARTSYS as mapping errors
For snat and dnat cases, the saddr should be taken from reverse tuple.
Fixes: 3412e16418 (netfilter: flowtable: nft_flow_route use more data for reverse route)
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nf_flow_table gets route through ip_route_output_key. If the saddr
is not local one, then FLOWI_FLAG_ANYSRC flag should be set. Without
this flag, the route lookup for other_dst will fail.
Fixes: 3412e16418 (netfilter: flowtable: nft_flow_route use more data for reverse route)
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
__nft_release_hooks() is called from pre_netns exit path which
unregisters the hooks, then the NETDEV_UNREGISTER event is triggered
which unregisters the hooks again.
[ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270
[...]
[ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27
[ 565.253682] Workqueue: netns cleanup_net
[ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270
[...]
[ 565.297120] Call Trace:
[ 565.300900] <TASK>
[ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables]
[ 565.308518] raw_notifier_call_chain+0x63/0x80
[ 565.312386] unregister_netdevice_many+0x54f/0xb50
Unregister and destroy netdev hook from netns pre_exit via kfree_rcu
so the NETDEV_UNREGISTER path see unregistered hooks.
Fixes: 767d1216bf ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
clean_net() runs in workqueue while walking over the lists, grab mutex.
Fixes: 767d1216bf ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add several sanity checks for nft_set_desc_concat_parse():
- validate desc->field_count not larger than desc->field_len array.
- field length cannot be larger than desc->field_len (ie. U8_MAX)
- total length of the concatenation cannot be larger than register array.
Joint work with Florian Westphal.
Fixes: f3a2181e16 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Reported-by: <zhangziming.zzm@antgroup.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
RFC 9131 changes default behaviour of handling RX of NA messages when the
corresponding entry is absent in the neighbour cache. The current
implementation is limited to accept just unsolicited NAs. However, the
RFC is more generic where it also accepts solicited NAs. Both types
should result in adding a STALE entry for this case.
Expand accept_untracked_na behaviour to also accept solicited NAs to
be compliant with the RFC and rename the sysctl knob to
accept_untracked_na.
Fixes: f9a2fb7331 ("net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131")
Signed-off-by: Arun Ajith S <aajith@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220530101414.65439-1-aajith@arista.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix the following build warning when CONFIG_IPV6 is not set:
In function ‘fortify_memcpy_chk’,
inlined from ‘tcp_md5_do_add’ at net/ipv4/tcp_ipv4.c:1210:2:
./include/linux/fortify-string.h:328:4: error: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
328 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: huhai <huhai@kylinos.cn>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220526101213.2392980-1-zhanggenjian@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmKSSbcTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXgJyCACeyMOcFws5lyqqdk0R0zGr2KFfKsJn
YQR9nvldT2p/1y0ykvU208UIq0HHmXOb9pD8gOUzGYGp4XlEaC1f4V37mmzgLcRu
vL/HcFqBl2cQEfaQxiXZrmsIIVszwbc57EGqpl93cS2er4hp/NXmredKCId7Mpt8
FjxjgVGzdhEUKbJZYjkDM5pYAnJ9QVwuK3MaarKMK86Oj1P5YtKgIb4ZSt/NHvsC
Mukx3nivSH29XfK3fRsFDJUQr9WNYh1cmTtyhB0tWVXQCYFc4angZRtCJwyXzkp2
P5GBIQoMZcXX2XWkUBTtA1w5g/aZZsBExb3YGhQjsQP+jb6MtDnvOEo9
=Z2E+
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Harden hv_sock driver (Andrea Parri)
- Harden Hyper-V PCI driver (Andrea Parri)
- Fix multi-MSI for Hyper-V PCI driver (Jeffrey Hugo)
- Fix Hyper-V PCI to reduce boot time (Dexuan Cui)
- Remove code for long EOL'ed Hyper-V versions (Michael Kelley, Saurabh
Sengar)
- Fix balloon driver error handling (Shradha Gupta)
- Fix a typo in vmbus driver (Julia Lawall)
- Ignore vmbus IMC device (Michael Kelley)
- Add a new error message to Hyper-V DRM driver (Saurabh Sengar)
* tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (28 commits)
hv_balloon: Fix balloon_probe() and balloon_remove() error handling
scsi: storvsc: Removing Pre Win8 related logic
Drivers: hv: vmbus: fix typo in comment
PCI: hv: Fix synchronization between channel callback and hv_pci_bus_exit()
PCI: hv: Add validation for untrusted Hyper-V values
PCI: hv: Fix interrupt mapping for multi-MSI
PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
drm/hyperv: Remove support for Hyper-V 2008 and 2008R2/Win7
video: hyperv_fb: Remove support for Hyper-V 2008 and 2008R2/Win7
scsi: storvsc: Remove support for Hyper-V 2008 and 2008R2/Win7
Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7
x86/hyperv: Disable hardlockup detector by default in Hyper-V guests
drm/hyperv: Add error message for fb size greater than allocated
PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time
PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
Drivers: hv: vmbus: Refactor the ring-buffer iterator functions
Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
hv_sock: Add validation for untrusted Hyper-V values
hv_sock: Copy packets sent by Hyper-V out of the ring buffer
hv_sock: Check hv_pkt_iter_first_raw()'s return value
...
Use ida_alloc()/ida_free() instead of deprecated
ida_simple_get()/ida_simple_remove() .
Signed-off-by: keliu <liuke94@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"struct smc_cdc_tx_pend **" can not directly convert
to "struct smc_wr_tx_pend_priv *".
Fixes: 2bced6aefa ("net/smc: put slot when connection is killed")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reports:
BUG: KASAN: slab-out-of-bounds in __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42
[..]
list_del include/linux/list.h:148 [inline]
cttimeout_net_exit+0x211/0x540 net/netfilter/nfnetlink_cttimeout.c:617
No reproducer so far. Looking at recent changes in this area
its clear that the free_head must not be at the end of the
structure because nf_ct_timeout structure has variable size.
Reported-by: <syzbot+92968395eedbdbd3617d@syzkaller.appspotmail.com>
Fixes: 78222bacfc ("netfilter: cttimeout: decouple unlink and free on netns destruction")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
syzbot reports following warn:
WARNING: CPU: 0 PID: 3600 at net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0 net/netfilter/nfnetlink.c:694
The syzbot generated program does this:
socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3
setsockopt(3, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, [1], 4) = 0
... which triggers 'WARN_ON_ONCE(nfnlnet->ctnetlink_listeners == 0)' check.
Instead of counting, just enable reporting for every bind request
and check if we still have listeners on unbind.
While at it, also add the needed bounds check on nfnl_group2type[]
access.
Reported-by: <syzbot+4903218f7fba0a2d6226@syzkaller.appspotmail.com>
Reported-by: <syzbot+afd2d80e495f96049571@syzkaller.appspotmail.com>
Fixes: 2794cdb0b9 ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of a virtual kernel address use a pointer of the associated
struct page as second parameter of gnttab_end_foreign_access().
Most users have that pointer available already and are creating the
virtual address from it, risking problems in case the memory is
located in highmem.
gnttab_end_foreign_access() itself won't need to get the struct page
from the address again.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static tools
- Clean the some of the kernel caps, reducing the historical stealth uAPI
leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core to
free them
- Several rc quality bug fixes for hfi1
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
=jgTl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static
tools
- Clean the some of the kernel caps, reducing the historical stealth
uAPI leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core
to free them
- Several rc quality bug fixes for hfi1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
RDMA/rtrs-clt: Fix one kernel-doc comment
RDMA/hfi1: Remove all traces of diagpkt support
RDMA/hfi1: Consolidate software versions
RDMA/hfi1: Remove pointless driver version
RDMA/hfi1: Fix potential integer multiplication overflow errors
RDMA/hfi1: Prevent panic when SDMA is disabled
RDMA/hfi1: Prevent use of lock before it is initialized
RDMA/rxe: Fix an error handling path in rxe_get_mcg()
IB/core: Fix typo in comment
RDMA/core: Fix typo in comment
IB/hf1: Fix typo in comment
IB/qib: Fix typo in comment
IB/iser: Fix typo in comment
RDMA/mlx4: Avoid flush_scheduled_work() usage
IB/isert: Avoid flush_scheduled_work() usage
RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
RDMA/irdma: Add SW mechanism to generate completions on error
...
We introduce "courteous server" in this release. Previously NFSD
would purge open and lock state for an unresponsive client after
one lease period (typically 90 seconds). Now, after one lease
period, another client can open and lock those files and the
unresponsive client's lease is purged; otherwise if the unrespon-
sive client's open and lock state is uncontended, the server retains
that open and lock state for up to 24 hours, allowing the client's
workload to resume after a lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error
to the client, but leave the newly created file in place as an
artifact. The file creation code path has been reorganized so that
internal failures and race conditions are less likely to result in
an unwanted file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems.
Many of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix
for an ancient "sleep while spin-locked" splat that seems to have
become easier to hit since v5.18-rc3.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKPliAACgkQM2qzM29m
f5dB3BAAorPa2L8xu5P1Ge1oTNogNSOVRkLPDzEkfEwK07ZM2qvz78eMZGkMziJ/
strorvBWl3SWBlVtTePgNpJUjgYQ75MRRwaX7Qh2WuHeRKm1JlZm0/NId3+zKgbh
N40QI20jdswWcNDuhidxVFFWurd09GlcM4z1cu8gZLbfthkiUOjZoPiLkXeNcvhk
7wC9GiueWxHefYQQDAKh1nQS/L0GG1EkzJdJo7WUVAldZ9qVY9LpmJVMRqrBBbta
XrFYfpeY1zFFDY4Qolyz5PUJSeQuDj9PctlhoZ6B1hp56PD/6yaqVhYXiPxtlALj
tITtktfiekULZkgfvfvyzssCv+wkbYiaEBZcSSCauR7dkGOmBmajO+cf7vpsERgE
fbCU8DWGk78SMeehdCrO+26cV37VP+8c2t2Txq/rG5Eq4ZoCi++Hj5poRboFLqb+
oom+0Ee0LfcAKXkxH5gWTPTblHo49GzGitPZtRzTgZ9uFnVwvEaJ4+t0ij0J8JpL
HuVtWrg5/REhqpEvOSwF0sRmkYWLTu7KdueGn/iZ8xUi7GHEue01NsVkClohKJcR
WOjWrbNCNF/LJaG88MX0z5u7IO7s9bOHphd7PJ92vR+4YsehW3uRhk+rNi2ZBqQz
hzULfu8BiaicV9fdB/hDcMmKQD6U6due2AVVPtxTf5XY+CHQNRY=
=phE1
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"We introduce 'courteous server' in this release. Previously NFSD would
purge open and lock state for an unresponsive client after one lease
period (typically 90 seconds). Now, after one lease period, another
client can open and lock those files and the unresponsive client's
lease is purged; otherwise if the unresponsive client's open and lock
state is uncontended, the server retains that open and lock state for
up to 24 hours, allowing the client's workload to resume after a
lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error to
the client, but leave the newly created file in place as an artifact.
The file creation code path has been reorganized so that internal
failures and race conditions are less likely to result in an unwanted
file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems. Many
of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix for
an ancient 'sleep while spin-locked' splat that seems to have become
easier to hit since v5.18-rc3"
* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
NFSD: nfsd_file_put() can sleep
NFSD: Add documenting comment for nfsd4_release_lockowner()
NFSD: Modernize nfsd4_release_lockowner()
NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd: destroy percpu stats counters after reply cache shutdown
nfsd: Fix null-ptr-deref in nfsd_fill_super()
nfsd: Unregister the cld notifier when laundry_wq create failed
SUNRPC: Use RMW bitops in single-threaded hot paths
NFSD: Clean up the show_nf_flags() macro
NFSD: Trace filecache opens
NFSD: Move documenting comment for nfsd4_process_open2()
NFSD: Fix whitespace
NFSD: Remove dprintk call sites from tail of nfsd4_open()
NFSD: Instantiate a struct file when creating a regular NFSv4 file
NFSD: Clean up nfsd_open_verified()
NFSD: Remove do_nfsd_create()
NFSD: Refactor NFSv4 OPEN(CREATE)
NFSD: Refactor NFSv3 CREATE
NFSD: Refactor nfsd_create_setattr()
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
...
When cloning a packet-based limit expression, copy the cost value as
well. Otherwise the new limit is not functional anymore.
Fixes: 3b9e2ea6c1 ("netfilter: nft_limit: move stateful fields out of expression data")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since 3e135cd499 ("netfilter: nft_dynset: dynamic stateful expression
instantiation"), it is possible to attach stateful expressions to set
elements.
cd5125d8f5 ("netfilter: nf_tables: split set destruction in deactivate
and destroy phase") introduces conditional destruction on the object to
accomodate transaction semantics.
nft_expr_init() calls expr->ops->init() first, then check for
NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful
lookup expressions which points to a set, which might lead to UAF since
the set is not properly detached from the set->binding for this case.
Anyway, this combination is non-sense from nf_tables perspective.
This patch fixes this problem by checking for NFT_STATEFUL_EXPR before
expr->ops->init() is called.
The reporter provides a KASAN splat and a poc reproducer (similar to
those autogenerated by syzbot to report use-after-free errors). It is
unknown to me if they are using syzbot or if they use similar automated
tool to locate the bug that they are reporting.
For the record, this is the KASAN splat.
[ 85.431824] ==================================================================
[ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20
[ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776
[ 85.434756]
[ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2
[ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Fixes: 0b2d8a7b63 ("netfilter: nf_tables: add helper functions for expression handling")
Reported-and-tested-by: Aaron Adams <edg-e@nccgroup.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yuwei reported that plain reuse of DELAY_PROBE_TIME to rearm work queue
in neigh_managed_work is problematic if user explicitly configures the
DELAY_PROBE_TIME to 0 for a neighbor table. Such misconfig can then hog
CPU to 100% processing the system work queue. Instead, set lower interval
bound to HZ which is totally sufficient. Yuwei is additionally looking
into making the interval separately configurable from DELAY_PROBE_TIME.
Reported-by: Yuwei Wang <wangyuweihx@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/797c3c53-ce1b-9f60-e253-cda615788f4a@iogearbox.net
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/3b8c5aa906c52c3a8c995d1b2e8ccf650ea7c716.1653432794.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the process of checking whether RDMAv2 is available, the current
implementation first sets ini->smcrv2.ib_dev_v2, and then allocates
smc buf desc and register rmb, but the latter may fail. In this case,
the pointer should be reset.
Fixes: e49300a6bf ("net/smc: add listen processing for SMC-Rv2")
Signed-off-by: liuyacan <liuyacan@corp.netease.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220525085408.812273-1-liuyacan@corp.netease.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core
----
- Support TCPv6 segmentation offload with super-segments larger than
64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
- Generalize skb freeing deferral to per-cpu lists, instead of
per-socket lists.
- Add a netdev statistic for packets dropped due to L2 address
mismatch (rx_otherhost_dropped).
- Continue work annotating skb drop reasons.
- Accept alternative netdev names (ALT_IFNAME) in more netlink
requests.
- Add VLAN support for AF_PACKET SOCK_RAW GSO.
- Allow receiving skb mark from the socket as a cmsg.
- Enable memcg accounting for veth queues, sysctl tables and IPv6.
BPF
---
- Add libbpf support for User Statically-Defined Tracing (USDTs).
- Speed up symbol resolution for kprobes multi-link attachments.
- Support storing typed pointers to referenced and unreferenced
objects in BPF maps.
- Add support for BPF link iterator.
- Introduce access to remote CPU map elements in BPF per-cpu map.
- Allow middle-of-the-road settings for the
kernel.unprivileged_bpf_disabled sysctl.
- Implement basic types of dynamic pointers e.g. to allow for
dynamically sized ringbuf reservations without extra memory copies.
Protocols
---------
- Retire port only listening_hash table, add a second bind table
hashed by port and address. Avoid linear list walk when binding
to very popular ports (e.g. 443).
- Add bridge FDB bulk flush filtering support allowing user space
to remove all FDB entries matching a condition.
- Introduce accept_unsolicited_na sysctl for IPv6 to implement
router-side changes for RFC9131.
- Support for MPTCP path manager in user space.
- Add MPTCP support for fallback to regular TCP for connections
that have never connected additional subflows or transmitted
out-of-sequence data (partial support for RFC8684 fallback).
- Avoid races in MPTCP-level window tracking, stabilize and improve
throughput.
- Support lockless operation of GRE tunnels with seq numbers enabled.
- WiFi support for host based BSS color collision detection.
- Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
- Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
- Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
- Allow matching on the number of VLAN tags via tc-flower.
- Add tracepoint for tcp_set_ca_state().
Driver API
----------
- Improve error reporting from classifier and action offload.
- Add support for listing line cards in switches (devlink).
- Add helpers for reporting page pool statistics with ethtool -S.
- Add support for reading clock cycles when using PTP virtual clocks,
instead of having the driver convert to time before reporting.
This makes it possible to report time from different vclocks.
- Support configuring low-latency Tx descriptor push via ethtool.
- Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
New hardware / drivers
----------------------
- Ethernet:
- Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
- Sunplus SP7021 SoC (sp7021_emac)
- Add support for Renesas RZ/V2M (in ravb)
- Add support for MediaTek mt7986 switches (in mtk_eth_soc)
- Ethernet PHYs:
- ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
- TI DP83TD510 PHY
- Microchip LAN8742/LAN88xx PHYs
- WiFi:
- Driver for pureLiFi X, XL, XC devices (plfxlc)
- Driver for Silicon Labs devices (wfx)
- Support for WCN6750 (in ath11k)
- Support Realtek 8852ce devices (in rtw89)
- Mobile:
- MediaTek T700 modems (Intel 5G 5000 M.2 cards)
- CAN:
- ctucanfd: add support for CTU CAN FD open-source IP core
from Czech Technical University in Prague
Drivers
-------
- Delete a number of old drivers still using virt_to_bus().
- Ethernet NICs:
- intel: support TSO on tunnels MPLS
- broadcom: support multi-buffer XDP
- nfp: support VF rate limiting
- sfc: use hardware tx timestamps for more than PTP
- mlx5: multi-port eswitch support
- hyper-v: add support for XDP_REDIRECT
- atlantic: XDP support (including multi-buffer)
- macb: improve real-time perf by deferring Tx processing to NAPI
- High-speed Ethernet switches:
- mlxsw: implement basic line card information querying
- prestera: add support for traffic policing on ingress and egress
- Embedded Ethernet switches:
- lan966x: add support for packet DMA (FDMA)
- lan966x: add support for PTP programmable pins
- ti: cpsw_new: enable bc/mc storm prevention
- Qualcomm 802.11ax WiFi (ath11k):
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- enable keep-alive during WoWLAN suspend
- implement remain-on-channel support
- MediaTek WiFi (mt76):
- support Wireless Ethernet Dispatch offloading packet movement
between the Ethernet switch and WiFi interfaces
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 IPv6 NS offload support
- Ethernet PHYs:
- micrel: ksz9031/ksz9131: cabletest support
- lan87xx: SQI support for T1 PHYs
- lan937x: add interrupt support for link detection
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKNMPQACgkQMUZtbf5S
IrsRARAAuDyYs6jFYB3p+xazZdOnbF4iAgVv71+DQGvmsCl6CB9OrsNZMlvE85OL
Q3gjcRbgjrkN4lhgI8DmiGYbsUJnAvVjFdNjccz1Z/vTLYvuIM0ol54MUp5S+9WY
StncOJkOGJxxR/Gi5gzVmejPDsysU3Jik+hm/fpIcz8pybXxAsFKU5waY5qfl+/T
TZepfV0VCfqRDjqcF1qA5+jJZNU8pdodQlZ1+mh8bwu6Jk1ZkWkj6Ov8MWdwQldr
LnPeK/9hIGzkdJYHZfajxA3t8D0K5CHzSuih2bJ9ry8ZXgVBkXEThew778/R5izW
uB0YZs9COFlrIP7XHjtRTy/2xHOdYIPlj2nWhVdfuQDX8Crvt4VRN6EZ1rjko1ZJ
WanfG6WHF8NH5pXBRQbh3kIMKBnYn6OIzuCfCQSqd+niHcxFIM4vRiggeXI5C5TW
vJgEWfK6X+NfDiFVa3xyCrEmp5ieA/pNecpwd8rVkql+MtFAAw4vfsotLKOJEAru
J/XL6UE+YuLqIJV9ACZ9x1AFXXAo661jOxBunOo4VXhXVzWS9lYYz5r5ryIkgT/8
/Fr0zjANJWgfIuNdIBtYfQ4qG+LozGq038VA06RhFUAZ5tF9DzhqJs2Q2AFuWWBC
ewCePJVqo1j2Ceq2mGonXRt47OEnlePoOxTk9W+cKZb7ZWE+zEo=
=Wjii
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Support TCPv6 segmentation offload with super-segments larger than
64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
- Generalize skb freeing deferral to per-cpu lists, instead of
per-socket lists.
- Add a netdev statistic for packets dropped due to L2 address
mismatch (rx_otherhost_dropped).
- Continue work annotating skb drop reasons.
- Accept alternative netdev names (ALT_IFNAME) in more netlink
requests.
- Add VLAN support for AF_PACKET SOCK_RAW GSO.
- Allow receiving skb mark from the socket as a cmsg.
- Enable memcg accounting for veth queues, sysctl tables and IPv6.
BPF
---
- Add libbpf support for User Statically-Defined Tracing (USDTs).
- Speed up symbol resolution for kprobes multi-link attachments.
- Support storing typed pointers to referenced and unreferenced
objects in BPF maps.
- Add support for BPF link iterator.
- Introduce access to remote CPU map elements in BPF per-cpu map.
- Allow middle-of-the-road settings for the
kernel.unprivileged_bpf_disabled sysctl.
- Implement basic types of dynamic pointers e.g. to allow for
dynamically sized ringbuf reservations without extra memory copies.
Protocols
---------
- Retire port only listening_hash table, add a second bind table
hashed by port and address. Avoid linear list walk when binding to
very popular ports (e.g. 443).
- Add bridge FDB bulk flush filtering support allowing user space to
remove all FDB entries matching a condition.
- Introduce accept_unsolicited_na sysctl for IPv6 to implement
router-side changes for RFC9131.
- Support for MPTCP path manager in user space.
- Add MPTCP support for fallback to regular TCP for connections that
have never connected additional subflows or transmitted
out-of-sequence data (partial support for RFC8684 fallback).
- Avoid races in MPTCP-level window tracking, stabilize and improve
throughput.
- Support lockless operation of GRE tunnels with seq numbers enabled.
- WiFi support for host based BSS color collision detection.
- Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
- Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
- Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
- Allow matching on the number of VLAN tags via tc-flower.
- Add tracepoint for tcp_set_ca_state().
Driver API
----------
- Improve error reporting from classifier and action offload.
- Add support for listing line cards in switches (devlink).
- Add helpers for reporting page pool statistics with ethtool -S.
- Add support for reading clock cycles when using PTP virtual clocks,
instead of having the driver convert to time before reporting. This
makes it possible to report time from different vclocks.
- Support configuring low-latency Tx descriptor push via ethtool.
- Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
New hardware / drivers
----------------------
- Ethernet:
- Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
- Sunplus SP7021 SoC (sp7021_emac)
- Add support for Renesas RZ/V2M (in ravb)
- Add support for MediaTek mt7986 switches (in mtk_eth_soc)
- Ethernet PHYs:
- ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
- TI DP83TD510 PHY
- Microchip LAN8742/LAN88xx PHYs
- WiFi:
- Driver for pureLiFi X, XL, XC devices (plfxlc)
- Driver for Silicon Labs devices (wfx)
- Support for WCN6750 (in ath11k)
- Support Realtek 8852ce devices (in rtw89)
- Mobile:
- MediaTek T700 modems (Intel 5G 5000 M.2 cards)
- CAN:
- ctucanfd: add support for CTU CAN FD open-source IP core from
Czech Technical University in Prague
Drivers
-------
- Delete a number of old drivers still using virt_to_bus().
- Ethernet NICs:
- intel: support TSO on tunnels MPLS
- broadcom: support multi-buffer XDP
- nfp: support VF rate limiting
- sfc: use hardware tx timestamps for more than PTP
- mlx5: multi-port eswitch support
- hyper-v: add support for XDP_REDIRECT
- atlantic: XDP support (including multi-buffer)
- macb: improve real-time perf by deferring Tx processing to NAPI
- High-speed Ethernet switches:
- mlxsw: implement basic line card information querying
- prestera: add support for traffic policing on ingress and egress
- Embedded Ethernet switches:
- lan966x: add support for packet DMA (FDMA)
- lan966x: add support for PTP programmable pins
- ti: cpsw_new: enable bc/mc storm prevention
- Qualcomm 802.11ax WiFi (ath11k):
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- enable keep-alive during WoWLAN suspend
- implement remain-on-channel support
- MediaTek WiFi (mt76):
- support Wireless Ethernet Dispatch offloading packet movement
between the Ethernet switch and WiFi interfaces
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 IPv6 NS offload support
- Ethernet PHYs:
- micrel: ksz9031/ksz9131: cabletest support
- lan87xx: SQI support for T1 PHYs
- lan937x: add interrupt support for link detection"
* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
ptp: ocp: Add firmware header checks
ptp: ocp: fix PPS source selector debugfs reporting
ptp: ocp: add .init function for sma_op vector
ptp: ocp: vectorize the sma accessor functions
ptp: ocp: constify selectors
ptp: ocp: parameterize input/output sma selectors
ptp: ocp: revise firmware display
ptp: ocp: add Celestica timecard PCI ids
ptp: ocp: Remove #ifdefs around PCI IDs
ptp: ocp: 32-bit fixups for pci start address
Revert "net/smc: fix listen processing for SMC-Rv2"
ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
selftests/bpf: Dynptr tests
bpf: Add dynptr data slices
bpf: Add bpf_dynptr_read and bpf_dynptr_write
bpf: Dynptr support for ring buffers
bpf: Add bpf_dynptr_from_mem for local dynptrs
bpf: Add verifier support for dynptrs
bpf: Suppress 'passing zero to PTR_ERR' warning
bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
...
Fix the following coccicheck warning:
net/ceph/crush/mapper.c:1077:8-9: WARNING opportunity for swap()
by using swap() for the swapping of variable values and drop
the tmp variable that is not needed any more.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This KUnit update for Linux 5.19-rc1 consists of several fixes, cleanups,
and enhancements to tests and framework:
- introduces _NULL and _NOT_NULL macros to pointer error checks
- reworks kunit_resource allocation policy to fix memory leaks when
caller doesn't specify free() function to be used when allocating
memory using kunit_add_resource() and kunit_alloc_resource() funcs.
- adds ability to specify suite-level init and exit functions
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmKLw4QACgkQCwJExA0N
Qxz9wRAA3PonJESDAFF2sXTDzQurEXdWoJHqNvO0JCObku8SDODEI7nozXOD0MBC
ASAXiX3HuNI0yESF27xECqu3xbe8KsYOtCN8vco/sYUroVGmzgAt/atsvrSUv2Oh
sEQbjrTMwkMUjL5ECvjR2dArd6bQew7PPBkl3HqOpyysL3b/EAMEAY0DmDXrrrwB
+oNvXGVAR1Tczg4ahcSSwDdZl1C41kREj5f8S/4+kohMdIjCUPWOAYnaWHpVdAOJ
C+LWkPSJ5IpgjU2urDX2kNfg32UxIJpFI009ovytBmwCbd+GEs24u7gtgtksPM2s
YypoPEqC40gxkbY99omojtADiDdZlKqlIipCTWYe/CpzgBD+WQ4PVqMGM4ZprP9w
Hrc6ulVmd8hZ4F9QQ3oN6W9L6pBCgdXtPPCsQtGoUTbw7r79BP67PjJ6Ko+usn3s
Jy0FR5LvzYBjykoJzKSIaJ8ONaX34DB6w5rB+q5mBGwPKPHWo3eAZVZDPEMVo3Z7
D9TW5UliGBt2y5YJZbPbSnhdJPMPHSK5ef9hIy0wYjVJFafirdgrQhgbWbVxalRT
eZz1edcs1sdU7GAzfMA/v+NqAAA3bFIUVr2b+GTc+4zzWhq+cwI2SNikgyhETv/f
xKq8Xek8EkOIdaa2lu9chTPT4sG7A6991EkRqfc7rL1IptkPiS8=
=DzVQ
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"Several fixes, cleanups, and enhancements to tests and framework:
- introduce _NULL and _NOT_NULL macros to pointer error checks
- rework kunit_resource allocation policy to fix memory leaks when
caller doesn't specify free() function to be used when allocating
memory using kunit_add_resource() and kunit_alloc_resource() funcs.
- add ability to specify suite-level init and exit functions"
* tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (41 commits)
kunit: tool: Use qemu-system-i386 for i386 runs
kunit: fix executor OOM error handling logic on non-UML
kunit: tool: update riscv QEMU config with new serial dependency
kcsan: test: use new suite_{init,exit} support
kunit: tool: Add list of all valid test configs on UML
kunit: take `kunit_assert` as `const`
kunit: tool: misc cleanups
kunit: tool: minor cosmetic cleanups in kunit_parser.py
kunit: tool: make parser stop overwriting status of suites w/ no_tests
kunit: tool: remove dead parse_crash_in_log() logic
kunit: tool: print clearer error message when there's no TAP output
kunit: tool: stop using a shell to run kernel under QEMU
kunit: tool: update test counts summary line format
kunit: bail out of test filtering logic quicker if OOM
lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS
kunit: Rework kunit_resource allocation policy
kunit: fix debugfs code to use enum kunit_status, not bool
kfence: test: use new suite_{init/exit} support, add .kunitconfig
kunit: add ability to specify suite-level init and exit functions
kunit: rename print_subtest_{start,end} for clarity (s/subtest/suite)
...
One may want to have DF set on large packets to support discovering
path mtu and limiting the size of generated packets (hence not
setting the XFRM_STATE_NOPMTUDISC tunnel flag), while still
supporting networks that are incapable of carrying even minimal
sized IPv6 frames (post encapsulation).
Having IPv4 Don't Frag bit set on encapsulated IPv6 frames that
are not larger than the minimum IPv6 mtu of 1280 isn't useful,
because the resulting ICMP Fragmentation Required error isn't
actionable (even assuming you receive it) because IPv6 will not
drop it's path mtu below 1280 anyway. While the IPv4 stack
could prefrag the packets post encap, this requires the ICMP
error to be successfully delivered and causes a loss of the
original IPv6 frame (thus requiring a retransmit and latency
hit). Luckily with IPv4 if we simply don't set the DF flag,
we'll just make further fragmenting the packets some other
router's problems.
We'll still learn the correct IPv4 path mtu through encapsulation
of larger IPv6 frames.
I'm still not convinced this patch is entirely sufficient to make
everything happy... but I don't see how it could possibly
make things worse.
See also recent:
4ff2980b6b 'xfrm: fix tunnel model fragmentation behavior'
and friends
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lina Wang <lina.wang@mediatek.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Maciej Zenczykowski <maze@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This reverts commit 4dc2a5a8f6.
A non-zero return value from pfkey_broadcast() does not necessarily mean
an error occurred as this function returns -ESRCH when no registered
listener received the message. In particular, a call with
BROADCAST_PROMISC_ONLY flag and null one_sk argument can never return
zero so that this commit in fact prevents processing any PF_KEY message.
One visible effect is that racoon daemon fails to find encryption
algorithms like aes and refuses to start.
Excluding -ESRCH return value would fix this but it's not obvious that
we really want to bail out here and most other callers of
pfkey_broadcast() also ignore the return value. Also, as pointed out by
Steffen Klassert, PF_KEY is kind of deprecated and newer userspace code
should use netlink instead so that we should only disturb the code for
really important fixes.
v2: add a comment explaining why is the return value ignored
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmKKpM8ACgkQSfxwEqXe
A6726w/+OJimGd4arvpSmdn+vxepSyDLgKfwM0x5zprRVd16xg8CjJr4eMonTesq
YvtJRqpetb53MB+sMhutlvQqQzrjtf2MBkgPwF4I2gUrk7vLD45Q+AGdGhi/rUwz
wHGA7xg1FHLHia2M/9idSqi8QlZmUP4u4l5ZnMyTUHiwvRD6XOrWKfqvUSawNzyh
hCWlTUxDrjizsW5YpsJX/MkRadSC8loJEk5ByZebow6nRPfurJvqfrcOMgHyNrbY
pOZ/CGPxcetMqotL2TuuJt5wKmenqYhIWGAp3YM2SWWgU2ueBZekW8AYeMfgUcvh
LWV93RpSuAnE5wsdjIULvjFnEDJBf8ihfMnMrd9G5QjQu44tuKWfY2MghLSpYzaR
V6UFbRmhrqhqiStHQXOvk1oqxtpbHlc9zzJLmvPmDJcbvzXQ9Opk5GVXAmdtnHnj
M/ty3wGWxucY6mHqT8MkCShSSslbgEtc1pEIWHdrUgnaiSVoCVBEO+9LqLbjvOTm
XA/6YtoiCE5FasK51pir1zVb2GORQn0v8HnuAOsusD/iPAlRQ/G5jZkaXbwRQI6j
atYL1svqvSKn5POnzqAlMUXfMUr19K5xqJdp7i6qmlO1Vq6Z+tWbCQgD1JV+Wjkb
CMyvXomFCFu4aYKGRE2SBRnWLRghG3kYHqEQ15yTPMQerxbUDNg=
=SUr3
-----END PGP SIGNATURE-----
Merge tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"These updates continue to refine the work began in 5.17 and 5.18 of
modernizing the RNG's crypto and streamlining and documenting its
code.
New for 5.19, the updates aim to improve entropy collection methods
and make some initial decisions regarding the "premature next" problem
and our threat model. The cloc utility now reports that random.c is
931 lines of code and 466 lines of comments, not that basic metrics
like that mean all that much, but at the very least it tells you that
this is very much a manageable driver now.
Here's a summary of the various updates:
- The random_get_entropy() function now always returns something at
least minimally useful. This is the primary entropy source in most
collectors, which in the best case expands to something like RDTSC,
but prior to this change, in the worst case it would just return 0,
contributing nothing. For 5.19, additional architectures are wired
up, and architectures that are entirely missing a cycle counter now
have a generic fallback path, which uses the highest resolution
clock available from the timekeeping subsystem.
Some of those clocks can actually be quite good, despite the CPU
not having a cycle counter of its own, and going off-core for a
stamp is generally thought to increase jitter, something positive
from the perspective of entropy gathering. Done very early on in
the development cycle, this has been sitting in next getting some
testing for a while now and has relevant acks from the archs, so it
should be pretty well tested and fine, but is nonetheless the thing
I'll be keeping my eye on most closely.
- Of particular note with the random_get_entropy() improvements is
MIPS, which, on CPUs that lack the c0 count register, will now
combine the high-speed but short-cycle c0 random register with the
lower-speed but long-cycle generic fallback path.
- With random_get_entropy() now always returning something useful,
the interrupt handler now collects entropy in a consistent
construction.
- Rather than comparing two samples of random_get_entropy() for the
jitter dance, the algorithm now tests many samples, and uses the
amount of differing ones to determine whether or not jitter entropy
is usable and how laborious it must be. The problem with comparing
only two samples was that if the cycle counter was extremely slow,
but just so happened to be on the cusp of a change, the slowness
wouldn't be detected. Taking many samples fixes that to some
degree.
This, combined with the other improvements to random_get_entropy(),
should make future unification of /dev/random and /dev/urandom
maybe more possible. At the very least, were we to attempt it again
today (we're not), it wouldn't break any of Guenter's test rigs
that broke when we tried it with 5.18. So, not today, but perhaps
down the road, that's something we can revisit.
- We attempt to reseed the RNG immediately upon waking up from system
suspend or hibernation, making use of the various timestamps about
suspend time and such available, as well as the usual inputs such
as RDRAND when available.
- Batched randomness now falls back to ordinary randomness before the
RNG is initialized. This provides more consistent guarantees to the
types of random numbers being returned by the various accessors.
- The "pre-init injection" code is now gone for good. I suspect you
in particular will be happy to read that, as I recall you
expressing your distaste for it a few months ago. Instead, to avoid
a "premature first" issue, while still allowing for maximal amount
of entropy availability during system boot, the first 128 bits of
estimated entropy are used immediately as it arrives, with the next
128 bits being buffered. And, as before, after the RNG has been
fully initialized, it winds up reseeding anyway a few seconds later
in most cases. This resulted in a pretty big simplification of the
initialization code and let us remove various ad-hoc mechanisms
like the ugly crng_pre_init_inject().
- The RNG no longer pretends to handle the "premature next" security
model, something that various academics and other RNG designs have
tried to care about in the past. After an interesting mailing list
thread, these issues are thought to be a) mainly academic and not
practical at all, and b) actively harming the real security of the
RNG by delaying new entropy additions after a potential compromise,
making a potentially bad situation even worse. As well, in the
first place, our RNG never even properly handled the premature next
issue, so removing an incomplete solution to a fake problem was
particularly nice.
This allowed for numerous other simplifications in the code, which
is a lot cleaner as a consequence. If you didn't see it before,
https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
thread worth skimming through.
- While the interrupt handler received a separate code path years ago
that avoids locks by using per-cpu data structures and a faster
mixing algorithm, in order to reduce interrupt latency, input and
disk events that are triggered in hardirq handlers were still
hitting locks and more expensive algorithms. Those are now
redirected to use the faster per-cpu data structures.
- Rather than having the fake-crypto almost-siphash-based random32
implementation be used right and left, and in many places where
cryptographically secure randomness is desirable, the batched
entropy code is now fast enough to replace that.
- As usual, numerous code quality and documentation cleanups. For
example, the initialization state machine now uses enum symbolic
constants instead of just hard coding numbers everywhere.
- Since the RNG initializes once, and then is always initialized
thereafter, a pretty heavy amount of code used during that
initialization is never used again. It is now completely cordoned
off using static branches and it winds up in the .text.unlikely
section so that it doesn't reduce cache compactness after the RNG
is ready.
- A variety of functions meant for waiting on the RNG to be
initialized were only used by vsprintf, and in not a particularly
optimal way. Replacing that usage with a more ordinary setup made
it possible to remove those functions.
- A cleanup of how we warn userspace about the use of uninitialized
/dev/urandom and uninitialized get_random_bytes() usage.
Interestingly, with the change you merged for 5.18 that attempts to
use jitter (but does not block if it can't), the majority of users
should never see those warnings for /dev/urandom at all now, and
the one for in-kernel usage is mainly a debug thing.
- The file_operations struct for /dev/[u]random now implements
.read_iter and .write_iter instead of .read and .write, allowing it
to also implement .splice_read and .splice_write, which makes
splice(2) work again after it was broken here (and in many other
places in the tree) during the set_fs() removal. This was a bit of
a last minute arrival from Jens that hasn't had as much time to
bake, so I'll be keeping my eye on this as well, but it seems
fairly ordinary. Unfortunately, read_iter() is around 3% slower
than read() in my tests, which I'm not thrilled about. But Jens and
Al, spurred by this observation, seem to be making progress in
removing the bottlenecks on the iter paths in the VFS layer in
general, which should remove the performance gap for all drivers.
- Assorted other bug fixes, cleanups, and optimizations.
- A small SipHash cleanup"
* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
random: check for signals after page of pool writes
random: wire up fops->splice_{read,write}_iter()
random: convert to using fops->write_iter()
random: convert to using fops->read_iter()
random: unify batched entropy implementations
random: move randomize_page() into mm where it belongs
random: remove mostly unused async readiness notifier
random: remove get_random_bytes_arch() and add rng_has_arch_random()
random: move initialization functions out of hot pages
random: make consistent use of buf and len
random: use proper return types on get_random_{int,long}_wait()
random: remove extern from functions in header
random: use static branch for crng_ready()
random: credit architectural init the exact amount
random: handle latent entropy and command line from random_init()
random: use proper jiffies comparison macro
random: remove ratelimiting for in-kernel unseeded randomness
random: move initialization out of reseeding hot path
random: avoid initializing twice in credit race
random: use symbolic constants for crng_init states
...
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmKKlIAeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC3oH/iPm/fLG2sJut8My
sU0RC9K+6ESV5h2Qy6k00/lqKstlu4EvBjw4V8vYpx3Q2+hbSFMn2SeWqqqT3Lkk
Zb8KINCFuuyMtdCBb42PV0zhUf5pCQF7ocm/Ae4jllDHtPmqk3WJ6IGtZBK5JBlw
z6RR/wKt0y0MRj9eZyPyYjOee2L2vuVh4tgnexK/4L8g2ZtMMRThhvUzSMWG4zxR
STYYNp0uFcfT1Vt85+ODevFH4TvdECAj+SqAegN+seHLM17YY7M0/WiIYpxGRv8P
lIpDQl4PBU8EBkpI5hkpJ/3qPincbuVOMLsYfxFtpcjjG12vGjFp2krGpS3TedZQ
3mvaJ7c=
=vLke
-----END PGP SIGNATURE-----
Merge tag 'v5.18' into rdma.git for-next
Following patches have dependencies.
Resolve the merge conflict in
drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names
for the fs functions following linux-next:
https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-05-23
We've added 113 non-merge commits during the last 26 day(s) which contain
a total of 121 files changed, 7425 insertions(+), 1586 deletions(-).
The main changes are:
1) Speed up symbol resolution for kprobes multi-link attachments, from Jiri Olsa.
2) Add BPF dynamic pointer infrastructure e.g. to allow for dynamically sized ringbuf
reservations without extra memory copies, from Joanne Koong.
3) Big batch of libbpf improvements towards libbpf 1.0 release, from Andrii Nakryiko.
4) Add BPF link iterator to traverse links via seq_file ops, from Dmitrii Dolgov.
5) Add source IP address to BPF tunnel key infrastructure, from Kaixi Fan.
6) Refine unprivileged BPF to disable only object-creating commands, from Alan Maguire.
7) Fix JIT blinding of ld_imm64 when they point to subprogs, from Alexei Starovoitov.
8) Add BPF access to mptcp_sock structures and their meta data, from Geliang Tang.
9) Add new BPF helper for access to remote CPU's BPF map elements, from Feng Zhou.
10) Allow attaching 64-bit cookie to BPF link of fentry/fexit/fmod_ret, from Kui-Feng Lee.
11) Follow-ups to typed pointer support in BPF maps, from Kumar Kartikeya Dwivedi.
12) Add busy-poll test cases to the XSK selftest suite, from Magnus Karlsson.
13) Improvements in BPF selftest test_progs subtest output, from Mykola Lysenko.
14) Fill bpf_prog_pack allocator areas with illegal instructions, from Song Liu.
15) Add generic batch operations for BPF map-in-map cases, from Takshak Chahande.
16) Make bpf_jit_enable more user friendly when permanently on 1, from Tiezhu Yang.
17) Fix an array overflow in bpf_trampoline_get_progs(), from Yuntao Wang.
====================
Link: https://lore.kernel.org/r/20220523223805.27931-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Add support for Realtek 8761BUV
- Add HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk
- Add support for RTL8852C
- Add a new PID/VID 0489/e0c8 for MT7921
- Add support for Qualcomm WCN785x
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmKL8N4ZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKfy6EACPSdWECxGv+20lpBmgRbfF
1Ahu2rFfEtK5iDaTiFLqBOczGkaYnhjq9/Er4VsYHMH1DVedOKoTskzfa2V68j0L
xyCU44E0x314YI/B2sWoMAYejkPX265r0292KzZUpXM2s+S0Doh3Z52glMrjh13C
GbaTfukocBZ6HNIiXp8xQLIaD6MbKGhs2VSWuoXLQqASujrixm1ucglsHNbjfL6e
huSL/SoxKukX8qhlO19pJniROLKq0aC5tQSk66ihvhHWlNBTNJYkg6HoJ8vrB81d
lBjVjPjElbTljQIC+PeOSCSeOn+ELizsUdGgWxiZ2fgzgQvko0fDSnGSnpRSCnNH
zxSBs/pn3cgVS6rSIIzFXsnhdBgAoiwqqhwy+bKsi3nnIXGriZMn82Dpr7z/+QWB
di2wKlZSXw8LEhDaBxi8espe1NkdhgSYWI8c2hh40PLkydBpGXwag9FBkMM4cgsU
4p/riOd776EEVtwXI+lDcAdMFcbMLTf094qZWi1of93VUgDPoy8TSy3ZdshupZah
8UQyeKKYb3lxXIXd/i6C/jhYgPvo/8ZWv+OwysPpAzvvyhAagEKS1czSGJ01yjsy
1k+ebPY4CIHBBT23UFYKxD6Tgn9SAiqKtf0XtbYzCV2UD+QtUadc8MeGFtYb+0j1
ybtkiTT++lU3M9KfAMtH8A==
=XFUz
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add support for Realtek 8761BUV
- Add HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk
- Add support for RTL8852C
- Add a new PID/VID 0489/e0c8 for MT7921
- Add support for Qualcomm WCN785x
* tag 'for-net-next-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (26 commits)
Bluetooth: hci_sync: use hci_skb_event() helper
Bluetooth: eir: Add helpers for managing service data
Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan
Bluetooth: MGMT: Add conditions for setting HCI_CONN_FLAG_REMOTE_WAKEUP
Bluetooth: btmtksdio: fix the reset takes too long
Bluetooth: btmtksdio: fix possible FW initialization failure
Bluetooth: btmtksdio: fix use-after-free at btmtksdio_recv_event
Bluetooth: btbcm: Add entry for BCM4373A0 UART Bluetooth
Bluetooth: btusb: Add a new PID/VID 0489/e0c8 for MT7921
Bluetooth: btusb: Add 0x0bda:0x8771 Realtek 8761BUV devices
Bluetooth: btusb: Set HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for QCA
Bluetooth: core: Fix missing power_on work cancel on HCI close
Bluetooth: btusb: add support for Qualcomm WCN785x
Bluetooth: protect le accept and resolv lists with hdev->lock
Bluetooth: use hdev lock for accept_list and reject_list in conn req
Bluetooth: use hdev lock in activate_scan for hci_is_adv_monitoring
Bluetooth: btrtl: Add support for RTL8852C
Bluetooth: btusb: Set HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN for QCA
Bluetooth: Print broken quirks
Bluetooth: HCI: Add HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk
...
====================
Link: https://lore.kernel.org/r/20220523204151.3327345-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The handling of connection failures shall be handled by the request
completion callback as already done by hci_cs_le_create_conn, also make
sure to use hci_conn_failed instead of hci_le_conn_failed as the later
don't actually call hci_conn_del to cleanup.
Link: https://github.com/bluez/bluez/issues/340
Fixes: 8e8b92ee60 ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>