For ENETC hardware, the TCs are numbered from 0 to N-1, where N
is the number of TCs. Numerically higher TC has higher priority.
It's obvious that the highest priority TC index should be N-1 and
the 2nd highest priority TC index should be N-2.
However, the previous logic uses netdev_get_prio_tc_map() to get
the indexes of highest priority and 2nd highest priority TCs, it
does not make sense and is incorrect to give a "tc" argument to
netdev_get_prio_tc_map(). So the driver may get the wrong indexes
of the two highest priotiry TCs which would lead to failed to set
the CBS for the two highest priotiry TCs.
e.g.
$ tc qdisc add dev eno0 parent root handle 100: mqprio num_tc 6 \
map 0 0 1 1 2 3 4 5 queues 1@0 1@1 1@2 1@3 2@4 2@6 hw 1
$ tc qdisc replace dev eno0 parent 100:6 cbs idleslope 100000 \
sendslope -900000 hicredit 12 locredit -113 offload 1
$ Error: Specified device failed to setup cbs hardware offload.
^^^^^
In this example, the previous logic deems the indexes of the two
highest priotiry TCs should be 3 and 2. Actually, the indexes are
5 and 4, because the number of TCs is 6. So it would be failed to
configure the CBS for the two highest priority TCs.
Fixes: c431047c4e ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This update contains:
- Propagate unlinked inode list corruption back up to log recovery (regression
fix).
- improve corruption detection for AGFL entries, AGFL indexes and XEFI extents
(syzkaller fuzzer oops report).
- Avoid double perag reference release (regression fix).
- Improve extent merging detection in scrub (regression fix).
- Fix a new undefined high bit shift (regression fix).
- Fix for AGF vs inode cluster buffer deadlock (regression fix).
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmSBcAIUHGRhdmlkQGZy
b21vcmJpdC5jb20ACgkQregpR/R1+h1gEg/+IfG2aNR8P4+rqhOJ2yF5fZYtsqS1
HkOX/N/Q8gBNXMwh3wWoXyBJk7gBhwySwcGvlYXoMZf6+alXGHaTMl8whxmFYaT9
+aPBWo4lXRec6YHx016ZOjnNkLiWhyxdUvh85IFf0EJm5mK9QqjoX+lmbPc7HDzh
0nFL66jaxM8W36QhK0srdwwjD3kNgZ2ZRNonlRULOzyTPpFfh985esTrmfmn3Ulx
xiejw57xdpti9x+Pm5WZjUsW1/gx50hMS+yn/KiIWTQqncIO/OuirZSTrOFtUWTM
xIfMB9xlkdaSmMCUyx2r2RVWJawXP++aT7nbza2eWJa0WSn5kZmHXugzI+V9zUx7
M0oakkOXJl2pYakVr7G8JU4djZkQNu41JkuLVf5U7O3yYRWlXzViAqljd3S2C/+i
pSjG9ram8esd/CAmw/hE6Jvhm6QYS1/D3KQ9Gs6JaptzR8Xjc7t7GEj1T0pMyPem
iZx80C6fi87k/94hQ+HXalrAyJER9EmcQ25yngKucjgfrO0BrzNLGDus4uY0+IzX
Y2T6xcSF/Vhd1soaklRuHryF7Vv7ECCIWVUV2pH7GHZwv0LaXvBnC1CUdYXskXy8
RGlIBL75lBkOfZp0zK/R11sm1qxzXPCayBkZtSglj/RdNaZiFO9uwO139eGl+xP5
ytjMvitThOGoAfU=
=vT74
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Dave Chinner:
"These are a set of regression fixes discovered on recent kernels. I
was hoping to send this to you a week and half ago, but events out of
my control delayed finalising the changes until early this week.
Whilst the diffstat looks large for this stage of the merge window, a
large chunk of it comes from moving the guts of one function from one
file to another i.e. it's the same code, it is just run in a different
context where it is safe to hold a specific lock. Otherwise the
individual changes are relatively small and straigtht forward.
Summary:
- Propagate unlinked inode list corruption back up to log recovery
(regression fix)
- improve corruption detection for AGFL entries, AGFL indexes and
XEFI extents (syzkaller fuzzer oops report)
- Avoid double perag reference release (regression fix)
- Improve extent merging detection in scrub (regression fix)
- Fix a new undefined high bit shift (regression fix)
- Fix for AGF vs inode cluster buffer deadlock (regression fix)"
* tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: collect errors from inodegc for unlinked inode recovery
xfs: validate block number being freed before adding to xefi
xfs: validity check agbnos on the AGFL
xfs: fix agf/agfl verification on v4 filesystems
xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()
xfs: fix broken logic when detecting mergeable bmap records
xfs: Fix undefined behavior of shift into sign bit
xfs: fix AGF vs inode cluster buffer deadlock
xfs: defered work could create precommits
xfs: restore allocation trylock iteration
xfs: buffer pins need to hold a buffer reference
Michael Chan says:
====================
bnxt_en: Bug fixes
This patchset has the following fixes for bnxt_en:
1. Add missing VNIC ID parameter in the FW message when getting an
updated RSS configuration from the FW.
2. Fix a warning when doing ethtool reset on newer chips.
3. Fix VLAN issue on a VF when a default VLAN is assigned.
4. Fix a problem during DPC (Downstream Port containment) scenario.
5. Fix a NULL pointer dereference when receiving a PTP event from FW.
6. Fix VXLAN/Geneve UDP port delete/add with newer FW.
====================
Link: https://lore.kernel.org/r/20230607075409.228450-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As per the new udp tunnel framework, drivers which need to know the
details of a port entry (i.e. port type) when it gets deleted should
use the .set_port / .unset_port callbacks.
Implementing the current .udp_tunnel_sync callback would mean that the
deleted tunnel port entry would be all zeros. This used to work on
older firmware because it would not check the input when deleting a
tunnel port. With newer firmware, the delete will now fail and
subsequent tunnel port allocation will fail as a result.
Fixes: 442a35a5a7 ("bnxt: convert to new udp_tunnel_nic infra")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The firmware can send PHC_RTC_UPDATE async event on a PF that may not
have PTP registered. In such a case, there will be a null pointer
deference for bp->ptp_cfg when we try to handle the event.
Fix it by not registering for this event with the firmware if !bp->ptp_cfg.
Also, check that bp->ptp_cfg is valid before proceeding when we receive
the event.
Fixes: 8bcf6f04d4 ("bnxt_en: Handle async event when the PHC is updated in RTC mode")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Driver starts firmware fatal error recovery by detecting
heartbeat failure or fw reset count register changing. But
these checks are not reliable if the device is not accessible.
This can happen while DPC (Downstream Port containment) is in
progress. Skip firmware fatal recovery if pci_device_is_present()
returns false.
Fixes: acfb50e4e7 ("bnxt_en: Add FW fatal devlink_health_reporter.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to call bnxt_hwrm_func_qcfg() on a VF to query the default
VLAN that may be setup by the PF. If a default VLAN is enabled,
the VF cannot support VLAN acceleration on the receive side and
the VNIC must be setup to strip out the default VLAN tag. If a
default VLAN is not enabled, the VF can support VLAN acceleration
on the receive side. The VNIC should be set up to strip or not
strip the VLAN based on the RX VLAN acceleration setting.
Without this call to determine the default VLAN before calling
bnxt_setup_vnic(), the VNIC may not be set up correctly. For
example, bnxt_setup_vnic() may set up to strip the VLAN tag based
on stale default VLAN information. If RX VLAN acceleration is
not enabled, the VLAN tag will be incorrectly stripped and the
RX data path will not work correctly.
Fixes: cf6645f8eb ("bnxt_en: Add function for VF driver to query default VLAN.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Only older NIC controller's firmware uses the PROC AP reset type.
Firmware on 5731X/5741X and newer chips does not support this reset
type. When bnxt_reset() issues a series of resets, this PROC AP
reset may actually fail on these newer chips because the firmware
is not ready to accept this unsupported command yet. Avoid this
unnecessary error by skipping this reset type on chips that don't
support it.
Fixes: 7a13240e37 ("bnxt_en: fix ethtool_reset_flags ABI violations")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We must specify the vnic id of the vnic in the input structure of this
firmware message. Otherwise we will get an error from the firmware.
Fixes: 98a4322b70 ("bnxt_en: update RSS config using difference algorithm")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
- fix a broken sync while rescheduling delayed work, by
Vladislav Efanov
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmSAp90WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoZPwEADXDBvWlT7DH3sP1peNmFF/y+pd
AgNJE5wJiJGBrCA1K0gZONulyhpOLjsBtwuyWDuS143IlBSPPQqFcJgZrmU6zHfQ
MjZBOJMGEgxUbh51vQH4bVTotZB1STVSbst9+HJi+KLpgEN/BnMU2/gDfbV5JwzK
xVDk2/D0+1OX4w61V+UDqmXuljBLa5cbOUPnRP4/gz/suVui5Q0CESeB+H/One9z
RlKM6YkcSOr4y9MAIgSpJwY8O4hZ0oeqZyewMTYYWYDQ3nGpZ9NUCGR8kYozusqg
u8c71nrJwdHV8VS7IU3eEzeKXFo2uz8UxyTgK+qcsoem4oTZgCZy8nXh+Pwp2y72
R+RHFngBcKIYlvil5cyVUisnJ7GZOjHK/N2pESeG7A/iI0jU6YZgVe5oJaCHbMJl
//F6m4iFHvPAbf61f5tRePTZTPd98LC3KAlI1Fu4/g+07H0ivgsiFk+qi45OvvcE
MWvK12FlgTbCUeqjhg6bKuVva2NY5SDn1uRcVrTAD3HDpcpfw7UYv/jLBIeAwpoY
S7SLRl6xho1+aEPaJWx39DrXWQCFQZP95ygZIN5TPZcihvVp8knY0F7mLPMRovf9
WBFp89+aS/bv1x14fLrPYOzyJD0XisA3A04iERvf1eLroNa9poh3KHf5jhJLH+RP
CTumDtHSDt+GIH7E5A==
=euFF
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix a broken sync while rescheduling delayed work,
by Vladislav Efanov
* tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge:
batman-adv: Broken sync while rescheduling delayed work
====================
Link: https://lore.kernel.org/r/20230607155515.548120-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We had a number of short comings:
- EEE must be re-evaluated whenever the state machine detects a link
change as wight be switching from a link partner with EEE
enabled/disabled
- tx_lpi_enabled controls whether EEE should be enabled/disabled for the
transmit path, which applies to the TBUF block
- We do not need to forcibly enable EEE upon system resume, as the PHY
state machine will trigger a link event that will do that, too
Fixes: 6ef398ea60 ("net: bcmgenet: add EEE support")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230606214348.2408018-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Flip the netif_carrier_ok() condition in queue wake logic.
When I moved it to inside __netif_txq_completed_wake()
I missed negating it.
This made the condition ineffective and could probably
lead to crashes.
Fixes: 301f227fc8 ("net: piggy back on the memory barrier in bql when waking queues")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The down condition should be the negation of the wake condition,
IOW when I moved it from:
if (cond && wake())
to
if (__netif_txq_completed_wake(cond))
Cond should have been negated. Flip it now.
This bug leads to occasional crashes with netconsole.
It may also lead to queue never waking up in case BQL is not enabled.
Reported-by: David Wei <davidhwei@meta.com>
Fixes: 08a096780d ("bnxt: use new queue try_stop/try_wake macros")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230607010826.960226-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZIDxUwAKCRDbK58LschI
g5hDAQD7ukrniCvMRNIm2yUZIGSxE4RvGiXptO4a0NfLck5R/wEAsfN2KUsPcPhW
HS37lVfx7VVXfj42+REf7lWLu4TXpwk=
=6mS/
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-07
We've added 7 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 112 insertions(+), 7 deletions(-).
The main changes are:
1) Fix a use-after-free in BPF's task local storage, from KP Singh.
2) Make struct path handling more robust in bpf_d_path, from Jiri Olsa.
3) Fix a syzbot NULL-pointer dereference in sockmap, from Eric Dumazet.
4) UAPI fix for BPF_NETFILTER before final kernel ships,
from Florian Westphal.
5) Fix map-in-map array_map_gen_lookup code generation where elem_size was
not being set for inner maps, from Rhys Rustad-Elliott.
6) Fix sockopt_sk selftest's NETLINK_LIST_MEMBERSHIPS assertion,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add extra path pointer check to d_path helper
selftests/bpf: Fix sockopt_sk selftest
bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
selftests/bpf: Add access_inner_map selftest
bpf: Fix elem_size not being set for inner maps
bpf: Fix UAF in task local storage
bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
====================
Link: https://lore.kernel.org/r/20230607220514.29698-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
irq_cpu_rmap_release() calls cpu_rmap_put(), which may free the rmap.
So we need to clear the pointer to our glue structure in rmap before
doing that, not after.
Fixes: 4e0473f106 ("lib: cpu_rmap: Avoid use after free on rmap->obj array entries")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZHo0vwquhOy3FaXc@decadent.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- a fix for unbalanced open count for inhibited input devices
- fixups in Elantech PS/2 and Cyppress TTSP v5 drivers
- a quirk to soc_button_array driver to make it work with
Lenovo Yoga Book X90F / X90L
- a removal of erroneous entry from xpad driver.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZH9ziwAKCRBAj56VGEWX
nJ0JAQDkzAz8sD97Ua07O4VtP/wginhrM8GRe0gHQd2Pp+r83AD8C7+4P79OA5K7
jZSy5FXmTQJYctUvvIiQA6KJC3pK7Aw=
=I4bq
-----END PGP SIGNATURE-----
Merge tag 'input-for-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for unbalanced open count for inhibited input devices
- fixups in Elantech PS/2 and Cyppress TTSP v5 drivers
- a quirk to soc_button_array driver to make it work with Lenovo
Yoga Book X90F / X90L
- a removal of erroneous entry from xpad driver
* tag 'input-for-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - delete a Razer DeathAdder mouse VID/PID entry
Input: psmouse - fix OOB access in Elantech protocol
Input: soc_button_array - add invalid acpi_index DMI quirk handling
Input: fix open count when closing inhibited device
Input: cyttsp5 - fix array length
This is overdue and an oversight.
Add myself to this file deespite the fact that I'm trying to reduce the
number of entries in this file which have my name attached, but in the
hope that patches wont get picked up elsewhere completely unreviewed and
unnoticed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kafs incorrectly passes a zero mtime (ie. 1st Jan 1970) to the server when
creating a file, dir or symlink because the mtime recorded in the
afs_operation struct gets passed to the server by the marshalling routines,
but the afs_mkdir(), afs_create() and afs_symlink() functions don't set it.
This gets masked if a file or directory is subsequently modified.
Fix this by filling in op->mtime before calling the create op.
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anastasios reported crash on stable 5.15 kernel with following
BPF attached to lsm hook:
SEC("lsm.s/bprm_creds_for_exec")
int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm)
{
struct path *path = &bprm->executable->f_path;
char p[128] = { 0 };
bpf_d_path(path, p, 128);
return 0;
}
But bprm->executable can be NULL, so bpf_d_path call will crash:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
...
RIP: 0010:d_path+0x22/0x280
...
Call Trace:
<TASK>
bpf_d_path+0x21/0x60
bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99
bpf_trampoline_6442506293_0+0x55/0x1000
bpf_lsm_bprm_creds_for_exec+0x5/0x10
security_bprm_creds_for_exec+0x29/0x40
bprm_execve+0x1c1/0x900
do_execveat_common.isra.0+0x1af/0x260
__x64_sys_execve+0x32/0x40
It's problem for all stable trees with bpf_d_path helper, which was
added in 5.9.
This issue is fixed in current bpf code, where we identify and mark
trusted pointers, so the above code would fail even to load.
For the sake of the stable trees and to workaround potentially broken
verifier in the future, adding the code that reads the path object from
the passed pointer and verifies it's valid in kernel space.
Fixes: 6e22ab9da7 ("bpf: Add d_path helper")
Reported-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
to be called to drop the refcount if ops don't implement the required
function.
Fixes: 9f407f1768 ("net: sched: introduce chain templates")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes following sparse errors:
net/sched/act_police.c:360:28: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:368:28: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
Fixes: d1967e495a ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.
Here is an example:
PID: 59693 TASK: ffff0005f4f51500 CPU: 0 COMMAND: "ovs-vswitchd"
#0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
#1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
#2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
#3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
#4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
#5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
#6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
#7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
...
PID: 58682 TASK: ffff0005b2f0bf00 CPU: 0 COMMAND: "kworker/0:3"
#0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
#1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
#2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
#3 [ffff80000a5d3120] die at ffffb70f0628234c
#4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
#5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
#6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
#7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
#8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
#9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
#10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
#11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
#12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
#13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
#14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
#15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
#16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
#17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90
We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.
Fixes: 95637d91fe ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea365a ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
thus should be declared in an include file.
This fixes the following sparse warning:
net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?
Fixes: e331473fee ("net/sched: cls_api: add missing validation of netlink attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current ice driver's GNSS write implementation buffers writes and
works through them asynchronously in a kthread. That's bad because:
- The GNSS write_raw operation is supposed to be synchronous[1][2].
- There is no upper bound on the number of pending writes.
Userspace can submit writes much faster than the driver can process,
consuming unlimited amounts of kernel memory.
A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS
buffers instead of first one") would add one more problem:
- The possibility of waiting for a very long time to flush the write
work when doing rmmod, softlockups.
To fix these issues, simplify the implementation: Drop the buffering,
the write_work, and make the writes synchronous.
I tested this with gpsd and ubxtool.
[1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf
"User interface" slide.
[2] A comment in drivers/gnss/core.c:gnss_write():
/* Ignoring O_NONBLOCK, write_raw() is synchronous. */
[3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/
Fixes: d6b98c8d24 ("ice: add write functionality for GNSS TTY")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
rfs: annotate lockless accesses
rfs runs without locks held, so we should annotate
read and writes to shared variables.
It should prevent compilers forcing writes
in the following situation:
if (var != val)
var = val;
A compiler could indeed simply avoid the conditional:
var = val;
This matters if var is shared between many cpus.
v2: aligns one closing bracket (Simon)
adds Fixes: tags (Jakub)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.
This also prevents a (smart ?) compiler to remove the condition in:
if (table->ents[index] != newval)
table->ents[index] = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e5 ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash.
This also prevents a (smart ?) compiler to remove the condition in:
if (sk->sk_rxhash != newval)
sk->sk_rxhash = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e5 ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fixes to debugfs registration
- Fix use-after-free in hci_remove_ltk/hci_remove_irk
- Fixes to ISO channel support
- Fix missing checks for invalid L2CAP DCID
- Fix l2cap_disconnect_req deadlock
- Add lock to protect HCI_UNREGISTER
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmR+ftMZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQMXD/9NcuqbGmEzJspVA8bZ8gXD
L7a68QnacdIoqH56QstLhGPQsYH6dv9fwhpNX6AN8/j8UG8DnDXQtHyfm4gZzfYA
h8GP7+ZQIEiHivIxiamrJnQ1Ii+KYEV3NGyS43YBuuPi9LcTFR0Km42xA0GqOnDU
Hz3/n5v342479TjJPNJkFPmcUGViRaLXtKhzcBzmSykUW+SVuIuD03yxuAJcojf5
rlPYA7yho7k8BAWkcYxWAP3v9fzQVa3nz8rQO2rG+poi4La2mmqRHykuSCXmzvBX
SbZwvzqgquqgQiFLpRIo/nwnVwPu3NYK6dQzlXPqiaxfM6qAtRttwQWNnOT+UxEu
VVGk6fD9iKjo9dttq+lTSY3LI/SXWAHYByIBzjx883hJYf1YvDAMSlMlzo029xL6
BHu3hMTDhosP8sG5wFdR2KzBmUd1W/ZcwOG0UP8PjshZgrOZ3uej9p3MrocKAys7
uGOBFmGzwOaQLXJQLbd4djE5l6zLOxSCV/0OLIWQw7VFQiHb66NzN6wenYEkDnxM
j2pFAlzp4RKHHCjU3dfaE90c0ede116e9nhjAlzmUOxggg6aCxCrCkMNOI8NlZ4v
oukYWq66RWYA/J4S80OLepITtBRPVn3JFxOXss5xESFfEnzL2nRZ5gm8jJJGULU4
x6tKTHaomO99FcH0ZFlZMw==
=jMWO
-----END PGP SIGNATURE-----
Merge tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fixes to debugfs registration
- Fix use-after-free in hci_remove_ltk/hci_remove_irk
- Fixes to ISO channel support
- Fix missing checks for invalid L2CAP DCID
- Fix l2cap_disconnect_req deadlock
- Add lock to protect HCI_UNREGISTER
* tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Add missing checks for invalid DCID
Bluetooth: ISO: use correct CIS order in Set CIG Parameters event
Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
Bluetooth: Fix l2cap_disconnect_req deadlock
Bluetooth: hci_qca: fix debugfs registration
Bluetooth: fix debugfs registration
Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG
Bluetooth: ISO: consider right CIS when removing CIG at cleanup
====================
Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmR/uDEACgkQ1V2XiooU
IOTC0BAAoKLyoPncbYOO9bTX9nbmn+gttwVd/wDJEbeAXzHSIiWJmjfCklJ9P7Bu
j3cRAOPe7qyXbUCpTTWPOMzcrjUwnnSuNjF5dgGhfgkg+jiykEuxaRJvyXJ1WKI4
v94hkmVeWB/iVpbNtFlUVzAzjemtLWU8TDEqaKRpZubaf+tNokJ3gggTlTRYslnn
YGXlaypkLh7xGUmW7q3MfmySbfj6E7dHnYJ4Df5MKMwGM3Rrbelh9/VTpn33nob2
74lWg/Gj3My9E+NjnZMoTA/YGnuUVPhYm4naIvp6Hc6IKQ3dI7NqleywxeHbuPgr
McwHtLRR8a5HJpMhPXPtA0d/Ot2LGzKo4L62Ahp4KHrTr/UKDtqSDu+9ZButue/E
0W/dKn+UA5hQKiNXOlTt25npx8VgQJFwcdCAYPJZNONCegCzl2MDVUBZufFLg6OM
JC2XMHFN1GRAHtgHMfdbM1pHYjkx9QBeYFz4zLgWmsGLIvsfgYpVE+nF6ExJsNjZ
pOILZtbAFWCUFVXWVUxJF4OkwOmpV2DhUk0hRKLOhmPD/HSoa4dvkGaB/yQB1uyz
SVfZgIrTqftLYgLvHDb9u0nRSwxibmPSCkr0C86yWRzOLJytil/qWqX6lAyMYUei
Yy8d+Kq/iX6qGJf5py9xtyXbT2Vsb5EYX7+qMu6HySngCZz+Zwo=
=tb7S
-----END PGP SIGNATURE-----
Merge tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.
2) Fix bitwise register tracking, from Jeremy Sowden.
3) Null pointer dereference when accessing conntrack helper,
from Tijs Van Buggenhout.
4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.
5) Incorrect boundary check when building chain blob.
* tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: out-of-bound check in chain blob
netfilter: ipset: Add schedule point in call_ad().
netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
netfilter: nft_bitwise: fix register tracking
netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
====================
Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmR/S2URHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZuHXwgAhS9w8UIZ2qLYmLQOlby4Hx9+TV2lSdZ1
V878SCWC+/nRX1mRrWZdU5zwwXXVpLv61dCUOuYyJp8ko4izzTwUhZzvNGowaGgo
HA+KrND/rZ2ApRZDZQMpe8SXaTUZJhcRDdV4njjdeSqNEcfksgz1W8exzDpKt8YD
pAdz8+gfpBSoATRThY5p3vyeC4e1weKqbsk96SLoip/wKzz92jyUx9fyexTskfoN
WMfDU474bz4XIEXzmuFBqpwylwxTvy+FKvEVZfe9PqtXEOChqMUZGGMAemD81FY0
kKIEY21kAOBKRBW5OLNHcR0WrFcq+C17+L9eazE1F7iQiKIVQaCsag==
=a4jg
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.4
Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.
* tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: fix locking in regulatory disconnect
wifi: cfg80211: fix locking in sched scan stop work
wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
wifi: mac80211: fix switch count in EMA beacons
wifi: mac80211: don't translate beacon/presp addrs
wifi: mac80211: mlme: fix non-inheritence element
wifi: cfg80211: reject bad AP MLD address
wifi: mac80211: use correct iftype HE cap
wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
wifi: rtw89: remove redundant check of entering LPS
wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
====================
Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 699b045a8e ("net: virtio_net: notifications coalescing
support") added coalescing command support for virtio_net. However,
the coalesce commands are using buffers on the stack, which is causing
the device to see DMA errors. There should also be a complaint from
check_for_stack() in debug_dma_map_xyz(). Fix this by adding and using
coalesce params from the control_buf struct, which aligns with other
commands.
Cc: stable@vger.kernel.org
Fixes: 699b045a8e ("net: virtio_net: notifications coalescing support")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20230605195925.51625-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 523847df1b ("pds_core: add devcmd device interfaces") included
initial support for FW recovery detection. Unfortunately, the ordering
in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be
undetected by the driver. Fix this by making sure to update the cached
fw_status by calling pdsc_is_fw_running() before setting the local FW
gen.
Fixes: 523847df1b ("pds_core: add devcmd device interfaces")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :
oldlen = (u16)~skb->len;
This part came with commit 0718bcc09b ("[NET]: Fix CHECKSUM_HW GSO problems.")
This leads to wrong TCP checksum.
Adapt the code to accept arbitrary packet length.
v2:
- use two csum_add() instead of csum_fold() (Alexander Duyck)
- Change delta type to __wsum to reduce casts (Alexander Duyck)
Fixes: 09f3d1a3a5 ("ipv6/gso: remove temporary HBH/jumbo header")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156.
The Source Routing Header (SRH) has the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr Ext Len | Routing Type | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE | Pad | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
. .
. Addresses[1..n] .
. .
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The originator of an SRH places the first hop's IPv6 address in the IPv6
header's IPv6 Destination Address and the second hop's IPv6 address as
the first address in Addresses[1..n].
The CmprI and CmprE fields indicate the number of prefix octets that are
shared with the IPv6 Destination Address. When CmprI or CmprE is not 0,
Addresses[1..n] are compressed as follows:
1..n-1 : (16 - CmprI) bytes
n : (16 - CmprE) bytes
Segments Left indicates the number of route segments remaining. When the
value is not zero, the SRH is forwarded to the next hop. Its address
is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6
Destination Address.
When Segment Left is greater than or equal to 2, the size of SRH is not
changed because Addresses[1..n-1] are decompressed and recompressed with
CmprI.
OTOH, when Segment Left changes from 1 to 0, the new SRH could have a
different size because Addresses[1..n-1] are decompressed with CmprI and
recompressed with CmprE.
Let's say CmprI is 15 and CmprE is 0. When we receive SRH with Segment
Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has
16 bytes. When Segment Left is 1, Addresses[1..n-1] is decompressed to
16 bytes and not recompressed. Finally, the new SRH will need more room
in the header, and the size is (16 - 1) * (n - 1) bytes.
Here the max value of n is 255 as Segment Left is u8, so in the worst case,
we have to allocate 3825 bytes in the skb headroom. However, now we only
allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7
bytes). If the decompressed size overflows the room, skb_push() hits BUG()
below [0].
Instead of allocating the fixed buffer for every packet, let's allocate
enough headroom only when we receive SRH with Segment Left 1.
[0]:
skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo
kernel BUG at net/core/skbuff.c:200!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_panic (net/core/skbuff.c:200)
Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90000003da0 EFLAGS: 00000246
RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540
RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff
R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800
R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190
FS: 00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
skb_push (net/core/skbuff.c:210)
ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718)
ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483)
__netif_receive_skb_one_core (net/core/dev.c:5494)
process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934)
__napi_poll (net/core/dev.c:6496)
net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
do_softirq (kernel/softirq.c:472 kernel/softirq.c:459)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:396)
__dev_queue_xmit (net/core/dev.c:4272)
ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134)
rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
sock_sendmsg (net/socket.c:724 net/socket.c:747)
__sys_sendto (net/socket.c:2144)
__x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f453a138aea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea
RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003
RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b
</TASK>
Modules linked in:
Fixes: 8610c7c6e3 ("net: ipv6: add support for rpl sr exthdr")
Reported-by: Max VA
Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Working on the code gen for C reveals typos in the ethtool spec
as the compiler tries to find the names in the existing uAPI
header. Fix the mistakes.
Fixes: a353318ebf ("tools: ynl: populate most of the ethtool spec")
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230605233257.843977-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add current size of rule expressions to the boundary check.
Fixes: 2c865a8a28 ("netfilter: nf_tables: add rule blob layout")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
syzkaller found a repro that causes Hung Task [0] with ipset. The repro
first creates an ipset and then tries to delete a large number of IPs
from the ipset concurrently:
IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187
IPSET_ATTR_CIDR : 2
The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
held, and other threads wait for it to be released.
Previously, the same issue existed in set->variant->uadt() that could run
so long under ip_set_lock(set). Commit 5e29dc36bd ("netfilter: ipset:
Rework long task execution when adding/deleting entries") tried to fix it,
but the issue still exists in the caller with another mutex.
While adding/deleting many IPs, we should release the CPU periodically to
prevent someone from abusing ipset to hang the system.
Note we need to increment the ipset's refcnt to prevent the ipset from
being destroyed while rescheduling.
[0]:
INFO: task syz-executor174:268 blocked for more than 143 seconds.
Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor174 state:D stack:0 pid:268 ppid:260 flags:0x0000000d
Call trace:
__switch_to+0x308/0x714 arch/arm64/kernel/process.c:556
context_switch kernel/sched/core.c:5343 [inline]
__schedule+0xd84/0x1648 kernel/sched/core.c:6669
schedule+0xf0/0x214 kernel/sched/core.c:6745
schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747
__mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035
mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286
nfnl_lock net/netfilter/nfnetlink.c:98 [inline]
nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295
netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546
nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365
netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x4b8/0x810 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
An nf_conntrack_helper from nf_conn_help may become NULL after DNAT.
Observed when TCP port 1720 (Q931_PORT), associated with h323 conntrack
helper, is DNAT'ed to another destination port (e.g. 1730), while
nfqueue is being used for final acceptance (e.g. snort).
This happenned after transition from kernel 4.14 to 5.10.161.
Workarounds:
* keep the same port (1720) in DNAT
* disable nfqueue
* disable/unload h323 NAT helper
$ linux-5.10/scripts/decode_stacktrace.sh vmlinux < /tmp/kernel.log
BUG: kernel NULL pointer dereference, address: 0000000000000084
[..]
RIP: 0010:nf_conntrack_update (net/netfilter/nf_conntrack_core.c:2080 net/netfilter/nf_conntrack_core.c:2134) nf_conntrack
[..]
nfqnl_reinject (net/netfilter/nfnetlink_queue.c:237) nfnetlink_queue
nfqnl_recv_verdict (net/netfilter/nfnetlink_queue.c:1230) nfnetlink_queue
nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:241) nfnetlink
[..]
Fixes: ee04805ff5 ("netfilter: conntrack: make conntrack userspace helpers work again")
Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
At the end of `nft_bitwise_reduce`, there is a loop which is intended to
update the bitwise expression associated with each tracked destination
register. However, currently, it just updates the first register
repeatedly. Fix it.
Fixes: 34cc9e5288 ("netfilter: nf_tables: cancel tracking for clobbered destination registers")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nla_nest_start_noflag() function may fail and return NULL;
the return value needs to be checked.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: d54725cd11 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit f4e4534850 ("net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report")
fixed NETLINK_LIST_MEMBERSHIPS length report which caused
selftest sockopt_sk failure. The failure log looks like
test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
run_test:PASS:skel_load 0 nsec
run_test:PASS:setsockopt_link 0 nsec
run_test:PASS:getsockopt_link 0 nsec
getsetsockopt:FAIL:Unexpected NETLINK_LIST_MEMBERSHIPS value unexpected Unexpected NETLINK_LIST_MEMBERSHIPS value: actual 8 != expected 4
run_test:PASS:getsetsockopt 0 nsec
#201 sockopt_sk:FAIL
In net/netlink/af_netlink.c, function netlink_getsockopt(), for NETLINK_LIST_MEMBERSHIPS,
nlk->ngroups equals to 36. Before Commit f4e4534850, the optlen is calculated as
ALIGN(nlk->ngroups / 8, sizeof(u32)) = 4
After that commit, the optlen is
ALIGN(BITS_TO_BYTES(nlk->ngroups), sizeof(u32)) = 8
Fix the test by setting the expected optlen to be 8.
Fixes: f4e4534850 ("net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230606172202.1606249-1-yhs@fb.com
A small collection of driver specific fixes, none of them particularly
remarkable or severe.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmR/LxwACgkQJNaLcl1U
h9DROwf+OL+WBoWbR6zVOG5xR6B70I03QNAewHOW+WWsyXAqw3im7T1H6H8rthOf
E6xKPNTMwfGceKkTVJvzZdajjEM3QYTdxAF8gNwffv2tTmXOxDmXDF8G510Up9Nh
h9CdqypfItEKRdVbN+qqX01XZpkqxWuHX1OeemR6u78gRSP4Tp5KCiQn6j/8GwRj
Zd7863hCj5uVuYQCu2bxrx1rFIgodCVeC333mnrDLvQMIRGgJeS7FGSdyrg+wLTQ
zEygY0mt5uezHt+8CyFbzHYCgmVNiM/iLBhNnkIvIzA4fhQoi4shm+S0BqGwB1VV
NgLghGlCbnGpyQpQQmEXsZZfOKvtvQ==
=pTb8
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of driver specific fixes, none of them particularly
remarkable or severe"
* tag 'spi-fix-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: qup: Request DMA before enabling clocks
spi: mt65xx: make sure operations completed before unloading
spi: lpspi: disable lpspi module irq in DMA mode
This should use wiphy_lock() now instead of requiring the
RTNL, since __cfg80211_leave() via cfg80211_leave() is now
requiring that lock to be held.
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This should use wiphy_lock() now instead of acquiring the
RTNL, since cfg80211_stop_sched_scan_req() now needs that.
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Don't get stuck writing page onto itself under direct I/O.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmR/KcEUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTr3wA//eCaUYWKOiSXbbvNeP1pdawN8zRZO
alpK/nsujcB4bDdiDUTREYFBfcmBKEIboFz6e02DL8MPp2mc0fZ/Ox9yq/nR7o9x
9Y1CjWmC1zoUkqw6V8+vbg0m432OlcWglppgywHjvyUiEnyUzBfnxIRH/k0lLYor
yR7vViSZQhJ4jroSeEVNKsCeZUiY7y5tiLo+bcHYYF7lolab/ZfNxacr1/lSAuww
WS0frjAeSBneQ5aU2JQ60lbJcJQydfdoS3n0dlyX6qJeVoFCnhQAJeiVSQaYMFpE
HaYFs/3YGjzAkoWqX5CAzLIfxsIHepdaP4PtITg3xwyMQ1j3X5H5n1FOyJCcbXz7
s7gq/RXjU3TV/hVPSukVGhwjavB8gMrbQtmpSYHpA99ldeNfIVwps2PSWdoDND/B
7w+g0T5U+yd7DNz2tL9YML7Anioc0K6y1hVvacuPIgNHJyLQ5XaPYJXUUJFfl0X6
njcZVmfK56RQRPR7jDp26F4X+Pw+GjahJJq05zCwsmFnP6+pX2gjYNx9LidXBugU
1L8BlN2IZvc9ShvInZIuQHwTMEXAMjjSd5JtX7iZLnHxeWDfIlo64ZrroiiNbrk6
pDqG+C6fkYV1h1fzX3bvFZIvFoKERxu9u9TunBIiBQFHQNYWoe2i5zpZNXyT7Soo
epIqTaWmf8l/3vo=
=9syc
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v6.4-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- Don't get stuck writing page onto itself under direct I/O
* tag 'gfs2-v6.4-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Don't get stuck writing page onto itself under direct I/O
Highlights:
- Various Microsoft Surface support fixes
- 1 fix for the INT3472 driver
The following is an automated git shortlog grouped by driver:
int3472:
- Avoid crash in unregistering regulator gpio
platform/surface:
- aggregator_tabletsw: Add support for book mode in POS subsystem
- aggregator_tabletsw: Add support for book mode in KIP subsystem
- aggregator: Allow completion work-items to be executed in parallel
- aggregator: Make to_ssam_device_driver() respect constness
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmR+8JIUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9xUNwgAhonExP6s59exGm89UX7nF8GVUgoa
gZc4kfpVXR5MuZFyxERN+M7TuyAmfCXlaGwjqU83Wc1o9jvUnUHGOcT04QVXGmjb
V50kYfFfPg8WFHpXXPX+oavHSOuSmPyRTJMtqHnpUVv4pSAcM+RsWyG2qiDPESk6
DKhI7W6JdL22be297bdpI1atbpg3VpeGLzMXosUQZnfnk5BwCCCyxgybj2OadlvW
y4toyw7eeRYMtT20IA4PyEjAFtiPV+F7gFMO1ohW5oEkjVgxgp+BHwN/fLph8Sl+
KfICRLLJ81RDl0B8XB+X8r0EjBcA0MxIwgAZZZiScvm8snsa1fv9U6/OeQ==
=zfTX
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- various Microsoft Surface support fixes
- one fix for the INT3472 driver
* tag 'platform-drivers-x86-v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int3472: Avoid crash in unregistering regulator gpio
platform/surface: aggregator_tabletsw: Add support for book mode in POS subsystem
platform/surface: aggregator_tabletsw: Add support for book mode in KIP subsystem
platform/surface: aggregator: Allow completion work-items to be executed in parallel
platform/surface: aggregator: Make to_ssam_device_driver() respect constness