The value in idx and the number of rules handled in that particular
__nf_tables_dump_rules() call is not identical. The former is a cursor
to pick up from if multiple netlink messages are needed, so its value is
ever increasing. Fixing this is not just a matter of subtracting s_idx
from it, though: When resetting rules in multiple chains,
__nf_tables_dump_rules() is called for each and cb->args[0] is not
adjusted in between. Introduce a dedicated counter to record the number
of rules reset in this call in a less confusing way.
While being at it, prevent the direct return upon buffer exhaustion: Any
rules previously dumped into that skb would evade audit logging
otherwise.
Fixes: 9b5ba5c9c5 ("netfilter: nf_tables: Unbreak audit log reset")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The size table is incorrect due to copypaste error,
this reserves more size than needed.
TSTAMP reserved 32 instead of 16 bytes.
TIMEOUT reserved 16 instead of 8 bytes.
Fixes: 5f31edc067 ("netfilter: conntrack: move extension sizes into core")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to deny the attach_override test for arm64, denying the
whole kprobe_multi_test suite. Also making attach_override static.
Fixes: 7182e56411 ("selftests/bpf: Add kprobe_multi override test")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230913114711.499829-1-jolsa@kernel.org
This pull request contains a critical fix for my previous pull request.
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iIgEABYIADAWIQRE6pSOnaBC00OEHEIaerohdGur0gUCZQDFZxIcamFya2tvQGtl
cm5lbC5vcmcACgkQGnq6IXRrq9JtiQD/ewB17tLbjoAzZSRjUTgHwfjLSOkLUE0R
xyNSsbnOIQEBANLFJ9/23HBHN7hL/mhdcovTX4eWdjvUAzZHoQtasV4E
=huwc
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fix from Jarkko Sakkinen.
* tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Fix typo in tpmrm class definition
* fix reference to exported symbols for parisc64 [Masahiro Yamada]
* Block-TLB (BTLB) support on 32-bit CPUs
* sparse and build-warning fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZQDAAwAKCRD3ErUQojoP
X5wDAP4jxLxuVnUCpV5hUdFoJC5lkRM2LigbWDSfDQGHaycr0QD+NerBYX8Ejo6n
x0zHqqtBBe1fgU0QfRdwHeE7hlOiigI=
=FDci
-----END PGP SIGNATURE-----
Merge tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- fix reference to exported symbols for parisc64 [Masahiro Yamada]
- Block-TLB (BTLB) support on 32-bit CPUs
- sparse and build-warning fixes
* tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
linux/export: fix reference to exported functions for parisc64
parisc: BTLB: Initialize BTLB tables at CPU startup
parisc: firmware: Simplify calling non-PA20 functions
parisc: BTLB: _edata symbol has to be page aligned for BTLB support
parisc: BTLB: Add BTLB insert and purge firmware function wrappers
parisc: BTLB: Clear possibly existing BTLB entries
parisc: Prepare for Block-TLB support on 32-bit kernel
parisc: shmparam.h: Document aliasing requirements of PA-RISC
parisc: irq: Make irq_stack_union static to avoid sparse warning
parisc: drivers: Fix sparse warning
parisc: iosapic.c: Fix sparse warnings
parisc: ccio-dma: Fix sparse warnings
parisc: sba-iommu: Fix sparse warnigs
parisc: sba: Fix compile warning wrt list of SBA devices
parisc: sba_iommu: Fix build warning if procfs if disabled
- Add missing LOCKDOWN checks for eventfs callers
When LOCKDOWN is active for tracing, it causes inconsistent state
when some functions succeed and others fail.
- Use dput() to free the top level eventfs descriptor
There was a race between accesses and freeing it.
- Fix a long standing bug that eventfs exposed due to changing timings
by dynamically creating files. That is, If a event file is opened
for an instance, there's nothing preventing the instance from being
removed which will make accessing the files cause use-after-free bugs.
- Fix a ring buffer race that happens when iterating over the ring
buffer while writers are active. Check to make sure not to read
the event meta data if it's beyond the end of the ring buffer sub buffer.
- Fix the print trigger that disappeared because the test to create it
was looking for the event dir field being filled, but now it has the
"ef" field filled for the eventfs structure.
- Remove the unused "dir" field from the event structure.
- Fix the order of the trace_dynamic_info as it had it backwards for the
offset and len fields for which one was for which endianess.
- Fix NULL pointer dereference with eventfs_remove_rec()
If an allocation fails in one of the eventfs_add_*() functions,
the caller of it in event_subsystem_dir() or event_create_dir()
assigns the result to the structure. But it's assigning the ERR_PTR
and not NULL. This was passed to eventfs_remove_rec() which expects
either a good pointer or a NULL, not ERR_PTR. The fix is to not
assign the ERR_PTR to the structure, but to keep it NULL on error.
- Fix list_for_each_rcu() to use list_for_each_srcu() in
dcache_dir_open_wrapper(). One iteration of the code used RCU
but because it had to call sleepable code, it had to be changed
to use SRCU, but one of the iterations was missed.
- Fix synthetic event print function to use "as_u64" instead of
passing in a pointer to the union. To fix big/little endian issues,
the u64 that represented several types was turned into a union to
define the types properly.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZQCvoBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtgrAP9MiYiCMU+90oJ+61DFchbs3y7BNidP
s3lLRDUMJ935NQD/SSAm54PqWb+YXMpD7m9+3781l6xqwfabBMXNaEl+FwA=
=tlZu
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Add missing LOCKDOWN checks for eventfs callers
When LOCKDOWN is active for tracing, it causes inconsistent state
when some functions succeed and others fail.
- Use dput() to free the top level eventfs descriptor
There was a race between accesses and freeing it.
- Fix a long standing bug that eventfs exposed due to changing timings
by dynamically creating files. That is, If a event file is opened for
an instance, there's nothing preventing the instance from being
removed which will make accessing the files cause use-after-free
bugs.
- Fix a ring buffer race that happens when iterating over the ring
buffer while writers are active. Check to make sure not to read the
event meta data if it's beyond the end of the ring buffer sub buffer.
- Fix the print trigger that disappeared because the test to create it
was looking for the event dir field being filled, but now it has the
"ef" field filled for the eventfs structure.
- Remove the unused "dir" field from the event structure.
- Fix the order of the trace_dynamic_info as it had it backwards for
the offset and len fields for which one was for which endianess.
- Fix NULL pointer dereference with eventfs_remove_rec()
If an allocation fails in one of the eventfs_add_*() functions, the
caller of it in event_subsystem_dir() or event_create_dir() assigns
the result to the structure. But it's assigning the ERR_PTR and not
NULL. This was passed to eventfs_remove_rec() which expects either a
good pointer or a NULL, not ERR_PTR. The fix is to not assign the
ERR_PTR to the structure, but to keep it NULL on error.
- Fix list_for_each_rcu() to use list_for_each_srcu() in
dcache_dir_open_wrapper(). One iteration of the code used RCU but
because it had to call sleepable code, it had to be changed to use
SRCU, but one of the iterations was missed.
- Fix synthetic event print function to use "as_u64" instead of passing
in a pointer to the union. To fix big/little endian issues, the u64
that represented several types was turned into a union to define the
types properly.
* tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()
tracing/synthetic: Print out u64 values properly
tracing/synthetic: Fix order of struct trace_dynamic_info
selftests/ftrace: Fix dependencies for some of the synthetic event tests
tracing: Remove unused trace_event_file dir field
tracing: Use the new eventfs descriptor for print trigger
ring-buffer: Do not attempt to read past "commit"
tracefs/eventfs: Free top level files on removal
ring-buffer: Avoid softlockup in ring_buffer_resize()
tracing: Have event inject files inc the trace array ref count
tracing: Have option files inc the trace array ref count
tracing: Have current_trace inc the trace array ref count
tracing: Have tracing_max_latency inc the trace array ref count
tracing: Increase trace array ref count on enable and filter files
tracefs/eventfs: Use dput to free the toplevel events directory
tracefs/eventfs: Add missing lockdown checks
tracefs: Add missing lockdown check to tracefs_create_dir()
After commit 50f303496d ("igb: Enable SR-IOV after reinit"), removing
the igb module could hang or crash (depending on the machine) when the
module has been loaded with the max_vfs parameter set to some value != 0.
In case of one test machine with a dual port 82580, this hang occurred:
[ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
[ 233.093257] igb 0000:41:00.1: IOV Disabled
[ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
[ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000
[ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First)
[ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c
[ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000
[ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First)
[ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c
[ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
[ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
[ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed
[ 234.157244] igb 0000:41:00.0: IOV Disabled
[ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
[ 371.627489] Not tainted 6.4.0-dirty #2
[ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
[ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0
[ 371.650330] Call Trace:
[ 371.653061] <TASK>
[ 371.655407] __schedule+0x20e/0x660
[ 371.659313] schedule+0x5a/0xd0
[ 371.662824] schedule_preempt_disabled+0x11/0x20
[ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0
[ 371.673237] ? __pfx_aer_root_reset+0x10/0x10
[ 371.678105] report_error_detected+0x25/0x1c0
[ 371.682974] ? __pfx_report_normal_detected+0x10/0x10
[ 371.688618] pci_walk_bus+0x72/0x90
[ 371.692519] pcie_do_recovery+0xb2/0x330
[ 371.696899] aer_process_err_devices+0x117/0x170
[ 371.702055] aer_isr+0x1c0/0x1e0
[ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0
[ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10
[ 371.715496] irq_thread_fn+0x20/0x60
[ 371.719491] irq_thread+0xe6/0x1b0
[ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10
[ 371.728255] ? __pfx_irq_thread+0x10/0x10
[ 371.732731] kthread+0xe2/0x110
[ 371.736243] ? __pfx_kthread+0x10/0x10
[ 371.740430] ret_from_fork+0x2c/0x50
[ 371.744428] </TASK>
The reproducer was a simple script:
#!/bin/sh
for i in `seq 1 5`; do
modprobe -rv igb
modprobe -v igb max_vfs=1
sleep 1
modprobe -rv igb
done
It turned out that this could only be reproduce on 82580 (quad and
dual-port), but not on 82576, i350 and i210. Further debugging showed
that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
dev->is_physfn is 0 on 82580.
Prior to commit 50f303496d ("igb: Enable SR-IOV after reinit"),
igb_enable_sriov() jumped into the "err_out" cleanup branch. After this
commit it only returned the error code.
So the cleanup didn't take place, and the incorrect VF setup in the
igb_adapter structure fooled the igb driver into assuming that VFs have
been set up where no VF actually existed.
Fix this problem by cleaning up again if pci_enable_sriov() fails.
Fixes: 50f303496d ("igb: Enable SR-IOV after reinit")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit in fixes introduced flags to control the status of hardware
configuration while processing packets. At the same time another structure
is used to provide configuration of timestamper to user-space applications.
The way it was coded makes this structures go out of sync easily. The
repro is easy for 82599 chips:
[root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 12
The eth0 device is properly configured to timestamp any PTPv2 events.
[root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1
current settings:
tx_type 1
rx_filter 12
SIOCSHWTSTAMP failed: Numerical result out of range
The requested time stamping mode is not supported by the hardware.
The error is properly returned because HW doesn't support all packets
timestamping. But the adapter->flags is cleared of timestamp flags
even though no HW configuration was done. From that point no RX timestamps
are received by user-space application. But configuration shows good
values:
[root@hostname ~]# hwstamp_ctl -i eth0
current settings:
tx_type 1
rx_filter 12
Fix the issue by applying new flags only when the HW was actually
configured.
Fixes: a9763f3cb5 ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been pointed out that naming a subsystem "genpd" isn't very
self-explanatory and the acronym itself that means Generic PM Domain, is
known only by a limited group of people.
In a way to improve the situation, let's rename the subsystem to pmdomain,
which ideally should indicate that this is about so called Power Domains or
"PM domains" as we often also use within the Linux Kernel terminology.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230912221127.487327-1-ulf.hansson@linaro.org
Kuniyuki Iwashima says:
====================
tcp: Fix bind() regression for v4-mapped-v6 address
Since bhash2 was introduced, bind() is broken in two cases related
to v4-mapped-v6 address.
This series fixes the regression and adds test to cover the cases.
Changes:
v2:
* Added patch 1 to factorise duplicated comparison (Eric Dumazet)
v1: https://lore.kernel.org/netdev/20230911165106.39384-1-kuniyu@amazon.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We add these 8 test cases in bind_wildcard.c to check bind() conflicts.
1st bind() 2nd bind()
--------- ---------
0.0.0.0 ::FFFF:0.0.0.0
::FFFF:0.0.0.0 0.0.0.0
0.0.0.0 ::FFFF:127.0.0.1
::FFFF:127.0.0.1 0.0.0.0
127.0.0.1 ::FFFF:0.0.0.0
::FFFF:0.0.0.0 127.0.0.1
127.0.0.1 ::FFFF:127.0.0.1
::FFFF:127.0.0.1 127.0.0.1
All test passed without bhash2 and with bhash2 and this series.
Before bhash2:
$ uname -r
6.0.0-rc1-00393-g0bf73255d3a3
$ ./bind_wildcard
...
# PASSED: 16 / 16 tests passed.
Just after bhash2:
$ uname -r
6.0.0-rc1-00394-g28044fc1d495
$ ./bind_wildcard
...
ok 15 bind_wildcard.v4_local_v6_v4mapped_local.v4_v6
not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4
# FAILED: 15 / 16 tests passed.
On net.git:
$ ./bind_wildcard
...
not ok 14 bind_wildcard.v4_local_v6_v4mapped_any.v6_v4
not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4
# FAILED: 13 / 16 tests passed.
With this series:
$ ./bind_wildcard
...
# PASSED: 16 / 16 tests passed.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparation patch for the following patch.
Let's define expected_errno in each test case so that we can add other test
cases easily.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The selftest passes the IPv6 address length for an IPv4 address.
We should pass the correct length.
Note inet_bind_sk() does not check if the size is larger than
sizeof(struct sockaddr_in), so there is no real bug in this
selftest.
Fixes: 13715acf8a ("selftest: Add test for bind() conflicts.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since bhash2 was introduced, the example below does not work as expected.
These two bind() should conflict, but the 2nd bind() now succeeds.
from socket import *
s1 = socket(AF_INET6, SOCK_STREAM)
s1.bind(('::ffff:127.0.0.1', 0))
s2 = socket(AF_INET, SOCK_STREAM)
s2.bind(('127.0.0.1', s1.getsockname()[1]))
During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find()
fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates
a new tb2 for the 2nd socket. Then, we call inet_csk_bind_conflict() that
checks conflicts in the new tb2 by inet_bhash2_conflict(). However, the
new tb2 does not include the 1st socket, thus the bind() finally succeeds.
In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has
the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find()
returns the 1st socket's tb2.
Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1,
the 2nd bind() fails properly for the same reason mentinoed in the previous
commit.
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrei Vagin reported bind() regression with strace logs.
If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.
from socket import *
s1 = socket(AF_INET6, SOCK_STREAM)
s1.bind(('::ffff:0.0.0.0', 0))
s2 = socket(AF_INET, SOCK_STREAM)
s2.bind(('127.0.0.1', s1.getsockname()[1]))
During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
if tb has the v4-mapped-v6 wildcard address.
The example above does not work after commit 5456262d2b ("net: Fix
incorrect address comparison when searching for a bind2 bucket"), but
the blamed change is not the commit.
Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
as 0.0.0.0, and the sequence above worked by chance. Technically, this
case has been broken since bhash2 was introduced.
Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
the 2nd bind() fails properly because we fall back to using bhash to
detect conflicts for the v4-mapped-v6 address.
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Andrei Vagin <avagin@google.com>
Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a prep patch to make the following patches cleaner that touch
inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any().
Both functions have duplicated comparison for netns, port, and l3mdev.
Let's factorise them.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix missing or extra function parameter kernel-doc warnings
in cgroup.c:
kernel/bpf/cgroup.c:1359: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_skb'
kernel/bpf/cgroup.c:1359: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_skb'
kernel/bpf/cgroup.c:1439: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sk'
kernel/bpf/cgroup.c:1439: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sk'
kernel/bpf/cgroup.c:1467: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sock_addr'
kernel/bpf/cgroup.c:1467: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sock_addr'
kernel/bpf/cgroup.c:1512: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sock_ops'
kernel/bpf/cgroup.c:1512: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sock_ops'
kernel/bpf/cgroup.c:1685: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sysctl'
kernel/bpf/cgroup.c:1685: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sysctl'
kernel/bpf/cgroup.c:795: warning: Excess function parameter 'type' description in '__cgroup_bpf_replace'
kernel/bpf/cgroup.c:795: warning: Function parameter or member 'new_prog' not described in '__cgroup_bpf_replace'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230912060812.1715-1-rdunlap@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 1d56ade032 changed the function get_unpriv_disabled() to
return its results as a bool instead of updating a global variable, but
test_verifier was not updated to keep in line with these changes. Thus
unpriv_disabled is always false in test_verifier and unprivileged tests
are not properly skipped on systems with unprivileged bpf disabled.
Fixes: 1d56ade032 ("selftests/bpf: Unprivileged tests for test_loader.c")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230912120631.213139-1-asavkov@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit d2e8071bed ("tpm: make all 'class' structures const")
unfortunately had a typo for the name on tpmrm.
Fixes: d2e8071bed ("tpm: make all 'class' structures const")
Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmT/hwAACgkQxWXV+ddt
WDsn7hAAngwEMKEAH9Jvu/BtHgRYcAdsGh5Mxw34aQf1+DAaH03GGsZjN6hfHYo4
FMsnnvoZD5VPfuaFaQVd+mS9mRzikm503W7KfZFAPAQTOjz50RZbohLnZWa3eFbI
46OcpoHusxwoYosEmIAt+dcw/gDlT9fpj+W11dKYtwOEjCqGA/OeKoVenfk38hVJ
r+XhLwZFf4dPIqE3Ht26UtJk87Xs2X0/LQxOX3vM1MZ+l38N4dyo7TQnwfTHlQNw
AK9sK6vp3rpRR96rvTV1dWr9lnmE7wky+Vh36DN/jxpzbW7Wx8IVoobBpcsO4Tyk
Vw/rdjB7g7LfBmjLFhWvvQ73jv0WjIUUzXH17RuxOeyAQJ9tXFztVMh+QoVVC/Ka
NxwA5uqyJKR7DIA+kLL06abUnASUVgP6Krdv9Fk7rYCKWluWk1k9ls9XaFFhytvg
eeno/UB0px1rwps5P5zfaSXLIXEl53Luy5rFhTMCCNQfXyo+Qe6PJyTafR3E0uP8
aXJV1lPG+o7qi9Vwg+20yy//1sE5gR0dLrcTaup3/20RK6eljZ/bNSkl3GJR9mlS
YF+J/Ccia06y8Qo0xaeCofxkoI3J/PK6KPOTt8yZDgYoetYgHhrfBRO0I7ZU4Edq
10512hAeskzPt6+5348+/jOEENASffXKP3FJSdDEzWd33vtlaHE=
=mHTa
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- several fixes for handling directory item (inserting, removing,
iteration, error handling)
- fix transaction commit stalls when auto relocation is running and
blocks other tasks that want to commit
- fix a build error when DEBUG is enabled
- fix lockdep warning in inode number lookup ioctl
- fix race when finishing block group creation
- remove link to obsolete wiki in several files
* tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org
btrfs: assert delayed node locked when removing delayed item
btrfs: remove BUG() after failure to insert delayed dir index item
btrfs: improve error message after failure to add delayed dir index item
btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folio
btrfs: check for BTRFS_FS_ERROR in pending ordered assert
btrfs: fix lockdep splat and potential deadlock after failure running delayed items
btrfs: do not block starts waiting on previous transaction commit
btrfs: release path before inode lookup during the ino lookup ioctl
btrfs: fix race between finishing block group creation and its item update
Highlights
- Various platform/mellanox fixes
- 1 new DMI quirk for asus-wmi
The following is an automated git shortlog grouped by driver:
asus-wmi:
- Support 2023 ROG X16 tablet mode
platform/mellanox:
- NVSW_SN2201 should depend on ACPI
- mlxbf-bootctl: add NET dependency into Kconfig
- mlxbf-pmc: Fix reading of unprogrammed events
- mlxbf-pmc: Fix potential buffer overflows
- mlxbf-tmfifo: Drop jumbo frames
- mlxbf-tmfifo: Drop the Rx packet if no more descriptors
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmUAHPsUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9xDXAf/elg9EnavL2jAFEd9npGvIvauaCe6
53Wt+82NIAW6lLtU4i3wwlFwIua2sRFFV7vCayIYph3mwX9zFaMLc7z2Tlcruo3W
yMOGKy4uNxBVxl2eB1dIpUaJbgL3TRjObuhEqIEVN3e2pi/t1n1GWPy4lo0XGtmT
OjhFhkVYAacWT5e8F/gIOoF0j6JS58RDagPMm7QU9fqZrhlFnUjsz2mWGYk4Y8GV
sa1g5tpkLrQlARR+Q1n4Pur05Q8RGTAo0knNJbAlgd29XneCOzovIlTTIMAu6qhR
+okZA/+9IyNiVABuOdOPJdhkiym1wFK1EtJ1R2z7rKrWsW+u0egquKjWPw==
=vW3K
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- various platform/mellanox fixes
- one new DMI quirk for asus-wmi
* tag 'platform-drivers-x86-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Support 2023 ROG X16 tablet mode
platform/mellanox: NVSW_SN2201 should depend on ACPI
platform/mellanox: mlxbf-bootctl: add NET dependency into Kconfig
platform/mellanox: mlxbf-pmc: Fix reading of unprogrammed events
platform/mellanox: mlxbf-pmc: Fix potential buffer overflows
platform/mellanox: mlxbf-tmfifo: Drop jumbo frames
platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors
ip6_sock_set_addr_preferences() second argument should be an integer.
SUNRPC attempts to set IPV6_PREFER_SRC_PUBLIC were
translated to IPV6_PREFER_SRC_TMP
Fixes: 18d5ad6232 ("ipv6: add ip6_sock_set_addr_preferences")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230911154213.713941-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This kselftest fixes update for Linux 6.6-rc2 consists of fixes
-- kselftest runner script to propagate SIGTERM to runner child
to avoid kselftest hang.
-- to install symlinks required for test execution to avoid test
failures.
-- kselftest dependency checker script argument parsing.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmT/StAACgkQCwJExA0N
QxxeZRAA1wBztlE9JF9cBcjXICXXlCBIGL3tfPMJJrkv5KezZPegzdOwXQfEfHyp
2bDtzpamrnDUNtV5xbCu+NVWhlLvyNHb9Irr/wcOLLopsLxtvnodVyrPbOs2Spsd
dV6szfMEP7+sbjlSjPD/96OPRLNRcPEFiVr/bjjzMbpAO0AjDjjONKSeN6zce4K/
7hsh4EZjb7mUnKLDi+ZF2+HTKEBiyejpOC2zvEoL5nba7voVxy/bNYarbpAMas7u
XkYSrxPMQM5moA6MYs+As0IVwlEDk+4XGgYN9Z5eTgai5nz1q/8mNxGvlK2pATI+
vWo1yRAdzDLuBSpesXjocAZWMxR3BD7BhbF3IAWKVkotbwiB0zZGy0umta9qF/e8
izT298QA9YiVz+Um/sCirpZGdegtaMKUcFN+AyGbIPzd/xmDLWoUlIy1qlg3dPW1
ZtiEMgNqqvJgi3bh/kjLUPUoMdswkM3Zuhyn9sBq+Z215XoyJqVt2vW0NrlTuTf8
3dunqIDxcyhaGvnCDBDvH80TArg91eERtRrWe9aISYH5Y4IZl7kXRtFlCGi8c48M
hP0kYpenbo/rnU8GoR+Yuo2aNma7YBYg3ZJwGJAE55ZgG4euNYXWv4NsVYyzDRTF
dtwl5GCggYGtCRDQXNEmxAIuEZgCH4PIAt742CgkrVxEI+lrWlQ=
=8/pm
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- kselftest runner script to propagate SIGTERM to runner child
to avoid kselftest hang
- install symlinks required for test execution to avoid test
failures
- kselftest dependency checker script argument parsing
* tag 'linux-kselftest-next-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Keep symlinks, when possible
selftests: fix dependency checker script
kselftest/runner.sh: Propagate SIGTERM to runner child
selftests/ftrace: Correctly enable event in instance-event.tc
This kunit update for Linux 6.6-rc2 consists of important fixes to
possible memory leak, null-ptr-deref, wild-memory-access, and error
path bugs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmT/PB8ACgkQCwJExA0N
QxwWCg//fPTPHu59xQ/SOeVFEeaKGB77FwnmG+JIdryd+4oVEnBehk/4TZCZTvMe
v7wLwaB9WOxkFWztWIlEEsYApZoFbLSWYx8iEua5QPaPBI81XAZGYv1j9u6z/vCn
62boIq+TVusokEuNzS6zb8ZKd8A44zUTswdfVmiZ7Itf0HSe4xYNFdhDU93AtZOo
oHtR3ok6dTj/lYp4Sp5BC4JRog3OhSdme2+whLfgH9YY1HRTHBxScSHYDCTn9OPT
ZbaZJW4PJsw1mx7fF/ZS5Zuo1zey27LoPuEJu9C46kTLSXEGWAO8L5rV9G2gJLHg
LAMjvcgfkEDgWhrWRaFcP2tsYqn+Dxsb/Yvmrtq3tFJDYbatUTABxhuO/6T8wbfY
YS8ksIdXz8koDTI2ShpSd+1Gv7KEjXM7XhoO8GVcVg3pVL/niKgPLdxEsE8dgaoE
XRip1IJBs8A+9c0xE7325wR6op6IKD0xM0xGTTjYEbGHqKjZJgg72zbCBPMGcWDZ
mGlv53sPlDTRh0B8B/M+LnUpdJ5gRMf800CgDl3mKUN1b8fvqmsW3HjO5wWNo/cw
I9JSJUrSxkYHUbJVtMasrhseRHY+V+ysvvzl/Z39RoVMPMonh+EIY6dG07aAU6lA
jIIL6k1KwJnox4+dFgkShrnNqcZF/6INok0wzzSnJBwBKKD8Aac=
=wiS6
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"Fixes to possible memory leak, null-ptr-deref, wild-memory-access, and
error path bugs"
* tag 'linux-kselftest-kunit-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Fix possible memory leak in kunit_filter_suites()
kunit: Fix possible null-ptr-deref in kunit_parse_glob_filter()
kunit: Fix the wrong err path and add goto labels in kunit_filter_suites()
kunit: Fix wild-memory-access bug in kunit_free_suite_set()
kunit: test: Make filter strings in executor_test writable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmT+s8AACgkQEVvVyTe/
1Wr3GA/+IiTHYJSqHOjKvAWs8pVX3rcmJxoodPidOFAncPKzw5GlD7Lo0C8Qgmie
CEzg8kSfMPg0n7VpywOCiA1n37gBJbmHdQ0108OxJA65IRna4K6qcyBQvfOGXq9u
Qx360cTCJFGBITkkRmg8RZQKF+Dj3nd0vHn3feGkPftL113fhZTZ1uWFxyXIGsuu
eki8wW3WgZFvS71Pp9nZEVFd/HPJukd03LKHQlSnQQBnCJJjZhvM6O2Y4o4lFznj
aWTIHQdowz00Mj1kFEM46cGFDg3SwFtdOizpRrWxL0oOVnElIo8mRAxtf3gz4081
fvvhvYG5QEgSUrq7VfuGxaxlh2tyHgc8gh7lNXC0JCGSjc2lHGosvniFcRo6ecgU
7UwT+rX0odWzbSDH6TrMHPZsBSS/siKWreii63HUlMOo0mSorTzA20EK7f9qoXTA
dgvMD7cbFiEOjwlS9SbIjMMuvYM5VykCyWuniBqicA5UzUj2/K5DG6apkMK/MGbn
DU/r0HYROqdggk920i/Yyv4GoS6uERfELpkoJr9q7Lx1+wAkRGbNUpUe408Wp411
I66Ynie48oBmlDfU3LiyW9b3OPbFMPKE3WTPIngJurWRoHFXunxdkArfQi+rAUpx
cmC5CovFAaVSL3HyhWAXh5lMbe1KjUasQM8ywTyhki2wZzrAI7U=
=F/OX
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
"Two fixes for pretty old regressions"
* tag 'ovl-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix incorrect fdput() on aio completion
ovl: fix failed copyup of fileattr on a symlink
John David Anglin reported parisc has been broken since commit
ddb5cdbafa ("kbuild: generate KSYMTAB entries by modpost").
Like ia64, parisc64 uses a function descriptor. The function
references must be prefixed with P%.
Also, symbols prefixed $$ from the library have the symbol type
STT_LOPROC instead of STT_FUNC. They should be handled as functions
too.
Fixes: ddb5cdbafa ("kbuild: generate KSYMTAB entries by modpost")
Reported-by: John David Anglin <dave.anglin@bell.net>
Tested-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Helge Deller <deller@gmx.de>
Closes: https://lore.kernel.org/linux-parisc/1901598a-e11d-f7dd-a5d9-9a69d06e6b6e@bell.net/T/#u
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
There's an early return in veth_set_features() if the device is in a down
state, which leads to the XDP feature flags not being updated when enabling
GRO while the device is down. Which in turn leads to XDP_REDIRECT not
working, because the redirect code now checks the flags.
Fix this by updating the feature flags after bringing the device up.
Before this patch:
NETDEV_XDP_ACT_BASIC: yes
NETDEV_XDP_ACT_REDIRECT: yes
NETDEV_XDP_ACT_NDO_XMIT: no
NETDEV_XDP_ACT_XSK_ZEROCOPY: no
NETDEV_XDP_ACT_HW_OFFLOAD: no
NETDEV_XDP_ACT_RX_SG: yes
NETDEV_XDP_ACT_NDO_XMIT_SG: no
After this patch:
NETDEV_XDP_ACT_BASIC: yes
NETDEV_XDP_ACT_REDIRECT: yes
NETDEV_XDP_ACT_NDO_XMIT: yes
NETDEV_XDP_ACT_XSK_ZEROCOPY: no
NETDEV_XDP_ACT_HW_OFFLOAD: no
NETDEV_XDP_ACT_RX_SG: yes
NETDEV_XDP_ACT_NDO_XMIT_SG: yes
Fixes: fccca038f3 ("veth: take into account device reconfiguration for xdp_features flag")
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20230911135826.722295-1-toke@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
macb_set_tx_clk() is called under a spinlock but itself calls clk_set_rate()
which can sleep. This results in:
| BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
| pps pps1: new PPS source ptp1
| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 40, name: kworker/u4:3
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| 4 locks held by kworker/u4:3/40:
| #0: ffff000003409148
| macb ff0c0000.ethernet: gem-ptp-timer ptp clock registered.
| ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c
| #1: ffff8000833cbdd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c
| #2: ffff000004f01578 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x44/0x4e8
| #3: ffff000004f06f50 (&bp->lock){....}-{3:3}, at: macb_mac_link_up+0x40/0x2ac
| irq event stamp: 113998
| hardirqs last enabled at (113997): [<ffff800080e8503c>] _raw_spin_unlock_irq+0x30/0x64
| hardirqs last disabled at (113998): [<ffff800080e84478>] _raw_spin_lock_irqsave+0xac/0xc8
| softirqs last enabled at (113608): [<ffff800080010630>] __do_softirq+0x430/0x4e4
| softirqs last disabled at (113597): [<ffff80008001614c>] ____do_softirq+0x10/0x1c
| CPU: 0 PID: 40 Comm: kworker/u4:3 Not tainted 6.5.0-11717-g9355ce8b2f50-dirty #368
| Hardware name: ... ZynqMP ... (DT)
| Workqueue: events_power_efficient phylink_resolve
| Call trace:
| dump_backtrace+0x98/0xf0
| show_stack+0x18/0x24
| dump_stack_lvl+0x60/0xac
| dump_stack+0x18/0x24
| __might_resched+0x144/0x24c
| __might_sleep+0x48/0x98
| __mutex_lock+0x58/0x7b0
| mutex_lock_nested+0x24/0x30
| clk_prepare_lock+0x4c/0xa8
| clk_set_rate+0x24/0x8c
| macb_mac_link_up+0x25c/0x2ac
| phylink_resolve+0x178/0x4e8
| process_one_work+0x1ec/0x51c
| worker_thread+0x1ec/0x3e4
| kthread+0x120/0x124
| ret_from_fork+0x10/0x20
The obvious fix is to move the call to macb_set_tx_clk() out of the
protected area. This seems safe as rx and tx are both disabled anyway at
this point.
It is however not entirely clear what the spinlock shall protect. It
could be the read-modify-write access to the NCFGR register, but this
is accessed in macb_set_rx_mode() and macb_set_rxcsum_feature() as well
without holding the spinlock. It could also be the register accesses
done in mog_init_rings() or macb_init_buffers(), but again these
functions are called without holding the spinlock in macb_hresp_error_task().
The locking seems fishy in this driver and it might deserve another look
before this patch is applied.
Fixes: 633e98a711 ("net: macb: use resolved link config in mac_link_up()")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20230908112913.1701766-1-s.hauer@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I got the below warning when do fuzzing test:
BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9
CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE
Hardware name: linux,dummy-virt (DT)
Workqueue: pencrypt_parallel padata_parallel_worker
Call trace:
dump_backtrace+0x0/0x420
show_stack+0x34/0x44
dump_stack+0x1d0/0x248
__kasan_report+0x138/0x140
kasan_report+0x44/0x6c
__asan_load4+0x94/0xd0
scatterwalk_copychunks+0x320/0x470
skcipher_next_slow+0x14c/0x290
skcipher_walk_next+0x2fc/0x480
skcipher_walk_first+0x9c/0x110
skcipher_walk_aead_common+0x380/0x440
skcipher_walk_aead_encrypt+0x54/0x70
ccm_encrypt+0x13c/0x4d0
crypto_aead_encrypt+0x7c/0xfc
pcrypt_aead_enc+0x28/0x84
padata_parallel_worker+0xd0/0x2dc
process_one_work+0x49c/0xbdc
worker_thread+0x124/0x880
kthread+0x210/0x260
ret_from_fork+0x10/0x18
This is because the value of rec_seq of tls_crypto_info configured by the
user program is too large, for example, 0xffffffffffffff. In addition, TLS
is asynchronously accelerated. When tls_do_encryption() returns
-EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
skmsg is released before the asynchronous encryption process ends. As a
result, the UAF problem occurs during the asynchronous processing of the
encryption module.
If the operation is asynchronous and the encryption module returns
EINPROGRESS, do not free the record information.
Fixes: 635d939817 ("net/tls: free record only on encryption error")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eduard Zingerman says:
====================
For a device bound BPF program with flag BPF_F_XDP_DEV_BOUND_ONLY,
in case if device does not support offload, __bpf_prog_dev_bound_init()
creates a dummy bpf_offload_netdev struct with .offdev field set to NULL.
This dummy struct might be reused for programs without this flag
bound to the same device. However, bpf_prog_offload_verifier_prep()
that uses bpf_offload_netdev assumes that .offdev field cannot be NULL.
This bug was reported by syzbot in [1].
[1] https://lore.kernel.org/bpf/000000000000d97f3c060479c4f8@google.com/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Fix for a bug observable under the following sequence of events:
1. Create a network device that does not support XDP offload.
2. Load a device bound XDP program with BPF_F_XDP_DEV_BOUND_ONLY flag
(such programs are not offloaded).
3. Load a device bound XDP program with zero flags
(such programs are offloaded).
At step (2) __bpf_prog_dev_bound_init() associates with device (1)
a dummy bpf_offload_netdev struct with .offdev field set to NULL.
At step (3) __bpf_prog_dev_bound_init() would reuse dummy struct
allocated at step (2).
However, downstream usage of the bpf_offload_netdev assumes that
.offdev field can't be NULL, e.g. in bpf_prog_offload_verifier_prep().
Adjust __bpf_prog_dev_bound_init() to require bpf_offload_netdev
with non-NULL .offdev for offloaded BPF programs.
Fixes: 2b3486bc2d ("bpf: Introduce device-bound XDP programs")
Reported-by: syzbot+291100dcb32190ec02a8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/000000000000d97f3c060479c4f8@google.com/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230912005539.2248244-2-eddyz87@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The eventfs files list is protected by SRCU. In earlier iterations it was
protected with just RCU, but because it needed to also call sleepable
code, it had to be switch to SRCU. The dcache_dir_open_wrapper()
list_for_each_rcu() was missed and did not get converted over to
list_for_each_srcu(). That needs to be fixed.
Link: https://lore.kernel.org/linux-trace-kernel/20230911120053.ca82f545e7f46ea753deda18@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20230911200654.71ce927c@gandalf.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <akaher@vmware.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Sysbot discovered that the queue and stack maps can deadlock if they are
being used from a BPF program that can be called from NMI context (such as
one that is attached to a perf HW counter event). To fix this, add an
in_nmi() check and use raw_spin_trylock() in NMI context, erroring out if
grabbing the lock fails.
Fixes: f1a2e44a3a ("bpf: add queue and stack maps")
Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Tested-by: Hsin-Wei Hung <hsinweih@uci.edu>
Co-developed-by: Hsin-Wei Hung <hsinweih@uci.edu>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20230911132815.717240-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To make handling BIG and LITTLE endian better the offset/len of dynamic
fields of the synthetic events was changed into a structure of:
struct trace_dynamic_info {
#ifdef CONFIG_CPU_BIG_ENDIAN
u16 offset;
u16 len;
#else
u16 len;
u16 offset;
#endif
};
to replace the manual changes of:
data_offset = offset & 0xffff;
data_offest = len << 16;
But if you look closely, the above is:
<len> << 16 | offset
Which in little endian would be in memory:
offset_lo offset_hi len_lo len_hi
and in big endian:
len_hi len_lo offset_hi offset_lo
Which if broken into a structure would be:
struct trace_dynamic_info {
#ifdef CONFIG_CPU_BIG_ENDIAN
u16 len;
u16 offset;
#else
u16 offset;
u16 len;
#endif
};
Which is the opposite of what was defined.
Fix this and just to be safe also add "__packed".
Link: https://lore.kernel.org/all/20230908154417.5172e343@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230908163929.2c25f3dc@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ddeea494a1 ("tracing/synthetic: Use union instead of casts")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Commit 151e887d8f ("veth: Fixing transmit return status for dropped
packets") started propagating proper NET_XMIT_DROP error to the caller
which means it's now possible to get positive error code when calling
bpf_clone_redirect() in this particular test. Update the test to reflect
that.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230911194731.286342-2-sdf@google.com
Commit 151e887d8f ("veth: Fixing transmit return status for dropped
packets") exposed the fact that bpf_clone_redirect is capable of
returning raw NET_XMIT_XXX return codes.
This is in the conflict with its UAPI doc which says the following:
"0 on success, or a negative error in case of failure."
Update the UAPI to reflect the fact that bpf_clone_redirect can
return positive error numbers, but don't explicitly define
their meaning.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com
Hou Tao says:
====================
Fix the unmatched unit_size of bpf_mem_cache
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset aims to fix the reported warning [0] when the unit_size of
bpf_mem_cache is mismatched with the object size of underly slab-cache.
Patch #1 fixes the warning by adjusting size_index according to the
value of KMALLOC_MIN_SIZE, so bpf_mem_cache with unit_size which is
smaller than KMALLOC_MIN_SIZE or is not aligned with KMALLOC_MIN_SIZE
will be redirected to bpf_mem_cache with bigger unit_size. Patch #2
doesn't do prefill for these redirected bpf_mem_cache to save memory.
Patch #3 adds further error check in bpf_mem_alloc_init() to ensure the
unit_size and object_size are always matched and to prevent potential
issues due to the mismatch.
Please see individual patches for more details. And comments are always
welcome.
[0]: https://lore.kernel.org/bpf/87jztjmmy4.fsf@all.your.base.are.belong.to.us
====================
Link: https://lore.kernel.org/r/20230908133923.2675053-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test to test all possible and valid allocation size for bpf
memory allocator. For each possible allocation size, the test uses
the following two steps to test the alloc and free path:
1) allocate N (N > high_watermark) objects to trigger the refill
executed in irq_work.
2) free N objects to trigger the freeing executed in irq_work.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230908133923.2675053-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add extra check in bpf_mem_alloc_init() to ensure the unit_size of
bpf_mem_cache is matched with the object_size of underlying slab cache.
If these two sizes are unmatched, print a warning once and return
-EINVAL in bpf_mem_alloc_init(), so the mismatch can be found early and
the potential issue can be prevented.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230908133923.2675053-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When the unit_size of a bpf_mem_cache is unmatched with the object_size
of the underlying slab cache, the bpf_mem_cache will not be used, and
the allocation will be redirected to a bpf_mem_cache with a bigger
unit_size instead, so there is no need to prefill for these
unused bpf_mem_caches.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230908133923.2675053-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The following warning was reported when running "./test_progs -a
link_api -a linked_list" on a RISC-V QEMU VM:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342 bpf_mem_refill
Modules linked in: bpf_testmod(OE)
CPU: 3 PID: 261 Comm: test_progs- ... 6.5.0-rc5-01743-gdcb152bb8328 #2
Hardware name: riscv-virtio,qemu (DT)
epc : bpf_mem_refill+0x1fc/0x206
ra : irq_work_single+0x68/0x70
epc : ffffffff801b1bc4 ra : ffffffff8015fe84 sp : ff2000000001be20
gp : ffffffff82d26138 tp : ff6000008477a800 t0 : 0000000000046600
t1 : ffffffff812b6ddc t2 : 0000000000000000 s0 : ff2000000001be70
s1 : ff5ffffffffe8998 a0 : ff5ffffffffe8998 a1 : ff600003fef4b000
a2 : 000000000000003f a3 : ffffffff80008250 a4 : 0000000000000060
a5 : 0000000000000080 a6 : 0000000000000000 a7 : 0000000000735049
s2 : ff5ffffffffe8998 s3 : 0000000000000022 s4 : 0000000000001000
s5 : 0000000000000007 s6 : ff5ffffffffe8570 s7 : ffffffff82d6bd30
s8 : 000000000000003f s9 : ffffffff82d2c5e8 s10: 000000000000ffff
s11: ffffffff82d2c5d8 t3 : ffffffff81ea8f28 t4 : 0000000000000000
t5 : ff6000008fd28278 t6 : 0000000000040000
[<ffffffff801b1bc4>] bpf_mem_refill+0x1fc/0x206
[<ffffffff8015fe84>] irq_work_single+0x68/0x70
[<ffffffff8015feb4>] irq_work_run_list+0x28/0x36
[<ffffffff8015fefa>] irq_work_run+0x38/0x66
[<ffffffff8000828a>] handle_IPI+0x3a/0xb4
[<ffffffff800a5c3a>] handle_percpu_devid_irq+0xa4/0x1f8
[<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
[<ffffffff800ae570>] ipi_mux_process+0xac/0xfa
[<ffffffff8000a8ea>] sbi_ipi_handle+0x2e/0x88
[<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
[<ffffffff807ee70e>] riscv_intc_irq+0x36/0x4e
[<ffffffff812b5d3a>] handle_riscv_irq+0x54/0x86
[<ffffffff812b6904>] do_irq+0x66/0x98
---[ end trace 0000000000000000 ]---
The warning is due to WARN_ON_ONCE(tgt->unit_size != c->unit_size) in
free_bulk(). The direct reason is that a object is allocated and
freed by bpf_mem_caches with different unit_size.
The root cause is that KMALLOC_MIN_SIZE is 64 and there is no 96-bytes
slab cache in the specific VM. When linked_list test allocates a
72-bytes object through bpf_obj_new(), bpf_global_ma will allocate it
from a bpf_mem_cache with 96-bytes unit_size, but this bpf_mem_cache is
backed by 128-bytes slab cache. When the object is freed, bpf_mem_free()
uses ksize() to choose the corresponding bpf_mem_cache. Because the
object is allocated from 128-bytes slab cache, ksize() returns 128,
bpf_mem_free() chooses a 128-bytes bpf_mem_cache to free the object and
triggers the warning.
A similar warning will also be reported when using CONFIG_SLAB instead
of CONFIG_SLUB in a x86-64 kernel. Because CONFIG_SLUB defines
KMALLOC_MIN_SIZE as 8 but CONFIG_SLAB defines KMALLOC_MIN_SIZE as 32.
An alternative fix is to use kmalloc_size_round() in bpf_mem_alloc() to
choose a bpf_mem_cache which has the same unit_size with the backing
slab cache, but it may introduce performance degradation, so fix the
warning by adjusting the indexes in size_index according to the value of
KMALLOC_MIN_SIZE just like setup_kmalloc_cache_index_table() does.
Fixes: 822fb26bdb ("bpf: Add a hint to allocated objects.")
Reported-by: Björn Töpel <bjorn@kernel.org>
Closes: https://lore.kernel.org/bpf/87jztjmmy4.fsf@all.your.base.are.belong.to.us
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230908133923.2675053-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add quirk for ASUS ROG X16 (GV601V, 2023 versions) Flow 2-in-1
to enable tablet mode with lid flip (all screen rotations).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230905082813.13470-1-luke@ljones.dev
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The only probing method supported by the Nvidia SN2201 platform driver
is probing through an ACPI match table. Hence add a dependency on
ACPI, to prevent asking the user about this driver when configuring a
kernel without ACPI support.
Fixes: 662f24826f ("platform/mellanox: Add support for new SN2201 system")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Vadim Pasternak <vadimp@nvidia.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/ec5a4071691ab08d58771b7732a9988e89779268.1693828363.git.geert+renesas@glider.be
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The latest version of the mlxbf_bootctl driver utilizes
"sysfs_format_mac", and this API is only available if
NET is defined in the kernel configuration. This patch
changes the mlxbf_bootctl Kconfig to depend on NET.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309031058.JvwNDBKt-lkp@intel.com/
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20230905133243.31550-1-davthompson@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
This fix involves 2 changes:
- All event regs have a reset value of 0, which is not a valid
event_number as per the event_list for most blocks and hence seen
as an error. Add a "disable" event with event_number 0 for all blocks.
- The enable bit for each counter need not be checked before
reading the event info, and hence removed.
Fixes: 1a218d312e ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver")
Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/04d0213932d32681de1c716b54320ed894e52425.1693917738.git.shravankr@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>