- swiotlb area sizing fixes (Petr Tesarik)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmSq57sLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOh0xAAkwklIxxzXvNlgjvy2hdgPWImPS8tGPDSIsqA9TDD
WDZq89nz/ndnchdPObDvyJXmfBgqa0qCHqopBVPqMKv5a1pKZhrRXYlbajGQQwji
MIqefTLZ/VGw7bDpEivOt+yadwQ3KxuxWWs7/JPKLLReSJ22H8P+jkrK7P7kBL5X
YaMtZG9d86fvFHnQHKdAOlF1iCvnoZDHPcvaZbI6m5mMSZ+HIYqK5pP1MTUEAbIU
MX4ZSI7/mL0q67+kZuM/NCSeq1pH0Cd0D2DGm+8k/y87G81GS6E5Wgg+xO7vYiXf
5ChuwlAO9K9KhH7NIRkKhkad/Ii89ENXSyv5gmPRoKYK5FXajnHSlJTUrZojV6XC
Pbsd9ATVzV0rY61EPyh6G1a+Ttp/pwMp+W0I2fi032GVAePQ/vhB9x9O+2O/3QiC
v80nUSatkciZncWqkguhp3NONsRmLKep3CCQnEAA/gLs27B0avdQeslnqbOOUQKd
Si+djIoel8ONjQ+mW8eFCsVYMH1xFSo0aGsgGe0y2cyBE3DN1AW9eRnOXWT4C1PR
UyDlx8ACs87ojec+YRQFYk2/PbsU7CQiH1pteXvBHcbhiVUAvrtXtg6ANQ+7066P
IIduAZmlHcHb1BhyrSQbAtRllVLIp/l9IAkCSY9SvL0tjh/B5CaRBD5m2Taow5I/
KUI=
=4Lfc
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix bad git merge of #endif in arm64 code
A merge of the arm64 tree caused #endif to go into the wrong place
- Fix crash on lseek of write access to tracefs/error_log
Opening error_log as write only, and then doing an lseek() causes
a kernel panic, because the lseek() handle expects a "seq_file"
to exist (which is not done on write only opens). Use tracing_lseek()
that tests for this instead of calling the default seq lseek handler.
- Check for negative instead of -E2BIG for error on strscpy() returns
Instead of testing for -E2BIG from strscpy(), to be more robust,
check for less than zero, which will make sure it catches any error
that strscpy() may someday return.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZKbQUhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qpWaAP9zQ1eLQSfMt0dHH01OBSJvc2mMd4QJ
VZtWZ+xTSvk+4gD/axDzDS7Qisfrrli+1oQSPwVik2SXiz0SPJqJ25m9zw4=
=xMlg
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix bad git merge of #endif in arm64 code
A merge of the arm64 tree caused #endif to go into the wrong place
- Fix crash on lseek of write access to tracefs/error_log
Opening error_log as write only, and then doing an lseek() causes a
kernel panic, because the lseek() handle expects a "seq_file" to
exist (which is not done on write only opens). Use tracing_lseek()
that tests for this instead of calling the default seq lseek handler.
- Check for negative instead of -E2BIG for error on strscpy() returns
Instead of testing for -E2BIG from strscpy(), to be more robust,
check for less than zero, which will make sure it catches any error
that strscpy() may someday return.
* tag 'trace-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/boot: Test strscpy() against less than zero for error
arm64: ftrace: fix build error with CONFIG_FUNCTION_GRAPH_TRACER=n
tracing: Fix null pointer dereference in tracing_err_log_open()
These are cleanups for architecture specific header files:
- the comments in include/linux/syscalls.h have gone out of sync
and are really pointless, so these get removed
- The asm/bitsperlong.h header no longer needs to be architecture
specific on modern compilers, so use a generic version for newer
architectures that use new enough userspace compilers
- A cleanup for virt_to_pfn/virt_to_bus to have proper type
checking, forcing the use of pointers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSl138ACgkQYKtH/8kJ
UieqWxAA2WjNVfyuieYckglOVE0PZPs2fzCwyzTY5iUTH3gE5cBFWJDWcg2EnouG
v3X3htEQcowYWaCF9+rypQXaGiSx4WXi2Bjxnz3D/BcreqWPI4eSQ0fpGG5SURTY
2zYF72GTt4JGR++l+7/R9MZwPbwYDT9BsD5tkel8PxnyVLM6/c5xFvbjzRSKFE8x
SMN1jGZ62ITLNf/8coAOEPNxBYtDT6yQyu7P2sx5cd65LAQq9yLKjFklnBBovgWT
OoCIZAdGkhcNwOh1LjyHcdNdpfNJGceKyqKPqty07IhCQuF2jxiyFYFzuBbeyQfE
S0itN8o/MIfUmxaQl3e8dPAVb1RlNVr1zfQ6y4tUtWNdkNL2WwSnSQSRHrBfHxCQ
QCF++PMeFcLhGwMYtqdNJ7XGLQ0PsjD74pRf0vo+vjmqDk2BJsJBP57VU+8MJn5r
SoxqnJ0WxLvm1TfrNKusV7zMNWquc2duJDW40zsOssP4itjYELSI6qa56qmzlqmX
zKmRx6mxAlx9RRK8FHXFYHbz3p93vv8z9vTOZV3AjIjjED960CLknUAwCC8FoJyz
9b5wyMXsLQHQjGt8luAvPc6OiU0EiU9a4SPK+feWcv27serFvnjJlRTS/yG2Z3zd
BYsUgsXHypsdoud+aE7MeCy7fE8n3mhoyMQQRBkOMFJ7RsG6wAE=
=S/he
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are cleanups for architecture specific header files:
- the comments in include/linux/syscalls.h have gone out of sync and
are really pointless, so these get removed
- The asm/bitsperlong.h header no longer needs to be architecture
specific on modern compilers, so use a generic version for newer
architectures that use new enough userspace compilers
- A cleanup for virt_to_pfn/virt_to_bus to have proper type checking,
forcing the use of pointers"
* tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: Remove file path comments from headers
tools arch: Remove uapi bitsperlong.h of hexagon and microblaze
asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
m68k/mm: Make pfn accessors static inlines
arm64: memory: Make virt_to_pfn() a static inline
ARM: mm: Make virt_to_pfn() a static inline
asm-generic/page.h: Make pfn accessors static inlines
xen/netback: Pass (void *) to virt_to_page()
netfs: Pass a pointer to virt_to_page()
cifs: Pass a pointer to virt_to_page() in cifsglob
cifs: Pass a pointer to virt_to_page()
riscv: mm: init: Pass a pointer to virt_to_page()
ARC: init: Pass a pointer to virt_to_pfn() in init
m68k: Pass a pointer to virt_to_pfn() virt_to_page()
fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSlu+MACgkQMUZtbf5S
Irslgw//S7jf/GL8V6y8VL3te+/OPOZnLDTzFFOdy64/y97FE6XIacJUpyWRhtmz
oSzcSNHETPW9U+xSGa2ZQlKhAXt6n9iRNvUegql+VBb13Iz+l7AdTeoxRv/YuwDo
5lTOIB6cBw+ATd0oxS6wr8SyUlcvktUKBfTAItjbVM55aXfIUpXIa84+F7avJgIA
XP1u/3PHhwItmwo/hXhHH0+P0QA8ix1q2SvRB7DAlQLBsTuQhaKjXWQkYYTKw/Nt
dtvh8iQSs/YXaHMjTa5CK28HOD8+ywIizr+uJ9VaNqIzV0W5JE9IE8P4NFpBcY7t
kGjTYODOph7dkNmZ5RLj3N+B6CyC57OXDzoo/tr8940UytCLVj9EVyduarLGLx57
edqK9cUz5kWejyGoyZ4Pvlo/SKvCQ2HKMeiAJ0/nNpTJMFuygMoqGsaD6ttzkXMj
fZLPjRUK3axd+15ZzhLEf8HyL5Qh+qPqqX9p7NljfMKwhxMWJ5fuICJfdGOSdMJR
ndL+wPfRPFQwszZ4pbTY2Ivn29mo8ScBOSOEgQs2mOny+zFzTzmqNWz/jcFfQnjS
cylxBEHrgudT2FuCImZ/v66TM5yakHXqIdpTGG+zsvJWQqjM96Z3I7WRvi0g9d75
n84il+j34mnzl90j2xEutqUiK7BQ9ZpZBsutPVTKBIHKWWiortI=
=9yzk
-----END PGP SIGNATURE-----
Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and wireguard.
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind"
* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
nfp: clean mc addresses in application firmware when closing port
selftests: mptcp: pm_nl_ctl: fix 32-bit support
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: connect: fail if nft supposed to work
mptcp: do not rely on implicit state check in mptcp_listen()
mptcp: ensure subflow is unhashed before cleaning the backlog
s390/qeth: Fix vipa deletion
octeontx-af: fix hardware timestamp configuration
net: dsa: sja1105: always enable the send_meta options
net: dsa: tag_sja1105: fix MAC DA patching from meta frames
net: Replace strlcpy with strscpy
pptp: Fix fib lookup calls.
mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
xsk: Honor SO_BINDTODEVICE on bind
ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
...
Instead of checking for -E2BIG, it is better to just check for less than
zero of strscpy() for error. Testing for -E2BIG is not very robust, and
the calling code does not really care about the error code, just that
there was an error.
One of the updates to convert strlcpy() to strscpy() had a v2 version
that changed the test from testing against -E2BIG to less than zero, but I
took the v1 version that still tested for -E2BIG.
Link: https://lore.kernel.org/linux-trace-kernel/20230615180420.400769-1-azeemshaikh38@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230704100807.707d1605@rorschach.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Azeem Shaikh <azeemshaikh38@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fix an issue in function 'tracing_err_log_open'.
The function doesn't call 'seq_open' if the file is opened only with
write permissions, which results in 'file->private_data' being left as null.
If we then use 'lseek' on that opened file, 'seq_lseek' dereferences
'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic.
Writing to this node requires root privileges, therefore this bug
has very little security impact.
Tracefs node: /sys/kernel/tracing/error_log
Example Kernel panic:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
Call trace:
mutex_lock+0x30/0x110
seq_lseek+0x34/0xb8
__arm64_sys_lseek+0x6c/0xb8
invoke_syscall+0x58/0x13c
el0_svc_common+0xc4/0x10c
do_el0_svc+0x24/0x98
el0_svc+0x24/0x88
el0t_64_sync_handler+0x84/0xe4
el0t_64_sync+0x1b4/0x1b8
Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02)
---[ end trace 561d1b49c12cf8a5 ]---
Kernel panic - not syncing: Oops: Fatal exception
Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4
Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3
Cc: stable@vger.kernel.org
Fixes: 8a062902be ("tracing: Add tracing error log")
Signed-off-by: Mateusz Stachyra <m.stachyra@samsung.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Vegard Nossum pointed out two different problems with the error handling
in init_module_from_file():
(a) the idempotent loading code didn't clean up properly in some error
cases, leaving the on-stack 'struct idempotent' element still in
the hash table
(b) failure to read the module file would nonsensically update the
'invalid_kread_bytes' stat counter with the error value
The first error is quite nasty, in that it can then cause subsequent
idempotent loads of that same file to access stale stack contents of the
previous failure. The case may not happen in any normal situation
(explaining all the "Tested-by's on the original change), and requires
admin privileges, but syzkaller triggers random bad behavior as a
result:
BUG: soft lockup in sys_finit_module
BUG: unable to handle kernel paging request in init_module_from_file
general protection fault in init_module_from_file
INFO: task hung in init_module_from_file
KASAN: out-of-bounds Read in init_module_from_file
KASAN: slab-out-of-bounds Read in init_module_from_file
...
The second error is fairly benign and just leads to nonsensical stats
(and has been around since the debug stats were added).
Vegard also provided a patch for the idempotent loading issue, but I'd
rather re-organize the code and make it more legible using another level
of helper functions than add the usual "goto out" error handling.
Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
Fixes: 9b9879fc03 ("modules: catch concurrent module loads, treat them as idempotent")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fairly small changes this cycle.
* An additional static inline function when kgdb is not enabled to
reduce boilerplate in arch files.
* kdb will now handle input with linefeeds more like carriage return.
This will make little difference for interactive use but can make
it script to use expect-like interaction with kdb.
* A couple of warning fixes.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmSiiJwACgkQfOMlXTn3
iKFAyQ//U+/08u01ikY0o8h//l7ELX4L7SUEJim1xE9Qk6DVF6R5c8+G6E0kWS5M
ClKBf8J3YdjM2UvysNmVmh3/Sv1JP2GLT7YUoKNBYv+O8kveGy2N/vE6cYfoHK3n
1uOXtIxQ3IafjlgNF+FTTIYm2PDqDhYZTTYIKGlEDnwL3mjEP3iV+hNP2debcLvT
3AWeLcrvzni5KQ301kXqw4SmrgNAeLDKwKKQvDnho8EdAuMDmHTbmUt6Zgvvl6TN
rH7nlxcUoDW+M1nMNxQ4Ko7xgEbqhpNQnkOIDv/j4j+KObQqvAVOF4dj9thBZUje
IEPQD0O/rAeYDpnKTIx1bmaVkYIBIpiSokRKlHWpz3iHxU52M+cOXe6u2vMHj97t
dHTc42TwcBQHSxCQ+R9i1pv6G7i2BfwUSju7opVqKISolwphp9Z4oIZnHwLtWpba
P5Tv7Azv6nw1HoYl2tAP/2xNO2u7w1nfqUzfr+/XXFTcZGRt65084HDbRQSzjmiJ
6F5Qw7fwOTNUHEObybjL/A4AilIRK8sBl/vU3TXJUEf3CEryH3VK4lE7qlqKn75G
nPplXMiBePbSTFE4aIJDdQFA4l3AOffg4w2Ce8v06eytr9EaNh8vS9W8Pv9dxwAz
ZQrItOPO4i+1Sm9n/CvsPkUDVt3wYbb/QnPdQHsbGOoDN9H3JeI=
=wAuD
-----END PGP SIGNATURE-----
Merge tag 'kgdb-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Fairly small changes this cycle:
- An additional static inline function when kgdb is not enabled to
reduce boilerplate in arch files
- kdb will now handle input with linefeeds more like carriage return.
This will make little difference for interactive use but can make
it script to use expect-like interaction with kdb
- A couple of warning fixes"
* tag 'kgdb-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: move kdb_send_sig() declaration to a better header file
kdb: Handle LF in the command parser
kdb: include kdb_private.h for function prototypes
kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
__register_btf_kfunc_id_set() assumes .BTF to be part of the module's .ko
file if CONFIG_DEBUG_INFO_BTF is enabled. If that's not the case, the
function prints an error message and return an error. As a result, such
modules cannot be loaded.
However, the section could be stripped out during a build process. It would
be better to let the modules loaded, because their basic functionalities
have no problem [0], though the BTF functionalities will not be supported.
Make the function to lower the level of the message from error to warn, and
return no error.
[0] https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion
Fixes: c446fdacb1 ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF")
Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com>
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/87y228q66f.fsf@oc8242746057.ibm.com
Link: https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion
Link: https://lore.kernel.org/bpf/20230701171447.56464-1-sj@kernel.org
- Remove the deprecated rule to build *.dtbo from *.dts
- Refactor section mismatch detection in modpost
- Fix bogus ARM section mismatch detections
- Fix error of 'make gtags' with O= option
- Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error with
the latest LLVM version
- Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed
- Ignore more compiler-generated symbols for kallsyms
- Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles
- Enable more kernel-doc warnings with W=2
- Refactor <linux/export.h> by generating KSYMTAB data by modpost
- Deprecate <asm/export.h> and <asm-generic/export.h>
- Remove the EXPORT_DATA_SYMBOL macro
- Move the check for static EXPORT_SYMBOL back to modpost, which makes
the build faster
- Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm
- Warn missing MODULE_DESCRIPTION when building modules with W=1
- Make 'make clean' robust against too long argument error
- Exclude more objects from GCOV to fix CFI failures with GCOV
- Allow 'make modules_install' to install modules.builtin and
modules.builtin.modinfo even when CONFIG_MODULES is disabled
- Include modules.builtin and modules.builtin.modinfo in the linux-image
Debian package even when CONFIG_MODULES is disabled
- Revive "Entering directory" logging for the latest Make version
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmSf6B0VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGS2wP/1izzNJ/64XmQoyBDhZCbuOl7ODF
n4wgVJnsJmRnD/RxXR/AZ0JZwQHhzpGISWQM61rVIf/RVFOB7Apx1HpmomKUUjrL
Yc53wLfhTEizGgwttP6tusLM3RO6jkuMKhjC4rllc0tDLJ3zCcwAjSyiOQQ9PBcH
txwAb8r4/TZUzDDCJ0d98WdhIsNDca/ISeRXKHMiIkfvHe+6yizDKu25Y4B6BL5g
0VPJ9nVJZ+XVwRqdVR+UQoPYGZzZ/O2NqAtU7n4PpBKvFfLACILJW+aBDAz9SqN7
RSxn1ahxwq0vrhlB9bSrQRj3N0g8zsi7/xShEZSnGLCbyxYilr5Gq8C59+QxOIJf
5lGBwZlEgn5aWH+D9abwjEI/QOQbTI9kX09sVzweulGCN9iJlJqyIGsB0Ri0/S2R
c/n7c8nLwnWnGF/+LXYvkrak8L9YRKori//YYf9zdvh4h1c2/0SS0nDoC29DhDru
Am7YmhBAkJXXX3NUB2gLvtdp94GSumqefHeSJ5Sp9v/+f2Ft7ruY2ouJC81xDa4p
nNpvolAq2txlZ9t5OU7x7DQiuCWYSws0W7PJ9FBhyHJchf21UHbcm97/HfDoU8rN
ioLQGm+h+g6oZt8pArk45wccjkR3ydpEFDWenYbTEr2o3zLfeKigZps5uhCK3DW2
gnVk50VNagkzrzvA
=Rc1z
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove the deprecated rule to build *.dtbo from *.dts
- Refactor section mismatch detection in modpost
- Fix bogus ARM section mismatch detections
- Fix error of 'make gtags' with O= option
- Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error
with the latest LLVM version
- Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed
- Ignore more compiler-generated symbols for kallsyms
- Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles
- Enable more kernel-doc warnings with W=2
- Refactor <linux/export.h> by generating KSYMTAB data by modpost
- Deprecate <asm/export.h> and <asm-generic/export.h>
- Remove the EXPORT_DATA_SYMBOL macro
- Move the check for static EXPORT_SYMBOL back to modpost, which makes
the build faster
- Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm
- Warn missing MODULE_DESCRIPTION when building modules with W=1
- Make 'make clean' robust against too long argument error
- Exclude more objects from GCOV to fix CFI failures with GCOV
- Allow 'make modules_install' to install modules.builtin and
modules.builtin.modinfo even when CONFIG_MODULES is disabled
- Include modules.builtin and modules.builtin.modinfo in the
linux-image Debian package even when CONFIG_MODULES is disabled
- Revive "Entering directory" logging for the latest Make version
* tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (72 commits)
modpost: define more R_ARM_* for old distributions
kbuild: revive "Entering directory" for Make >= 4.4.1
kbuild: set correct abs_srctree and abs_objtree for package builds
scripts/mksysmap: Ignore prefixed KCFI symbols
kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb
kbuild: builddeb: always make modules_install, to install modules.builtin*
modpost: continue even with unknown relocation type
modpost: factor out Elf_Sym pointer calculation to section_rel()
modpost: factor out inst location calculation to section_rel()
kbuild: Disable GCOV for *.mod.o
kbuild: Fix CFI failures with GCOV
kbuild: make clean rule robust against too long argument error
script: modpost: emit a warning when the description is missing
kbuild: make modules_install copy modules.builtin(.modinfo)
linux/export.h: rename 'sec' argument to 'license'
modpost: show offset from symbol for section mismatch warnings
modpost: merge two similar section mismatch warnings
kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion
modpost: use null string instead of NULL pointer for default namespace
modpost: squash sym_update_namespace() into sym_add_exported()
...
- Add infrastructure for supporting background commands along with
support for device sanitization and firmware update
- Introduce a CXL performance monitoring unit driver based on the common
definition in the specification.
- Land some preparatory cleanup and refactoring for the anticipated
arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1
topology) error handling.
- Rework CPU cache management with respect to region configuration
(device hotplug or other dynamic changes to memory interleaving)
- Fix region reconfiguration vs CXL decoder ordering rules.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZJ9fkQAKCRDfioYZHlFs
ZyWcAP9THJ6ZzX1mbAfHhPz9r+oxsrE3l1jQpNjNbh7MNW29MAEA36dmTE62JaHK
lTPDgHxqBt1vrHPktYWOM9ZPHE2tLwA=
=3fFL
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dan Williams:
"The highlights in terms of new functionality are support for the
standard CXL Performance Monitor definition that appeared in CXL 3.0,
support for device sanitization (wiping all data from a device),
secure-erase (re-keying encryption of user data), and support for
firmware update. The firmware update support is notable as it reuses
the simple sysfs_upload interface to just cat(1) a blob to a sysfs
file and pipe that to the device.
Additionally there are a substantial number of cleanups and
reorganizations to get ready for RCH error handling (RCH == Restricted
CXL Host == current shipping hardware generation / pre CXL-2.0
topologies) and type-2 (accelerator / vendor specific) devices.
For vendor specific devices they implement a subset of what the
generic type-3 (generic memory expander) driver expects. As a result
the rework decouples optional infrastructure from the core driver
context.
For RCH topologies, where the specification working group did not want
to confuse pre-CXL-aware operating systems, many of the standard
registers are hidden which makes support standard bus features like
AER (PCIe Advanced Error Reporting) difficult. The rework arranges for
the driver to help the PCI-AER core. Bjorn is on board with this
direction but a late regression disocvery means the completion of this
functionality needs to cook a bit longer, so it is code
reorganizations only for now.
Summary:
- Add infrastructure for supporting background commands along with
support for device sanitization and firmware update
- Introduce a CXL performance monitoring unit driver based on the
common definition in the specification.
- Land some preparatory cleanup and refactoring for the anticipated
arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1
topology) error handling.
- Rework CPU cache management with respect to region configuration
(device hotplug or other dynamic changes to memory interleaving)
- Fix region reconfiguration vs CXL decoder ordering rules"
* tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (51 commits)
cxl: Fix one kernel-doc comment
cxl/pci: Use correct flag for sanitize polling
docs: perf: Minimal introduction the the CXL PMU device and driver
perf: CXL Performance Monitoring Unit driver
tools/testing/cxl: add firmware update emulation to CXL memdevs
tools/testing/cxl: Use named effects for the Command Effect Log
tools/testing/cxl: Fix command effects for inject/clear poison
cxl: add a firmware update mechanism using the sysfs firmware loader
cxl/test: Add Secure Erase opcode support
cxl/mem: Support Secure Erase
cxl/test: Add Sanitize opcode support
cxl/mem: Wire up Sanitization support
cxl/mbox: Add sanitization handling machinery
cxl/mem: Introduce security state sysfs file
cxl/mbox: Allow for IRQ_NONE case in the isr
Revert "cxl/port: Enable the HDM decoder capability for switch ports"
cxl/memdev: Formalize endpoint port linkage
cxl/pci: Unconditionally unmask 256B Flit errors
cxl/region: Manage decoder target_type at decoder-attach time
cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM
...
Before commit d67790ddf0 ("overflow: Add struct_size_t() helper") only
struct_size() existed, which expects a valid pointer instance containing
the flexible array.
However, when we determine the default struct pid allocation size for
the associated kmem cache of a pid namespace we need to take the nesting
depth of the pid namespace into account without an variable instance
necessarily being available.
In commit b69f0aeb06 ("pid: Replace struct pid 1-element array with
flex-array") we used to handle this the old fashioned way and cast NULL
to a struct pid pointer type. However, we do apparently have a dedicated
struct_size_t() helper for exactly this case. So switch to that.
Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmSeigUACgkQUqAMR0iA
lPIMbxAAqnbFKTDZ0alQibPlPulEvDsdVnh0FYLpWRW6sZDG96Lr/s3i1ghWxIUI
cJrePc7i8FJBz6cDMvT6e9TOqoP3iYU9AuQdWr1aoN37O3gJO/3AvHN89CFy6tE3
scV/Uo2sM5RCQWptutpd2nu1pFLKYWAKegwX0g0+KkyDINDzDA54iSd6HAwfDUeu
DEO2cp+kUgBKDOcGWEHQHQX5DUs1P7lJDBt5aTBq2KzAYSaUls/HBt1L2T9JYV/D
yHE3vjmvniug3sYDzXd5aXA+yvIjIsdjqqxMpN7uz8e/q+Mx3045WM1aPCevPq9x
V7jkc61CQcDURjUdHaZw8jMCX7kDjNkTzkt0OvATqqoL4wqfcohVExI5KUtf6mOq
NAhsaPL4E+pGOa71uXenhbLClYjfpLfpHRxNHT8kh7O+DTMprt7srp0174vnrZTQ
Gsfh2BfCoojZxt1SPJWDl1rBaAtZ7EZTmt0oV1opbTgZS1drDKrlbmHQs44oBDeg
Uqy0uAB6kI92BzPbS66ZwHup3Tg2uz98YwCTMQ1dTuLSFzQNOju6h38BKRzMKkkz
nWF/AQA89TKOgaOZG3HGVadru7zk1P5EvnOD2+iMI3Arw9ClHMu9wAMoNxNcuYTO
i+h/hrtenmeH3owA+HVsR/lMnJhnMlxRRfan0pIzUfICIiYlZaw=
=macg
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching update from Petr Mladek:
- Make a variable static to fix a sparse warning
* tag 'livepatching-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Make 'klp_stack_entries' static
- fprobe: Pass return address to the fprobe entry/exit callbacks so that
the callbacks don't need to analyze pt_regs/stack to find the function
return address.
- kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN
flags so that those are not set at once.
- fprobe events:
. Add a new fprobe events for tracing arbitrary function entry and
exit as a trace event.
. Add a new tracepoint events for tracing raw tracepoint as a trace
event. This allows user to trace non user-exposed tracepoints.
. Move eprobe's event parser code into probe event common file.
. Introduce BTF (BPF type format) support to kernel probe (kprobe,
fprobe and tracepoint probe) events so that user can specify traced
function arguments by name. This also applies the type of argument
when fetching the argument.
. Introduce '$arg*' wildcard support if BTF is available. This expands
the '$arg*' meta argument to all function argument automatically.
. Check the return value types by BTF. If the function returns 'void',
'$retval' is rejected.
. Add some selftest script for fprobe events, tracepoint events and
BTF support.
. Update documentation about the fprobe events.
. Some fixes for above features, document and selftests.
- selftests for ftrace (except for new fprobe events):
. Add a test case for multiple consecutive probes in a function which
checks if ftrace based kprobe, optimized kprobe and normal kprobe
can be defined in the same target function.
. Add a test case for optimized probe, which checks whether kprobe
can be optimized or not.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmSa+9MACgkQ2/sHvwUr
PxsmOAgAmUOIWtvH5py7AZpIRhCj8B18F6KnT7w2hByCsRxf7SaCqMhpBCk9VnYv
9fJFBHpvYRJEmpHoH3o2ET5AGfKVNac9z96AGI2qJ4ECWITd6I5+WfTdZ5ueVn2d
f6DQ10mHXDHSMFbuqfYWSHtkeivJpWpUNHhwzPb4doNOe06bZNfVuSgnksFg1at5
kq16HbvGnhPzdO4YHmvqwjmRHr5/nCI1KDE9xIBcqNtWFbiRigC11zaZEUkLX+vT
F63ShyfCK718AiwDfnjXpGkXAiVOZuAIR8RELaSqQ92YHCFKq5k9K4++WllPR5f9
AxjVultFDiCd4oSPgYpQkjuZdFq9NA==
=IhmY
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- fprobe: Pass return address to the fprobe entry/exit callbacks so
that the callbacks don't need to analyze pt_regs/stack to find the
function return address.
- kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN
flags so that those are not set at once.
- fprobe events:
- Add a new fprobe events for tracing arbitrary function entry and
exit as a trace event.
- Add a new tracepoint events for tracing raw tracepoint as a
trace event. This allows user to trace non user-exposed
tracepoints.
- Move eprobe's event parser code into probe event common file.
- Introduce BTF (BPF type format) support to kernel probe (kprobe,
fprobe and tracepoint probe) events so that user can specify
traced function arguments by name. This also applies the type of
argument when fetching the argument.
- Introduce '$arg*' wildcard support if BTF is available. This
expands the '$arg*' meta argument to all function argument
automatically.
- Check the return value types by BTF. If the function returns
'void', '$retval' is rejected.
- Add some selftest script for fprobe events, tracepoint events
and BTF support.
- Update documentation about the fprobe events.
- Some fixes for above features, document and selftests.
- selftests for ftrace (in addition to the new fprobe events):
- Add a test case for multiple consecutive probes in a function
which checks if ftrace based kprobe, optimized kprobe and normal
kprobe can be defined in the same target function.
- Add a test case for optimized probe, which checks whether kprobe
can be optimized or not.
* tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix tracepoint event with $arg* to fetch correct argument
Documentation: Fix typo of reference file name
tracing/probes: Fix to return NULL and keep using current argc
selftests/ftrace: Add new test case which checks for optimized probes
selftests/ftrace: Add new test case which adds multiple consecutive probes in a function
Documentation: tracing/probes: Add fprobe event tracing document
selftests/ftrace: Add BTF arguments test cases
selftests/ftrace: Add tracepoint probe test case
tracing/probes: Add BTF retval type support
tracing/probes: Add $arg* meta argument for all function args
tracing/probes: Support function parameters if BTF is available
tracing/probes: Move event parameter fetching code to common parser
tracing/probes: Add tracepoint support on fprobe_events
selftests/ftrace: Add fprobe related testcases
tracing/probes: Add fprobe events for tracing function entry and exit.
tracing/probes: Avoid setting TPARG_FL_FENTRY and TPARG_FL_RETURN
fprobe: Pass return address to the handlers
- Add new feature to have function graph tracer record the return value.
Adds a new option: funcgraph-retval ; when set, will show the return
value of a function in the function graph tracer.
- Also add the option: funcgraph-retval-hex where if it is not set, and
the return value is an error code, then it will return the decimal of
the error code, otherwise it still reports the hex value.
- Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
That when a application opens it, it becomes the task that the timer lat
tracer traces. The application can also read this file to find out how
it's being interrupted.
- Add the file /sys/kernel/tracing/available_filter_functions_addrs
that works just the same as available_filter_functions but also shows
the addresses of the functions like kallsyms, except that it gives the
address of where the fentry/mcount jump/nop is. This is used by BPF to
make it easier to attach BPF programs to ftrace hooks.
- Replace strlcpy with strscpy in the tracing boot code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZJy6ixQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnzRAPsEI2YgjaJSHnuPoGRHbrNil6pq66wY
LYaLizGI4Jv9BwEAqdSdcYcMiWo1SFBAO8QxEDM++BX3zrRyVgW8ahaTNgs=
=TF0C
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Add new feature to have function graph tracer record the return
value. Adds a new option: funcgraph-retval ; when set, will show the
return value of a function in the function graph tracer.
- Also add the option: funcgraph-retval-hex where if it is not set, and
the return value is an error code, then it will return the decimal of
the error code, otherwise it still reports the hex value.
- Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
That when a application opens it, it becomes the task that the timer
lat tracer traces. The application can also read this file to find
out how it's being interrupted.
- Add the file /sys/kernel/tracing/available_filter_functions_addrs
that works just the same as available_filter_functions but also shows
the addresses of the functions like kallsyms, except that it gives
the address of where the fentry/mcount jump/nop is. This is used by
BPF to make it easier to attach BPF programs to ftrace hooks.
- Replace strlcpy with strscpy in the tracing boot code.
* tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix warnings when building htmldocs for function graph retval
riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
tracing/boot: Replace strlcpy with strscpy
tracing/timerlat: Add user-space interface
tracing/osnoise: Skip running osnoise if all instances are off
tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable
ftrace: Show all functions with addresses in available_filter_functions_addrs
selftests/ftrace: Add funcgraph-retval test case
LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex
function_graph: Support recording and printing the return value of function
fgraph: Add declaration of "struct fgraph_ret_regs"
* Support for ACPI.
* Various cleanups to the ISA string parsing, including making them
case-insensitive
* Support for the vector extension.
* Support for independent irq/softirq stacks.
* Our CPU DT binding now has "unevaluatedProperties: false"
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmSe70ATHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiWNPD/0ZfSdQ0A/gMVOzAD4zFKPEqQ6ffW2V
Zy6Jo7UDNqKsiai7QA4XB1uyYIv/y1yUKJ0oeBVcA9Nzyq+TW9QDcApDBTabxAUI
agY19YKw6VVZ+p7I9sMsf6EbdJdkNfSAzcQACPxb4ScEoaf9X+oAK5qgXuRuWluh
qQuVkkJlgWc/t1cuUkrRdJmHQYvjP3zL7z4o344q2IVpXJkNNu0GeP+HbF8BYKcA
+I/TTA5JY3kCIaxkpF2rU6pE6T5T9xrPmRYZ7bZoPUPnbL+M8As/jx3ym52Y4WGp
kf8pgkxixOjU64kVJOH66CA8GaOiaAH/ptjQb0ZmCaGrHhr7aOT9HrkX4rU1lS8T
stPphfM4gGPcCoPgRqSl+mEhBzjII8maOBLtbricAoQi6efRq8fzoOGaif/QpCbc
6n0LGS4nQPGVyD3rAPfHxxfrlGJR+SsgyDvjZoDhqauFglims14GnK+eBeO8zrui
Aj/uuAS63VIYprJWC1NOBJlU2WKZiOGhCANpZ6W6SH21PYn2WjsVILqaGh+WN8ZO
KOHxZNaN8fQag0Yg7oNAUb7l6S0DHYtJIksFnFW2Rf2+VT58RAMYRQbpbhr7Tqr+
jLgIR8PkFrBERHE49IqLGhAxGDnNzAUysMRw9pIk7WIre2Jt4wPqUdl+ee+5ErIX
jiYfSFZw9q28UA==
=Fpq8
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for ACPI
- Various cleanups to the ISA string parsing, including making them
case-insensitive
- Support for the vector extension
- Support for independent irq/softirq stacks
- Our CPU DT binding now has "unevaluatedProperties: false"
* tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits)
riscv: hibernate: remove WARN_ON in save_processor_state
dt-bindings: riscv: cpus: switch to unevaluatedProperties: false
dt-bindings: riscv: cpus: add a ref the common cpu schema
riscv: stack: Add config of thread stack size
riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK
RISC-V: always report presence of extensions formerly part of the base ISA
dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support
RISC-V: remove decrement/increment dance in ISA string parser
RISC-V: rework comments in ISA string parser
RISC-V: validate riscv,isa at boot, not during ISA string parsing
RISC-V: split early & late of_node to hartid mapping
RISC-V: simplify register width check in ISA string parsing
perf: RISC-V: Limit the number of counters returned from SBI
riscv: replace deprecated scall with ecall
riscv: uprobes: Restore thread.bad_cause
riscv: mm: try VMA lock-based page fault handling first
riscv: mm: Pre-allocate PGD entries for vmalloc/modules area
RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
RISC-V: Track ISA extensions per hart
...
- Extend KCSAN support to 32-bit and BookE. Add some KCSAN annotations.
- Make ELFv2 ABI the default for 64-bit big-endian kernel builds, and use
the -mprofile-kernel option (kernel specific ftrace ABI) for big endian
ELFv2 kernels.
- Add initial Dynamic Execution Control Register (DEXCR) support, and allow
the ROP protection instructions to be used on Power 10.
- Various other small features and fixes.
Thanks to: Aditya Gupta, Aneesh Kumar K.V, Benjamin Gray, Brian King,
Christophe Leroy, Colin Ian King, Dmitry Torokhov, Gaurav Batra, Jean Delvare,
Joel Stanley, Marco Elver, Masahiro Yamada, Nageswara R Sastry, Nathan
Chancellor, Naveen N Rao, Nayna Jain, Nicholas Piggin, Paul Gortmaker, Randy
Dunlap, Rob Herring, Rohan McLure, Russell Currey, Sachin Sant, Timothy
Pearson, Tom Rix, Uwe Kleine-König.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmSeqQMTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKukD/sGUceX6gIc7UcjWhL1ZCVco0bsgLjT
JrY1NenisGKjwwRd/o+2+h3ziJDoO5AsQfT72EiNLhaYJhnlb1d0vXzsvN0THc+2
W5RrxAZUNhBy+c7gSSEJjy8+vBIwSQAliQLChHGOSejGCj94SxF5+zjUFvSX458I
z0+ZQK+Fiw5NcpzEnBT0XPnLzap74a7TL0JcG1MLbj2QtHXhbfjIlkkPDX3kK0Gw
xbelFy38X7KKbQsXXYSTCGqwRdJ3yqu21nEsjRuo2yT5H5rQbjCNggkMOL1DecDd
ULGxit/z13Pt1Ad3oe+6FF17ggOiCG0F75DONASjFDthFYx6NQffkJS1n1VZauQj
jU6LtWeD3HkgIYm6Udjq+LaSmkAmn5a+9tsElE/K+V1WG4rKyMeVmE3z/tCJG0l2
yhLKyFs+glXN/LiWHyX0mrQIIVZdRK237X1qXJuIvAuB7Drm5duXFAHR8pdJD0dg
H23OhoO2FvLxb9GvnzGxqjdazzattctz31wU/1RgnPxumYkJ9PlBcXn9h1uXa8/K
rDZFJADsQhEfRCjmLG3GIaFWqZdc4Cn+ZUk4iHkjPDFDL05Fq7JYHIuwteiN6/wP
NHRvtKdJisu583NI9RN9300JykrEqjSRbMOWlc3vuFwbRLGioXvWhWlIZ3/t58jG
R8s+f0nKSPr+fg==
=ssit
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Extend KCSAN support to 32-bit and BookE. Add some KCSAN annotations
- Make ELFv2 ABI the default for 64-bit big-endian kernel builds, and
use the -mprofile-kernel option (kernel specific ftrace ABI) for big
endian ELFv2 kernels
- Add initial Dynamic Execution Control Register (DEXCR) support, and
allow the ROP protection instructions to be used on Power 10
- Various other small features and fixes
Thanks to Aditya Gupta, Aneesh Kumar K.V, Benjamin Gray, Brian King,
Christophe Leroy, Colin Ian King, Dmitry Torokhov, Gaurav Batra, Jean
Delvare, Joel Stanley, Marco Elver, Masahiro Yamada, Nageswara R Sastry,
Nathan Chancellor, Naveen N Rao, Nayna Jain, Nicholas Piggin, Paul
Gortmaker, Randy Dunlap, Rob Herring, Rohan McLure, Russell Currey,
Sachin Sant, Timothy Pearson, Tom Rix, and Uwe Kleine-König.
* tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (76 commits)
powerpc: remove checks for binutils older than 2.25
powerpc: Fail build if using recordmcount with binutils v2.37
powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
powerpc/iommu: Only build sPAPR access functions on pSeries
powerpc: powernv: Annotate data races in opal events
powerpc: Mark writes registering ipi to host cpu through kvm and polling
powerpc: Annotate accesses to ipi message flags
powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
powerpc: qspinlock: Mark accesses to qnode lock checks
powerpc/powernv/pci: Remove last IODA1 defines
powerpc/powernv/pci: Remove MVE code
powerpc/powernv/pci: Remove ioda1 support
powerpc: 52xx: Make immr_id DT match tables static
powerpc: mpc512x: Remove open coded "ranges" parsing
powerpc: fsl_soc: Use of_range_to_resource() for "ranges" parsing
powerpc: fsl: Use of_property_read_reg() to parse "reg"
powerpc: fsl_rio: Use of_range_to_resource() for "ranges" parsing
macintosh: Use of_property_read_reg() to parse "reg"
...
For pid namespaces, struct pid uses a dynamically sized array member,
"numbers". This was implemented using the ancient 1-element fake
flexible array, which has been deprecated for decades.
Replace it with a C99 flexible array, refactor the array size
calculations to use struct_size(), and address elements via indexes.
Note that the static initializer (which defines a single element) works
as-is, and requires no special handling.
Without this, CONFIG_UBSAN_BOUNDS (and potentially
CONFIG_FORTIFY_SOURCE) will trigger bounds checks:
https://lore.kernel.org/lkml/20230517-bushaltestelle-super-e223978c1ba6@brauner
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reported-by: syzbot+ac3b41786a2d0565b6d5@syzkaller.appspotmail.com
[brauner: dropped unrelated changes and remove 0 with NULL cast]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The main kdb command parser only handles CR (ASCII 13 AKA '\r') today,
but not LF (ASCII 10 AKA '\n'). That means that the kdb command parser
can handle terminals that send just CR or that send CR+LF but can't
handle terminals that send just LF.
The fact that kdb didn't handle LF in the command parser tripped up a
tool I tried to use with it. Specifically, I was trying to send a
command to my device to resume it from kdb using a ChromeOS tool like:
dut-control cpu_uart_cmd:"g"
That tool only terminates lines with LF, not CR+LF.
Arguably the ChromeOS tool should be fixed. After all, officially kdb
seems to be designed such that CR+LF is the official line ending
transmitted over the wire and that internally a line ending is just
'\n' (LF). Some evidence:
* uart_poll_put_char(), which is used by kdb, notices a '\n' and
converts it to '\r\n'.
* kdb functions specifically use '\r' to get a carriage return without
a newline. You can see this in the pager where kdb will write a '\r'
and then write over the pager prompt.
However, all that being said there's no real harm in accepting LF as a
command terminator in the kdb parser and doing so seems like it would
improve compatibility. After this, I'd expect that things would work
OK-ish with a remote terminal that used any of CR, CR+LF, or LF as a
line ending. Someone using CR as a line ending might get some ugliness
where kdb wasn't able to overwrite the last line, but basic commands
would work. Someone using just LF as a line ending would probably also
work OK.
A few other notes:
- It can be noted that "bash" running on an "agetty" handles LF as a
line termination with no complaints.
- Historically, kdb's "pager" actually handled either CR or LF fine. A
very quick inspection would make one think that kdb's pager actually
could have paged down two lines instead of one for anyone using
CR+LF, but this is generally avoided because of kdb_input_flush().
- Conceivably one could argue that some of this special case logic
belongs in uart_poll_get_char() since uart_poll_put_char() handles
the '\n' => '\r\n' conversion. I would argue that perhaps we should
eventually do the opposite and move the '\n' => '\r\n' out of
uart_poll_put_char(). Having that conversion at such a low level
could interfere if we ever want to transfer binary data. In
addition, if we truly made uart_poll_get_char() the inverse of
uart_poll_put_char() it would convert back to '\n' and (ironically)
kdb's parser currently only looks for '\r' to find the end of a
command.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230628125612.1.I5cc6c3d916195f5bcfdf5b75d823f2037707f5dc@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
- swiotlb cleanups (Petr Tesarik)
- use kvmalloc_array (gaoxu)
- a small step towards removing is_swiotlb_active (Christoph Hellwig)
- fix a Kconfig typo Sui Jingfeng)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmScHTELHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOoGhAAuKdcWAumhEuKXu/vuA4/xH5Kvn2uhrAVs/RSKKh7
Wbh2WXh+9GOQs4dwpYI5/chRmCJgyC2ics/E7dny2ovJumM/RGOR9zff/+zJk5yh
3tpiZ4VW/2x/9qmoRpdQVTaj1uB2Ug8wpztUohyeEYPpjLX0/Vq/zTItAyvQs4kO
vskg2u9EGwzgV4qmHWq2fjowljHS1sMZq2URKGX8JDHrwOh0+uCke626ze/WznSQ
5quWW4mE0IYq/nldIMTvJxUx6273zBMm+yoZSfFY6K0VrP3WDE5endP5oYR7d251
zNRPyWqW78HQT0z/gntaPOKDB0t7YxsWVLwam98Yv8wl1wLJQYe5sRoL7qXL0yBz
cu4uTOGU6Kvw3ZHWQOk2qkGxYAxDLnaFejz8ZDCycWQA7T2+TNdaVXNnl/vE+aRN
25mnT0JtClfZOvHihhHKaiRXMKE0WgPy4vBZBT0SAUVuUR7JwtYDv/5giGf6Bi3x
QwNeXAmFD75VELJoqX1KL/UiJh8FzyWZMI7l7XmDVy9lJsA1P3PO+npI8SOmA+e8
q4GSD36BxDoDE06Hg/BQcRqPohDCEsrRgfvOaa7xld9SCwe1lGaN3kDsyv6HoBzX
NRS3aishdqHf21dNctjX9f+II+30NFkPbAa64B8L4tP5aQLy3bgjNLweLP9fhKgl
KQs=
=yYCS
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.5-2023-06-28' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- swiotlb cleanups (Petr Tesarik)
- use kvmalloc_array (gaoxu)
- a small step towards removing is_swiotlb_active (Christoph Hellwig)
- fix a Kconfig typo Sui Jingfeng)
* tag 'dma-mapping-6.5-2023-06-28' of git://git.infradead.org/users/hch/dma-mapping:
drm/nouveau: stop using is_swiotlb_active
swiotlb: use the atomic counter of total used slabs if available
swiotlb: remove unused field "used" from struct io_tlb_mem
dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
dma-mapping: fix a Kconfig typo
Linus, included in this pull request is just a minor fix which
Matthieu Baerts noted I had not picked up. I adjusted the Fixes
commit ID to reflect the latest tree.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmSeA/ISHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinUq4QAIGJi5oRY+tAqzJ1VigkUgwheDeND3KL
ixwn/9RoZ6Sk9hpXbFVdS4pUGueYmyCnbeKF48aTYKMSRV8Vm3E61RevYnpjSXBA
D+zhJ80NS95MacrHM0cK77UD/A8T4Y57e/rUJfL/3RqWMPpJbPciqO/Y+9MZJRFT
RCC8jbPUjdmvbccEMQhALSZb2nOUtHouUWgG4SgyiELjY5djNojINEOQ1xIh9W1G
LWWCM/99NeDZfQtF+36k6F1uc568xrSunELXBKcQBk6PlETt4dtQta/PqEAQ+tHT
qZDnwepw9RCNPxuGv0oialYe3weFB4zC2kr3ZYePs0qLfM4GGqXjDy7bIT5qa59V
9AsAhSEuLu/ZAnYgQNMcJZss/PGrxSDJbKOUEW3pJ00nlj9zwWXpGX4jJsBNKdoe
smalelXUOR3Kr5BZm7cm0125kiQrvnMx8L/5waG81sCKsW+LcHQgTJu8RVD49f5H
k/LCUlxk3jyXO3xlyYvQKzpA70P4vromIhWjBxcK2cAbPJL5N6ux4WX7L+m/niID
cD0/74XJ+es1qAUHkl1on5s7U1DOryY4xJA+liQK/pKFIMyuKaUYlLmgD8lworI4
jqJ4BgtSASc/W81TX7x32CAbKWwjxKDQK3mLBUdVAJPtcr5vFOnmHh2vgZ0/A/yI
eMgoAPB5E+UD
=Tyyn
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl fix from Luis Chamberlain:
"A missed minor fix which Matthieu Baerts noted I had not picked up"
* tag 'sysctl-6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: fix unused proc_cap_handler() function warning
The user events write currently returns the size of what was suppose to be
written when tracing is disabled and nothing was written. Instead, behave like
trace_marker and return -EBADF, as that is what is returned if a file is opened
for read only, and a write is performed on it. Writing to the buffer
that is disabled is like trying to write to a file opened for read
only, as the buffer still can be read, but just not written to.
This also includes test cases for this use case
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZJxLzRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qs9eAP0ZRDempFNyMhi+pfENtwv65CHxRX/3
3s1Lmt04oqwXfgD/dCfojrVAd++kpq3p9cGxJYWuNiM4s47mlD8VLgQ7AAY=
=aZ8/
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.4-rc7-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix user event write on buffer disabled.
The user events write currently returns the size of what was supposed
to be written when tracing is disabled and nothing was written.
Instead, behave like trace_marker and return -EBADF, as that is what
is returned if a file is opened for read only, and a write is
performed on it. Writing to the buffer that is disabled is like trying
to write to a file opened for read only, as the buffer still can be
read, but just not written to.
This also includes test cases for this use case"
* tag 'trace-v6.4-rc7-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/user_events: Add test cases when event is disabled
selftests/user_events: Enable the event before write_fault test in ftrace self-test
tracing/user_events: Fix incorrect return value for writing operation when events are disabled
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmSZtjsACgkQu+CwddJF
iJqCTwf/XVhmAD7zMOj6g1aak5oHNZDRG5jufM5UNXmiWjCWT3w4DpltrJkz0PPm
mg3Ac5fjNUqesZ1SGtUbvoc363smroBrRudGEFrsUhqBcpR+S4fSneoDk+xqMypf
VLXP/8kJlFEBGMiR7ouAWnR4+u6JgY4E8E8JIPNzao5KE/L1lD83nY+Usjc/01ek
oqMyYVFRfncsGjGJXc5fOOTTCj768mRroF0sLmEegIonnwQkSHE7HWJ/nyaVraDV
bomnTIgMdVIDqharin08ZPIM7qBIWM09Uifaf0lIs6fIA94pQP+5Ko3mum2P/S+U
ON/qviSrlNgRXoHPJ3hvPHdfEU9cSg==
=1d0v
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- SLAB deprecation:
Following the discussion at LSF/MM 2023 [1] and no objections, the
SLAB allocator is deprecated by renaming the config option (to make
its users notice) to CONFIG_SLAB_DEPRECATED with updated help text.
SLUB should be used instead. Existing defconfigs with CONFIG_SLAB are
also updated.
- SLAB_NO_MERGE kmem_cache flag (Jesper Dangaard Brouer):
There are (very limited) cases where kmem_cache merging is
undesirable, and existing ways to prevent it are hacky. Introduce a
new flag to do that cleanly and convert the existing hacky users.
Btrfs plans to use this for debug kernel builds (that use case is
always fine), networking for performance reasons (that should be very
rare).
- Replace the usage of weak PRNGs (David Keisar Schmidt):
In addition to using stronger RNGs for the security related features,
the code is a bit cleaner.
- Misc code cleanups (SeongJae Parki, Xiongwei Song, Zhen Lei, and
zhaoxinchao)
Link: https://lwn.net/Articles/932201/ [1]
* tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab_common: use SLAB_NO_MERGE instead of negative refcount
mm/slab: break up RCU readers on SLAB_TYPESAFE_BY_RCU example code
mm/slab: add a missing semicolon on SLAB_TYPESAFE_BY_RCU example code
mm/slab_common: reduce an if statement in create_cache()
mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED
mm/slab: remove HAVE_HARDENED_USERCOPY_ALLOCATOR
mm/slab_common: Replace invocation of weak PRNG
mm/slab: Replace invocation of weak PRNG
slub: Don't read nr_slabs and total_objects directly
slub: Remove slabs_node() function
slub: Remove CONFIG_SMP defined check
slub: Put objects_show() into CONFIG_SLUB_DEBUG enabled block
slub: Correct the error code when slab_kset is NULL
mm/slab: correct return values in comment for _kmem_cache_create()
Since usermodehelper_table() is marked static now, we get a
warning about it being unused when SYSCTL is disabled:
kernel/umh.c:497:12: error: 'proc_cap_handler' defined but not used [-Werror=unused-function]
Just move it inside of the same #ifdef.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Fixes: 861dc0b464 ("sysctl: move umh sysctl registration to its own file")
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
[mcgrof: adjust new commit ID for Fixes tag]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
The kdb_kbd_cleanup_state() is called from another file through
the kdb_private.h file, but that is not included before the
definition, causing a W=1 warning:
kernel/debug/kdb/kdb_keyboard.c:198:6: error: no previous prototype for 'kdb_kbd_cleanup_state' [-Werror=missing-prototypes]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230517124802.929751-1-arnd@kernel.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.
For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.
If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.
Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.
While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.
Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Core
----
- Rework the sendpage & splice implementations. Instead of feeding
data into sockets page by page extend sendmsg handlers to support
taking a reference on the data, controlled by a new flag called
MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file
to invoke an additional callback instead of trying to predict what
the right combination of MORE/NOTLAST flags is.
Remove the MSG_SENDPAGE_NOTLAST flag completely.
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid.
- Enable socket busy polling with CONFIG_RT.
- Improve reliability and efficiency of reporting for ref_tracker.
- Auto-generate a user space C library for various Netlink families.
Protocols
---------
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2].
- Use per-VMA locking for "page-flipping" TCP receive zerocopy.
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags.
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative.
- Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO).
- Avoid waking up applications using TLS sockets until we have
a full record.
- Allow using kernel memory for protocol ioctl callbacks, paving
the way to issuing ioctls over io_uring.
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address.
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch.
- PPPoE: make number of hash bits configurable.
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig).
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge).
- Support matching on Connectivity Fault Management (CFM) packets.
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug.
- HSR: don't enable promiscuous mode if device offloads the proto.
- Support active scanning in IEEE 802.15.4.
- Continue work on Multi-Link Operation for WiFi 7.
BPF
---
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used,
or in fact passing the verifier at all for complex programs,
especially those using open-coded iterators.
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what
the output buffer *should* be, without writing anything.
- Accept dynptr memory as memory arguments passed to helpers.
- Add routing table ID to bpf_fib_lookup BPF helper.
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands.
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only).
- Show target_{obj,btf}_id in tracing link fdinfo.
- Addition of several new kfuncs (most of the names are self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter
---------
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value.
- Increase ip_vs_conn_tab_bits range for 64BIT builds.
- Allow updating size of a set.
- Improve NAT tuple selection when connection is closing.
Driver API
----------
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out).
- Support configuring rate selection pins of SFP modules.
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines.
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer.
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio).
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message.
- Split devlink instance and devlink port operations.
New hardware / drivers
----------------------
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers
-------
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is
a variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split
the different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and
Enhanced MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S
IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3
MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07
IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ
CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f
mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/
fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp
J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY
dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7
yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/
JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv
tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4=
=9i4I
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers:
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
The changes queued up for v6.5-rc1 for sysctl are in line with
prior efforts to stop usage of deprecated routines which incur
recursion and also make it hard to remove the empty array element
in each sysctl array declaration. The most difficult user to modify
was parport which required a bit of re-thinking of how to declare shared
sysctls there, Joel Granados has stepped up to the plate to do most of
this work and eventual removal of register_sysctl_table(). That work
ended up saving us about 1465 bytes according to bloat-o-meter. Since
we gained a few bloat-o-meter karma points I moved two rather small
sysctl arrays from kernel/sysctl.c leaving us only two more sysctl
arrays to move left.
Most changes have been tested on linux-next for about a month. The last
straggler patches are a minor parport fix, changes to the sysctl
kernel selftest so to verify correctness and prevent regressions for
the future change he made to provide an alternative solution for the
special sysctl mount point target which was using the now deprecated
sysctl child element.
This is all prep work to now finally be able to remove the empty
array element in all sysctl declarations / registrations which is
expected to save us a bit of bytes all over the kernel. That work
will be tested early after v6.5-rc1 is out.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmSceh0SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinBFQQAK9WdpcU8ODoDzoSls4jsCQpZUCfZ+ED
pbCgQqUqu9VPs6bnJ+aXVa6Fh3uCr6+TIfNFM55qI/Sbo2issZ7bm0nvKmGgc6/m
giqDP7btvHqiAsEootci8DVdbBXKkdH4dx3pSwleyN8pdinewH0hrKImaPpahyo6
1mB1du0iI89yjsZmheHVVSyfXXYAnP0PqRVy5Y+qxY7yYlIegQ5uAZmwRE62lfTf
TuiV7OFuDZ2DBYOmqIhfGKGRnfOL5ZVF3iHCrfUpX3p+fEFzDmwvm3vr73PTSrFw
/aRRLa/hOWr5ilw1bvnMcazgQzFEOlQb3DMhBKH7gLl3XHVrM+TaaqYHjUia1+6Y
e2axz/duA2q9uLMW81daRApvHMCgy0exkpC7prfOxF5bgTe4TjA7ZWvGpqG1kPKT
PPSxw80XvG5hLZm4tB0ZWJ5rOfFpiUGGneSeRQwyuClBt73SIO+F03jyGpt83slU
jFE50ac14Zwh1oxpCQtYoR1+bXWdq1QwM5vQBNEuaoTSnJfVjrXqBz/BnqJChtjr
m1vA27+4/dfki2P3gVWF1lGx43ir3uJvqk+BjWXm2CDDJqpRi3N0qcUwZwLuqAAz
/LEgFqK61bpHi/C8c2NWAxIoeWRU4NUOaoiKmZwyt0sKAWU1Yzg70xssYeg7VYqZ
3pvFNVBqkV+F
=sXUU
-----END PGP SIGNATURE-----
Merge tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"The changes for sysctl are in line with prior efforts to stop usage of
deprecated routines which incur recursion and also make it hard to
remove the empty array element in each sysctl array declaration.
The most difficult user to modify was parport which required a bit of
re-thinking of how to declare shared sysctls there, Joel Granados has
stepped up to the plate to do most of this work and eventual removal
of register_sysctl_table(). That work ended up saving us about 1465
bytes according to bloat-o-meter. Since we gained a few bloat-o-meter
karma points I moved two rather small sysctl arrays from
kernel/sysctl.c leaving us only two more sysctl arrays to move left.
Most changes have been tested on linux-next for about a month. The
last straggler patches are a minor parport fix, changes to the sysctl
kernel selftest so to verify correctness and prevent regressions for
the future change he made to provide an alternative solution for the
special sysctl mount point target which was using the now deprecated
sysctl child element.
This is all prep work to now finally be able to remove the empty array
element in all sysctl declarations / registrations which is expected
to save us a bit of bytes all over the kernel. That work will be
tested early after v6.5-rc1 is out"
* tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: replace child with an enumeration
sysctl: Remove debugging dump_stack
test_sysclt: Test for registering a mount point
test_sysctl: Add an option to prevent test skip
test_sysctl: Add an unregister sysctl test
test_sysctl: Group node sysctl test under one func
test_sysctl: Fix test metadata getters
parport: plug a sysctl register leak
sysctl: move security keys sysctl registration to its own file
sysctl: move umh sysctl registration to its own file
signal: move show_unhandled_signals sysctl to its own file
sysctl: remove empty dev table
sysctl: Remove register_sysctl_table
sysctl: Refactor base paths registrations
sysctl: stop exporting register_sysctl_table
parport: Removed sysctl related defines
parport: Remove register_sysctl_table from parport_default_proc_register
parport: Remove register_sysctl_table from parport_device_proc_register
parport: Remove register_sysctl_table from parport_proc_register
parport: Move magic number "15" to a define
The changes queued up for v6.5-rc1 for modules are pretty tame, mostly
code removal of moving of code. Only two minor functional changes are
made, the only one which stands out is Sebastian Andrzej Siewior's
simplification of module reference counting by removing preempt_disable()
and that has been tested on linux-next for well over a month without
no regressions. I'm now, I guess, also a kitchen sink for some kallsyms
changes.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmScfl0SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoin8oMP/0DQK+r3BZimknzz6rF0EBStNZ/dIK2W
1Q/r/ER4VKQKYxklc1M74K+7IX8ZDCYxqlaDS9lvAkRDWNC+t69aNZEib2odJleC
p6WB30P0JIwfZZC0DS/ct3vrWZTyUhw7aOtvABRmjBfiJ3lFlU092Glvk1w1aFbD
UrNRomPu4CujzfmnGj3VGc+HVSOEK0F1/GLm9ClrsR8SzKEpQmH4ALI/ON69B0ea
PmL+d1Wyt6WEoH0hlV1TOXNdHUb3ZO1riSSfDYQ7TiG2AM5w1t4n26YRusc16hYU
6Bx4OGt52ZJYR3btsRQlcylF4R5DUo+boDkM0NqEDU/3ciGMg6DgKdHnYCBN1w+X
ZO8aXK1MIgF7W6CqSz+8HCsu5CuCos55FgM22dPbpZr3OEFCWemqnV+cYCu1DA+M
Gbnn883ZLtt+R+qikD3135s+LxYIvxSuQrj+B3ZoQeIKEtAlyxuhrUJbU0tOns0j
05PrkI8J1FtIysdlNZeIFg752IPtjp/0QNB4R46m40mT16L0TSjEP7c+zcPDryMb
84SdLqh1gis0QZRkoH6JbMBDeT2dtuxqtQ5dTPka4s1mtg3SvRYr53sCJg+gQ8e2
CBW6jgrIf3F4RIMMiSfXpSf4yVVxXxJAEFnGLRXhQ2HkUnk3mdGEfsZc7ucrsnlK
f/KwaEzmLD9c
=gjKD
-----END PGP SIGNATURE-----
Merge tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"The changes queued up for modules are pretty tame, mostly code removal
of moving of code.
Only two minor functional changes are made, the only one which stands
out is Sebastian Andrzej Siewior's simplification of module reference
counting by removing preempt_disable() and that has been tested on
linux-next for well over a month without no regressions.
I'm now, I guess, also a kitchen sink for some kallsyms changes"
[ There was a mis-communication about the concurrent module load changes
that I had expected to come through Luis despite me authoring the
patch. So some of the module updates were left hanging in the email
ether, and I just committed them separately.
It's my bad - I should have made it more clear that I expected my
own patches to come through the module tree too. Now they missed
linux-next, but hopefully that won't cause any issues - Linus ]
* tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kallsyms: make kallsyms_show_value() as generic function
kallsyms: move kallsyms_show_value() out of kallsyms.c
kallsyms: remove unsed API lookup_symbol_attrs
kallsyms: remove unused arch_get_kallsym() helper
module: Remove preempt_disable() from module reference counting.
This is the new-and-improved attempt at avoiding huge memory load spikes
when the user space boot sequence tries to load hundreds (or even
thousands) of redundant duplicate modules in parallel.
See commit 9828ed3f69 ("module: error out early on concurrent load of
the same module file") for background and an earlier failed attempt that
was reverted.
That earlier attempt just said "concurrently loading the same module is
silly, just open the module file exclusively and return -ETXTBSY if
somebody else is already loading it".
While it is true that concurrent module loads of the same module is
silly, the reason that earlier attempt then failed was that the
concurrently loaded module would often be a prerequisite for another
module.
Thus failing to load the prerequisite would then cause cascading
failures of the other modules, rather than just short-circuiting that
one unnecessary module load.
At the same time, we still really don't want to load the contents of the
same module file hundreds of times, only to then wait for an eventually
successful load, and have everybody else return -EEXIST.
As a result, this takes another approach, and treats concurrent module
loads from the same file as "idempotent" in the inode. So if one module
load is ongoing, we don't start a new one, but instead just wait for the
first one to complete and return the same return value as it did.
So unlike the first attempt, this does not return early: the intent is
not to speed up the boot, but to avoid a thundering herd problem in
allocating memory (both physical and virtual) for a module more than
once.
Also note that this does change behavior: it used to be that when you
had concurrent loads, you'd have one "winner" that would return success,
and everybody else would return -EEXIST.
In contrast, this idempotent logic goes all Oprah on the problem, and
says "You are a winner! And you are a winner! We are ALL winners". But
since there's no possible actual real semantic difference between "you
loaded the module" and "somebody else already loaded the module", this
is more of a feel-good change than an actual honest-to-goodness semantic
change.
Of course, any true Johnny-come-latelies that don't get caught in the
concurrency filter will still return -EEXIST. It's no different from
not even getting a seat at an Oprah taping. That's life.
See the long thread on the kernel mailing list about this all, which
includes some numbers for memory use before and after the patch.
Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/
Reviewed-by: Johan Hovold <johan@kernel.org>
Tested-by: Johan Hovold <johan@kernel.org>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Rudi Heitbaum <rudi@heitbaum..com>
Tested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will simplify the next step, where we can then key off the inode to
do one idempotent module load.
Let's do the obvious re-organization in one step, and then the new code
in another.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
top-level directories.
- Douglas Anderson has added a new "buddy" mode to the hardlockup
detector. It permits the detector to work on architectures which
cannot provide the required interrupts, by having CPUs periodically
perform checks on other CPUs.
- Zhen Lei has enhanced kexec's ability to support two crash regions.
- Petr Mladek has done a lot of cleanup on the hard lockup detector's
Kconfig entries.
- And the usual bunch of singleton patches in various places.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJelTAAKCRDdBJ7gKXxA
juDkAP0VXWynzkXoojdS/8e/hhi+htedmQ3v2dLZD+vBrctLhAEA7rcH58zAVoWa
2ejqO6wDrRGUC7JQcO9VEjT0nv73UwU=
=F293
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
directories
- Douglas Anderson has added a new "buddy" mode to the hardlockup
detector. It permits the detector to work on architectures which
cannot provide the required interrupts, by having CPUs periodically
perform checks on other CPUs
- Zhen Lei has enhanced kexec's ability to support two crash regions
- Petr Mladek has done a lot of cleanup on the hard lockup detector's
Kconfig entries
- And the usual bunch of singleton patches in various places
* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
kernel/time/posix-stubs.c: remove duplicated include
ocfs2: remove redundant assignment to variable bit_off
watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
devres: show which resource was invalid in __devm_ioremap_resource()
watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
watchdog/hardlockup: make the config checks more straightforward
watchdog/hardlockup: sort hardlockup detector related config values a logical way
watchdog/hardlockup: move SMP barriers from common code to buddy code
watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
...
- Yosry has also eliminated cgroup's atomic rstat flushing.
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability.
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning.
- Lorenzo Stoakes has done some maintanance work on the get_user_pages()
interface.
- Liam Howlett continues with cleanups and maintenance work to the maple
tree code. Peng Zhang also does some work on maple tree.
- Johannes Weiner has done some cleanup work on the compaction code.
- David Hildenbrand has contributed additional selftests for
get_user_pages().
- Thomas Gleixner has contributed some maintenance and optimization work
for the vmalloc code.
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code.
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting.
- Christoph Hellwig has some cleanups for the filemap/directio code.
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the provided
APIs rather than open-coding accesses.
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings.
- John Hubbard has a series of fixes to the MM selftesting code.
- ZhangPeng continues the folio conversion campaign.
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock.
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
128 to 8.
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management.
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code.
- Vishal Moola also has done some folio conversion work.
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
=B7yQ
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
The writing operation return the count of writes regardless of whether events
are enabled or disabled. Switch it to return -EBADF to indicates that the event
is disabled.
Link: https://lkml.kernel.org/r/20230626111344.19136-2-sunliming@kylinos.cn
Cc: stable@vger.kernel.org7f5a08c79d ("user_events: Add minimal support for trace_event into ftrace")
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk
GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx
9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo
2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz
iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA
9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8
sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r
5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP
n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU
uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G
OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J
cwANDmkL6QQK7yfeeg==
=s0j1
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"There are three areas of note:
A bunch of strlcpy()->strscpy() conversions ended up living in my tree
since they were either Acked by maintainers for me to carry, or got
ignored for multiple weeks (and were trivial changes).
The compiler option '-fstrict-flex-arrays=3' has been enabled
globally, and has been in -next for the entire devel cycle. This
changes compiler diagnostics (though mainly just -Warray-bounds which
is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
coverage. In other words, there are no new restrictions, just
potentially new warnings. Any new FORTIFY warnings we've seen have
been fixed (usually in their respective subsystem trees). For more
details, see commit df8fc4e934.
The under-development compiler attribute __counted_by has been added
so that we can start annotating flexible array members with their
associated structure member that tracks the count of flexible array
elements at run-time. It is possible (likely?) that the exact syntax
of the attribute will change before it is finalized, but GCC and Clang
are working together to sort it out. Any changes can be made to the
macro while we continue to add annotations.
As an example of that last case, I have a treewide commit waiting with
such annotations found via Coccinelle:
https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b
Also see commit dd06e72e68 for more details.
Summary:
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex
arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members"
* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
netfilter: ipset: Replace strlcpy with strscpy
uml: Replace strlcpy with strscpy
um: Use HOST_DIR for mrproper
kallsyms: Replace all non-returning strlcpy with strscpy
sh: Replace all non-returning strlcpy with strscpy
of/flattree: Replace all non-returning strlcpy with strscpy
sparc64: Replace all non-returning strlcpy with strscpy
Hexagon: Replace all non-returning strlcpy with strscpy
kobject: Use return value of strreplace()
lib/string_helpers: Change returned value of the strreplace()
jbd2: Avoid printing outside the boundary of the buffer
checkpatch: Check for 0-length and 1-element arrays
riscv/purgatory: Do not use fortified string functions
s390/purgatory: Do not use fortified string functions
x86/purgatory: Do not use fortified string functions
acpi: Replace struct acpi_table_slit 1-element array with flex-array
clocksource: Replace all non-returning strlcpy with strscpy
string: use __builtin_memcpy() in strlcpy/strlcat
staging: most: Replace all non-returning strlcpy with strscpy
drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmSZuh0UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNpuxAAxChGqME9nE7iITx1TaFRrbK49mDF
1RZh/5cwzde72lLLFkTFKB6ErMSQkrrtA+jFH7vKsrOslBel1+yO80vkXmhYCeZU
P3m0FeREUpuU4QV0tbQamPeR+SWohmKi2dYWd+VdpLA+1aTK3KNYsi2NFkDIreap
BqeRq4S0Rqc4u3/5juk6JCGFhTRWaH16YJQrzIKHF/K3DK+gMhAY5sjuAWzFc6ma
/5bbD55kdVVDfnsxNSe+lzJ7zEf7TYedLG6BN+R9cVrU+El12a38M29kASaAof5w
vpb92a27hA9Q5EyQ2O9QXnr2L5CShT4bvAZCGkK4cmZerGNTdM0iojhYj1s7FAV/
USkWgkDmEuSatp0+DdXlfQyUmZZWlw1W0oiEfZwR8w7TY7q9CU7aD8K7+GDSIazB
g89nYznVjlaC/oA4/owMraoWP3eiDiAcsQdO052Vv63TVyJtTiRiKyBq5EFLrX8L
iaUCa4cBaYFc94kN1PZeNXZKwqRc2F6oAFT1YuXnFWBGmixN0kUL023C0xjl/J7P
02jYYSVzLm22aU39GU0DSnaLfAwl3muazOB3XuyGOhUWHFYzjkc9UhmGp0W50DkK
qigW3ONA8s8CKUS/q7QSGq+Vf+CVZA5f+daDDPGYstPfCTk61eu0wjwfwek3W0o+
xKzBr2Od3vTOzAs=
=3nWy
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- A SafeSetID patch to correct what appears to be a cut-n-paste typo in
the code causing a UID to be printed where a GID was desired.
This is coming via the LSM tree because we haven't been able to get a
response from the SafeSetID maintainer (Micah Morton) in several
months. Hopefully we are able to get in touch with Micah, but until
we do I'm going to pick them up in the LSM tree.
- A small fix to the reiserfs LSM xattr code.
We're continuing to work through some issues with the reiserfs code
as we try to fixup the LSM xattr handling, but in the process we're
uncovering some ugly problems in reiserfs and we may just end up
removing the LSM xattr support in reiserfs prior to reiserfs'
removal.
For better or worse, this shouldn't impact any of the reiserfs users,
as we discovered that LSM xattrs on reiserfs were completely broken,
meaning no one is currently using the combo of reiserfs and a file
labeling LSM.
- A tweak to how the cap_user_data_t struct/typedef is declared in the
header file to appease the Sparse gods.
- In the process of trying to sort out the SafeSetID lost-maintainer
problem I realized that I needed to update the labeled networking
entry to "Supported".
- Minor comment/documentation and spelling fixes.
* tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
device_cgroup: Fix kernel-doc warnings in device_cgroup
SafeSetID: fix UID printed instead of GID
MAINTAINERS: move labeled networking to "supported"
capability: erase checker warnings about struct __user_cap_data_struct
lsm: fix a number of misspellings
reiserfs: Initialize sec->length in reiserfs_security_init().
capability: fix kernel-doc warnings in capability.c
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmSZuaAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMd1Q//euuQA+nZbQN5II/Ht7bJPNtjxjKj
OqZer5dXuwzfjpo2YXxmUrm7NPlyfa4hsyv0q70fV2bUbTy3j+ukAnxDdYv8DAcH
QICdQiRWPkmOZBRq5TsMMhf/pet7ZNuKTZ8x8rHisZ2Oj7AO9I2eiQw5ihu8xND8
NK1WRNEpZVKOVzWzjdUHLSArlQggNKAxsYGyiyeKROWtklAK+WAJfnVXgHRgcKq8
KRi7kdDSN5ewTZIcT38ghGx3stBu/RJTICE5HE1MkQf+ZTLsNKum+bAhrzG2ZsWP
zzo8UWdELdixO7VHFqnN/TTqk9aItjS4AHxoR8Vkt6VXTdjijaEvFDIEeuGSxjMW
qszP7D4kjtsosLGMDCsR35yrG0yrISU9a5xqwZMbNcOCpbMLXg/FVs2WhtJ2MxDH
4AYpbEQdkoSsfAL2F5c9+iKJ/ZFLPjdHM0yPsKfMPWWj8xuhZIwhmPOqh9Ce1vEF
GpS7KGA/35wNpFPDu1KyQML1VanwKKSoerd4HU2SphyBwu8MYNRnxjClNCJkPgcc
XWAxHOIFETUxxBEtVfTsHBytft0go7LpQkAXQ/4olcZBznQoHCr6F5qPhWWsaUDz
WxIqZaiY4WnRER4M1lTgBqdLg2sGdsbRIMzMFhKkdLVFEaPLbZ+7HP4e/SnfzvoA
Y+qTHowFNyOIEd4=
=+w0u
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"A single audit patch that resolves two compiler warnings regarding
missing function prototypes"
* tag 'audit-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: avoid missing-prototype warnings
* Whenever cpuset needs to rebuild sched_domain, it walked all tasks looking
for DEADLINE tasks as they need to be accounted on the new domain. Walking
all tasks can be expensive and there may not be any DEADLINE tasks at all.
Task iteration is now omitted if there are no DEADLINE tasks.
* Fixes DEADLINE bandwidth misaccounting after task migration failures.
* When no controller is enabled, -Wstringop-overflow warning is triggered.
The fix patch added an early exit which is too eager and got reverted for
now. Will fix later.
* Everything else are minor cleanups.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoRHw4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGZatAQCKTv8pb5HEgochph4n26laSdVZs6ce3Y+s7V1T
rum+3QD/TyJFmCkZSMscolZGFuafpg41sjPbmc4SexeuAMYCMgY=
=nioD
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Whenever cpuset needs to rebuild sched_domain, it walked all tasks
looking for DEADLINE tasks as they need to be accounted on the new
domain. Walking all tasks can be expensive and there may not be any
DEADLINE tasks at all. Task iteration is now omitted if there are no
DEADLINE tasks
- Fixes DEADLINE bandwidth misaccounting after task migration failures
- When no controller is enabled, -Wstringop-overflow warning is
triggered. The fix patch added an early exit which is too eager and
got reverted for now. Will fix later
- Everything else is minor cleanups
* tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: Avoid -Wstringop-overflow warnings"
cgroup/misc: Expose misc.current on cgroup v2 root
cgroup: Avoid -Wstringop-overflow warnings
cgroup: remove obsolete comment on cgroup_on_dfl()
cgroup: remove unused task_cgroup_path()
cgroup/cpuset: remove unneeded header files
cgroup: make cgroup_is_threaded() and cgroup_is_thread_root() static
rdmacg: fix kernel-doc warnings in rdmacg
cgroup: Replace the css_set call with cgroup_get
cgroup: remove unused macro for_each_e_css()
cgroup: Update out-of-date comment in cgroup_migrate()
cgroup: Replace all non-returning strlcpy with strscpy
cgroup/cpuset: remove unneeded header files
cgroup/cpuset: Free DL BW in case can_attach() fails
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Iterate only if DEADLINE tasks are present
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
sched/cpuset: Bring back cpuset_mutex
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
* Concurrency-managed per-cpu work items that hog CPUs and delay the
execution of other work items are now automatically detected and excluded
from concurrency management. Reporting on such work items can also be
enabled through a config option.
* Added tools/workqueue/wq_monitor.py which improves visibility into
workqueue usages and behaviors.
* Includes Arnd's minimal fix for gcc-13 enum warning on 32bit compiles.
This conflicts with afa4bb778e ("workqueue: clean up WORK_* constant
types, clarify masking") in master. Can be resolved by picking the master
version.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoGvw4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGZu0AP9IGK2opAzO9i3i1/Ys81b3sHi9PwrYWH3g252T
Oe3O6QD/Wh0wYBVl0o7IdW6BGdd5iNwIEs420G53UmmPrATqsgQ=
=TffY
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Concurrency-managed per-cpu work items that hog CPUs and delay the
execution of other work items are now automatically detected and
excluded from concurrency management. Reporting on such work items
can also be enabled through a config option.
- Added tools/workqueue/wq_monitor.py which improves visibility into
workqueue usages and behaviors.
- Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
superseded by commit afa4bb778e in mainline.
* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
workqueue: fix enum type for gcc-13
workqueue: Track and monitor per-workqueue CPU time usage
workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
workqueue: Improve locking rule description for worker fields
workqueue: Move worker_set/clr_flags() upwards
workqueue: Re-order struct worker fields
workqueue: Add pwq->stats[] and a monitoring script
Further upgrade queue_work_on() comment
- Rework & fix the event forwarding logic by extending the
core interface. This fixes AMD PMU events that have to
be forwarded from the core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSayC0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jlWxAAqUPtfst1p6H5sSyCBPYo5Y/Rh0SyyqJj
w0YZ8p2nbB/+EmIN3WS1uYhx1/AalTP254q2BgVF4DlDFQA1MlJCmSNJ9HhtzOgt
mbpNKzy50cQCR/iH+s3ldcFsLGhSG07j6w8xeb6BGiABm2JoiZeg6iVU76zRe5A1
iPnjC7qoqjKH+sq8pu32fBClMjzf05/LGMd0MqFuYfl5950xRW61olstjo93XWgK
O5z+5wm5H3MhJ2mzU6x+0C/xurIEQ0zRf6AqLbFp41BbJJJORgTCK746flghiqd5
DiADc7oj9eOqL1X9jFPHgE07T/6QPrMC8BoH64pOcM3PoZ6Iq3zTkUHxAw3qK5j+
kqduxzlVaFLFnf7R/vxUvjMg1PM+qP3pqgCrT+NFUdqsdLgSPxRzt5pAM6aAUwmU
1lhuapESH44RUFZGWrfOwzQE5q/FDmUc2yGyGW2aYDmwkclNjVpnvHEJrQMugI3M
M3/y9a+ErcPDUJfHcodutBDGw9l7VhsxJFMt4ydOTkNbEfZLbi2TzNapui6SKFja
G2efrB/HhrV9nE+21Wfa3uxoKMuJ/UPiGrVr2qyGOnShQpK7sdyGDshO1s6TTPye
OoVf9I0LhewMPap52SU/KDP7GJVPW1BhL/C7w6OSnXxlS5k4lOji7z4Dj2hqXHib
19Jm7BhqZwE=
=xn05
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Rework & fix the event forwarding logic by extending the core
interface.
This fixes AMD PMU events that have to be forwarded from the
core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Re-instate the linear PMU search
perf/x86/intel: Define bit macros for FixCntrCtl MSR
perf test: Add selftest to test IBS invocation via core pmu events
perf/core: Remove pmu linear searching code
perf/ibs: Fix interface via core pmu events
perf/core: Rework forwarding of {task|cpu}-clock events
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double().
The cmpxchg128() family of functions is basically & functionally
the same as cmpxchg_double(), but with a saner interface: instead
of a 6-parameter horror that forced u128 - u64/u64-halves layout
details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128 types.
- Restructure the generated atomic headers, and add
kerneldoc comments for all of the generic atomic{,64,_long}_t
operations. Generated definitions are much cleaner now,
and come with documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering
when taking multiple locks of the same type. This gets rid of
one use of lockdep_set_novalidate_class() in the bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended
variable shadowing generating garbage code on Clang on certain
ARM builds.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSav3wRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gDyxAAjCHQjpolrre7fRpyiTDwqzIKT27H04vQ
zrQVlVc42WBnn9pe8LthGy43/RvYvqlZvLoLONA4fMkuYriM6nSMsoZjeUmE+6Rs
QAElQC74P5YvEBOa67VNY3/M7sj22ftDe7ODtVV8OrnPjMk1sQNRvaK025Cs3yig
8MAI//hHGNmyVAp1dPYZMJNqxGCvluReLZ4SaUJFCMrg7YgUXgCBj/5Gi07TlKxn
sT8BFCssoEW/B9FXkh59B1t6FBCZoSy4XSZfsZe0uVAUJ4XDEOO+zBgaWFCedNQT
wP323ryBgMrkzUKA8j2/o5d3QnMA1GcBfHNNlvAl/fOfrxWXzDZnOEY26YcaLMa0
YIuRF/JNbPZlt6DCUVBUEvMPpfNYi18dFN0rat1a6xL2L4w+tm55y3mFtSsg76Ka
r7L2nWlRrAGXnuA+VEPqkqbSWRUSWOv5hT2Mcyb5BqqZRsxBETn6G8GVAzIO6j6v
giyfUdA8Z9wmMZ7NtB6usxe3p1lXtnZ/shCE7ZHXm6xstyZrSXaHgOSgAnB9DcuJ
7KpGIhhSODQSwC/h/J0KEpb9Pr/5jCWmXAQ2DWnZK6ndt1jUfFi8pfK58wm0AuAM
o9t8Mx3o8wZjbMdt6up9OIM1HyFiMx2BSaZK+8f/bWemHQ0xwez5g4k5O5AwVOaC
x9Nt+Tp0Ze4=
=DsYj
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()
The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.
Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.
- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.
The generated definitions are much cleaner now, and come with
documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.
This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.
* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...
- Scheduler SMP load-balancer improvements:
- Avoid unnecessary migrations within SMT domains on hybrid systems.
Problem:
On hybrid CPU systems, (processors with a mixture of higher-frequency
SMT cores and lower-frequency non-SMT cores), under the old code
lower-priority CPUs pulled tasks from the higher-priority cores if
more than one SMT sibling was busy - resulting in many unnecessary
task migrations.
Solution:
The new code improves the load balancer to recognize SMT cores with more
than one busy sibling and allows lower-priority CPUs to pull tasks, which
avoids superfluous migrations and lets lower-priority cores inspect all SMT
siblings for the busiest queue.
- Implement the 'runnable boosting' feature in the EAS balancer: consider CPU
contention in frequency, EAS max util & load-balance busiest CPU selection.
This improves CPU utilization for certain workloads, while leaves other key
workloads unchanged.
- Scheduler infrastructure improvements:
- Rewrite the scheduler topology setup code by consolidating it
into the build_sched_topology() helper function and building
it dynamically on the fly.
- Resolve the local_clock() vs. noinstr complications by rewriting
the code: provide separate sched_clock_noinstr() and
local_clock_noinstr() functions to be used in instrumentation code,
and make sure it is all instrumentation-safe.
- Fixes:
- Fix a kthread_park() race with wait_woken()
- Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
- Fix UP PREEMPT bug by unifying the SMP and UP implementations.
- Fix task_struct::saved_state handling.
- Fix various rq clock update bugs, unearthed by turning on the rq clock
debugging code.
- Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger by
creating enough cgroups, by removing the warnign and restricting
window size triggers to PSI file write-permission or CAP_SYS_RESOURCE.
- Propagate SMT flags in the topology when removing degenerate domain
- Fix grub_reclaim() calculation bug in the deadline scheduler code
- Avoid resetting the min update period when it is unnecessary, in
psi_trigger_destroy().
- Don't balance a task to its current running CPU in load_balance(),
which was possible on certain NUMA topologies with overlapping
groups.
- Fix the sched-debug printing of rq->nr_uninterruptible
- Cleanups:
- Address various -Wmissing-prototype warnings, as a preparation
to (maybe) enable this warning in the future.
- Remove unused code
- Mark more functions __init
- Fix shadow-variable warnings
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSatWQRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j62xAAuGOx1LcDfRGC6WGQzp1zOdlsVQtnDvlS
qL58zYSHgizprpVQ3j87SBaG4CHCdvd2Bo36yW0lNZS4nd203qdq7fkrMb3hPP/w
egUQUzMegf5fF6BWldKeMjuHSt+twFQz/ZAKK8iSbAir6CHNAqbNst1oL0i/+Tyk
o33hBs1hT5tnbFb1NSVZkX4k+qT3LzTW4K2QgjjGtkScr6yHh2BdEVefyigWOjdo
9s02d00ll9a2r+F5txlN7Dnw6TN7rmTXGMOJU5bZvBE90/anNiAorMXHJdEKCyUR
u9+JtBdJWiCplGa/tSRcxT16ZW1VdtTnd9q66TDhXREd2UNDFqBEyg5Wl77K4Tlf
vKFajmj/to+cTbuv6m6TVR+zyXpdEpdL6F04P44U3qiJvDobBqeDNKHHIqpmbHXl
AXUXcPWTVAzXX1Ce5M+BeAgTBQ1T7C5tELILrTNQHJvO1s9VVBRFZ/l65Ps4vu7T
wIZ781IFuopk0zWqHovNvgKrJ7oFmOQQZFttQEe8n6nafkjI7u+IZ8FayiGaUMRr
4GawFGUCEdYh8z9qyslGKe8Q/Rphfk6hxMFRYUJpDmubQ0PkMeDjDGq77jDGl1PF
VqwSDEyOaBJs7Gqf/mem00JtzBmXhkhm1SEjggHMI2IQbr/eeBXoLQOn3CDapO/N
PiDbtX760ic=
=EWQA
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Scheduler SMP load-balancer improvements:
- Avoid unnecessary migrations within SMT domains on hybrid systems.
Problem:
On hybrid CPU systems, (processors with a mixture of
higher-frequency SMT cores and lower-frequency non-SMT cores),
under the old code lower-priority CPUs pulled tasks from the
higher-priority cores if more than one SMT sibling was busy -
resulting in many unnecessary task migrations.
Solution:
The new code improves the load balancer to recognize SMT cores
with more than one busy sibling and allows lower-priority CPUs
to pull tasks, which avoids superfluous migrations and lets
lower-priority cores inspect all SMT siblings for the busiest
queue.
- Implement the 'runnable boosting' feature in the EAS balancer:
consider CPU contention in frequency, EAS max util & load-balance
busiest CPU selection.
This improves CPU utilization for certain workloads, while leaves
other key workloads unchanged.
Scheduler infrastructure improvements:
- Rewrite the scheduler topology setup code by consolidating it into
the build_sched_topology() helper function and building it
dynamically on the fly.
- Resolve the local_clock() vs. noinstr complications by rewriting
the code: provide separate sched_clock_noinstr() and
local_clock_noinstr() functions to be used in instrumentation code,
and make sure it is all instrumentation-safe.
Fixes:
- Fix a kthread_park() race with wait_woken()
- Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
- Fix UP PREEMPT bug by unifying the SMP and UP implementations
- Fix task_struct::saved_state handling
- Fix various rq clock update bugs, unearthed by turning on the rq
clock debugging code.
- Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger
by creating enough cgroups, by removing the warnign and restricting
window size triggers to PSI file write-permission or
CAP_SYS_RESOURCE.
- Propagate SMT flags in the topology when removing degenerate domain
- Fix grub_reclaim() calculation bug in the deadline scheduler code
- Avoid resetting the min update period when it is unnecessary, in
psi_trigger_destroy().
- Don't balance a task to its current running CPU in load_balance(),
which was possible on certain NUMA topologies with overlapping
groups.
- Fix the sched-debug printing of rq->nr_uninterruptible
Cleanups:
- Address various -Wmissing-prototype warnings, as a preparation to
(maybe) enable this warning in the future.
- Remove unused code
- Mark more functions __init
- Fix shadow-variable warnings"
* tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
sched/core: Fixed missing rq clock update before calling set_rq_offline()
sched/deadline: Update GRUB description in the documentation
sched/deadline: Fix bandwidth reclaim equation in GRUB
sched/wait: Fix a kthread_park race with wait_woken()
sched/topology: Mark set_sched_topology() __init
sched/fair: Rename variable cpu_util eff_util
arm64/arch_timer: Fix MMIO byteswap
sched/fair, cpufreq: Introduce 'runnable boosting'
sched/fair: Refactor CPU utilization functions
cpuidle: Use local_clock_noinstr()
sched/clock: Provide local_clock_noinstr()
x86/tsc: Provide sched_clock_noinstr()
clocksource: hyper-v: Provide noinstr sched_clock()
clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
x86/vdso: Fix gettimeofday masking
math64: Always inline u128 version of mul_u64_u64_shr()
s390/time: Provide sched_clock_noinstr()
loongarch: Provide noinstr sched_clock_read()
...
This pull contains the following branches:
doc.2023.05.10a: Documentation updates
fixes.2023.05.11a: Miscellaneous fixes, perhaps most notably:
o Remove RCU_NONIDLE(). The new visibility of most of the idle
loop to RCU has obsoleted this API.
o Make the RCU_SOFTIRQ callback-invocation time limit also apply
to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.
o Add a jiffies-based callback-invocation time limit to handle
long-running callbacks. (The local_clock() function is only
invoked once per 32 callbacks due to its high overhead.)
o Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs,
which fixes a bug that can occur on systems with non-contiguous
CPU numbering.
kvfree.2023.05.10a: kvfree_rcu updates
o Eliminate the single-argument variant of k[v]free_rcu() now
that all uses have been converted to k[v]free_rcu_mightsleep().
o Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks
too soon. Yes, this is closing the barn door after the horse
has escaped, but Murphy says that there will be more horses.
nocb.2023.05.11a: Callback-offloading updates
o Fix a number of bugs involving the shrinker and lazy callbacks.
rcu-tasks.2023.05.10a: Tasks RCU updates
torture.2023.05.15a: Torture-test updates
rcu-urgent.2023.06.06a: Urgent SRCU fix (already pulled)
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmSUuukTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jLB5EACWArBYSbXh9kx6RP3LRkOd//fQWuqx
z/RmHjMx3a2uIQpsbeAj+jrgHYzSOi7Afdnx2s0gUIWGjpF4d+e31eco9xTQtWIs
A3/pXUlcTyaPXEZh5ro763UyBF/K003TAdo7EZAScTfDNp2knqGdEOyXTOXiAULX
GH922kIqg0chbYaWocLY3g5mXeEm+kGY8GrDAB7/B3jHgoyylXzmSULDP4GQV7hw
DkM0GOlc3TSzHonnNS6j1xboqY4HhWIDkBrD4Oh5P//ttMpb1b6gs1zEyjCQcNBe
a6fnNF+0dUwANIZKroPn/L1uTGsEUhmLFkVK+XIuAit97yWI6t+aRH6TzHHYmkpu
wVmLxv/FbJohP7ArWaI8l0gNl0vkli3ZgQXnRvSpCqIFR93AWVMeZsDTGOcLUdry
AZEnuGXHnc9UB0KGOIras0o/EQezKq57JUV2bBZjl/GIDc3qiaJKnBhHysPc1iuE
UfP052vCaoZxO3U/FrObQhjLZnstKBYHj8WolxMjIyNMlRIvDro6O1WG4+mjeLDP
xdrjKGstsJh80CYDei+vJBXsbszhxv8yV4hCQX9JcDl3RjEqOOxgKUnAaP2mm02O
MX33P3MZvSsHGoxkJpXDSlkQlbNqDBMIjZXbZLRF4o8fPhVmQU/4QlJN0iFOoXaQ
1qqGrerEzfn0Jw==
=3LCd
-----END PGP SIGNATURE-----
Merge tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
"Documentation updates
Miscellaneous fixes, perhaps most notably:
- Remove RCU_NONIDLE(). The new visibility of most of the idle loop
to RCU has obsoleted this API.
- Make the RCU_SOFTIRQ callback-invocation time limit also apply to
the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.
- Add a jiffies-based callback-invocation time limit to handle
long-running callbacks. (The local_clock() function is only invoked
once per 32 callbacks due to its high overhead.)
- Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which
fixes a bug that can occur on systems with non-contiguous CPU
numbering.
kvfree_rcu updates:
- Eliminate the single-argument variant of k[v]free_rcu() now that
all uses have been converted to k[v]free_rcu_mightsleep().
- Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too
soon. Yes, this is closing the barn door after the horse has
escaped, but Murphy says that there will be more horses.
Callback-offloading updates:
- Fix a number of bugs involving the shrinker and lazy callbacks.
Tasks RCU updates
Torture-test updates"
* tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
torture: Remove duplicated argument -enable-kvm for ppc64
doc/rcutorture: Add description of rcutorture.stall_cpu_block
rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
rcutorture: Correct name of use_softirq module parameter
locktorture: Add long_hold to adjust lock-hold delays
rcu/nocb: Make shrinker iterate only over NOCB CPUs
rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
rcu: Make rcu_cpu_starting() rely on interrupts being disabled
rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
rcu: Employ jiffies-based backstop to callback time limit
rcu: Check callback-invocation time limit for rcuc kthreads
rcu: Remove RCU_NONIDLE()
rcu: Add more RCU files to kernel-api.rst
rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output
rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
rcu/nocb: Fix shrinker race against callback enqueuer
rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
...
- Introduce power capping core support for Intel TPMI (Topology Aware
Register and PM Capsule Interface) and a TPMI interface driver for
Intel RAPL (Zhang Rui, Dan Carpenter).
- Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping
driver (Zhang Rui).
- Fix invalid initialization for pl4_supported field in the Intel RAPL
power capping driver (Sumeet Pawnikar).
- Clean up the intel_idle driver, make it work with VM guests that
cannot use the MWAIT instruction and address the case in which the
host may enter a deep idle state when the guest is idle (Arjan van
de Ven).
- Prevent cpufreq drivers that provide the ->adjust_perf() callback
without a ->fast_switch() one which is used as a fallback from the
former in some cases (Wyes Karny).
- Fix some issues related to the AMD P-state cpufreq driver (Mario
Limonciello, Wyes Karny).
- Fix the energy_performance_preference attribute handling in the
intel_pstate driver in passive mode (Tero Kristo).
- Fix the handling of pm_suspend_target_state when CONFIG_PM is unset
(Kai-Heng Feng).
- Correct spelling mistake in a comment in the hibernation code (Wang
Honghui).
- Add arch_resume_nosmt() prototype to avoid a "missing prototypes"
build warning (Arnd Bergmann).
- Restrict pm_pr_dbg() to system-wide power transitions and use it in
a few additional places (Mario Limonciello).
- Drop verification of in-params from genpd_add_device() and ensure
that all of its callers will do it (Ulf Hansson).
- Prevent possible integer overflows from occurring in
genpd_parse_state() (Nikita Zhandarovich).
- Reorder fieldls in 'struct devfreq_dev_status' to reduce its size
somewhat (Christophe JAILLET).
- Ensure that the Exynos PPMU driver is already loaded before the
Exynos Bus driver starts probing so as to avoid a possible freeze
loading of the kernel modules (Marek Szyprowski).
- Fix variable deferencing before NULL check in the mtk-cci devfreq
driver (Sukrut Bellary).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSZwPASHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxu44P/AouvVFDMt+eE76nPfNc10X1lswS0vrT
X7LmylSDPyWiuz6p8MWGXB6T2nQ+3DbvSfVyBJ960ymkBnE/F9me8o3wB8eGbd6z
ZvOD8+wVPXS4Cq8gUxy2zV1ul+o5IwwT20cYC6mWjasvByl13vTevN5d6ZQ9o6hS
1hAQQDd6JjsdLIUyU0EbE4aD+l4h96o45IFxbV86qVH77ywa6VMNdulRKmDcONj3
kM7jHFYL4xl0TfMjHp4IhGWXK32qGYgX1zYTOU5kSc11IExJfVzQcL2uQ9A0KSLp
RJ0c93loUsHdMhenNkN4nSBFWBIaftKDLbS+5Ubt0DBuNN7kxWivEVts4DM/wxuB
72PNl5h8YglcW7LHH2IXb/6HEerzbj42+6y459o+M0DcNTq18gu19OQTK5IGtRrQ
Yf6+5BhgLR3R1REg0eaBg6njtGq0f5fmW7Iqo52eA8cXhHU0MTDJE1p6ytfN40gH
ViA+T8HB6Mh91lWHVftbwo3wONHGcfJy+S2hGM45V5LKEGeKHILgmw0nUzO7epWE
7VIPKGzkVd7h/Drk7X3nQR3DJFA/x5eNhjxt5LZD83cVVg34SS3ST5oH13FI9S7Q
8zwG5KoHTDrmYug3sxQ+Q9pq8MnOl0ZbgqlVfwyiWjKYmNMbg4elsQG/2zD3t0kv
y8zXbr7Kr4QB
=SZV7
-----END PGP SIGNATURE-----
Merge tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add Intel TPMI (Topology Aware Register and PM Capsule
Interface) support to the power capping subsystem, extend the
intel_idle driver to work in VM guests where MWAIT is not available,
extend the system-wide power management diagnostics, fix bugs and
clean up code.
Specifics:
- Introduce power capping core support for Intel TPMI (Topology Aware
Register and PM Capsule Interface) and a TPMI interface driver for
Intel RAPL (Zhang Rui, Dan Carpenter)
- Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping
driver (Zhang Rui)
- Fix invalid initialization for pl4_supported field in the Intel
RAPL power capping driver (Sumeet Pawnikar)
- Clean up the intel_idle driver, make it work with VM guests that
cannot use the MWAIT instruction and address the case in which the
host may enter a deep idle state when the guest is idle (Arjan van
de Ven)
- Prevent cpufreq drivers that provide the ->adjust_perf() callback
without a ->fast_switch() one which is used as a fallback from the
former in some cases (Wyes Karny)
- Fix some issues related to the AMD P-state cpufreq driver (Mario
Limonciello, Wyes Karny)
- Fix the energy_performance_preference attribute handling in the
intel_pstate driver in passive mode (Tero Kristo)
- Fix the handling of pm_suspend_target_state when CONFIG_PM is unset
(Kai-Heng Feng)
- Correct spelling mistake in a comment in the hibernation code (Wang
Honghui)
- Add arch_resume_nosmt() prototype to avoid a "missing prototypes"
build warning (Arnd Bergmann)
- Restrict pm_pr_dbg() to system-wide power transitions and use it in
a few additional places (Mario Limonciello)
- Drop verification of in-params from genpd_add_device() and ensure
that all of its callers will do it (Ulf Hansson)
- Prevent possible integer overflows from occurring in
genpd_parse_state() (Nikita Zhandarovich)
- Reorder fieldls in 'struct devfreq_dev_status' to reduce its size
somewhat (Christophe JAILLET)
- Ensure that the Exynos PPMU driver is already loaded before the
Exynos Bus driver starts probing so as to avoid a possible freeze
loading of the kernel modules (Marek Szyprowski)
- Fix variable deferencing before NULL check in the mtk-cci devfreq
driver (Sukrut Bellary)"
* tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits)
intel_idle: Add a "Long HLT" C1 state for the VM guest mode
cpufreq: intel_pstate: Fix energy_performance_preference for passive
cpufreq: amd-pstate: Add a kernel config option to set default mode
cpufreq: amd-pstate: Set a fallback policy based on preferred_profile
ACPI: CPPC: Add definition for undefined FADT preferred PM profile value
cpufreq: amd-pstate: Set default governor to schedutil
PM: domains: Move the verification of in-params from genpd_add_device()
cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated
cpufreq: amd-pstate: Write CPPC enable bit per-socket
intel_idle: Add support for using intel_idle in a VM guest using just hlt
cpufreq: Fail driver register if it has adjust_perf without fast_switch
intel_idle: clean up the (new) state_update_enter_method function
intel_idle: refactor state->enter manipulation into its own function
platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages
pinctrl: amd: Use pm_pr_dbg to show debugging messages
ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking
include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume
powercap: RAPL: Fix a NULL vs IS_ERR() bug
powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
powercap: RAPL: fix invalid initialization for pl4_supported field
...
- Remove repeated 'the' in comments
- Remove unused current_untag_mask()
- Document urgent tip branch timing
- Clean up MSR kernel-doc notation
- Clean up paravirt_ops doc
- Update Srivatsa S. Bhat's maintained areas
- Remove unused extern declaration acpi_copy_wakeup_routine()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmSZ6xIACgkQaDWVMHDJ
krB9aQ/+NjB4CiWLbrnOYj9QYG6p1GE7lfu2dzIDdmcNuiai8htopXys54Igy3Rq
BbIoW4E0SGK5E2OD7nLe4fBA/LpsYZTwDhGUu3SiovxLOoC5qkF0Q+6aVypPJE5o
q7kn0Eo9IDL1dO0EbJptFDJRjk3K5caEoyXJRelarjIfPRbDEhUFaybVRykMZN9I
4AOxrlb9WFggT4gUE4+N0kWyEqdgI9/aguavmasaG4lBHZ5JAHNQPNIa8bkVSAPL
wULAzsrGp96V3tVxdjDCzD9aumk4xlJq7gk+v7mfx013dg7Cjs074Xoi2Y+TmaC7
fdIZiGPJIkNToW+nENVO7BYtACSQhXeVTGxLQO/HNTDc//ZWiIUoJT2U4qu/6e6F
aAIGoLwv68H4BghS2qx6Gz+BTIfl35mcPUb75MQhu+D84QZoZWrdamCYhsvHeZzc
uC3nojrb6PBOth9nJsRae+j1zpRe/DT2LvHSWPJgK6EygOAi05ZfYUll/6sb0vze
IXkUrVV1BvDDVpY9/HnE8RpDCDolP0/ezK9zsw48arZtkc+Qmw2WlD/2D98E+pSb
MJPelbVmpzWTaoR4jDzXJCXkWe7CQJ5uPQj5azAE9l7YvnxgCQP5xnm5sLU9eyLu
RsOwRzss0+3z44x5rJi9nSxQJ0LHfTAzW8/ZmNSZGHzi0ClszK0=
=N82i
-----END PGP SIGNATURE-----
Merge tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Dave Hansen:
"As usual, these are all over the map. The biggest cluster is work from
Arnd to eliminate -Wmissing-prototype warnings:
- Address -Wmissing-prototype warnings
- Remove repeated 'the' in comments
- Remove unused current_untag_mask()
- Document urgent tip branch timing
- Clean up MSR kernel-doc notation
- Clean up paravirt_ops doc
- Update Srivatsa S. Bhat's maintained areas
- Remove unused extern declaration acpi_copy_wakeup_routine()"
* tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine()
Documentation: virt: Clean up paravirt_ops doc
x86/mm: Remove unused current_untag_mask()
x86/mm: Remove repeated word in comments
x86/lib/msr: Clean up kernel-doc notation
x86/platform: Avoid missing-prototype warnings for OLPC
x86/mm: Add early_memremap_pgprot_adjust() prototype
x86/usercopy: Include arch_wb_cache_pmem() declaration
x86/vdso: Include vdso/processor.h
x86/mce: Add copy_mc_fragile_handle_tail() prototype
x86/fbdev: Include asm/fb.h as needed
x86/hibernate: Declare global functions in suspend.h
x86/entry: Add do_SYSENTER_32() prototype
x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled()
x86/mm: Include asm/numa.h for set_highmem_pages_init()
x86: Avoid missing-prototype warnings for doublefault code
x86/fpu: Include asm/fpu/regset.h
x86: Add dummy prototype for mk_early_pgtbl_32()
x86/pci: Mark local functions as 'static'
x86/ftrace: Move prepare_ftrace_return prototype to header
...