kexec allows replacing the current kernel with a different one. This is
usually a source of concerns for sysadmins that want to harden a system.
Linux already provides a way to disable loading new kexec kernel via
kexec_load_disabled, but that control is very coard, it is all or nothing
and does not make distinction between a panic kexec and a normal kexec.
This patch introduces new sysctl parameters, with finer tuning to specify
how many times a kexec kernel can be loaded. The sysadmin can set
different limits for kexec panic and kexec reboot kernels. The value can
be modified at runtime via sysctl, but only with a stricter value.
With these new parameters on place, a system with loadpin and verity
enabled, using the following kernel parameters:
sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a
good warranty that if initrd tries to load a panic kernel, a malitious
user will have small chances to replace that kernel with a different one,
even if they can trigger timeouts on the disk where the panic kernel
lives.
Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Both syscalls (kexec and kexec_file) do the same check, let's factor it
out.
Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use the 'struct' keyword for a struct's kernel-doc notation to avoid a
kernel-doc warning:
kernel/user_namespace.c:232: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* idmap_key struct holds the information necessary to find an idmapping in a
Link: https://lkml.kernel.org/r/20230108021243.16683-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When destroying a kthread worker warn if there are still some pending
delayed works. This indicates that the caller should clear all pending
delayed works before destroying the kthread worker.
Link: https://lkml.kernel.org/r/20230104144230.938521-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Just one fix for modules by Nick.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmPB5S4SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoin0OUQALx+uch7cZvl3aznH4CAkF1U2mKuESLT
unQsnSlxIF0BETTAdPbhcCVEtKi102Xq4R6ffCSilB33b0IGM6qY6ZUQb/GMGhg5
qIur7EDiFksjCBmBuE1iNaAknkiKeirkJTEC4lEOl8OLXprmnuMeeR+E0BDi5sH3
YIxexLuhyKeF9Ke4bjAJ7A4WKvh2R7yamISCUcP6GL3w7U9urSjo9FHKcUmFH9nr
qCX/1zE0fz7iJzb9YtBNVhdgKNOYNOxa7TDNXQCVuLcZQfmQqDMBVgKdOkj6TAWX
6L2CHT4N9IjPt9+DrlEUf6bSSwP4N4aFdyMAo6UlVXcgvEbTS/kdoMSqFOAQSAdb
G/lzsvS3fD76VcCdZkwAFYnEhUTJ4xWTS0oaI++tu0EFX5lvRuQv2DRt0WULlD/u
L0paUwmjtVajcSIATxRZkjoMiVD4btDRz30kaIUU/xoc1Gg/EADrSLHESaZ9eZVL
EJ40aqLLIRBXGZrVEzvf97HIzuQiKfaPzywNvbMpxG3m0tV2pn3Z4ts/A8aO7c+O
mBDnTURiZN6pT+xsnJBvqWrlXwPRUGwI+NjRcdPZhUyfgj5MHpEI8PAcUWy6TTUn
H2P6x2iC3/nypqhnwjoixSptjaUWcak3R6UgwVS2YqfjePCqaq0wg9DAFDQH0Yx3
awAOoum0Ubin
=aa3U
-----END PGP SIGNATURE-----
Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module fix from Luis Chamberlain:
"Just one fix for modules by Nick"
* tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kallsyms: Fix scheduling with interrupts disabled in self-test
kallsyms_on_each* may schedule so must not be called with interrupts
disabled. The iteration function could disable interrupts, but this
also changes lookup_symbol() to match the change to the other timing
code.
Reported-by: Erhard F. <erhard_f@mailbox.org>
Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com
Fixes: 30f3bb0977 ("kallsyms: Add self-test facility")
Tested-by: "Erhard F." <erhard_f@mailbox.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
triggering an integer overflow and disabling the feature.
- Fix use-after-free bug in dup_user_cpus_ptr().
- Fix do_set_cpus_allowed() deadlock scenarios related to calling kfree() with
the pi_lock held. NOTE: the rcu_free() is the 'lazy' solution here - we looked
at patches to free the structure after the pi_lock got dropped, but that looked
quite a bit messier - and none of this is truly performance critical. We can
revisit this if it's too lazy of a solution ...
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPAF7MRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gsaA/+Ic2IfCw5F836xBqEP/CI6kKHw13rtd3e
zE41Jz6gpLQ6FcfiQpma8ZEhgEbW5zlvv5IXvDMVy5joHtWBIuxLDT5qXZSzgKHC
Trf+kGcBQFo19axInDV0kI5EY9zr6msEa3jvk34WJeGrHQZlIJCvXamy5FryWqu+
yL1IQjQiHnjPfL5Ez3XWkjP+ijboy/gpgzOT4lqrHyl4U+y24Iuq/CpitDVcE3yf
ExI1k9s6WsjLGL24hob+/jH3sWK57GWwIR1yDN1sQ8soX5V9oIjtHXRCaxqD8N+v
UcDNZBpfH+7Mmsg7EKU/nYgtB7kbfbEjJbpxph9grsxNHEkbPTc0zLlERJa7VuIx
HHABEdAstEq7V7WNPNEZpcgeoLyHCbLnnaxbNSXtj4nGyzgn8cEoDzeyZUw8mZFb
1ZYhGwba5BojKTvQmxVcWyqCi0Z32ERBYibGULnRbjX5TgOIEllhZ5VjtpI1b593
M4o1VUsKYeLl6QbdXYNlIiFUOmsW4eJBDy3kDxh4l2xH4LgKNaoh8zoAYtUQgRJU
JOkP03PWvOoZEgcqur6ECcHolHaR65aFs8fkuZ+jMSUXOsIqSji6Cc1RPCQs3vpy
0tYFcnon9ZntPvecxQpbyretbRaehAFdZntKjfsS8RQzraFiPu3yfdHR57D97ioo
u877g9M/kZo=
=gCfL
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix scheduler frequency invariance bug related to overly long
tickless periods triggering an integer overflow and disabling the
feature.
- Fix use-after-free bug in dup_user_cpus_ptr().
- Fix do_set_cpus_allowed() deadlock scenarios related to calling
kfree() with the pi_lock held. NOTE: the rcu_free() is the 'lazy'
solution here - we looked at patches to free the structure after the
pi_lock got dropped, but that looked quite a bit messier - and none
of this is truly performance critical. We can revisit this if it's
too lazy of a solution ...
* tag 'sched-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Use kfree_rcu() in do_set_cpus_allowed()
sched/core: Fix use-after-free bug in dup_user_cpus_ptr()
sched/core: Fix arch_scale_freq_tick() on tickless systems
- fix xtensa allmodconfig build broken by the kcsan test
- drop unused members of struct thread_struct
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAmO9qEMTHGpjbXZia2Jj
QGdtYWlsLmNvbQAKCRBR+cyR+D+gRCsAD/9d3kNFdjDkSWBLaPOxENJzLBsrd6uK
ONEMekNGXWcy0sqbCi7keqqYSCD9BwKbup1XNW9fQ949OuvXfBYtBvJZmhQa50cZ
9FHVJkeuLTDnBXCKj4QyqNQ6bflviKV7CPbndsWA1bbP8WRKSW47bvr+8dCLvVNm
LaOEz2V7XDmFMDBLhIHYk/VCjOAoMmyXCEgpcbyJxLN/Mv09S/M3ZVYQbAtfwdu1
MUtVDuEqp1hP+6IdPmb0wWMe08AFLRd+5Ney/WoM2Fy8uXrthdfugtqUhuw+YHuV
EHxqcc342Pe1rNZ2o+NXLe6EmZiBC7pzo6RZK/LpvHWWSluoWQz1wbOehaS8Td5D
O5CFegEp0xR3ESCIyzFcAY32xWrPpv8ie32aDxkW3kkgZt450brTG3Okp9qfNkkQ
vhr0flO61W2WBpfrdT0RqvH+x+60NLSrqU0wRc/a9quTQ5ov5SPW9akGtavRMTtO
1t/L/BWXbQ50AsZejNisuYlVXICBnQRDbBJoENclqN6u1uO62cxJR7thuT7ckQhh
fuMlo3KTC4dcKyS7QwyjayFmfMWvhZD18AtG1Xg1QB58tAzjXemZPdlG+GRXoDmx
kViz8ACsRunPIWhfo/EDPZPYFLItIGa1JtTwnFGGVxw/yIH9vlmFrf+c1gmk+I0q
PXPr99tabm2hHQ==
=3G83
-----END PGP SIGNATURE-----
Merge tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix xtensa allmodconfig build broken by the kcsan test
- drop unused members of struct thread_struct
* tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: drop unused members of struct thread_struct
kcsan: test: don't put the expect array on the stack
Commit 851a723e45 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") may call kfree() if user_cpus_ptr was previously
set. Unfortunately, some of the callers of do_set_cpus_allowed()
may have pi_lock held when calling it. So the following splats may be
printed especially when running with a PREEMPT_RT kernel:
WARNING: possible circular locking dependency detected
BUG: sleeping function called from invalid context
To avoid these problems, kfree_rcu() is used instead. An internal
cpumask_rcuhead union is created for the sole purpose of facilitating
the use of kfree_rcu() to free the cpumask.
Since user_cpus_ptr is not being used in non-SMP configs, the newly
introduced alloc_user_cpus_ptr() helper will return NULL in this case
and sched_setaffinity() is modified to handle this special case.
Fixes: 851a723e45 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221231041120.440785-3-longman@redhat.com
Since commit 07ec77a1d4 ("sched: Allow task CPU affinity to be
restricted on asymmetric systems"), the setting and clearing of
user_cpus_ptr are done under pi_lock for arm64 architecture. However,
dup_user_cpus_ptr() accesses user_cpus_ptr without any lock
protection. Since sched_setaffinity() can be invoked from another
process, the process being modified may be undergoing fork() at
the same time. When racing with the clearing of user_cpus_ptr in
__set_cpus_allowed_ptr_locked(), it can lead to user-after-free and
possibly double-free in arm64 kernel.
Commit 8f9ea86fdf ("sched: Always preserve the user requested
cpumask") fixes this problem as user_cpus_ptr, once set, will never
be cleared in a task's lifetime. However, this bug was re-introduced
in commit 851a723e45 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in
do_set_cpus_allowed(). This time, it will affect all arches.
Fix this bug by always clearing the user_cpus_ptr of the newly
cloned/forked task before the copying process starts and check the
user_cpus_ptr state of the source task under pi_lock.
Note to stable, this patch won't be applicable to stable releases.
Just copy the new dup_user_cpus_ptr() function over.
Fixes: 07ec77a1d4 ("sched: Allow task CPU affinity to be restricted on asymmetric systems")
Fixes: 851a723e45 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Reported-by: David Wang 王标 <wangbiao3@xiaomi.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221231041120.440785-2-longman@redhat.com
In order for the scheduler to be frequency invariant we measure the
ratio between the maximum CPU frequency and the actual CPU frequency.
During long tickless periods of time the calculations that keep track
of that might overflow, in the function scale_freq_tick():
if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt))
goto error;
eventually forcing the kernel to disable the feature for all CPUs,
and show the warning message:
"Scheduler frequency invariance went wobbly, disabling!".
Let's avoid that by limiting the frequency invariant calculations
to CPUs with regular tick.
Fixes: e2b0d619b4 ("x86, sched: check for counters overflow in frequency invariant accounting")
Suggested-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Yair Podemsky <ypodemsk@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Link: https://lore.kernel.org/r/20221130125121.34407-1-ypodemsk@redhat.com
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons,
avoid null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught
by Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch
and kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal document
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmO3MLcACgkQMUZtbf5S
IrsEQBAAijPrpxsGMfX+VMqZ8RPKA3Qg8XF3ji2fSp4c0kiKv6lYI7PzPTR3u/fj
CAlhQMHv7z53uM6Zd7FdUVl23paaEycu8YnlwSubg9z+wSeh/RQ6iq94mSk1PV+K
LLVR/yop2N35Yp/oc5KZMb9fMLkxRG9Ci73QUVVYgvIrSd4Zdm13FjfVjL2C1MZH
Yp003wigMs9IkIHOpHjNqwn/5s//0yXsb1PgKxCsaMdMQsG0yC+7eyDmxshCqsji
xQm15mkGMjvWEYJaa4Tj4L3JW6lWbQzCu9nqPUX16KpmrnScr8S8Is+aifFZIBeW
GZeDYgvjSxNWodeOrJnD3X+fnbrR9+qfx7T9y7XighfytAz5DNm1LwVOvZKDgPFA
s+LlxOhzkDNEqbIsusK/LW+04EFc5gJyTI2iR6s4SSqmH3c3coJZQJeyRFWDZy/x
1oqzcCcq8SwGUTJ9g6HAmDQoVkhDWDT/ZcRKhpWG0nJub972lB2iwM7LrAu+HoHI
r8hyCkHpOi5S3WZKI9gPiGD+yOlpVAuG2wHg2IpjhKQvtd9DFUChGDhFeoB2rqJf
9uI3RJBBYTDkeNu3kpfy5uMh2XhvbIZntK5kwpJ4VettZWFMaOAzn7KNqk8iT4gJ
ASMrUrX59X0TAN0MgpJJm7uGtKbKZOu4lHNm74TUxH7V7bYn7dk=
=TlcN
-----END PGP SIGNATURE-----
Merge tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
Clean up kernel-doc complaints about function names and non-kernel-doc
comments in kernel/time/. Fixes these warnings:
kernel/time/time.c:479: warning: expecting prototype for set_normalized_timespec(). Prototype was for set_normalized_timespec64() instead
kernel/time/time.c:553: warning: expecting prototype for msecs_to_jiffies(). Prototype was for __msecs_to_jiffies() instead
kernel/time/timekeeping.c:1595: warning: contents before sections
kernel/time/timekeeping.c:1705: warning: This comment starts with '/**', but isn't a kernel-doc comment.
* We have three kinds of time sources to use for sleep time
kernel/time/timekeeping.c:1726: warning: This comment starts with '/**', but isn't a kernel-doc comment.
* 1) can be determined whether to use or not only when doing
kernel/time/tick-oneshot.c:21: warning: missing initial short description on line:
* tick_program_event
kernel/time/tick-oneshot.c:107: warning: expecting prototype for tick_check_oneshot_mode(). Prototype was for tick_oneshot_mode_active() instead
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230103032849.12723-1-rdunlap@infradead.org
Size of the 'expect' array in the __report_matches is 1536 bytes, which
is exactly the default frame size warning limit of the xtensa
architecture.
As a result allmodconfig xtensa kernel builds with the gcc that does not
support the compiler plugins (which otherwise would push the said
warning limit to 2K) fail with the following message:
kernel/kcsan/kcsan_test.c:257:1: error: the frame size of 1680 bytes
is larger than 1536 bytes
Fix it by dynamically allocating the 'expect' array.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Marco Elver <elver@google.com>
- Fix a use-after-free on the perf syscall's error path
- A potential integer overflow fix in amd_core_pmu_init()
- Fix the cgroup events tracking after the context handling rewrite
- Return the proper value from the inherit_event() function on error
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOxotcACgkQEsHwGGHe
VUqN0g//fCMUbsO/TRfDavqFMdAYi93EHWwd2I8KUDd2TPKY6KA2gQndw8aEvAF8
iO/zO0osD0+23ANmpLBIGuyrM41UqiG+U/Q70++t/0yaD4nYegcpIamg518MEAK9
HxP4Gl3A9Yy4QbcOgLGq+ouOvUljJAB3jhPc+KsWmgxsLrua4JZeEgIwNm/4yLOG
cgetqX7l1A5ASPxU3MO+wSyjEHVIq6rA6y3HllpIHLYy5/TyvkhNZiykAxRn8Exy
yznyYkifNUfCh0TOusMLmK5qR/UFPbGbZfxBlw8ni9sIocsz02U3N4XOwBPVgi8x
2OVyH1j+cZUuBuk4AH39FmaFDe/PZxFiYvInH/Y1vmS5uqi5v7hn23VLxqykPbdw
Drz/vo7YiuPvM3R1ibf0yXN53mn+zMzKFRYRQwPWN6c1ocxEltaeaDcRsLhh8ql+
RM2mKVvWjELRuFWQpV8KCKuEpCz/niQ/8wvwXCxtPyJt9wiTPu4+j7K+p27T7FQR
33zPSGgN/nCuVEBOIsyi3B7zzIoRy0l3nM6tOTZXM/TVFhLyda0fJUV6Ydi39CGy
Mf1FnLd+9nrR+Oqinh/DzJ5nqoSrSxPV9FXRu+0bMsYD+lbiMtOZFw8VaQzxR1gD
dUOve8zUgQAusdvGeJU1+iOmdAxbiSS9s3pqw7G+d+O4a/x7PBE=
=M9iU
-----END PGP SIGNATURE-----
Merge tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Pass only an initialized perf event attribute to the LSM hook
- Fix a use-after-free on the perf syscall's error path
- A potential integer overflow fix in amd_core_pmu_init()
- Fix the cgroup events tracking after the context handling rewrite
- Return the proper value from the inherit_event() function on error
* tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Call LSM hook after copying perf_event_attr
perf: Fix use-after-free in error path
perf/x86/amd: fix potential integer overflow on shift of a int
perf/core: Fix cgroup events tracking
perf core: Return error pointer if inherit_event() fails to find pmu_ctx
Instead of counting on prior allocations to have sized allocations to
the next kmalloc bucket size, always perform a krealloc that is at least
ksize(dst) in size (which is a no-op), so the size can be correctly
tracked by all the various allocation size trackers (KASAN,
__alloc_size, etc).
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Link: https://lore.kernel.org/bpf/20221223094551.GA1439509@ubuntu
Fixes: ceb35b666d ("bpf/verifier: Use kmalloc_size_roundup() to match ksize() usage")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221223182836.never.866-kees@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix the system crash that happens when a task iterator travel through
vma of tasks.
In task iterators, we used to access mm by following the pointer on
the task_struct; however, the death of a task will clear the pointer,
even though we still hold the task_struct. That can cause an
unexpected crash for a null pointer when an iterator is visiting a
task that dies during the visit. Keeping a reference of mm on the
iterator ensures we always have a valid pointer to mm.
Co-developed-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reported-by: Nathan Slingerland <slinger@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221216221855.4122288-2-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the scenario where livepatch and kretfunc coexist, the pageattr of
im->image is rox after arch_prepare_bpf_trampoline in
bpf_trampoline_update, and then modify_fentry or register_fentry returns
-EAGAIN from bpf_tramp_ftrace_ops_func, the BPF_TRAMP_F_ORIG_STACK flag
will be configured, and arch_prepare_bpf_trampoline will be re-executed.
At this time, because the pageattr of im->image is rox,
arch_prepare_bpf_trampoline will read and write im->image, which causes
a fault. as follows:
insmod livepatch-sample.ko # samples/livepatch/livepatch-sample.c
bpftrace -e 'kretfunc:cmdline_proc_show {}'
BUG: unable to handle page fault for address: ffffffffa0206000
PGD 322d067 P4D 322d067 PUD 322e063 PMD 1297e067 PTE d428061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 2 PID: 270 Comm: bpftrace Tainted: G E K 6.1.0 #5
RIP: 0010:arch_prepare_bpf_trampoline+0xed/0x8c0
RSP: 0018:ffffc90001083ad8 EFLAGS: 00010202
RAX: ffffffffa0206000 RBX: 0000000000000020 RCX: 0000000000000000
RDX: ffffffffa0206001 RSI: ffffffffa0206000 RDI: 0000000000000030
RBP: ffffc90001083b70 R08: 0000000000000066 R09: ffff88800f51b400
R10: 000000002e72c6e5 R11: 00000000d0a15080 R12: ffff8880110a68c8
R13: 0000000000000000 R14: ffff88800f51b400 R15: ffffffff814fec10
FS: 00007f87bc0dc780(0000) GS:ffff88803e600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0206000 CR3: 0000000010b70000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bpf_trampoline_update+0x25a/0x6b0
__bpf_trampoline_link_prog+0x101/0x240
bpf_trampoline_link_prog+0x2d/0x50
bpf_tracing_prog_attach+0x24c/0x530
bpf_raw_tp_link_attach+0x73/0x1d0
__sys_bpf+0x100e/0x2570
__x64_sys_bpf+0x1c/0x30
do_syscall_64+0x5b/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
With this patch, when modify_fentry or register_fentry returns -EAGAIN
from bpf_tramp_ftrace_ops_func, the pageattr of im->image will be reset
to nx+rw.
Cc: stable@vger.kernel.org
Fixes: 00963a2e75 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221224133146.780578-1-nashuiliang@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In a scenario where kcalloc() fails to allocate memory, the futex_waitv
system call immediately returns -ENOMEM without invoking
destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this
results in leaking a timer debug object.
Fixes: bf69bad38c ("futex: Implement sys_futex_waitv()")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org # v5.16+
Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios.com
It passes the attr struct to the security_perf_event_open() but it's
not initialized yet.
Fixes: da97e18458 ("perf_event: Add support for LSM and SELinux checks")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org
We encounter perf warnings when using cgroup events like:
cd /sys/fs/cgroup
mkdir test
perf stat -e cycles -a -G test
Which then triggers:
WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0
Call Trace:
<TASK>
__schedule+0x4ae/0x9f0
? _raw_spin_unlock_irqrestore+0x23/0x40
? __cond_resched+0x18/0x20
preempt_schedule_common+0x2d/0x70
__cond_resched+0x18/0x20
wait_for_completion+0x2f/0x160
? cpu_stop_queue_work+0x9e/0x130
affine_move_task+0x18a/0x4f0
WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0
Call Trace:
<TASK>
? ctx_sched_out+0xb7/0x1b0
perf_cgroup_switch+0x88/0xc0
__schedule+0x4ae/0x9f0
? _raw_spin_unlock_irqrestore+0x23/0x40
? __cond_resched+0x18/0x20
preempt_schedule_common+0x2d/0x70
__cond_resched+0x18/0x20
wait_for_completion+0x2f/0x160
? cpu_stop_queue_work+0x9e/0x130
affine_move_task+0x18a/0x4f0
The above two warnings are not complete here since I remove other
unimportant information. The problem is caused by the perf cgroup
events tracking:
CPU0 CPU1
perf_event_open()
perf_event_alloc()
account_event()
account_event_cpu()
atomic_inc(perf_cgroup_events)
__perf_event_task_sched_out()
if (atomic_read(perf_cgroup_events))
perf_cgroup_switch()
// kernel/events/core.c:849
WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0)
if (READ_ONCE(cpuctx->cgrp) == cgrp) // false
return
perf_ctx_lock()
ctx_sched_out()
cpuctx->cgrp = cgrp
ctx_sched_in()
perf_cgroup_set_timestamp()
// kernel/events/core.c:829
WARN_ON_ONCE(!ctx->nr_cgroups)
perf_ctx_unlock()
perf_install_in_context()
cpu_function_call()
__perf_install_in_context()
add_event_to_ctx()
list_add_event()
perf_cgroup_event_enable()
ctx->nr_cgroups++
cpuctx->cgrp = X
We can see from above that we wrongly use percpu atomic perf_cgroup_events
to check if we need to perf_cgroup_switch(), which should only be used
when we know this CPU has cgroup events enabled.
The commit bd27568117 ("perf: Rewrite core context handling") change
to have only one context per-CPU, so we can just use cpuctx->cgrp to
check if this CPU has cgroup events enabled.
So percpu atomic perf_cgroup_events is not needed.
Fixes: bd27568117 ("perf: Rewrite core context handling")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com
inherit_event() returns NULL only when it finds orphaned events
otherwise it returns either valid child_event pointer or an error
pointer. Follow the same when it fails to find pmu_ctx.
Fixes: bd27568117 ("perf: Rewrite core context handling")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221118051539.820-1-ravi.bangoria@amd.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY6YkXgAKCRDbK58LschI
g25kAP4jYi+YomSlmGUzN/fUbEIHkXXyh85Yh2/yHGYdVuIuvwEA0uXeC7JHQTca
dkcyYvgY6zJwFBV0lAVnhTRzFirFkQk=
=THs1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 231 insertions(+), 3 deletions(-).
The main changes are:
1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to
misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case
from Martin Lau.
2) Fix BPF verifier's nullness propagation when registers are of
type PTR_TO_BTF_ID, from Hao Sun.
3) Fix bpftool build for JIT disassembler under statically built
libllvm, from Anton Protopopov.
4) Fix warnings reported by resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled, from Hou Tao.
5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up the sound code to not pass __GFP_COMP to the non-coherent DMA
allocator, as it copes with that just as badly as the coherent allocator,
and then add a check to make sure no one passes the flag ever again.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmOlvh8LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN/Ig//SHLj6xlAPHiBSMY44j0qQj4w63G4WivPs4B0Nr7j
Wne6yh0x8pg2L/BDY1KWoMqo8XhNIXVYKx6jahdqrik9IKJinIZPKT9nYkK5TbO+
RQSu4HKqAsrMO7u1+LzVyhNZqOtavRXj4EJkkienHDTh7Rqry5o8hBAe40G2BLqe
TMrnPLAJ0VZ/I45LakzCSCR2WC8GA+33849zJO1Vd4glnxL+VdM5WigYi1N2j6Ng
d4ymVbprnJljXjIDK2aMy1AGw5ctlQE1hslD2OoUr/onJ6PPU1rmYmf/B1fm8Cqh
ahxv2jJJsZ0BvwO7uouU6rsJfVGaZx7dMkABmX+A9b7RtP9mC19qc76VleYUouKB
v28SJD+jFUoQIO5ylly2LEvSgiEPspymv1VJtYdX/y6/YAtm2XFgmIEJdv6dGQu0
ZMpAs2nCfxzsGy4mQNpXEnBZwrP7GnEtm1ynBLQ1/SJ7toqBmX9/4JzZuRZ0TPEh
gw+Tkl3fC/jfLUDaLSjCnJw4OpH4+ai9tJuhXtPZk6UN5VPrQbz5xu9bh6wiyPqX
Pm6ubCDv0Y3I1rwoXwVIcrWfOMTAm8OwfMHCBRoRVp1vfuFSihptpWjIIx6rQCrI
sdZtVla+FblL9zO5WvGStlx9FiZa39HJm2d6rQIazI5of6iNf9/xSyoWNcNWSrcu
EFY=
=q0m6
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"Fix up the sound code to not pass __GFP_COMP to the non-coherent DMA
allocator, as it copes with that just as badly as the coherent
allocator, and then add a check to make sure no one passes the flag
ever again"
* tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: reject GFP_COMP for noncoherent allocations
ALSA: memalloc: don't use GFP_COMP for non-coherent dma allocations
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY6TcIgAKCRDdBJ7gKXxA
ji/zAQDucpSw+HKksgDpO385EAdF4gQgYDi06zu/vjpF7Hd4KAEAoIX1ygHqHy3u
z9xuulA9q84COV48Is9cU7eiijd0aQo=
=QlXq
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-12-22-14-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"Eight fixes, all cc:stable. One is for gcov and the remainder are MM"
* tag 'mm-hotfixes-stable-2022-12-22-14-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
gcov: add support for checksum field
test_maple_tree: add test for mas_spanning_rebalance() on insufficient data
maple_tree: fix mas_spanning_rebalance() on insufficient data
hugetlb: really allocate vma lock for all sharable vmas
kmsan: export kmsan_handle_urb
kmsan: include linux/vmalloc.h
mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
mm, mremap: fix mremap() expanding vma with addr inside vma
When CFI_CLANG and KASAN are both enabled, LLVM doesn't generate a
CFI type hash for asan.module_ctor functions in translation units
where CFI is disabled, which leads to a CFI failure during boot when
do_ctors calls the affected constructors:
CFI failure at do_basic_setup+0x64/0x90 (target:
asan.module_ctor+0x0/0x28; expected type: 0xa540670c)
Specifically, this happens because CFI is disabled for
kernel/cfi.c. There's no reason to keep CFI disabled here anymore, so
fix the failure by not filtering out CC_FLAGS_CFI for the file.
Note that https://reviews.llvm.org/rG3b14862f0a96 fixed the issue
where LLVM didn't emit CFI type hashes for any sanitizer constructors,
but now type hashes are emitted correctly for TUs that use CFI.
Link: https://github.com/ClangBuiltLinux/linux/issues/1742
Fixes: 8924560094 ("cfi: Switch to -fsanitize=kcfi")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222225747.3538676-1-samitolvanen@google.com
After befae75856, the verifier would propagate null information after
JEQ/JNE, e.g., if two pointers, one is maybe_null and the other is not,
the former would be marked as non-null in eq path. However, as comment
"PTR_TO_BTF_ID points to a kernel struct that does not need to be null
checked by the BPF program ... The verifier must keep this in mind and
can make no assumptions about null or non-null when doing branch ...".
If one pointer is maybe_null and the other is PTR_TO_BTF, the former is
incorrectly marked non-null. The following BPF prog can trigger a
null-ptr-deref, also see this report for more details[1]:
0: (18) r1 = map_fd ; R1_w=map_ptr(ks=4, vs=4)
2: (79) r6 = *(u64 *)(r1 +8) ; R6_w=bpf_map->inner_map_data
; R6 is PTR_TO_BTF_ID
; equals to null at runtime
3: (bf) r2 = r10
4: (07) r2 += -4
5: (62) *(u32 *)(r2 +0) = 0
6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null
7: (1d) if r6 == r0 goto pc+1
8: (95) exit
; from 7 to 9: R0=map_value R6=ptr_bpf_map
9: (61) r0 = *(u32 *)(r0 +0) ; null-ptr-deref
10: (95) exit
So, make the verifier propagate nullness information for reg to reg
comparisons only if neither reg is PTR_TO_BTF_ID.
[1] https://lore.kernel.org/bpf/CACkBjsaFJwjC5oiw-1KXvcazywodwXo4zGYsRHwbr2gSG9WcSw@mail.gmail.com/T/#u
Fixes: befae75856 ("bpf: propagate nullness information for reg to reg comparisons")
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221222024414.29539-1-sunhao.th@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
- Make monitor structures read only
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY6J+vxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qohJAP9Yx3A4xmopkMjpfK1HBzuB7j4U7blN
2NhqKM626unbeQEAi3FhPRc5N/sGBdsUClYZIKau0p3ip1TVfYbhk8vSgwg=
=VcGm
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"I missed this minor hardening of the kernel in the first pull.
- Make monitor structures read only"
* tag 'trace-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv/monitors: Move monitor structure in rodata
- New "symstr" type for dynamic events that writes the name of the
function+offset into the ring buffer and not just the address
- Prevent kernel symbol processing on addresses in user space probes
(uprobes).
- And minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5yAHxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qoWoAP9ZLmqgIqlH3Zcms31SR250kLXxsxT3
JHe82hiuI1I3fAD/Z93QLHw9wngLqIMx/wXsdFjTNOGGWdxfclSWI2qI6Q0=
=KaJg
-----END PGP SIGNATURE-----
Merge tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace probes updates from Steven Rostedt:
- New "symstr" type for dynamic events that writes the name of the
function+offset into the ring buffer and not just the address
- Prevent kernel symbol processing on addresses in user space probes
(uprobes).
- And minor fixes and clean ups
* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Reject symbol/symstr type for uprobe
tracing/probes: Add symstr type for dynamic events
kprobes: kretprobe events missing on 2-core KVM guest
kprobes: Fix check for probe enabled in kill_kprobe()
test_kprobes: Fix implicit declaration error of test_kprobes
tracing: Fix race where eprobes can be called before the event
In GCC version 12.1 a checksum field was added.
This patch fixes a kernel crash occurring during boot when using
gcov-kernel with GCC version 12.2. The crash occurred on a system running
on i.MX6SX.
Link: https://lkml.kernel.org/r/20221220102318.3418501-1-rickaran@axis.com
Fixes: 977ef30a7d ("gcov: support GCC 12.1 and newer compilers")
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Martin Liska <mliska@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit fixes a lockdep false positive in synchronize_rcu() that
can otherwise occur during early boot. Theis fix simply avoids invoking
lockdep if the scheduler has not yet been initialized, that is, during
that portion of boot when interrupts are disabled.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOeXj8THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jPmZEACaI5JqO6Dr2U4HojJJBYEfLVaSYxDp
JrUi5D5WzzZidyjM2fyyZZkdRVQ24i1aV2H/fbLoIIH/smYjE/KLEFHQmclpphw5
BSOyapotjdt5YhIavvAeOjdUd7jPyMqhbDVnwzjnblhUD1ObLVlhIs8Pjn7/03sF
gzlIhYgp3EL7GenT9j9kud2FwWP+wrVQ7SdJ+Ni/WAHYO8860xQAmFXH/07bYzx7
fbp5iPkCOSSUoRMw/qQ8s7CE3XhBNKufv1BtcvV/uxEtutfV1qvEQBv/l2RBd0Vg
wOVBZnWXze+7IUx13M90R/d04Nn7RaGwon6xBMlvIwL3qzEj8x/r1FYz7zZhQPkv
wwChAxFHQACnLCZSu48WBtVrawNdZHM57KHUK4rloAbrK92FpVznhQU+5pBDy4c6
rfY2my+SNO4kWvePEg/2fd8aQycrZr99fK/ojCIerEn8MNboxuVOYTjzy0qtUcVT
yJ/80O8ADI3QL/NRhjMFWgEnBDbHN1PcGhiRoutApdLQkg/UPTJjCRZ7ibmIFYY2
ViW3cSndr/f0I7sOex2EILHwiZ2bUKiwyeTW6vWuFl/7MEWsvpJaWoUxXgQj99Bt
ncAOaxtmmuhbwrOCt2kab90A0c/thNx9kNYYIkG3vUNcSRzyHQtg3ydEljBpaTFR
OzhrqdUA7W9Sfg==
=UKUo
-----END PGP SIGNATURE-----
Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This fixes a lockdep false positive in synchronize_rcu() that can
otherwise occur during early boot.
The fix simply avoids invoking lockdep if the scheduler has not yet
been initialized, that is, during that portion of boot when interrupts
are disabled"
* tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Don't assert interrupts enabled too early in boot
While not quite as bogus as for the dma-coherent allocations that were
fixed earlier, GFP_COMP for these allocations has no benefits for
the dma-direct case, and can't be supported at all by dma dma-iommu
backend which splits up allocations into smaller orders. Due to an
oversight in ffcb754584 that flag stopped being cleared for all
dma allocations, but only got rejected for coherent ones, so fix up
these callers to not allow __GFP_COMP as well after the sound code
has been fixed to not ask for it.
Fixes: ffcb754584 ("dma-mapping: reject __GFP_COMP in dma_alloc_attrs")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
It makes sense to move the important monitor structure into rodata to
prevent accidental structure modification.
Link: https://lkml.kernel.org/r/20221122173648.4732-1-acarmina@redhat.com
Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There are warnings reported from resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled:
WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_free_security
WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_alloc_security
So only define BTF IDs for these LSM hooks when CONFIG_SECURITY_NETWORK
is enabled.
Fixes: c0c852dd18 ("bpf: Do not mark certain LSM hook arguments as trusted")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221217062144.2507222-1-houtao@huaweicloud.com
- Support zstd-compressed debug info
- Allow W=1 builds to detect objects shared among multiple modules
- Add srcrpm-pkg target to generate a source RPM package
- Make the -s option detection work for future GNU Make versions
- Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y
- Allow W=1 builds to detect -Wundef warnings in any preprocessed files
- Raise the minimum supported version of binutils to 2.25
- Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used
- Use $(file ...) to read a file if GNU Make >= 4.2 is used
- Print error if GNU Make older than 3.82 is used
- Allow modpost to detect section mismatches with Clang LTO
- Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmOeImsVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG06IP/iVjuWFvnjDZT4X8X6zN8aKp1vtR
EMkmoRtt5cD4CLb1MG4N7irYHgedQSx4rYceP45MyW1I3egl6Ct14RDyeQ1xSIZb
XFTLDCZvfl/up3MdiqNAqKRS7x5lk9++7F0t+2SoQxKQyJvm735XreX+VhZ1FeLB
qcHrmzJ5veky5Ry/3OkNUgKFBjKEAL+qKMc55uvkXqfTb3KoBa2r4VC1OaoYGRru
R8oF9qQRnGVQAl/LbBVchmgSjxryxPrCvBGiKlK03VkXdzEMHMimEJh3BQ6e0PGo
gajdk+4liy7z+jQnI7jFhvJjGKzkEP/Bc99M/uS92QX5MgpH6mqpHMoqqPiqW87K
RmZH37FqRu1Vo8dpibmH6r2K6YD/HHRjaDHk1VuuCQYEn0dsNmokPXOqd/1v0I1i
TXPjWOw1AID5vMJWllqxFhpeVvf0vx5BT/UNrh68MLqlJZzv2eMVJb4fNy6640ml
U0NclMnOa3eOmf5z1T7/LqDRTa63Q0kpanRrBpcmVOaqW+ZpQ3SQjh4uBN1PyJHL
cX3Skc341DyRlFiT54QhGKlm57MEb2gjhBZ3Z4J+b7sEFgvjXH/W8vcOGIKlppmA
CfYMyres4OV+fJc89ONkWsvLiOP1OeUGPvytm33J5QMKXc8SzOLP0D/F8kjrDflm
EROKuZ4EA5ej/rOy
=Ig/Y
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support zstd-compressed debug info
- Allow W=1 builds to detect objects shared among multiple modules
- Add srcrpm-pkg target to generate a source RPM package
- Make the -s option detection work for future GNU Make versions
- Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y
- Allow W=1 builds to detect -Wundef warnings in any preprocessed files
- Raise the minimum supported version of binutils to 2.25
- Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used
- Use $(file ...) to read a file if GNU Make >= 4.2 is used
- Print error if GNU Make older than 3.82 is used
- Allow modpost to detect section mismatches with Clang LTO
- Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y
* tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
buildtar: fix tarballs with EFI_ZBOOT enabled
modpost: Include '.text.*' in TEXT_SECTIONS
padata: Mark padata_work_init() as __ref
kbuild: ensure Make >= 3.82 is used
kbuild: refactor the prerequisites of the modpost rule
kbuild: change module.order to list *.o instead of *.ko
kbuild: use .NOTINTERMEDIATE for future GNU Make versions
kconfig: refactor Makefile to reduce process forks
kbuild: add read-file macro
kbuild: do not sort after reading modules.order
kbuild: add test-{ge,gt,le,lt} macros
Documentation: raise minimum supported version of binutils to 2.25
kbuild: add -Wundef to KBUILD_CPPFLAGS for W=1 builds
kbuild: move -Werror from KBUILD_CFLAGS to KBUILD_CPPFLAGS
kbuild: Port silent mode detection to future gnu make.
init/version.c: remove #include <generated/utsrelease.h>
firmware_loader: remove #include <generated/utsrelease.h>
modpost: Mark uuid_le type to be suitable only for MEI
kbuild: add ability to make source rpm buildable using koji
kbuild: warn objects shared among multiple modules
...
- Add powerpc qspinlock implementation optimised for large system scalability and
paravirt. See the merge message for more details.
- Enable objtool to be built on powerpc to generate mcount locations.
- Use a temporary mm for code patching with the Radix MMU, so the writable mapping is
restricted to the patching CPU.
- Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI.
- Sanitise user registers on interrupt entry on 64-bit Book3S.
- Many other small features and fixes.
Thanks to: Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen
Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin
Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven,
Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain,
Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao,
Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure,
Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang
Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, Wolfram Sang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOfrj8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIWtD/9mGF/ze2k+qFTo+30fb7bO8WJIDgsR
dIASnZjXV7q/45elvymhUdkQv4R7xL3pzC40P1+ZKtWzGTNe+zWUQLoALNwRK85j
8CsxZbqefGNKE5Z6ZHo9s37wsu3+jJu9yEQpGFo1LINyzeclCn5St5oqfRam+Hd/
cPF+VfvREwZ0+YOKGBhJ2EgC+Gc9xsFY7DLQsoYlu71iZZr6Z6rgZW/EY5h3RMGS
YKBoVwDsWaU0FpFWrr/rYTI6DqSr3AHr1+ftDg7ncCZMD6vQva6aMCCt94aLB1aE
vC+DNdhZlA558bXGa5yA7Wr//7aUBUIwyC60DogOeZ6vw3kD9tdEd1fbH5hmqNKY
K5bfqm28XU2959CTE8RDgsYYZvwDcfrjBIML14WZGdCQOTcGKpgOGp22o6yNb1Pq
JKpHHnVpvu2PZ/p2XdKSm9+etr2yI6lXZAEVTS7ehdtMukButjSHEVbSCEZ8tlWz
KokQt2J23BMHuSrXK6+67wWQBtdsLEk+LBOQmweiwarMocqvL/Zjz/5J7DR2DtH8
wlY3wOtB1+E5j7xZ+RgK3c3jNg5dH39ZwvFsSATWTI3P+iq6OK/bbk4q4LmZt2l9
ZIfH/CXPf9BvGCHzHa3AAd3UBbJLFwj17btMEv1wFVPS0T4LPUzkgTNTNUYeP6zL
h1e5QfgUxvKPuQ==
=7k3p
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add powerpc qspinlock implementation optimised for large system
scalability and paravirt. See the merge message for more details
- Enable objtool to be built on powerpc to generate mcount locations
- Use a temporary mm for code patching with the Radix MMU, so the
writable mapping is restricted to the patching CPU
- Add an option to build the 64-bit big-endian kernel with the ELFv2
ABI
- Sanitise user registers on interrupt entry on 64-bit Book3S
- Many other small features and fixes
Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.
* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
powerpc/code-patching: Fix oops with DEBUG_VM enabled
powerpc/qspinlock: Fix 32-bit build
powerpc/prom: Fix 32-bit build
powerpc/rtas: mandate RTAS syscall filtering
powerpc/rtas: define pr_fmt and convert printk call sites
powerpc/rtas: clean up includes
powerpc/rtas: clean up rtas_error_log_max initialization
powerpc/pseries/eeh: use correct API for error log size
powerpc/rtas: avoid scheduling in rtas_os_term()
powerpc/rtas: avoid device tree lookups in rtas_os_term()
powerpc/rtasd: use correct OF API for event scan rate
powerpc/rtas: document rtas_call()
powerpc/pseries: unregister VPA when hot unplugging a CPU
powerpc/pseries: reset the RCU watchdogs after a LPM
powerpc: Take in account addition CPU node when building kexec FDT
powerpc: export the CPU node count
powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
powerpc/dts/fsl: Fix pca954x i2c-mux node names
cxl: Remove unnecessary cxl_pci_window_alignment()
selftests/powerpc: Fix resource leaks
...
The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
that interrupts are enabled, as they normally should be when waiting for
an RCU grace period. Except that it is legal to wait for grace periods
during early boot, before interrupts have been enabled for the first time,
and polling for grace periods is required to work during this time.
This can result in false-positive lockdep splats in the presence of
boot-time-initiated tracing.
This commit therefore conditions those interrupts-enabled checks on
rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
which time interrupts have been enabled.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
* Randomize the per-cpu entry areas
Cleanups:
* Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
coding it
* Move to "native" set_memory_rox() helper
* Clean up pmd_get_atomic() and i386-PAE
* Remove some unused page table size macros
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
+PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
=wwVF
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Dave Hansen:
"New Feature:
- Randomize the per-cpu entry areas
Cleanups:
- Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it
- Move to "native" set_memory_rox() helper
- Clean up pmd_get_atomic() and i386-PAE
- Remove some unused page table size macros"
* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
x86/mm: Ensure forced page table splitting
x86/kasan: Populate shadow for shared chunk of the CPU entry area
x86/kasan: Add helpers to align shadow addresses up and down
x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
x86/mm: Recompute physical address for every page of per-CPU CEA mapping
x86/mm: Rename __change_page_attr_set_clr(.checkalias)
x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
x86/mm: Add a few comments
x86/mm: Fix CR3_ADDR_MASK
x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
mm: Convert __HAVE_ARCH_P..P_GET to the new style
mm: Remove pointless barrier() after pmdp_get_lockless()
x86/mm/pae: Get rid of set_64bit()
x86_64: Remove pointless set_64bit() usage
x86/mm/pae: Be consistent with pXXp_get_and_clear()
x86/mm/pae: Use WRITE_ONCE()
x86/mm/pae: Don't (ab)use atomic64
mm/gup: Fix the lockless PMD access
...
- Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when the architecture
does not make use of irq domains instead of returning 0, which is pretty
limiting.
- Check for the presence of an irq domain when validating the MSI iterator,
as s390/powerpc won't have one.
- Fix powerpc's MSI backends which fail to clear the descriptor's IRQ field
on teardown, leading to a splat and leaked descriptors.
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOdpC8PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDqHEP+wYuEti5Pib6PIxDgmm9eT7uvCPvOw31LojJ
n+MMbOht5qx0vQ/9w/e5N5gwkao0i870s3vCA0vDh9JurHTS/NDo/PWsf2eyVY/0
OOVEYFJS87wewXENK3mLabP1AD5+lv8RWe7HKEhrwl741K5NRs4xpKyOxOlgo4CO
LXDG/kXN25tQ4HTKgbPXxU8P16PqsJI3H/2NKgZW0ntggldhhgO+Lb+TDgloAPo9
dk9gTG5EEhJZu1em3gDAuX71Tjyr8/OcyNa5WEec78lBqlgyt9gMqEffhoBigM7d
tVLsi667J94qdCJtG7Zeeo3996HQ4YqdqmO2csPzJq+d0TCKrTTwyvkyAmSJ51nV
pUywGkhLRYsvA4PX/ZFcrT/GfJLIGhXmzqV5fWpAWfoDPAw/s3PfrTzugQ6cPpYE
Ox8pcfo6xEhVPfCzeIlYShPEz746Kyje8vUuHbXKjekZolW0FE7qOQaVxGEY7qal
IvI6MotDjqbJV0ancGgZPIU2r7zYmx8fXiEnqMJEg3xNf93O867u4GqnvoqKoQAP
hU7lPFKyX9UYkZUyTR8Vp6v+tAObXbzkR+22nf0ktFT8toi7Ujq27r5zwdfr2Jjs
Rg1X5wSigy7RVDVjVc8oXlYhNMrQXLWQUhh/OwtUBI4mhCEpOL2EWr5O2A3dRbXT
ZL+syvmh
=yuHx
-----END PGP SIGNATURE-----
Merge tag 'msi-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms
Pull MSI fixes from Marc Zyngier:
"Thomas tasked me with sending out a few urgent fixes after the giant
MSI rework that landed in 6.2, as both s390 and powerpc ended-up
suffering from it (they do not use the full core code infrastructure,
leading to these previously undetected issues):
- Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when the
architecture does not make use of irq domains instead of returning
0, which is pretty limiting.
- Check for the presence of an irq domain when validating the MSI
iterator, as s390/powerpc won't have one.
- Fix powerpc's MSI backends which fail to clear the descriptor's IRQ
field on teardown, leading to a splat and leaked descriptors"
* tag 'msi-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms:
powerpc/msi: Fix deassociation of MSI descriptors
genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
Use a temporary variable to take full advantage of READ_ONCE() behavior.
Without this, the report (and even the test) might be out of sync with
the initial test.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net
Fixes: 9fc9e278a5 ("panic: Introduce warn_limit")
Fixes: d4ccd54d28 ("exit: Put an upper limit on how often we can oops")
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Kees Cook <keescook@chromium.org>
On architectures such as s390 that do not use irq domains for MSI,
returning 0 as the maximum MSI index is a bit counter-productive,
as it indicates that no MSI can be allocated. Bad idea.
Instead, return the maximum we're willing to support in the MSI
backing store (MSI_XA_DOMAIN_SIZE), and let the arch code do its
usual thing.
Thanks to Matthew Rosato for fixing the fix.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87fsdgzpqs.ffs@tglx
For architectures such as s390 and powerpc that do not use
irq domains for MSIs, dev->msi.domain is always NULL, so
the per-device, per-bus MSI domain is also guaranteed to
be NULL.
So checking one without checking the other is bound to result
in a splat, followed by a memory leak as we don't free the MSI
descriptors.
Add the missing check.
Reported-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/e570e70d-19bc-101b-0481-ff9a3cab3504@linux.ibm.com
Here is the set of driver core and kernfs changes for 6.2-rc1.
The "big" change in here is the addition of a new macro,
container_of_const() that will preserve the "const-ness" of a pointer
passed into it.
The "problem" of the current container_of() macro is that if you pass in
a "const *", out of it can comes a non-const pointer unless you
specifically ask for it. For many usages, we want to preserve the
"const" attribute by using the same call. For a specific example, this
series changes the kobj_to_dev() macro to use it, allowing it to be used
no matter what the const value is. This prevents every subsystem from
having to declare 2 different individual macros (i.e.
kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
the const value at build time, which having 2 macros would not do
either.
The driver for all of this have been discussions with the Rust kernel
developers as to how to properly mark driver core, and kobject, objects
as being "non-mutable". The changes to the kobject and driver core in
this pull request are the result of that, as there are lots of paths
where kobjects and device pointers are not modified at all, so marking
them as "const" allows the compiler to enforce this.
So, a nice side affect of the Rust development effort has been already
to clean up the driver core code to be more obvious about object rules.
All of this has been bike-shedded in quite a lot of detail on lkml with
different names and implementations resulting in the tiny version we
have in here, much better than my original proposal. Lots of subsystem
maintainers have acked the changes as well.
Other than this change, included in here are smaller stuff like:
- kernfs fixes and updates to handle lock contention better
- vmlinux.lds.h fixes and updates
- sysfs and debugfs documentation updates
- device property updates
All of these have been in the linux-next tree for quite a while with no
problems, OTHER than some merge issues with other trees that should be
obvious when you hit them (block tree deletes a driver that this tree
modifies, iommufd tree modifies code that this tree also touches). If
there are merge problems with these trees, please let me know.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY5wz3A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yks0ACeKYUlVgCsER8eYW+x18szFa2QTXgAn2h/VhZe
1Fp53boFaQkGBjl8mGF8
=v+FB
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of driver core and kernfs changes for 6.2-rc1.
The "big" change in here is the addition of a new macro,
container_of_const() that will preserve the "const-ness" of a pointer
passed into it.
The "problem" of the current container_of() macro is that if you pass
in a "const *", out of it can comes a non-const pointer unless you
specifically ask for it. For many usages, we want to preserve the
"const" attribute by using the same call. For a specific example, this
series changes the kobj_to_dev() macro to use it, allowing it to be
used no matter what the const value is. This prevents every subsystem
from having to declare 2 different individual macros (i.e.
kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
the const value at build time, which having 2 macros would not do
either.
The driver for all of this have been discussions with the Rust kernel
developers as to how to properly mark driver core, and kobject,
objects as being "non-mutable". The changes to the kobject and driver
core in this pull request are the result of that, as there are lots of
paths where kobjects and device pointers are not modified at all, so
marking them as "const" allows the compiler to enforce this.
So, a nice side affect of the Rust development effort has been already
to clean up the driver core code to be more obvious about object
rules.
All of this has been bike-shedded in quite a lot of detail on lkml
with different names and implementations resulting in the tiny version
we have in here, much better than my original proposal. Lots of
subsystem maintainers have acked the changes as well.
Other than this change, included in here are smaller stuff like:
- kernfs fixes and updates to handle lock contention better
- vmlinux.lds.h fixes and updates
- sysfs and debugfs documentation updates
- device property updates
All of these have been in the linux-next tree for quite a while with
no problems"
* tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits)
device property: Fix documentation for fwnode_get_next_parent()
firmware_loader: fix up to_fw_sysfs() to preserve const
usb.h: take advantage of container_of_const()
device.h: move kobj_to_dev() to use container_of_const()
container_of: add container_of_const() that preserves const-ness of the pointer
driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion.
driver core: fix up missed scsi/cxlflash class.devnode() conversion.
driver core: fix up some missing class.devnode() conversions.
driver core: make struct class.devnode() take a const *
driver core: make struct class.dev_uevent() take a const *
cacheinfo: Remove of_node_put() for fw_token
device property: Add a blank line in Kconfig of tests
device property: Rename goto label to be more precise
device property: Move PROPERTY_ENTRY_BOOL() a bit down
device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*()
kernfs: fix all kernel-doc warnings and multiple typos
driver core: pass a const * into of_device_uevent()
kobject: kset_uevent_ops: make name() callback take a const *
kobject: kset_uevent_ops: make filter() callback take a const *
kobject: make kobject_namespace take a const *
...
- Add options to the osnoise tracer
o panic_on_stop option that panics the kernel if osnoise is greater than some
user defined threshold.
o preempt option, to test noise while preemption is disabled
o irq option, to test noise when interrupts are disabled
- Add .percent and .graph suffix to histograms to give different outputs
- Add nohitcount to disable showing hitcount in histogram output
- Add new __cpumask() to trace event fields to annotate that a unsigned long
array is a cpumask to user space and should be treated as one.
- Add trace_trigger kernel command line parameter to enable trace event
triggers at boot up. Useful to trace stack traces, disable tracing and take
snapshots.
- Fix x86/kmmio mmio tracer to work with the updates to lockdep
- Unify the panic and die notifiers
- Add back ftrace_expect reference that is used to extract more information in
the ftrace_bug() code.
- Have trigger filter parsing errors show up in the tracing error log.
- Updated MAINTAINERS file to add kernel tracing mailing list and patchwork
info
- Use IDA to keep track of event type numbers.
- And minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5vIcxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qlO7AQCmtZbriadAR6N7Llj092YXmYfzrxyi
1WS35vhpZsBJ8gEA8j68l+LrgNt51N2gXlTXEHgXzdBgL/TKAPSX4D99GQY=
=z1pe
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Add options to the osnoise tracer:
- 'panic_on_stop' option that panics the kernel if osnoise is
greater than some user defined threshold.
- 'preempt' option, to test noise while preemption is disabled
- 'irq' option, to test noise when interrupts are disabled
- Add .percent and .graph suffix to histograms to give different
outputs
- Add nohitcount to disable showing hitcount in histogram output
- Add new __cpumask() to trace event fields to annotate that a unsigned
long array is a cpumask to user space and should be treated as one.
- Add trace_trigger kernel command line parameter to enable trace event
triggers at boot up. Useful to trace stack traces, disable tracing
and take snapshots.
- Fix x86/kmmio mmio tracer to work with the updates to lockdep
- Unify the panic and die notifiers
- Add back ftrace_expect reference that is used to extract more
information in the ftrace_bug() code.
- Have trigger filter parsing errors show up in the tracing error log.
- Updated MAINTAINERS file to add kernel tracing mailing list and
patchwork info
- Use IDA to keep track of event type numbers.
- And minor fixes and clean ups
* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
tracing: Fix cpumask() example typo
tracing: Improve panic/die notifiers
ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
tracing: Do not synchronize freeing of trigger filter on boot up
tracing: Remove pointer (asterisk) and brackets from cpumask_t field
tracing: Have trigger filter parsing errors show up in error_log
x86/mm/kmmio: Remove redundant preempt_disable()
tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
Documentation/osnoise: Add osnoise/options documentation
tracing/osnoise: Add preempt and/or irq disabled options
tracing/osnoise: Add PANIC_ON_STOP option
Documentation/osnoise: Escape underscore of NO_ prefix
tracing: Fix some checker warnings
tracing/osnoise: Make osnoise_options static
tracing: remove unnecessary trace_trigger ifdef
ring-buffer: Handle resize in early boot up
tracing/hist: Fix issue of losting command info in error_log
tracing: Fix issue of missing one synthetic field
tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
tracing/hist: Fix wrong return value in parse_action_params()
...
On architectures where the PTE/PMD is larger than the native word size
(i386-PAE for example), READ_ONCE() can do the wrong thing. Use
pmdp_get_lockless() just like we use ptep_get_lockless().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.906110403%40infradead.org