This reverts commit 1f86123b97 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee7
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes: 1f86123b97 ("net: align SO_RCVMARK required privileges with SO_MARK")
Cc: Larysa Zaremba <larysa.zaremba@intel.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick Rohr <prohr@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
struct mux_adth actually ends with multiple struct mux_adth_dg members.
This is seen both in the comments about the member:
/**
* struct mux_adth - Structure of the Aggregated Datagram Table Header.
...
* @dg: datagramm table with variable length
*/
and in the preparation for populating it:
adth_dg_size = offsetof(struct mux_adth, dg) +
ul_adb->dg_count[i] * sizeof(*dg);
...
adth_dg_size -= offsetof(struct mux_adth, dg);
memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
This was reported as a run-time false positive warning:
memcpy: detected field-spanning write (size 16) of single field "&adth->dg" at drivers/net/wwan/iosm/iosm_ipc_mux_codec.c:852 (size 8)
Adjust the struct mux_adth definition and associated sizeof() math; no binary
output differences are observed in the resulting object file.
Reported-by: Florian Klink <flokli@flokli.de>
Closes: https://lore.kernel.org/lkml/dbfa25f5-64c8-5574-4f5d-0151ba95d232@gmail.com/
Fixes: 1f52d7b622 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Cc: M Chetan Kumar <m.chetan.kumar@intel.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Intel Corporation <linuxwwan@intel.com>
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620194234.never.023-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Sparse warns about lock imbalance vs. the hrtimer_base lock due to missing
sparse annotations:
kernel/time/hrtimer.c:175:33: warning: context imbalance in 'lock_hrtimer_base' - wrong count at exit
kernel/time/hrtimer.c:1301:28: warning: context imbalance in 'hrtimer_start_range_ns' - unexpected unlock
kernel/time/hrtimer.c:1336:28: warning: context imbalance in 'hrtimer_try_to_cancel' - unexpected unlock
kernel/time/hrtimer.c:1457:9: warning: context imbalance in '__hrtimer_get_remaining' - unexpected unlock
Add the annotations to the relevant functions.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230621075928.394481-1-ben.dooks@codethink.co.uk
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes: 35c31d5c32 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
Fixes: 239e754af8 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Properly check for RX_DROP_UNUSABLE now that the new drop reason
infrastructure is used. Without this change, the comparison will always
be false as a more specific reason is given in the lower bits of result.
Fixes: baa951a1c1 ("mac80211: use the new drop reasons infrastructure")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230621120543.412920-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: fixes for 6.4
Patch 1 correctly handles disconnect() failures that can happen in some
specific cases: now the socket state is set as unconnected as expected.
That fixes an issue introduced in v6.2.
Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
flag. It should address an issue present in MPTCP from almost the
beginning, from v5.9.
Patch 3 fixes a possible list corruption on passive MPJ even if the race
seems very unlikely, better be safe than sorry. The possible issue is
present from v5.17.
Patch 4 consolidates fallback and non fallback state machines to avoid
leaking some MPTCP sockets. The fix is likely needed for versions from
v5.11.
Patch 5 drops code that is no longer used after the introduction of
patch 4/6. This is not really a fix but this patch can probably land in
the -net tree as well not to leave unused code.
Patch 6 ensures listeners are unhashed before updating their sk status
to avoid possible deadlocks when diag info are going to be retrieved
with a lock. Even if it should not be visible with the way we are
currently getting diag info, the issue is present from v5.17.
====================
Link: https://lore.kernel.org/r/20230620-upstream-net-20230620-misc-fixes-for-v6-4-v1-0-f36aa5eae8b9@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MPTCP protocol access the listener subflow in a lockless
manner in a couple of places (poll, diag). That works only if
the msk itself leaves the listener status only after that the
subflow itself has been closed/disconnected. Otherwise we risk
deadlock in diag, as reported by Christoph.
Address the issue ensuring that the first subflow (the listener
one) is always disconnected before updating the msk socket status.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407
Fixes: b29fcfb54c ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks to the previous patch -- "mptcp: consolidate fallback and non
fallback state machine" -- we can finally drop the "temporary hack"
used to detect rx eof.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An orphaned msk releases the used resources via the worker,
when the latter first see the msk in CLOSED status.
If the msk status transitions to TCP_CLOSE in the release callback
invoked by the worker's final release_sock(), such instance of the
workqueue will not take any action.
Additionally the MPTCP code prevents scheduling the worker once the
socket reaches the CLOSE status: such msk resources will be leaked.
The only code path that can trigger the above scenario is the
__mptcp_check_send_data_fin() in fallback mode.
Address the issue removing the special handling of fallback socket
in __mptcp_check_send_data_fin(), consolidating the state machine
for fallback and non fallback socket.
Since non-fallback sockets do not send and do not receive data_fin,
the mptcp code can update the msk internal status to match the next
step in the SM every time data fin (ack) should be generated or
received.
As a consequence we can remove a bunch of checks for fallback from
the fastpath.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At passive MPJ time, if the msk socket lock is held by the user,
the new subflow is appended to the msk->join_list under the msk
data lock.
In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in
that list are moved from the join_list into the conn_list under the
msk socket lock.
Append and removal could race, possibly corrupting such list.
Address the issue splicing the join list into a temporary one while
still under the msk data lock.
Found by code inspection, the race itself should be almost impossible
to trigger in practice.
Fixes: 3e5014909b ("mptcp: cleanup MPJ subflow list handling")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Fixes: 7d803344fd ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZJK5DwAKCRDbK58LschI
gyUtAQD4gT4BEVHRqvniw9yyqYo0BvElAznutDq7o9kFHFep2gEAoksEWS84OdZj
0L5mSKjXrpHKzmY/jlMrVIcTb3VzOw0=
=gAYE
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-21
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 7 files changed, 181 insertions(+), 15 deletions(-).
The main changes are:
1) Fix a verifier id tracking issue with scalars upon spill,
from Maxim Mikityanskiy.
2) Fix NULL dereference if an exception is generated while a BPF
subprogram is running, from Krister Johansen.
3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0,
from Florent Revest.
4) Fix expected_attach_type enforcement for kprobe_multi link,
from Jiri Olsa.
5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
selftests/bpf: add a test for subprogram extables
bpf: ensure main program has an extable
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
selftests/bpf: Add test cases to assert proper ID tracking on spill
bpf: Fix verifier id tracking of scalars on spill
====================
Link: https://lore.kernel.org/r/20230621101116.16122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For a long time the tick was aligned to clock MONOTONIC so that the tick
event happened at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable, was
adjusted with a boot offset, so that time 0 is closer to the point where
the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is initially
set up to a multiple of tick periods. That works as long as the
underlying tick device supports periodic mode, but breaks under certain
conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can get
in a range where the minimal programming delta of the underlying clock
event device is larger than the calculated delta to the next tick. This
results in a boot hang as the tick code tries to play catch up, but as
the tick never fires jiffies are not advanced so it keeps trying for
ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where applications
try to align their timers, the old behaviour of the tick happening at a
multiple of nanoseconds per tick starting from clock MONOTONIC = 0 is
restored.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSTUNQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQQjD/4+zHK7IxRK+1k1lIStxksEudVeNuqK
7RZKNTawZFSChrxGmv5uhmqYDK9/tudnZot+uyAXI81JHnnGSXbco+VrAA2DQ2v2
w/oxhAHIAobJWDzUuCqT2YsseDVNoTGMrTTPrL4klXqhVtShVOLTI/603+macEiw
Olqmd7XxlyhCloNqoMIh18EwSNIs78kP3YAQHUmq3NoRkJT5c3wWKvZy38WOxSJ9
Jjpw75tePfcwWte9fGVq+oJ8hOWkPAKN5hFL8420RrEgdIkxKlhVmuyG48ieZKH1
LE1brB5iaU8+M91PevZMRlD7oio8u/d6xX1vT3Cqz9UBe1NLj1I0ToX6COhwSNxm
50U0HJ5wGlpodu4IyBHXTG45UAVwpOg/EgDlFigeP3Nxl6h45ugrA010r6FqE1zV
KltjNGsk/OD5d8ywW+yManU2mySm9znqmWN2zbjRGm138Tuz7vglcBhFB6jEJFUF
V8vYPLwI4J2gqifx8NoFBIfLuZp/+D4LOQiJ7XRc9CVm3JLK8Xi8fxnHGTBwJHIn
45WU4hzmY7fjR/gGhAyG7lH6tlo7VTC9/lrO+YtAy2hL2sK14+m/ZH7836I5bAhw
EgSyVlcfR6tgO55I3JgkbNIyO5TUAZPZIAY1TlyR53xXvMkb9aZTh9UMm1xMJ3uX
+sQaLAuDFNUQSg==
=/uvx
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single regression fix for a regression fix:
For a long time the tick was aligned to clock MONOTONIC so that the
tick event happened at a multiple of nanoseconds per tick starting
from clock MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable,
was adjusted with a boot offset, so that time 0 is closer to the point
where the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is
initially set up to a multiple of tick periods. That works as long as
the underlying tick device supports periodic mode, but breaks under
certain conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can
get in a range where the minimal programming delta of the underlying
clock event device is larger than the calculated delta to the next
tick. This results in a boot hang as the tick code tries to play catch
up, but as the tick never fires jiffies are not advanced so it keeps
trying for ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where
applications try to align their timers, the old behaviour of the tick
happening at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0 is restored"
* tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/common: Align tick period during sched_timer setup
Fix the documentation of the devt_from_partuuid() return value.
Fix the following two recently introduced kernel-doc warnings:
block/bdev.c:570: warning: Function parameter or member 'hops' not described in 'bd_finish_claiming'
block/early-lookup.c:46: warning: Function parameter or member 'devt' not described in 'devt_from_partuuid'
Cc: Christoph Hellwig <hch@lst.de>
Fixes: 0718afd47f ("block: introduce holder ops")
Fixes: cf056a4312 ("init: improve the name_to_dev_t interface")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230621165054.743815-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are a few assignments to variable len where the value is not
being read and so the assignments are redundant and can be removed.
In one case, the variable len can be removed completely. Cleans up
4 clang scan warnings of the form:
fs/nfsd/export.c💯7: warning: Although the value stored to 'len'
is used in the enclosing expression, the value is never actually
read from 'len' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
A last minute revert to fix a regression.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmSSsb0PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpteMH/RRAiEYfWD+/DqFvxeBdFQsistVphMj7/3hh
9tKbVHXIJGl9SVodfjL/vvgk4QwZg3CgmKGDAWGpPO1SjFHCg7JgII7AE6Wu5Ff+
an4FguPA5Wc+7dmzsVH3yPg9N5Im9XtR5Y9avlhDmDEmLcKSzBEEtWrfQtKLLuyi
/dZZTEkzDgAP/wkrOEp90U9ION/bZ/hxFDbGWMr2XAg9ZR90GivCOleG63qPyLcq
k9asAF55tjNhNmqH6OpR5BcfFZhJli3IJa56Bypr//dJkzjQSrRhkz2UcwXNwu/b
GxAmBo/ObgnuSOUsDve9jZowGLa/MgocuCMcf05P9D3CtYsIqIs=
=gmrq
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute revert to fix a regression"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "virtio-blk: support completion batching for the IRQ path"
intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.
In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).
To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set. The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.
Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6). There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state. The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed). The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU. This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the intel_pstate driver is set to passive mode, then writing the
same value to the energy_performance_preference sysfs twice will fail.
This is caused by the wrong return value used (index of the matched
energy_perf_string), instead of the length of the passed in parameter.
Fix by forcing the internal return value to zero when the same
preference is passed in by user. This same issue is not present when
active mode is used for the driver.
Fixes: f6ebbcf08f ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Niklas Neronin <niklas.neronin@intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
One last fix for SPI, just a simple fix for incorrect handling of probe
deferral for DMA in the Qualcomm GENI driver.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS8esACgkQJNaLcl1U
h9Cm1Qf+PD5lKhoc4Grecs5EbrI/Fzg0y3y6aLP2dghc+XbGnNcCtBYExavRWpJa
AsmqgH+Q7lARZ6T7vrsVyyV3Pn6mLem4aNWo1omKTbmfOD4zf9IT3pnoMQSBo8VC
/7f5nlP+0aVtHXwhZeGP3biZKZ25c2tkDsFb5yKBKFSohSi23RJCErHLhnVZv8Ne
2OymxblxbcGV3Mu63EvZbnP0q8r0toK4VgAKn86onV2D/jrH/EjNZDTSXWLHhCkk
IptQ8vCv9suIF1aZApvBlOWqE8xs1dp/OVNuD2MtVFKWCm/9DS65QPXVxz5VUQVx
J0GF1m8uvLWdWfZwqS75orCpUYpuGQ==
=/Lry
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One last fix for SPI, just a simple fix for incorrect handling of
probe deferral for DMA in the Qualcomm GENI driver"
* tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS8qMACgkQJNaLcl1U
h9AuQwf7BY16m/8M/3aHCzkj6d/NzzDOmsvgLuelAp7WAhoCd/+7jrp1rgt+M/CS
hiZZlhIaQVhEOjBzfDIEAY5aTW7rpZB/PsA0cVOSffVASAFNMkfAKo2zLPQ+IN/V
kcFHYRDtctQnENZZ3a4L9gxU0NuKwjvdX1tBhDk7fGzZLc2S+J3G/ea9z5NUNj97
ycUwbmOyLMmqh+Yb+nUrMBwvMpm8HsjzS3Ln3Xlz4rZUCKIfzOfewjmLO3ZbqwQZ
bjeiFB4XoyfwacQBPgeR1vCNZRXPw21Th2CYE7p4nqTUdpqHFU5k43ILl1nrpcBE
jS9INrLBU88EWvkE6Mh67qIyFey/FQ==
=naq9
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver"
* tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
The earlier fix to take account of the register data size when limiting
raw register writes exposed the fact that the Intel AVMM bus was
incorrectly specifying too low a limit on the maximum data transfer, it
is only capable of transmitting one register so had set a transfer size
limit that couldn't fit both the value and the the register address into
a single message.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS8bEACgkQJNaLcl1U
h9B1zwf/emwrogeLq+eQ03Q9O4oE7OmbYmbTKKhZHtdwtUxZ4qsZ9aNekcgPw1Kv
riZHiL2EC7EKFAqTl1KiGsoJTWQONt32DFcc+27fW6IyXAFc4AMDf2JPnKMtZC83
0R+H0vC/FDVO6lzxNerPuC62ydaBr38XdimyCwYgLtzNWwsZUNh6leXgjFbF3sh7
MEi/SaLH9hcBr0suERnXh3DZUAT7d2kvwL1krSDHF7pvhQ+Z2pn96B1NVnaVVhM1
OlyfQLieRZz1Q2yF2LASI3I9n2IWeToR9i2iaWn2RlGiBzXpo5E5YpEFtH7umCH0
AjoI5Cf44xFXh5asJskRydgI7iKwPQ==
=yQ+K
-----END PGP SIGNATURE-----
Merge tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"One more fix for v6.4
The earlier fix to take account of the register data size when
limiting raw register writes exposed the fact that the Intel AVMM bus
was incorrectly specifying too low a limit on the maximum data
transfer, it is only capable of transmitting one register so had set a
transfer size limit that couldn't fit both the value and the the
register address into a single message"
* tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spi-avmm: Fix regmap_bus max_raw_write
Users are having more success with amd-pstate since the introduction
of EPP and Guided modes. To expose the driver to more users by default
introduce a kernel configuration option for setting the default mode.
Users can use an integer to map out which default mode they want to use
in lieu of a kernel command line option.
This will default to EPP, but only if:
1) The CPU supports an MSR.
2) The system profile is identified
3) The system profile is identified as a non-server by the FADT.
Link: https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/merge_requests/121
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Co-developed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a user's configuration doesn't explicitly specify the cpufreq
scaling governor then the code currently explicitly falls back to
'powersave'. This default is fine for notebooks and desktops, but
servers and undefined machines should default to 'performance'.
Look at the 'preferred_profile' field from the FADT to set this
policy accordingly.
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fixed-acpi-description-table-fadt
Acked-by: Huang Rui <ray.huang@amd.com>
Suggested-by: Wyes Karny <Wyes.Karny@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the event a new preferred PM profile value is introduced it's best for
code to be able to defensively guard against it so that the wrong settings
don't get applied on a new system that uses this profile but ancient
kernels.
Acked-by: Huang Rui <ray.huang@amd.com>
Suggested-by: Gautham Ranjal Shenoy <gautham.shenoy@amd.com>
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fixed-acpi-description-table-fadt
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When patching kernel alternatives, we need to be careful not to execute
kernel code which is itself subject to patching. In general, if code is
executed after the instructions in memory have been patched but prior to
the cache maintenance and barriers completing, it could lead to
UNPREDICTABLE results.
As our regular cache maintenance routines are patched with alternatives,
we have a clean_dcache_range_nopatch() function which is *intended* to
avoid patchable code and therefore supposed to be safe in the middle of
patching alternatives. Unfortunately, it's not marked as 'noinstr', and
so can be instrumented with patchable code.
Additionally, it calls read_sanitised_ftr_reg() (which may be
instrumented with patchable code) to find the sanitized value of
CTR_EL0.DminLine, and is therefore not safe to call during patching.
Luckily, since commit:
675b0563d6 ("arm64: cpufeature: expose arm64_ftr_reg struct for CTR_EL0")
... we can read the sanitised CTR_EL0 value directly, and avoid the call
to read_sanitised_ftr_reg().
This patch marks clean_dcache_range_nopatch() as noinstr, and has it
read the sanitized CTR_EL0 value directly, avoiding the issues above.
As a bonus, this is also an optimization. As read_sanitised_ftr_reg()
performs a binary search to find the CTR_EL0 value, reading the value
directly avoids this binary search per applied alternative, avoiding
some unnecessary work.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230616103150.1238132-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In case of real io scheduler, q->elevator is set, so blk_mq_run_hw_queue()
may just check if scheduler queue has request to dispatch, see
__blk_mq_sched_dispatch_requests(). Then IO hang may be caused because
all passthorugh requests may stay in sw queue.
And any passthrough request should have been inserted to hctx->dispatch
always.
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Fixes: d97217e7f0 ("blk-mq: don't queue plugged passthrough requests into scheduler")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230621132208.1142318-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that the driver core allows for struct class to be in read-only
memory, move the bsg_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-scsi@vger.kernel.org
Cc: linux-block@vger.kernel.org
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230620180129.645646-8-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that the driver core allows for struct class to be in read-only
memory, move the ublk_chr_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230620180129.645646-7-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that the driver core allows for struct class to be in read-only
memory, move the aoe_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Cc: Justin Sanders <justin@coraid.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230620180129.645646-6-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that the driver core allows for struct class to be in read-only
memory, making all 'class' structures to be declared at build time
placing them into read-only memory, instead of having to be dynamically
allocated at load time.
Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
Cc: Jack Wang <jinpu.wang@ionos.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20230620180129.645646-5-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
FMODE_EXEC has nothing to do with exclusive opens, and even is of
the wrong type. We need to check for BLK_OPEN_EXCL here.
Fixes: 985958b858 ("block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230621124914.185992-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A fix for a typoed iterator in the Intel Soundwire driver, fairly simple
on inspection though not reviewed by Intel.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS9eUACgkQJNaLcl1U
h9AUrgf8DzxOtaTaxzWGlVhoky6KxmYXXLANFOM1iDvGghoxEiYRCjn79El5+MuL
YfmgOzUX1ls1T9eZFzUqhAmavjfYbQw7bK90nuK2iNklxswhFbvwJdb0X854eFza
wLZEXhDnw9uyuCeenhC/JICxz5JvRnscmQ+rMy0SZQhKWeQ5QUdy2oa1o6dEpLd7
uaqZirgaaa2jWikj/6KwO7/sEaE7UznZfwdMbijwwwURXMty1+9Zo5V8+sctYbjI
oMHhiTbgAPbgfKN6k7MntGwWXBM7CXpAuzz6SOwdrOQmKaE5ODAAG1PnAWMkR2DQ
T0rlCos56Xrom2lseAQoX/97OKCJWA==
=dj3R
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-v6.4-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fix for v6.4
A fix for a typoed iterator in the Intel Soundwire driver, fairly simple
on inspection though not reviewed by Intel.
* irq/misc-6.5:
: .
: Misc cleanups:
:
: - Add a number of missing prototypes
: - Mark global symbol as static where needed
: - Drop some now useless non-DT code paths
: - Add a missing interrupt mapping to the STM32 irqchip
: - Silence another STM32 warning when building with W=1
: - Fix the jcore-aic driver that actually never worked...
: .
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
Signed-off-by: Marc Zyngier <maz@kernel.org>
This reverts commit 5b7e567620.
Although including linux/irqchip/mxs.h is technically correct,
this clashes with the parallel removal of this include file
with 32bit ARM modernizing the low level irq handling as part of
5bb578a0c1 ("ARM: 9298/1: Drop custom mdesc->handle_irq()").
As such, this patch is not only unnecessary, it also breaks
compilation in -next. Revert it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Shawn Guo <shawnguo@kernel.org>
During hibernation or restoration, freeze_secondary_cpus
checks num_online_cpus via BUG_ON, and the subsequent
save_processor_state also does the checking with WARN_ON.
In the case of CONFIG_PM_SLEEP_SMP=n, freeze_secondary_cpus
is not defined, but the sole possible condition to disable
CONFIG_PM_SLEEP_SMP is !SMP where num_online_cpus is always 1.
We also don't have to check it in save_processor_state.
So remove the unnecessary checking in save_processor_state.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230609075049.2651723-3-songshuaishuai@tinylab.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GV601V series.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230621085715.5382-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Fixes: ca74823c6e ("bpf: Add cookie support to programs attached with kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
Fixes: 1dc9285184 ("bpf: kernel side support for BTF Var and DataSec")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Yonghong Song <yhs@meta.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org