The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.
Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.
The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
contains both BITs for the L3/L4 types, and combinations to be used by
drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
to BPF via BTF, and it is up to the BPF-programmer to match using these
defines.
This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
a pointer value argument for provide the RSS hash type.
Change signature for all xmo_rx_hash calls in drivers to make it compile.
The RSS type implementations for each driver comes as separate patches.
Fixes: 3d76a4d3d4 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.
Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.
Add counters for providing remaining information about failure and
skipped packet events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I got really badly confused in d443d93864 ("fbcon: move more common
code into fb_open()") because we set the con2fb_map before the failure
points, which didn't look good.
But in trying to fix that I moved the assignment into the wrong path -
we need to do it for _all_ vc we take over, not just the first one
(which additionally requires the call to con2fb_acquire_newinfo).
I've figured this out because of a KASAN bug report, where the
fbcon_registered_fb and fbcon_display arrays went out of sync in
fbcon_mode_deleted() because the con2fb_map pointed at the old
fb_info, but the modes and everything was updated for the new one.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: d443d93864 ("fbcon: move more common code into fb_open()")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.19+
This is a regressoin introduced in b07db39584 ("fbcon: Ditch error
handling for con2fb_release_oldinfo"). I failed to realize what the if
(!err) checks. The mentioned commit was dropping the
con2fb_release_oldinfo() return value but the if (!err) was also
checking whether the con2fb_acquire_newinfo() function call above
failed or not.
Fix this with an early return statement.
Note that there's still a difference compared to the orginal state of
the code, the below lines are now also skipped on error:
if (!search_fb_in_map(info_idx))
info_idx = newidx;
These are only needed when we've actually thrown out an old fb_info
from the console mappings, which only happens later on.
Also move the fbcon_add_cursor_work() call into the same if block,
it's all protected by console_lock so doesn't matter when we set up
the blinking cursor delayed work anyway. This further simplifies the
control flow and allows us to ditch the found local variable.
v2: Clarify commit message (Javier)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: b07db39584 ("fbcon: Ditch error handling for con2fb_release_oldinfo")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.19+
Commit 1effe8ca4e ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.
to->pp_recycle --> false
from->pp_recycle --> true
skb_cloned(from) --> true
However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.
The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1 pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 6.2.0+
Call Trace:
<IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20
Fixes: 53e0961da1 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230413090353.14448-1-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The selftest sctp_vrf needs CONFIG_IP_SCTP set in config
when building the kernel, so add it.
Fixes: a61bd7b9fe ("selftests: add a selftest for sctp vrf")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/61dddebc4d2dd98fe7fb145e24d4b2430e42b572.1681312386.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lena wang reported an issue caused by udpv6_sendmsg()
mangling msg->msg_name and msg->msg_namelen, which
are later read from ____sys_sendmsg() :
/*
* If this is sendmmsg() and sending to current destination address was
* successful, remember it.
*/
if (used_address && err >= 0) {
used_address->name_len = msg_sys->msg_namelen;
if (msg_sys->msg_name)
memcpy(&used_address->name, msg_sys->msg_name,
used_address->name_len);
}
udpv6_sendmsg() wants to pretend the remote address family
is AF_INET in order to call udp_sendmsg().
A fix would be to modify the address in-place, instead
of using a local variable, but this could have other side effects.
Instead, restore initial values before we return from udpv6_sendmsg().
Fixes: c71d8ebe7a ("net: Fix security_socket_sendmsg() bypass problem.")
Reported-by: lena wang <lena.wang@mediatek.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230412130308.1202254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netlink message for creating a new datapath takes an array
of ports for the PID creation. This shouldn't cause much issue
but correct it for future cases where we need to do decode of
datapath information that could include the per-cpu PID map.
Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: more fixes for 6.3
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
====================
Link: https://lore.kernel.org/r/20230411-upstream-net-20230411-mptcp-fixes-v1-0-ca540f3ef986@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simply adding a "sleep" before checking something is usually not a good
idea because the time that has been picked can not be enough or too
much. The best is to wait for events with a timeout.
In this selftest, 'sleep 0.5' is used more than 40 times. It is always
used before calling a 'verify_*' function except for this
verify_listener_events which has been added later.
At the end, using all these 'sleep 0.5' seems to work: the slow CIs
don't complain so far. Also because it doesn't take too much time, we
can just add two more 'sleep 0.5' to uniform what is done before calling
a 'verify_*' function. For the same reasons, we can also delay a bigger
refactoring to replace all these 'sleep 0.5' by functions waiting for
events instead of waiting for a fix time and hope for the best.
Fixes: 6c73008aa3 ("selftests: mptcp: listener test for userspace PM")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of early fallback to TCP, subflow_syn_recv_sock() deletes
the subflow context before returning the newly allocated sock to
the caller.
The fastopen path does not cope with the above unconditionally
dereferencing the subflow context.
Fixes: 36b122baf6 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Beyond reducing code duplication this also avoids scheduling
the mptcp_worker on a closed socket on some edge scenarios.
The addressed issue is actually older than the blamed commit
below, but this fix needs it as a pre-requisite.
Fixes: ba8f48f7a4 ("mptcp: introduce mptcp_schedule_work")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In polling mode, no stop condition is generated after a timeout. This
causes SCL to remain low and thereby block the bus. If this happens
during a transfer it can cause slaves to misinterpret the subsequent
transfer and return wrong values.
To solve this, pass the ETIMEDOUT error up from ocores_process_polling()
instead of setting STATE_ERROR directly. The caller is adjusted to call
ocores_process_timeout() on error both in polling and in IRQ mode, which
will set STATE_ERROR and generate a stop condition.
Fixes: 69c8c0c0ef ("i2c: ocores: add polling interface")
Signed-off-by: Gregor Herburger <gregor.herburger@tq-group.com>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Federico Vaga <federico.vaga@cern.ch>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init
annotation is wrong.
A crash was observed on an x86 platform where CMOS RTC is unused and
disabled via device tree. set_rtc_noop() was invoked from ntp:
sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock()
doesn't honour that.
Workqueue: events_power_efficient sync_hw_clock
RIP: 0010:set_rtc_noop
Call Trace:
update_persistent_clock64
sync_hw_clock
Fix this by dropping the __init annotation from set/get_rtc_noop().
Fixes: c311ed6183 ("x86/init: Allow DT configured systems to disable RTC at boot time")
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com
I have observed an issue where the RX direction of the LS1028A ENETC pMAC
seems unresponsive. The minimal procedure to reproduce the issue is:
1. Connect ENETC port 0 with a loopback RJ45 cable to one of the Felix
switch ports (0).
2. Bring the ports up (MAC Merge layer is not enabled on either end).
3. Send a large quantity of unidirectional (express) traffic from Felix
to ENETC. I tried altering frame size and frame count, and it doesn't
appear to be specific to either of them, but rather, to the quantity
of octets received. Lowering the frame count, the minimum quantity of
packets to reproduce relatively consistently seems to be around 37000
frames at 1514 octets (w/o FCS) each.
4. Using ethtool --set-mm, enable the pMAC in the Felix and in the ENETC
ports, in both RX and TX directions, and with verification on both
ends.
5. Wait for verification to complete on both sides.
6. Configure a traffic class as preemptible on both ends.
7. Send some packets again.
The issue is at step 5, where the verification process of ENETC ends
(meaning that Felix responds with an SMD-R and ENETC sees the response),
but the verification process of Felix never ends (it remains VERIFYING).
If step 3 is skipped or if ENETC receives less traffic than
approximately that threshold, the test runs all the way through
(verification succeeds on both ends, preemptible traffic passes fine).
If, between step 4 and 5, the step below is also introduced:
4.1. Disable and re-enable PM0_COMMAND_CONFIG bit RX_EN
then again, the sequence of steps runs all the way through, and
verification succeeds, even if there was the previous RX traffic
injected into ENETC.
Traffic sent *by* the ENETC port prior to enabling the MAC Merge layer
does not seem to influence the verification result, only received
traffic does.
The LS1028A manual does not mention any relationship between
PM0_COMMAND_CONFIG and MMCSR, and the hardware people don't seem to
know for now either.
The bit that is toggled to work around the issue is also toggled
by enetc_mac_enable(), called from phylink's mac_link_down() and
mac_link_up() methods - which is how the workaround was found:
verification would work after a link down/up.
Fixes: c7b9e80869 ("net: enetc: add support for MAC Merge layer")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230411192645.1896048-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only
checks the pos against the end of the chunk. However, the data left for
the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference
it as struct sctp_ifwdtsn_skip may cause coverflow.
This patch fixes it by checking the pos against "the end of the chunk -
sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to
sctp_fwdtsn_skip.
Fixes: 0fc2ea922c ("sctp: implement validate_ftsn for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.1681155810.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
These Lenovo laptops use Realtek HDA codec combined with
2xCS35L41 Amplifiers using I2C with External Boost.
Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230412160531.182007-1-sbinding@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The commits referenced below allows userspace to use the NLM_F_ECHO flag
for RTM_NEW/DELLINK operations to receive unicast notifications for the
affected link. Prior to these changes, applications may have relied on
multicast notifications to learn the same information without specifying
the NLM_F_ECHO flag.
For such applications, the mentioned commits changed the behavior for
requests not using NLM_F_ECHO. Multicast notifications are still received,
but now use the portid of the requester and the sequence number of the
request instead of zero values used previously. For the application, this
message may be unexpected and likely handled as a response to the
NLM_F_ACKed request, especially if it uses the same socket to handle
requests and notifications.
To fix existing applications relying on the old notification behavior,
set the portid and sequence number in the notification only if the
request included the NLM_F_ECHO flag. This restores the old behavior
for applications not using it, but allows unicasted notifications for
others.
Fixes: f3a63cce1b ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link")
Fixes: d88e136cab ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create")
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230411074319.24133-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- apple admac driver fixes for current_tx, src_addr_widths and global'
interrupt flags handling
- xdma kerneldoc fix
- core fix for use of devm_add_action_or_reset
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmQ24cwACgkQfBQHDyUj
g0d/BBAAogctaal5eawTXUetsRx2PSOfgtTnG09Q+HMG3FIkoRplITngipHNG4eG
1ynM750+RsupMsEY8oxWGZIZ8r3ex12hgaQgwEbCF7eB0RxqQzCyjeubwIihlD7S
QpjOdqliuf0R7WJvBAv33emnmY2kQQaJYUrR/nmFNOSQ5bnYL9nrd2Wdwv69JmjA
Sdy1lWvqVVZJ5OUFrT/B/uDgVMxI9UN97nD3X5+s3HgP4A7SoLBgGM0k2xQYnF6i
k+Ru+xauvoRVcW+2KW6JlSwhLhS543CFg/SRxTEbSk/joU8Kn63Kkbh55CeJJa6h
vGL3IW30I4/1+Wr7E4rk0UoNflHPO3Wti6KS6y7P98d8rImvT88/wnjHL4rMDpom
1cDMYZU4VsddvVSHkb7jL7B4+tWTkucZYFWke2r+d0Fc4jWYMJ4+vSwBk7jp3pGl
aBhl/dnUq6vIo8hHemiRg3l0Wlal/tXe4FiMgik0ELjT7sjlKSlr/q66Zb7r03o4
Lwk8CRRqjZ03eROFbEKPFseJUkXGnLnGFRpLKcM6PAjYmP6KaEJyiwO5d5Dii8t2
jUlVRXeqE8u7Udsh1XAxCy6+WrWhDox/CujBUUWq2xCuc8rrMufI7Y0GNrZx+NRH
oBZudqEe3q00CDAqKVmxnaRe9omLi5Ji3BmQkfcy9FYuZi/NTe4=
=3UUN
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fix-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A couple of fixes in apple driver, core and kernedoc fix for dmaengine
subsystem:
- apple admac driver fixes for current_tx, src_addr_widths and
global' interrupt flags handling
- xdma kerneldoc fix
- core fix for use of devm_add_action_or_reset"
* tag 'dmaengine-fix-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: apple-admac: Fix 'current_tx' not getting freed
dmaengine: apple-admac: Set src_addr_widths capability
dmaengine: apple-admac: Handle 'global' interrupt flags
dmaengine: xilinx: xdma: Fix some kernel-doc comments
dmaengine: Actually use devm_add_action_or_reset()
Update the driver implementations to fit those data exposed
by PMFW.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
[Why & How]
drm_dp_remove_payload() interface was changed. Correct amdgpu dm code
to pass the right parameter to the drm helper function.
Reviewed-by: Jerry Zuo <Jerry.Zuo@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It is found that attaching a task to the top_cpuset does not currently
ignore CPUs allocated to subpartitions in cpuset_attach_task(). So the
code is changed to fix that.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.
We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.
Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>
By default, the clone(2) syscall spawn a child process into the same
cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag
introduced by commit ef2c41cf38 ("clone3: allow spawning processes
into cgroups"), the child will be spawned into a different cgroup which
is somewhat similar to writing the child's tid into "cgroup.threads".
The current cpuset_fork() method does not properly handle the
CLONE_INTO_CGROUP case where the cpuset of the child may be different
from that of its parent. Update the cpuset_fork() method to treat the
CLONE_INTO_CGROUP case similar to cpuset_attach().
Since the newly cloned task has not been running yet, its actual
memory usage isn't known. So it is not necessary to make change to mm
in cpuset_fork().
Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>
After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.
Fixes: e44193d39e ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the
size of the page to the part of the page extracted, not the remaining
amount of data in the extracted page array at that point.
This doesn't yet affect anything as cifs, the only current user, only
passes in non-user-backed iterators.
Fixes: 0185846975 ("netfs: Add a function to extract an iterator into a scatterlist")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When local group is fully busy but its average load is above system load,
computing the imbalance will overflow and local group is not the best
target for pulling this load.
Fixes: 0b0695f2b3 ("sched/fair: Rework load_balance()")
Reported-by: Tingjia Cao <tjcao980311@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingjia Cao <tjcao980311@gmail.com>
Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ89jQ@mail.gmail.com/T/
TI CPSW uses of_platform_* functions which are declared in of_platform.h.
of_platform.h gets implicitly included by of_device.h, but that is going
to be removed soon. Nothing else depends on of_device.h so it can be
dropped. of_platform.h also implicitly includes platform_device.h, so
add an explicit include for it, too.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch reports:
drivers/net/wwan/iosm/iosm_ipc_pcie.c:298 ipc_pcie_probe()
warn: missing unwind goto?
When dma_set_mask fails it directly returns without disabling pci
device and freeing ipc_pcie. Fix this my calling a correct goto label
As dma_set_mask returns either 0 or -EIO, we can use a goto label, as
it finally returns -EIO.
Add a set_mask_fail goto label which stands consistent with other goto
labels in this function..
Fixes: 035e3befc1 ("net: wwan: iosm: fix driver not working with INTEL_IOMMU disabled")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With Eric's ref tracker, syzbot finally found a repro for
use-after-free in tcp_write_timer_handler() by kernel TCP
sockets. [0]
If SMC creates a kernel socket in __smc_create(), the kernel
socket is supposed to be freed in smc_clcsock_release() by
calling sock_release() when we close() the parent SMC socket.
However, at the end of smc_clcsock_release(), the kernel
socket's sk_state might not be TCP_CLOSE. This means that
we have not called inet_csk_destroy_sock() in __tcp_close()
and have not stopped the TCP timers.
The kernel socket's TCP timers can be fired later, so we
need to hold a refcnt for net as we do for MPTCP subflows
in mptcp_subflow_create_socket().
[0]:
leaked reference.
sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108)
inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244)
__sock_create (net/socket.c:1546)
smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284)
__sock_create (net/socket.c:1546)
__sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661)
__x64_sys_socket (net/socket.c:1672)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
==================================================================
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091
CPU: 0 PID: 18091 Comm: syzrepro Tainted: G W 6.3.0-rc4-01174-gb5d54eb5899a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:107)
print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
kasan_report (mm/kasan/report.c:538)
tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643)
call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
__run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022)
run_timer_softirq (kernel/time/timer.c:2037)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
__irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
irq_exit_rcu (kernel/softirq.c:664)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14))
</IRQ>
Fixes: ac7138746e ("smc: establish new socket family")
Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com
Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Static code analyzer complains to unchecked return value.
The result of pci_reset_function() is unchecked.
Despite, the issue is on the FLR supported code path and in that
case reset can be done with pcie_flr(), the patch uses less invasive
approach by adding the result check of pci_reset_function().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 7e2cf4feba ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
iavf: fix racing in VLANs
Ahmed Zaki says:
This patchset mainly fixes a racing issue in the iavf where the number of
VLANs in the vlan_filter_list might be more than the PF limit. To fix that,
we get rid of the cvlans and svlans bitmaps and keep all the required info
in the list.
The second patch adds two new states that are needed so that we keep the
VLAN info while the interface goes DOWN:
-- DISABLE (notify PF, but keep the filter in the list)
-- INACTIVE (dev is DOWN, filter is removed from PF)
Finally, the current code keeps each state in a separate bit field, which
is error prone. The first patch refactors that by replacing all bits with
a single enum. The changes are minimal where each bit change is replaced
with the new state value.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: remove active_cvlans and active_svlans bitmaps
iavf: refactor VLAN filter states
====================
Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix not setting Dath Path for broadcast sink
- Fix not cleaning up on LE Connection failure
- SCO: Fix possible circular locking dependency
- L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
- Fix race condition in hidp_session_thread
- btbcm: Fix logic error in forming the board name
- btbcm: Fix use after free in btsdio_remove
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmQ0RqEZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQrlEACfcXOFd5OtB2uLT+TTXkWk
QS8D4qfQfKkwVxNSBrvg/pSm1PXvmed6OdboQgcrosKppbOydYmXbPy+4k3jzQIG
2fCkgdUd8TUCGp4CiIlOHIZ3iPANa78It1c986L0of3uJ9lZaoZDNioodpFO+xT9
6yrjS7blIU8GpkAGPKLSj7Im59CpkPE2tbh6HGmUAkslngsQb/GicqL1do1XWMX+
8gtc4J1kHDearzm8ecHwX6csdocFuBfSeFCPIshwPwqeNzGkt43gQeBuylTKa1Ep
HbyUekazaLJq2dlVcjMVDQ/ISlYSXannBF2v+Wx7ETdF+DsmidApZxaLDEdwVjok
NGxsChfVum5C1bmeiltnc93UW/l8GZYjlf3fEb0xirMmSbZGZ6oR6ul0Q0y6VGa8
S1nw42K+p7Gys/s0fejkmXZAZCTvfA0TJErocKlVPDwigCzUGUIaMNGIdRQT47u5
h3f0aW4qvPkdszmlWvuknXWSqLoOVB97L+fNUCA31sSH/dG83KwnBzQAMcn6ZcWC
EPO4WDVToZWxaMdZ0MSBaGXu4j/dD3KT7wz39FTUBNyUiW0bM3DxOPSI4vncyrkl
2uaThul45R+iD3n0Q2eFh5T1fcoSZ/GBMc4w2P+y7V7E2w/wxCyFH7+fnEsbrD0T
7JUlngpbtwbaQu4i4XLRGQ==
=Ag/P
-----END PGP SIGNATURE-----
Merge tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix not setting Dath Path for broadcast sink
- Fix not cleaning up on LE Connection failure
- SCO: Fix possible circular locking dependency
- L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
- Fix race condition in hidp_session_thread
- btbcm: Fix logic error in forming the board name
- btbcm: Fix use after free in btsdio_remove
* tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Bluetooth: Set ISO Data Path on broadcast sink
Bluetooth: hci_conn: Fix possible UAF
Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
bluetooth: btbcm: Fix logic error in forming the board name.
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
Bluetooth: Fix race condition in hidp_session_thread
Bluetooth: Fix printing errors if LE Connection times out
Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
====================
Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 3fe97ff3d9 ("scsi: ses: Don't attach if enclosure
has no components") and introduces proper handling of case where there are
no detected secondary components, but primary component (enumerated in
num_enclosures) does exist. That fix was originally proposed by Ding Hui
<dinghui@sangfor.com.cn>.
Completely ignoring devices that have one primary enclosure and no
secondary one results in ses_intf_add() bailing completely
scsi 2:0:0:254: enclosure has no enumerated components
scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such
even on valid configurations with 1 primary and 0 secondary enclosures as
below:
# sg_ses /dev/sg0
3PARdata SES 3321
Supported diagnostic pages:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Short Enclosure Status (SES) [ses] [0x8]
# sg_ses -p cf /dev/sg0
3PARdata SES 3321
Configuration diagnostic page:
number of secondary subenclosures: 0
generation code: 0x0
enclosure descriptor list
Subenclosure identifier: 0 [primary]
relative ES process id: 0, number of ES processes: 1
number of type descriptor headers: 1
enclosure logical identifier (hex): 20000002ac02068d
enclosure vendor: 3PARdata product: VV rev: 3321
type descriptor header and text list
Element type: Unspecified, subenclosure id: 0
number of possible elements: 1
The changelog for the original fix follows
=====
We can get a crash when disconnecting the iSCSI session,
the call trace like this:
[ffff00002a00fb70] kfree at ffff00000830e224
[ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
[ffff00002a00fbd0] device_del at ffff0000086b6a98
[ffff00002a00fc50] device_unregister at ffff0000086b6d58
[ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
[ffff00002a00fca0] scsi_remove_device at ffff000008706134
[ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
[ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
[ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
[ffff00002a00fdb0] process_one_work at ffff00000810f35c
[ffff00002a00fe00] worker_thread at ffff00000810f648
[ffff00002a00fe70] kthread at ffff000008116e98
In ses_intf_add, components count could be 0, and kcalloc 0 size scomp,
but not saved in edev->component[i].scratch
In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would happen
The call trace also could be other random cases when kfree cannot catch
the invalid pointer
We should not use edev->component[] array when the components count is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process
=====
Reported-by: Michal Kolar <mich.k@seznam.cz>
Originally-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: stable@vger.kernel.org
Fixes: 3fe97ff3d9 ("scsi: ses: Don't attach if enclosure has no components")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pm
Tested-by: Michal Kolar <mich.k@seznam.cz>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This reverts commit b26cd9325b.
This patch introduces a regression on Lenovo Z13, which can't wake
from the lid with it applied; and some unspecified AMD based Dell
platforms are unable to wake from hitting the power button
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230411134932.292287-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
In a NOMMU kernel, sigreturn trampolines are generated on the user
stack by setup_rt_frame. Currently, these trampolines are not instruction
fenced, thus their visibility to ifetch is not guaranteed.
This patch adds a flush_icache_range in setup_rt_frame to fix this
problem.
Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de>
Fixes: 6bd33e1ece ("riscv: add nommu support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When loading a DT overlay that creates a device, the device is not
probed, unless the DT overlay is unloaded and reloaded again.
After the recent refactoring to improve fw_devlink, it no longer depends
on the "compatible" property to identify which device tree nodes will
become struct devices. fw_devlink now picks up dangling consumers
(consumers pointing to descendent device tree nodes of a device that
aren't converted to child devices) when a device is successfully bound
to a driver. See __fw_devlink_pickup_dangling_consumers().
However, during DT overlay, a device's device tree node can have
sub-nodes added/removed without unbinding/rebinding the driver. This
difference in behavior between the normal device instantiation and
probing flow vs. the DT overlay flow has a bunch of implications that
are pointed out elsewhere[1]. One of them is that the fw_devlink logic
to pick up dangling consumers is never exercised.
This patch solves the fw_devlink issue by marking all DT nodes added by
DT overlays with FWNODE_FLAG_NOT_DEVICE (fwnode that won't become
device), and by clearing the flag when a struct device is actually
created for the DT node. This way, fw_devlink knows not to have
consumers waiting on these newly added DT nodes, and to propagate the
dependency to an ancestor DT node that has the corresponding struct
device.
Based on a patch by Saravana Kannan, which covered only platform and spi
devices.
[1] https://lore.kernel.org/r/CAGETcx_bkuFaLCiPrAWCPQz+w79ccDp6=9e881qmK=vx3hBMyg@mail.gmail.com
Fixes: 4a032827da ("of: property: Simplify of_link_to_phandle()")
Link: https://lore.kernel.org/r/CAGETcx_+rhHvaC_HJXGrr5_WAd2+k5f=rWYnkCZ6z5bGX-wj4w@mail.gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Saravana Kannan <saravanak@google.com>
Tested-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Link: https://lore.kernel.org/r/e1fa546682ea4c8474ff997ab6244c5e11b6f8bc.1680182615.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
The "compatible" doesn't match what the kernel is using. Fix it as
kernel using.
Fixes: 6b2748ada2 ("dt-bindings: interrupt-controller: add yaml for LoongArch CPU interrupt controller")
Reported-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/all/20221208020954.GA3368836-robh@kernel.org/
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Liu Peibao <liupeibao@loongson.cn>
Link: https://lore.kernel.org/r/20230401091304.12633-1-liupeibao@loongson.cn
[robh: Rename file to match compatible, fix subject typo]
Signed-off-by: Rob Herring <robh@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmQ1qFYUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxRJA/+JnDrZ8Ag8sIGgI7meuqWcLe3dHXQ
7KaOGM/MP0ir8qJwiobuon2ml1lEU+iWZEcpFH9b9b5iurAfbvZkPhFAWZzJG3GW
B6qEX5zaK4tNJHC8byK4pSat2pJLg8Q6xwA6c3H9264tj4Dc83dg2Y8hWWozbU2l
xFaawRexCus7Xzd2VEfskwOpHQpSYkWCDtTFclIt1iZeelj6an/0FVdvYqvvIXi+
QZjS80tMeZIcKJq06023KJX36lZHf0RGzYzQTyr7asiNnXBnQ1DfYIASwx1Nf6Y3
9PpvoDVbUGnLDAB8THcOM9j9UOrhlbOJ+4b60J0b4nsT0ORn4rCrN05K0DIIXZi0
CsP3qR2l04xYZFElWrvnA30KS76G7vq0OeAGpQsE0nB1PgydXuiXk5lniQhsrOuj
xRiY7YeDFwiNzpjWsVtqkANvl6tA53ZUn+AMEPfIpdYbjnbLcSSF/qY3WCeKoftx
KUg6fVOoXbbSctK5rXQnj2yhwQZSkm3DrG9Ol2i3E524vcMKs28QBBuUN+gtL3TP
aD6nYP2Q0gItYAQCMtJHCBzfstrO5l9JSLM6xUHR/GwiaLCC9+QRdLJFcAxZvZJi
buGs09B2Xz8AbgQ3AejEw7UXk+4JP19ZLyznBhLaaQQO9YOvDYlcn3Xw1QGvJGY/
qsYKElgrLWSz+50=
=2gjK
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to
avoid build errors (Reinette Chatre)
- Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid
broken USB after hotplug or suspend/resume (Basavaraj Natikar)
- Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring)
* tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix use-after-free in pci_bus_release_domain_nr()
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()