Commit Graph

1186480 Commits

Author SHA1 Message Date
Paolo Abeni 111d467485 Merge branch 'two-fixes-for-smcrv2'
Wen Gu says:

====================
Two fixes for SMCRv2

This patch set includes two bugfix for SMCRv2.
====================

Link: https://lore.kernel.org/r/1685101741-74826-1-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30 11:26:34 +02:00
Wen Gu 71c6aa0305 net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
We encountered a crash when using SMCRv2. It is caused by a logical
error in smc_llc_fill_ext_v2().

 BUG: kernel NULL pointer dereference, address: 0000000000000014
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 7 PID: 453 Comm: kworker/7:4 Kdump: loaded Tainted: G        W   E      6.4.0-rc3+ #44
 Workqueue: events smc_llc_add_link_work [smc]
 RIP: 0010:smc_llc_fill_ext_v2+0x117/0x280 [smc]
 RSP: 0018:ffffacb5c064bd88 EFLAGS: 00010282
 RAX: ffff9a6bc1c3c02c RBX: ffff9a6be3558000 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000002 RDI: 000000000000000a
 RBP: ffffacb5c064bdb8 R08: 0000000000000040 R09: 000000000000000c
 R10: ffff9a6bc0910300 R11: 0000000000000002 R12: 0000000000000000
 R13: 0000000000000002 R14: ffff9a6bc1c3c02c R15: ffff9a6be3558250
 FS:  0000000000000000(0000) GS:ffff9a6eefdc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000014 CR3: 000000010b078003 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  smc_llc_send_add_link+0x1ae/0x2f0 [smc]
  smc_llc_srv_add_link+0x2c9/0x5a0 [smc]
  ? cc_mkenc+0x40/0x60
  smc_llc_add_link_work+0xb8/0x140 [smc]
  process_one_work+0x1e5/0x3f0
  worker_thread+0x4d/0x2f0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xe5/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

When an alernate RNIC is available in system, SMC will try to add a new
link based on the RNIC for resilience. All the RMBs in use will be mapped
to the new link. Then the RMBs' MRs corresponding to the new link will be
filled into SMCRv2 LLC ADD LINK messages.

However, smc_llc_fill_ext_v2() mistakenly accesses to unused RMBs which
haven't been mapped to the new link and have no valid MRs, thus causing
a crash. So this patch fixes the logic.

Fixes: b4ba4652b3 ("net/smc: extend LLC layer for SMC-Rv2")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30 11:26:32 +02:00
Wen Gu b24aa141c2 net/smc: Scan from current RMB list when no position specified
When finding the first RMB of link group, it should start from the
current RMB list whose index is 0. So fix it.

Fixes: b4ba4652b3 ("net/smc: extend LLC layer for SMC-Rv2")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30 11:26:32 +02:00
Maximilian Luz 061c228967 platform/surface: aggregator_tabletsw: Add support for book mode in POS subsystem
Devices with a type-cover have an additional "book" mode, deactivating
type-cover input and turning off its backlight. This is currently
unsupported, leading to the warning

  surface_aggregator_tablet_mode_switch 01:26:01:00:01: unknown device posture for type-cover: 6

Therefore, add support for this state and map it to enable tablet-mode.

Fixes: 37ff64cd81 ("platform/surface: aggregator_tabletsw: Add support for Type-Cover posture source")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525213218.2797480-3-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-30 11:20:30 +02:00
Maximilian Luz 9bed667033 platform/surface: aggregator_tabletsw: Add support for book mode in KIP subsystem
Devices with a type-cover have an additional "book" mode, deactivating
type-cover input and turning off its backlight. This is currently
unsupported, leading to the warning

  surface_aggregator_tablet_mode_switch 01:0e:01:00:01: unknown KIP cover state: 6

Therefore, add support for this state and map it to enable tablet-mode.

Fixes: 9f794056db ("platform/surface: Add KIP/POS tablet-mode switch driver")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525213218.2797480-2-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-30 11:20:26 +02:00
Maximilian Luz 539e0a7f91 platform/surface: aggregator: Allow completion work-items to be executed in parallel
Currently, event completion work-items are restricted to be run strictly
in non-parallel fashion by the respective workqueue. However, this has
lead to some problems:

In some instances, the event notifier function called inside this
completion workqueue takes a non-negligible amount of time to execute.
One such example is the battery event handling code (surface_battery.c),
which can result in a full battery information refresh, involving
further synchronous communication with the EC inside the event handler.
This is made worse if the communication fails spuriously, generally
incurring a multi-second timeout.

Since the event completions are run strictly non-parallel, this blocks
other events from being propagated to the respective subsystems. This
becomes especially noticeable for keyboard and touchpad input, which
also funnel their events through this system. Here, users have reported
occasional multi-second "freezes".

Note, however, that the event handling system was never intended to run
purely sequentially. Instead, we have one work struct per EC/SAM
subsystem, processing the event queue for that subsystem. These work
structs were intended to run in parallel, allowing sequential processing
of work items for each subsystem but parallel processing of work items
across subsystems.

The only restriction to this is the way the workqueue is created.
Therefore, replace create_workqueue() with alloc_workqueue() and do not
restrict the maximum number of parallel work items to be executed on
that queue, resolving any cross-subsystem blockage.

Fixes: c167b9c7e3 ("platform/surface: Add Surface Aggregator subsystem")
Link: https://github.com/linux-surface/linux-surface/issues/1026
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525210110.2785470-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-30 11:20:16 +02:00
Maximilian Luz ed08d937ea platform/surface: aggregator: Make to_ssam_device_driver() respect constness
Make to_ssam_device_driver() a bit safer by replacing container_of()
with container_of_const() to respect the constness of the passed in
pointer, instead of silently discarding any const specifications. This
change also makes it more similar to to_ssam_device(), which already
uses container_of_const().

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525205041.2774947-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-30 11:20:02 +02:00
David Howells 020c69c1a7 rxrpc: Truncate UTS_RELEASE for rxrpc version
UTS_RELEASE has a maximum length of 64 which can cause rxrpc_version to
exceed the 65 byte message limit.

Per the rx spec[1]: "If a server receives a packet with a type value of 13,
and the client-initiated flag set, it should respond with a 65-byte payload
containing a string that identifies the version of AFS software it is
running."

The current implementation causes a compile error when WERROR is turned on
and/or UTS_RELEASE exceeds the length of 49 (making the version string more
than 64 characters).

Fix this by generating the string during module initialisation and limiting
the UTS_RELEASE segment of the string does not exceed 49 chars.  We need to
make sure that the 64 bytes includes "linux-" at the front and " AF_RXRPC"
at the back as this may be used in pattern matching.

Fixes: 44ba06987c ("RxRPC: Handle VERSION Rx protocol packets")
Reported-by: Kenny Ho <Kenny.Ho@amd.com>
Link: https://lore.kernel.org/r/20230523223944.691076-1-Kenny.Ho@amd.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Kenny Ho <Kenny.Ho@amd.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Andrew Lunn <andrew@lunn.ch>
cc: David Laight <David.Laight@ACULAB.COM>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Link: https://web.mit.edu/kolya/afs/rx/rx-spec [1]
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Link: https://lore.kernel.org/r/654974.1685100894@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30 10:01:06 +02:00
Akihiko Odaki 811154e234 KVM: arm64: Populate fault info for watchpoint
When handling ESR_ELx_EC_WATCHPT_LOW, far_el2 member of struct
kvm_vcpu_fault_info will be copied to far member of struct
kvm_debug_exit_arch and exposed to the userspace. The userspace will
see stale values from older faults if the fault info does not get
populated.

Fixes: 8fb2046180 ("KVM: arm64: Move early handlers to per-EC handlers")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230530024651.10014-1-akihiko.odaki@daynix.com
Cc: stable@vger.kernel.org
2023-05-30 08:39:07 +01:00
Maninder Singh 719dfd5925 powerpc/xmon: Use KSYM_NAME_LEN in array size
kallsyms_lookup() which in turn calls kallsyms_lookup_buildid() writes
to index "KSYM_NAME_LEN - 1".

Thus the array passed as namebuf to kallsyms_lookup() should be
KSYM_NAME_LEN in size.

In xmon.c the array was defined to be "128" bytes directly, without
using KSYM_NAME_LEN. Commit b8a94bfb33 ("kallsyms: increase maximum
kernel symbol length to 512") changed the value to 512, but missed
updating the xmon code.

Fixes: b8a94bfb33 ("kallsyms: increase maximum kernel symbol length to 512")
Cc: stable@vger.kernel.org # v6.1+
Co-developed-by: Onkarnath <onkarnath.1@samsung.com>
Signed-off-by: Onkarnath <onkarnath.1@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
[mpe: Tweak change log wording and fix commit reference]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230529111337.352990-2-maninder1.s@samsung.com
2023-05-30 16:46:56 +10:00
Gaurav Batra 9d2ccf00bd powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcall
Currently in tce_freemulti_pSeriesLP() there is no limit on how many
TCEs are passed to the H_STUFF_TCE hcall. This has not caused an issue
until now, but newer firmware releases have started enforcing a limit of
512 TCEs per call.

The limit is correct per the specification (PAPR v2.12 § 14.5.4.2.3).

The code has been in it's current form since it was initially merged.

Cc: stable@vger.kernel.org
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
[mpe: Tweak change log wording & add PAPR reference]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230525143454.56878-1-gbatra@linux.vnet.ibm.com
2023-05-30 16:33:10 +10:00
Michael Ellerman 81d358b118 powerpc/crypto: Fix aes-gcm-p10 link errors
The recently added P10 AES/GCM code added some files containing
CRYPTOGAMS perl-asm code which are near duplicates of the p8 files
found in drivers/crypto/vmx.

In particular the newly added files produce functions with identical
names to the existing code.

When the kernel is built with CONFIG_CRYPTO_AES_GCM_P10=y and
CONFIG_CRYPTO_DEV_VMX_ENCRYPT=y that leads to link errors, eg:

  ld: drivers/crypto/vmx/aesp8-ppc.o: in function `aes_p8_set_encrypt_key':
  (.text+0xa0): multiple definition of `aes_p8_set_encrypt_key'; arch/powerpc/crypto/aesp8-ppc.o:(.text+0xa0): first defined here
  ...
  ld: drivers/crypto/vmx/ghashp8-ppc.o: in function `gcm_ghash_p8':
  (.text+0x140): multiple definition of `gcm_ghash_p8'; arch/powerpc/crypto/ghashp8-ppc.o:(.text+0x2e4): first defined here

Fix it for now by renaming the newly added files and functions to use
"p10" instead of "p8" in the names.

Fixes: 45a4672b9a ("crypto: p10-aes-gcm - Update Kconfig and Makefile")
Tested-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230525150501.37081-1-mpe@ellerman.id.au
2023-05-30 15:50:32 +10:00
Cambda Zhu 34dfde4ad8 tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set
This patch replaces the tp->mss_cache check in getting TCP_MAXSEG
with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Since
tp->mss_cache is initialized with TCP_MSS_DEFAULT, checking if
it's zero is probably a bug.

With this change, getting TCP_MAXSEG before connecting will return
default MSS normally, and return user_mss if user_mss is set.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Jack Yang <mingliang@linux.alibaba.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89i+3kL9pYtkxkwxwNMzvC_w3LNUum_2=3u+UyLBmGmifHA@mail.gmail.com/#t
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/netdev/14D45862-36EA-4076-974C-EA67513C92F6@linux.alibaba.com/
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230527040317.68247-1-cambda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 22:03:48 -07:00
Eric Dumazet 4faeee0cf8 tcp: deny tcp_disconnect() when threads are waiting
Historically connect(AF_UNSPEC) has been abused by syzkaller
and other fuzzers to trigger various bugs.

A recent one triggers a divide-by-zero [1], and Paolo Abeni
was able to diagnose the issue.

tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN
and TCP REPAIR mode being not used.

Then later if socket lock is released in sk_wait_data(),
another thread can call connect(AF_UNSPEC), then make this
socket a TCP listener.

When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf()
and attempt a divide by 0 in tcp_rcv_space_adjust() [1]

This patch adds a new socket field, counting number of threads
blocked in sk_wait_event() and inet_wait_for_connect().

If this counter is not zero, tcp_disconnect() returns an error.

This patch adds code in blocking socket system calls, thus should
not hurt performance of non blocking ones.

Note that we probably could revert commit 499350a5a6 ("tcp:
initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore
original tcpi_rcv_mss meaning (was 0 if no payload was ever
received on a socket)

[1]
divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740
Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48
RSP: 0018:ffffc900033af660 EFLAGS: 00010206
RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001
RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8
R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344
FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616
tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681
inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg+0xe2/0x160 net/socket.c:1038
____sys_recvmsg+0x210/0x5a0 net/socket.c:2720
___sys_recvmsg+0xf2/0x180 net/socket.c:2762
do_recvmmsg+0x25e/0x6e0 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5c0108c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9
RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003
RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000
</TASK>

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Diagnosed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 22:03:48 -07:00
Eric Dumazet 6ffc57ea00 af_packet: do not use READ_ONCE() in packet_bind()
A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt()

This is better handled by reading pkt_sk(sk)->num later
in packet_do_bind() while appropriate lock is held.

READ_ONCE() in writers are often an evidence of something being wrong.

Fixes: 822b5a1c17 ("af_packet: Fix data-races of pkt_sk(sk)->num.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230526154342.2533026-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 22:03:48 -07:00
Jakub Kicinski 0684f29a89 netlink: specs: correct types of legacy arrays
ethtool has some attrs which dump multiple scalars into
an attribute. The spec currently expects one attr per entry.

Fixes: a353318ebf ("tools: ynl: populate most of the ethtool spec")
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230526220653.65538-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 22:03:48 -07:00
Sebastian Krzyszkowiak 36936a56e1 net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818
BM818 is based on Qualcomm MDM9607 chipset.

Fixes: 9a07406b00 ("net: usb: qmi_wwan: Add the BroadMobi BM818 card")
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20230526-bm818-dtr-v1-1-64bbfa6ba8af@puri.sm
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 22:03:48 -07:00
Damien Le Moal 7f875850f2 ata: libata-scsi: Use correct device no in ata_find_dev()
For devices not attached to a port multiplier and managed directly by
libata, the device number passed to ata_find_dev() must always be lower
than the maximum number of devices returned by ata_link_max_devices().
That is 1 for SATA devices or 2 for an IDE link with master+slave
devices. This device number is the SCSI device ID which matches these
constraints as the IDs are generated per port and so never exceed the
maximum number of devices for the link being used.

However, for libsas managed devices, SCSI device IDs are assigned per
struct scsi_host, leading to device IDs for SATA devices that can be
well in excess of libata per-link maximum number of devices. This
results in ata_find_dev() to always return NULL for libsas managed
devices except for the first device of the target scsi_host with ID
(device number) equal to 0. This issue is visible by executing the
hdparm utility, which fails. E.g.:

hdparm -i /dev/sdX
/dev/sdX:
  HDIO_GET_IDENTITY failed: No message of desired type

Fix this by rewriting ata_find_dev() to ignore the device number for
non-PMP attached devices with a link with at most 1 device, that is SATA
devices. For these, the device number 0 is always used to
return the correct pointer to the struct ata_device of the port link.
This change excludes IDE master/slave setups (maximum number of devices
per link is 2) and port-multiplier attached devices. Also, to be
consistant with the fact that SCSI device IDs and channel numbers used
as device numbers are both unsigned int, change the devno argument of
ata_find_dev() to unsigned int.

Reported-by: Xingui Yang <yangxingui@huawei.com>
Fixes: 41bda9c980 ("libata-link: update hotplug to handle PMP links")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
2023-05-30 08:08:18 +09:00
Mustafa Ismail 5842d1d9c1 RDMA/irdma: Fix Local Invalidate fencing
If the local invalidate fence is indicated in the WR, only the read fence
is currently being set in WQE. Fix this to set both the read and local
fence in the WQE.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20230522155654.1309-4-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-29 14:06:29 -03:00
Mustafa Ismail c8f304d75f RDMA/irdma: Prevent QP use after free
There is a window where the poll cq may use a QP that has been freed.
This can happen if a CQE is polled before irdma_clean_cqes() can clear the
CQE's related to the QP and the destroy QP races to free the QP memory.
then the QP structures are used in irdma_poll_cq.  Fix this by moving the
clearing of CQE's before the reference is removed and the QP is destroyed.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20230522155654.1309-3-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-29 14:06:29 -03:00
Michael Margolin ffe14de983 MAINTAINERS: Update maintainer of Amazon EFA driver
Change EFA driver maintainer from Gal Pressman to myself. Keep Gal as a
reviewer at his request.

Link: https://lore.kernel.org/r/20230525094444.12570-1-mrgolin@amazon.com
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Acked-by: Gal Pressman <gal.pressman@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-29 14:03:35 -03:00
Ruihan Li 44d0fb387b mm: page_table_check: Ensure user pages are not slab pages
The current uses of PageAnon in page table check functions can lead to
type confusion bugs between struct page and slab [1], if slab pages are
accidentally mapped into the user space. This is because slab reuses the
bits in struct page to store its internal states, which renders PageAnon
ineffective on slab pages.

Since slab pages are not expected to be mapped into the user space, this
patch adds BUG_ON(PageSlab(page)) checks to make sure that slab pages
are not inadvertently mapped. Otherwise, there must be some bugs in the
kernel.

Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: df4e817b71 ("mm: page table check")
Cc: <stable@vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20230515130958.32471-5-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 16:14:28 +01:00
Ruihan Li 81a31a860b mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
Without EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary
physical memory regions into the userspace via /dev/mem. At the same
time, pages may change their properties (e.g., from anonymous pages to
named pages) while they are still being mapped in the userspace, leading
to "corruption" detected by the page table check.

To avoid these false positives, this patch makes PAGE_TABLE_CHECK
depends on EXCLUSIVE_SYSTEM_RAM. This dependency is understandable
because PAGE_TABLE_CHECK is a hardening technique but /dev/mem without
STRICT_DEVMEM (i.e., !EXCLUSIVE_SYSTEM_RAM) is itself a security
problem.

Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be
mapped via /dev/mem. However, these pages are always considered as named
pages, so they won't break the logic used in the page table check.

Cc: <stable@vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20230515130958.32471-4-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 16:14:28 +01:00
Ruihan Li d0b861653f usb: usbfs: Use consistent mmap functions
When hcd->localmem_pool is non-null, localmem_pool is used to allocate
DMA memory. In this case, the dma address will be properly returned (in
dma_handle), and dma_mmap_coherent should be used to map this memory
into the user space. However, the current implementation uses
pfn_remap_range, which is supposed to map normal pages.

Instead of repeating the logic in the memory allocation function, this
patch introduces a more robust solution. Here, the type of allocated
memory is checked by testing whether dma_handle is properly set. If
dma_handle is properly returned, it means some DMA pages are allocated
and dma_mmap_coherent should be used to map them. Otherwise, normal
pages are allocated and pfn_remap_range should be called. This ensures
that the correct mmap functions are used consistently, independently
with logic details that determine which type of memory gets allocated.

Fixes: a0e710a7de ("USB: usbfs: fix mmap dma mismatch")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Link: https://lore.kernel.org/r/20230515130958.32471-3-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 16:14:28 +01:00
Ruihan Li 0143d148d1 usb: usbfs: Enforce page requirements for mmap
The current implementation of usbdev_mmap uses usb_alloc_coherent to
allocate memory pages that will later be mapped into the user space.
Meanwhile, usb_alloc_coherent employs three different methods to
allocate memory, as outlined below:
 * If hcd->localmem_pool is non-null, it uses gen_pool_dma_alloc to
   allocate memory;
 * If DMA is not available, it uses kmalloc to allocate memory;
 * Otherwise, it uses dma_alloc_coherent.

However, it should be noted that gen_pool_dma_alloc does not guarantee
that the resulting memory will be page-aligned. Furthermore, trying to
map slab pages (i.e., memory allocated by kmalloc) into the user space
is not resonable and can lead to problems, such as a type confusion bug
when PAGE_TABLE_CHECK=y [1].

To address these issues, this patch introduces hcd_alloc_coherent_pages,
which addresses the above two problems. Specifically,
hcd_alloc_coherent_pages uses gen_pool_dma_alloc_align instead of
gen_pool_dma_alloc to ensure that the memory is page-aligned. To replace
kmalloc, hcd_alloc_coherent_pages directly allocates pages by calling
__get_free_pages.

Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.comm
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: f7d34b445a ("USB: Add support for usbfs zerocopy.")
Fixes: ff2437befd ("usb: host: Fix excessive alignment restriction for local memory allocations")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230515130958.32471-2-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 16:14:28 +01:00
Marek Vasut 7b32040f6d dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type
The "snps,hsphy_interface" is string, not u8. Fix the type.

Fixes: 389d776588 ("dt-bindings: usb: Convert DWC USB3 bindings to DT schema")
Cc: stable <stable@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20230515172456.179049-1-marex@denx.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:57:22 +01:00
Damien Le Moal 47fe1c3064 block: fix revalidate performance regression
The scsi driver function sd_read_block_characteristics() always calls
disk_set_zoned() to a disk zoned model correctly, in case the device
model changed. This is done even for regular disks to set the zoned
model to BLK_ZONED_NONE and free any zone related resources if the drive
previously was zoned.

This behavior significantly impact the time it takes to revalidate disks
on a large system as the call to disk_clear_zone_settings() done from
disk_set_zoned() for the BLK_ZONED_NONE case results in the device
request queued to be frozen, even if there are no zone resources to
free.

Avoid this overhead for non-zoned devices by not calling
disk_clear_zone_settings() in disk_set_zoned() if the device model
was already set to BLK_ZONED_NONE, which is always the case for regular
devices.

Reported by: Brian Bunker <brian@purestorage.com>

Fixes: 508aebb805 ("block: introduce blk_queue_clear_zone_settings()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230529073237.1339862-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-29 08:40:32 -06:00
Dan Carpenter 016da9c65f usb: gadget: udc: fix NULL dereference in remove()
The "udc" pointer was never set in the probe() function so it will
lead to a NULL dereference in udc_pci_remove() when we do:

	usb_del_gadget_udc(&udc->gadget);

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/ZG+A/dNpFWAlCChk@kili
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:24:24 +01:00
Uttkarsh Aggarwal efb6b53520 usb: gadget: f_fs: Add unbind event before functionfs_unbind
While exercising the unbind path, with the current implementation
the functionfs_unbind would be calling which waits for the ffs->mutex
to be available, however within the same time ffs_ep0_read is invoked
& if no setup packets are pending, it will invoke function
wait_event_interruptible_exclusive_locked_irq which by definition waits
for the ev.count to be increased inside the same mutex for which
functionfs_unbind is waiting.
This creates deadlock situation because the functionfs_unbind won't
get the lock until ev.count is increased which can only happen if
the caller ffs_func_unbind can proceed further.

Following is the illustration:

	CPU1				CPU2

ffs_func_unbind()		ffs_ep0_read()
				mutex_lock(ffs->mutex)
				wait_event(ffs->ev.count)
functionfs_unbind()
  mutex_lock(ffs->mutex)
  mutex_unlock(ffs->mutex)

ffs_event_add()

<deadlock>

Fix this by moving the event unbind before functionfs_unbind
to ensure the ev.count is incrased properly.

Fixes: 6a19da1110 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait")
Cc: stable <stable@kernel.org>
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20230525092854.7992-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:24:08 +01:00
Frank Li dbe678f619 usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM
At iMX8QM platform, enable NCM gadget and run 'iperf3 -s'.
At host, run 'iperf3 -V -c fe80::6863:98ff:feef:3e0%enxc6e147509498'

[  5]   0.00-1.00   sec  1.55 MBytes  13.0 Mbits/sec   90   4.18 KBytes
[  5]   1.00-2.00   sec  1.44 MBytes  12.0 Mbits/sec   75   4.18 KBytes
[  5]   2.00-3.00   sec  1.48 MBytes  12.4 Mbits/sec   75   4.18 KBytes

Expected speed should be bigger than 300Mbits/sec.

The root cause of this performance drop was found to be data corruption
happening at 4K borders in some Ethernet packets, leading to TCP
checksum errors. This corruption occurs from the position
(4K - (address & 0x7F)) to 4K. The u_ether function's allocation of
skb_buff reserves 64B, meaning all RX addresses resemble 0xXXXX0040.

Force trb_burst_size to 16 can fix this problem.

Cc: stable@vger.kernel.org
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20230518154946.3666662-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:23:59 +01:00
Richard Acayan 46248400d8 misc: fastrpc: reject new invocations during device removal
The channel's rpmsg object allows new invocations to be made. After old
invocations are already interrupted, the driver shouldn't try to invoke
anymore. Invalidating the rpmsg at the end of the driver removal
function makes it easy to cause a race condition in userspace. Even
closing a file descriptor before the driver finishes its cleanup can
cause an invocation via fastrpc_release_current_dsp_process() and
subsequent timeout.

Invalidate the channel before the invocations are interrupted to make
sure that no invocations can be created to hang after the device closes.

Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable@kernel.org>
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-5-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:09:50 +01:00
Richard Acayan b6a062853d misc: fastrpc: return -EPIPE to invocations on device removal
The return value is initialized as -1, or -EPERM. The completion of an
invocation implies that the return value is set appropriately, but
"Permission denied" does not accurately describe the outcome of the
invocation. Set the invocation's return value to a more appropriate
"Broken pipe", as the cleanup breaks the driver's connection with rpmsg.

Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable@kernel.org>
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:09:50 +01:00
Ekansh Gupta 3c7d0079a1 misc: fastrpc: Reassign memory ownership only for remote heap
The userspace map request for remote heap allocates CMA memory.
The ownership of this memory needs to be reassigned to proper
owners to allow access from the protection domain running on
DSP. This reassigning of ownership is not correct if done for
any other supported flags.

When any other flag is requested from userspace, fastrpc is
trying to reassign the ownership of memory and this reassignment
is getting skipped for remote heap request which is incorrect.
Add proper flag check to reassign the memory only if remote heap
is requested.

Fixes: 532ad70c6d ("misc: fastrpc: Add mmap request assigning for static PD pool")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:09:50 +01:00
Ekansh Gupta a6e766dea0 misc: fastrpc: Pass proper scm arguments for secure map request
If a map request is made with securemap attribute, the memory
ownership needs to be reassigned to new VMID to allow access
from protection domain. Currently only DSP VMID is passed to
the reassign call which is incorrect as only a combination of
HLOS and DSP VMID is allowed for memory ownership reassignment
and passing only DSP VMID will cause assign call failure.

Also pass proper restoring permissions to HLOS as the source
permission will now carry both HLOS and DSP VMID permission.

Change is also made to get valid physical address from
scatter/gather for this allocation request.

Fixes: e90d911906 ("misc: fastrpc: Add support to secure memory map")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-29 15:09:50 +01:00
Conor Dooley ed309ce522
RISC-V: mark hibernation as nonportable
Hibernation support depends on firmware marking its reserved/PMP
protected regions as not accessible from Linux.
The latest versions of the de-facto SBI implementation (OpenSBI) do
not do this, having dropped the no-map property to enable 1 GiB huge
page mappings by the kernel.
This was exposed by commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages
for the linear mapping"), which made the first 2 MiB of DRAM (where SBI
typically resides) accessible by the kernel.
Attempting to hibernate with either OpenSBI, or other implementations
following its lead, will lead to a kernel panic ([1], [2]) as the
hibernation process will attempt to save/restore any mapped regions,
including the PMP protected regions in use by the SBI implementation.

Mark hibernation as depending on "NONPORTABLE", as only a small subset
of systems are capable of supporting it, until such time that an SBI
implementation independent way to communicate what regions are in use
has been agreed on.

As hibernation support landed in v6.4-rc1, disabling it for most
platforms does not constitute a regression. The alternative would have
been reverting commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages for
the linear mapping").
Doing so would permit hibernation on platforms with these SBI
implementations, but would limit the options we have to solve the
protection of the region without causing a regression in hibernation
support.

Reported-by: Song Shuai <suagrfillet@gmail.com>
Link: https://lore.kernel.org/all/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com/ [1]
Reported-by: JeeHeng Sia <jeeheng.sia@starfivetech.com>
Link: https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8 [2]
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230526-astride-detonator-9ae120051159@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-29 06:38:04 -07:00
Linus Torvalds 8b817fded4 Tracing fixes:
User events:
 
  - Use long instead of int for storing the enable set/clear bit, as it was
    found that big endian machines could end up using the wrong bits.
 
  - Split allocating mm and attaching it. This keeps the allocation separate
    from the registration and avoids various races.
 
  - Remove RCU locking around pin_user_pages_remote() as that can schedule. The
    RCU protection is no longer needed with the above split of mm allocation and
    attaching.
 
  - Rename the "link" fields of the various structs to something more
    meaningful.
 
  - Add comments around user_event_mm struct usage and locking requirements.
 
 Timerlat tracer:
 
  - Fix missed wakeup of timerlat thread caused by the timerlat interrupt
    triggering when tracing is off. The timer interrupt handler needs to always
    wake up the timerlat thread regardless if tracing is enabled or not,
    otherwise, it will never wake up.
 
 Histograms:
 
  - Fix regression of breaking the "stacktrace" modifier for variables. That
    modifier cannot be used for values, but can be used for variables that are
    passed from one histogram to the next. This was broken when adding the
    restriction to values as the variable logic used the same code.
 
  - Rename the special field "stacktrace" to "common_stacktrace". Special fields
    (that are not actually part of the event, but can act just like event
    fields, like 'comm' and 'timestamp') should be prefixed with 'common_' for
    consistency. To keep backward compatibility, 'stacktrace' can still be used
    (as with the special field 'cpu'), but can be overridden if the event has a
    field called 'stacktrace'.
 
  - Update the synthetic event selftests to use the new name (synthetic events
    are created by histograms)
 
 Tracing bootup selftests:
 
  - Reorganize the code to keep artifacts of the selftests not compiled in when
    selftests are not configured.
 
  - Add various cond_resched() around the selftest code, as the softlock
    watchdog was triggering much more often. It appears that the kernel runs
    slower now with full debugging enabled.
 
  - While debugging ftrace with ftrace (using an instance ring buffer instead of
    the top level one), I found that the selftests were disabling prints to the
    debug instance. This should not happen, as the selftests only disable
    printing to the main buffer as the selftests examine the main buffer to see
    if it has what it expects, and prints can make the tests fail. Make the
    selftests only disable printing to the toplevel buffer, and leave the
    instance buffers alone.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZHQGJBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qu6hAQCJ1WebZUTJ/s7pFo36mXirLnrW4afB
 Ua6sALseqKNesgEAyhLmd2+sMeqmAbCCIUWtcWJb/Pod0jGOt0U8+cBxfw8=
 =PhaX
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:
 "User events:

   - Use long instead of int for storing the enable set/clear bit, as it
     was found that big endian machines could end up using the wrong
     bits.

   - Split allocating mm and attaching it. This keeps the allocation
     separate from the registration and avoids various races.

   - Remove RCU locking around pin_user_pages_remote() as that can
     schedule. The RCU protection is no longer needed with the above
     split of mm allocation and attaching.

   - Rename the "link" fields of the various structs to something more
     meaningful.

   - Add comments around user_event_mm struct usage and locking
     requirements.

  Timerlat tracer:

   - Fix missed wakeup of timerlat thread caused by the timerlat
     interrupt triggering when tracing is off. The timer interrupt
     handler needs to always wake up the timerlat thread regardless if
     tracing is enabled or not, otherwise, it will never wake up.

  Histograms:

   - Fix regression of breaking the "stacktrace" modifier for variables.
     That modifier cannot be used for values, but can be used for
     variables that are passed from one histogram to the next. This was
     broken when adding the restriction to values as the variable logic
     used the same code.

   - Rename the special field "stacktrace" to "common_stacktrace".

     Special fields (that are not actually part of the event, but can
     act just like event fields, like 'comm' and 'timestamp') should be
     prefixed with 'common_' for consistency. To keep backward
     compatibility, 'stacktrace' can still be used (as with the special
     field 'cpu'), but can be overridden if the event has a field called
     'stacktrace'.

   - Update the synthetic event selftests to use the new name (synthetic
     events are created by histograms)

  Tracing bootup selftests:

   - Reorganize the code to keep artifacts of the selftests not compiled
     in when selftests are not configured.

   - Add various cond_resched() around the selftest code, as the
     softlock watchdog was triggering much more often. It appears that
     the kernel runs slower now with full debugging enabled.

   - While debugging ftrace with ftrace (using an instance ring buffer
     instead of the top level one), I found that the selftests were
     disabling prints to the debug instance.

     This should not happen, as the selftests only disable printing to
     the main buffer as the selftests examine the main buffer to see if
     it has what it expects, and prints can make the tests fail.

     Make the selftests only disable printing to the toplevel buffer,
     and leave the instance buffers alone"

* tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have function_graph selftest call cond_resched()
  tracing: Only make selftest conditionals affect the global_trace
  tracing: Make tracing_selftest_running/delete nops when not used
  tracing: Have tracer selftests call cond_resched() before running
  tracing: Move setting of tracing_selftest_running out of register_tracer()
  tracing/selftests: Update synthetic event selftest to use common_stacktrace
  tracing: Rename stacktrace field to common_stacktrace
  tracing/histograms: Allow variables to have some modifiers
  tracing/user_events: Document user_event_mm one-shot list usage
  tracing/user_events: Rename link fields for clarity
  tracing/user_events: Remove RCU lock while pinning pages
  tracing/user_events: Split up mm alloc and attach
  tracing/timerlat: Always wakeup the timerlat thread
  tracing/user_events: Use long vs int for atomic bit ops
2023-05-29 07:20:13 -04:00
Linus Torvalds 7a6c8e512f This push fixes an alignment crash in x86/aria.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmRt4tIACgkQxycdCkmx
 i6fKpw/6A52EyoKBOjfVE+FKc40Fs/1ecmMUumN0kqnpPFKMHvJehnFsb6U0/gCS
 l6BI3CsFVogeICpBYAo1tjvrt3B4FO+zhBWME+il3aOzYMpdryXf3oLMMORqWlCl
 a+TA0LZH+NmySfYKy6rg1fYTQwKICyv25dp0aFC2n/Nj5SQxVtSVN2TFZ8uXNwcT
 ubjMcVUdmGXVls0KO7/ZbhRf9t/2VELzrS/WovhxUJxxXXW4Qhu1tI7X4UJPbnqc
 m14/dj6aZuaElst8B82BT3OSqZdT2rEbomfcnKOo7ePTp5tkJcpXmtM8eDO40AXn
 lzwZ4d9TIPiXalZWZ4ZL0htvbaC6QAvioSlcFmPPZJudSXYm4n31eimMYBmYlgLq
 uAm5wG4Tl84USumoekTNusZJXinDi0oNDXR7RV3gACy1TQJ17+5vgXKr+1/kNH2u
 clHoBIuon9oFAvxZ44iNyBDc4770D+gU+dNsuFCreBUT/8fvAtFWeptpGY08iptY
 c1hu5Tv3kK68JfEgxTgVWs+ObX1pLWO0bEPeQummd5O9jNXRLLPrSB7wOZrkqrpq
 nl4AchdGE9u9OmB5h0VvNj0tUjJqxr8h04a+Hkhu7NqkREYajGWPJkTZQjNzrTSP
 lu52zuvflim1wyWMozMbFPg1O0J78JqI8EQxs/nuFAjF77u2xgs=
 =JBjy
 -----END PGP SIGNATURE-----

Merge tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix an alignment crash in x86/aria"

* tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
2023-05-29 07:05:49 -04:00
Linus Torvalds ac2263b588 Revert "module: error out early on concurrent load of the same module file"
This reverts commit 9828ed3f69.

Sadly, it does seem to cause failures to load modules. Johan Hovold reports:

 "This change breaks module loading during boot on the Lenovo Thinkpad
  X13s (aarch64).

  Specifically it results in indefinite probe deferral of the display
  and USB (ethernet) which makes it a pain to debug. Typing in the dark
  to acquire some logs reveals that other modules are missing as well"

Since this was applied late as a "let's try this", I'm reverting it
asap, and we can try to figure out what goes wrong later.  The excessive
parallel module loading problem is annoying, but not noticeable in
normal situations, and this was only meant as an optimistic workaround
for a user-space bug.

One possible solution may be to do the optimistic exclusive open first,
and then use a lock to serialize loading if that fails.

Reported-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-29 06:40:33 -04:00
Steven Rostedt (Google) a2d910f022 tracing: Have function_graph selftest call cond_resched()
When all kernel debugging is enabled (lockdep, KSAN, etc), the function
graph enabling and disabling can take several seconds to complete. The
function_graph selftest enables and disables function graph tracing
several times. With full debugging enabled, the soft lockup watchdog was
triggering because the selftest was running without ever scheduling.

Add cond_resched() throughout the test to make sure it does not trigger
the soft lockup detector.

Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.org

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28 21:15:46 -04:00
Steven Rostedt (Google) ac9d2cb1d5 tracing: Only make selftest conditionals affect the global_trace
The tracing_selftest_running and tracing_selftest_disabled variables were
to keep trace_printk() and other writes from affecting the tracing
selftests, as the tracing selftests would examine the ring buffer to see
if it contained what it expected or not. trace_printk() and friends could
add to the ring buffer and cause the selftests to fail (and then disable
the tracer that was being tested). To keep that from happening, these
variables were added and would keep trace_printk() and friends from
writing to the ring buffer while the tests were going on.

But this was only the top level ring buffer (owned by the global_trace
instance). There is no reason to prevent writing into ring buffers of
other instances via the trace_array_printk() and friends. For the
functions that could be used by other instances, check if the global_trace
is the tracer instance that is being written to before deciding to not
allow the write.

Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.org

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28 21:15:33 -04:00
Steven Rostedt (Google) a3ae76d7ff tracing: Make tracing_selftest_running/delete nops when not used
There's no reason to test the condition variables tracing_selftest_running
or tracing_selftest_delete when tracing selftests are not enabled. Make
them define 0s when not the selftests are not configured in.

Link: https://lkml.kernel.org/r/20230528051742.1325503-4-rostedt@goodmis.org

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28 21:15:26 -04:00
Steven Rostedt (Google) 9da705d432 tracing: Have tracer selftests call cond_resched() before running
As there are more and more internal selftests being added to the Linux
kernel (KSAN, lockdep, etc) the selftests are taking longer to run when
these are enabled. Add a cond_resched() to the calling of
do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set,
otherwise the soft lockup watchdog may trigger on boot up.

Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.org

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28 21:15:17 -04:00
Steven Rostedt (Google) e8352cf577 tracing: Move setting of tracing_selftest_running out of register_tracer()
The variables tracing_selftest_running and tracing_selftest_disabled are
only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only
visible within the selftest code. The setting of those variables are in
the register_tracer() call, and set in a location where they do not need
to be. Create a wrapper around run_tracer_selftest() called
do_run_tracer_selftest() which sets those variables, and have
register_tracer() call that instead.

Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST
scope gets rid of them (and also the ability to remove testing against
them) when the startup tests are not enabled (most cases).

Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.org

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28 21:15:04 -04:00
Linus Torvalds e338142b39 phy fixes for 6.4
- init count imbalance fix in qcom-qmp-pcie and combo drivers
  - kernel doc header fix for qcom-snps driver
  - mediatek floating point comparison fix
  - amlogic fix register value
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmRzk3EACgkQfBQHDyUj
 g0cU9g/+PeOboaED0BCiX+n6O4yxrD5bdT1hJSEd/lRdogL4Sp9pPYuVOnyKcZrs
 uQHq0v5BIqWa7dgI+OYB3QL6nsQwNwT5YvDaJlmXRLvmYqyB93gGRVLkTMppEtPU
 5zTE3pDj9BIotHH6i8LFhSyeFwxRRgNfYT6AqY1nx/I9KbA6VHBlmJqFAk1L+SZb
 SKjhK3DQ4ZViWNFUUeVvCcFy/66mm8MEkogGRA6mSkYRHQC5ZIU5QdZZeoXeMbN1
 LQcuFonEOqqlzELn4Xujz4JBe9kMy+5zSqY+OSK3c4MjSczesQVaqAyIY7/EmdXa
 vbfQMfZS2SRtt26ksY2MncLmwoVxDFP0+2FX3rrzpWsaeZ8CiLpkywkkteH2u+6a
 MYFSeKbqTvnkc2AA4kAcHD5Eodmfe7fBKMCnwQK2bU4bjl+pQwB+8kAMNoNOJLFw
 KGZIpOUkBO0Za6fXc+QSNiFfgWdLO/SvYgWAldV+BOQgcQhFyFnJlU6GctZ7HzL2
 GkHi1gzjCtPgn4lIfz/Nt6JqTFS24lrrreZTdVIybm2wieDOIgQgaYBOUsP9BGAn
 fPknh4WGayYm3lUBlgxXyxRiMQ3j/Cr9NYghEJPtN+9M5rpbQIu/dfXhtVVfsdSO
 Gatcrg3vKidbCW9B1xLcjc6iE5ZAfFmzh0ikZfU9O0BCgT9+iKU=
 =/mmj
 -----END PGP SIGNATURE-----

Merge tag 'phy-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

 - init count imbalance fix in qcom-qmp-pcie and combo drivers

 - kernel doc header fix for qcom-snps driver

 - mediatek floating point comparison fix

 - amlogic fix register value

* tag 'phy-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: qcom-snps: correct struct qcom_snps_hsphy kerneldoc
  phy: amlogic: phy-meson-g12a-mipi-dphy-analog: fix CNTL2_DIF_TX_CTL0 value
  phy: mediatek: rework the floating point comparisons to fixed point
  phy: qcom-qmp-pcie-msm8996: fix init-count imbalance
  phy: qcom-qmp-combo: fix init-count imbalance
2023-05-28 20:05:07 -04:00
Linus Torvalds dca389eb95 dmaengine fixes for v6.4
- AT HDMAC driver fixes for Flow Controller bitfield, peripheral ID
   handling and potential NULL dereference check
 - PL330 function rename to avoid conflicts
 - build warning fix for pm function in TI driver
 - IDXD driver fix for passing freed memory
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmRzj7oACgkQfBQHDyUj
 g0fExxAAx8KJHqUDuV6xXzdL+O3KANgdEY5Qz1Pyv65g8AtMLXC7H+WC5+1puO5s
 w07wCQnGk2Vo9NtlBrhl2XyK1wvk9kUKG0VcV0sDBCyudCQyuvkF06jcfQHo7lU7
 7MrjzZlVbz+f1mqg7ib9lKC0WEzMxTlT2POZqWw2w7fq0/60sLVK4CQIrObxZOLi
 9EnKNwXj0yzpPK10GUs/0mKWIPaS0m+PL7vcHxNniBxljYGFVpS5zwZMO6N6BnEj
 FgwbllaQfXYioPFMf3Hf9P+BfJM7T7YQexkwGfeiFqowCN6kYX3ECAqQ9cI7OC/y
 Tx3hXSySWagvd514RwGICh5p2VLCmn5hLhjl+E+16Y5Gj+YxqBNG9Cwzuu5v/4N5
 5K2NM87J9KYuI7aPTqvvWhf+X0Ap6FhSPVyrQK4bhcnBGkLmQLlIi9Xo47s0cG2I
 rKrVZXcbxGtrGBYEdTWKLqBmkJRYhMWTSAQSZH/GTMhDjE3kn/OX0kSVFoDwy4dI
 ud1NpMYzMcEvi9IlDx37GWTF2QKjfERz3aXzI/CeQKKSg8TtevQsNsj1lc9GUH7N
 Y/yIrmixgCJXWDevohdfeSu/nnD1DtbRLAiTKWpzwDQE5J5EiZrautl2avXVWkIs
 otLhBECGQAWWBRf8KpKf/z8NEgYhRrrJkNw1Gx//ypBGqOZM/HI=
 =T3nh
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
 "Driver fixes for the at-hdmac, pl330, TI and IDXD drivers:

   - AT HDMAC driver fixes for Flow Controller bitfield, peripheral ID
     handling and potential NULL dereference check

   - PL330 function rename to avoid conflicts

   - build warning fix for pm function in TI driver

   - IDXD driver fix for passing freed memory"

* tag 'dmaengine-fix-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: at_hdmac: Extend the Flow Controller bitfield to three bits
  dmaengine: at_hdmac: Repair bitfield macros for peripheral ID handling
  dmaengine: pl330: rename _start to prevent build error
  dmaengine: at_xdmac: fix potential Oops in at_xdmac_prep_interleaved()
  dmaengine: ti: k3-udma: annotate pm function with __maybe_unused
  dmaengine: idxd: Fix passing freed memory in idxd_cdev_open()
2023-05-28 19:59:08 -04:00
Greg Kroah-Hartman bca7a46336 1st set of IIO fixes for the 6.4 cycle.
Usual mixed bag of issues in new code for this cycle and old issues
 that have surfaced in the last few weeks.
 
 - adi,ad_sigma_delta
   * Ensure irq lazy disable handing is not used as it breaks completion
     detection.
 - adi,ad4130
   * Fix failure to remove clock provider.
 - adi,ad5758
   * Wrong CONFIG variable used to control driver build.
 - adi,ad7192
   * Fix repeated channel index by just expressing shorted channels
     as differential channel between a channel and itself.
 - adi,ad74413
   * Fix error handling for resistance input processing to not fail
     in case of success.
 - rohm,bu27034
   * Fix integration time in wrong units (should be seconds not usecs)
   * Ensure reset is actually written not detected as already set from
     regcache.
 - gts helper
   * Fix wrong parameter docs.
   * Fix integration time in wrong units (should be seconds not usecs)
 - fsl,imx8qxp-adc
   * Add missing vref-supply to binding doc (already used by driver)
 - fsl,imx93
   * Fix sign bug in read_raw() so that error check didn't work.
 - inv,icm42600
   * Fix reset of timestamp to work even if a particular sensor is off when
     the chip is first enabled.
 - kionix,kx022a
   * Fix irq get form fw node to not include the 0 value.
 - microchip,mcp4725
   * Fix return value from i2c_master_send() handling to nto assume 0 on
     success.
 - mediatek,mt6370
   * Fix incorrect scaling of a few currents on devices with particular
     vendor IDs.
 - fsl,mxs-lradc
   * Cleanup ordering issue fix.
 - renesas,rcar-adc bindings
   * Fix missing vendor prefix for adi,ad7476
 - st,st_accel
   * Fix handling when no ACPI _ONT method present.
 - st,stm32-adc
   * Handle no adc-diff-channel present case (all single ended)
   * Handle no adc-channels present case (all differential)
 - ti,palmas
   * Fix off by one bug that could allow out of bounds read if callers
     provided wrong value.
 - ti,tmag5273
   * Fix a runtime PM leak on measurement error
 - vishay,vcnl4035
   * Correctly mask chip ID so devices with different addresses
     don't fail the test.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmRqYH4RHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0FoiAsA/+LfGniNvgH0COL6sCGWYf0BYOr7tel36J
 ZcFsUcOP0cMVly67FjRBW0mbGNm2Ry0rjLCxmk/SKm1Hkm8tI2jgQ55LymrpsBXz
 umIuze6K0TYDQLUCCaav7GOwayQKnp28uXiqi60g+PeG/UlB/ax6P3b4y7/sUxF/
 8Z7CTVHvPjXs7WWaiOJzrhRRdqjv1WjnNCzOVatOFg2U50SQu8gCu8dOmYw22Ct2
 oGHUStGSqhkjlqYI/bslk+0gQ19FqH/X/V+RpyrlXfYDlW6kS5wtl2+6ms3i87oR
 EIT9mIwClX5L9gE5uKsoLfgPw4eHqBTlv3XHF3rfxNY4mQKQJCj2HHJodSdODxYa
 b42qnUomGJUJARdOVokN8Sy5majJdUnRR7GvnNr+z8d1Ml6Md35NJ9d+xhJGAJbN
 ZiWtZXzv7VNHCqvhQ8WxXbrTvnav//8s6+E9YmbjLN382NQT48EjfUt7ULc4GQOF
 CMFR1Pt0dT9l7EX6m7KovWBmB3jq/rVyYjeffeqJumm4Zyek/InNxmUrMaDYHHjn
 l7tjYQus5UBtphdc3mTIyVkD4WpcySP3JGKfQckWkewELso46hEmhOUYlGnDNflE
 +zfTSnf0jRGDeKpQq4pWL07dirxIMeRwlENfz3o6xRcdsUkqthGc81NMvt0p9Srq
 QK6MW7XTN/Y=
 =NhJU
 -----END PGP SIGNATURE-----

Merge tag 'iio-fixes-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus

Jonathan writes:

1st set of IIO fixes for the 6.4 cycle.

Usual mixed bag of issues in new code for this cycle and old issues
that have surfaced in the last few weeks.

- adi,ad_sigma_delta
  * Ensure irq lazy disable handing is not used as it breaks completion
    detection.
- adi,ad4130
  * Fix failure to remove clock provider.
- adi,ad5758
  * Wrong CONFIG variable used to control driver build.
- adi,ad7192
  * Fix repeated channel index by just expressing shorted channels
    as differential channel between a channel and itself.
- adi,ad74413
  * Fix error handling for resistance input processing to not fail
    in case of success.
- rohm,bu27034
  * Fix integration time in wrong units (should be seconds not usecs)
  * Ensure reset is actually written not detected as already set from
    regcache.
- gts helper
  * Fix wrong parameter docs.
  * Fix integration time in wrong units (should be seconds not usecs)
- fsl,imx8qxp-adc
  * Add missing vref-supply to binding doc (already used by driver)
- fsl,imx93
  * Fix sign bug in read_raw() so that error check didn't work.
- inv,icm42600
  * Fix reset of timestamp to work even if a particular sensor is off when
    the chip is first enabled.
- kionix,kx022a
  * Fix irq get form fw node to not include the 0 value.
- microchip,mcp4725
  * Fix return value from i2c_master_send() handling to nto assume 0 on
    success.
- mediatek,mt6370
  * Fix incorrect scaling of a few currents on devices with particular
    vendor IDs.
- fsl,mxs-lradc
  * Cleanup ordering issue fix.
- renesas,rcar-adc bindings
  * Fix missing vendor prefix for adi,ad7476
- st,st_accel
  * Fix handling when no ACPI _ONT method present.
- st,stm32-adc
  * Handle no adc-diff-channel present case (all single ended)
  * Handle no adc-channels present case (all differential)
- ti,palmas
  * Fix off by one bug that could allow out of bounds read if callers
    provided wrong value.
- ti,tmag5273
  * Fix a runtime PM leak on measurement error
- vishay,vcnl4035
  * Correctly mask chip ID so devices with different addresses
    don't fail the test.

* tag 'iio-fixes-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (23 commits)
  iio: imu: inv_icm42600: fix timestamp reset
  iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
  dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
  iio: dac: mcp4725: Fix i2c_master_send() return value handling
  iio: accel: kx022a fix irq getting
  iio: bu27034: Ensure reset is written
  iio: dac: build ad5758 driver when AD5758 is selected
  iio: addac: ad74413: fix resistance input processing
  iio: light: vcnl4035: fixed chip ID check
  dt-bindings: iio: imx8qxp-adc: add missing vref-supply
  iio: adc: stm32-adc: skip adc-channels setup if none is present
  iio: adc: stm32-adc: skip adc-diff-channels setup if none is present
  iio: adc: ad7192: Change "shorted" channels to differential
  iio: accel: st_accel: Fix invalid mount_matrix on devices without ACPI _ONT method
  iio: gts-helpers: fix integration time units
  iio: bu27034: Fix integration time
  iio: fix doc for iio_gts_find_sel_by_int_time
  iio: adc: palmas: fix off by one bugs
  iio: adc: mxs-lradc: fix the order of two cleanup operations
  iio: ad4130: Make sure clock provider gets removed
  ...
2023-05-28 20:03:33 +01:00
Akihiro Suda 36e4fc57fc efi: Bump stub image version for macOS HVF compatibility
The macOS hypervisor framework includes a host-side VMM called
VZLinuxBootLoader [1] which implements native support for booting the
Linux kernel inside a guest directly (instead of, e.g., via GRUB
installed inside the guest). On x86, it incorporates a BIOS style loader
that does not implement or expose EFI to the loaded kernel. However,
this loader appears to fail when the 'image minor version' field in the
kernel image's PE/COFF header (which is generally only used by EFI based
bootloaders) is set to any value other than 0x0. [2]

Commit e346bebbd3 ("efi: libstub: Always enable initrd command
line loader and bump version") incremented the EFI stub image minor
version to convey that all EFI stub kernels now implement support for
the initrd= command line option, and do so in a way where it can load
initrd images from any filesystem known to the EFI firmware (as opposed
to prior implementations that could only load initrds from the same
volume that the kernel image was loaded from).

Unfortunately, bumping the version to v1.1 triggers this issue in
VZLinuxBootLoader, breaking the boot on x86. So let's keep the image
minor version at 0x0, and bump the image major version instead.

While at it, convert this field to a bit field, so that individual
features are discoverable from it, as suggested by Linus. So let's bump
the major version to v3, and document the initrd= command line loading
feature as being represented by bit 1 in the mask.

Note that, due to the prior interpretation as a monotonically increasing
version field, loaders are still permitted to assume that the LoadFile2
initrd loading feature is supported for any major version value >= 1,
even if bit 0 is not set.

[1] https://developer.apple.com/documentation/virtualization/vzlinuxbootloader
[2] https://lore.kernel.org/linux-efi/CAG8fp8Teu4G9JuenQrqGndFt2Gy+V4YgJ=hN1xX7AD940YKf3A@mail.gmail.com/

Fixes: e346bebbd3 ("efi: libstub: Always enable initrd command ...")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217485
Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
[ardb: rewrite comment and commit log]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-05-28 20:45:46 +02:00
Theodore Ts'o b3e6bcb945 ext4: add EA_INODE checking to ext4_iget()
Add a new flag, EXT4_IGET_EA_INODE which indicates whether the inode
is expected to have the EA_INODE flag or not.  If the flag is not
set/clear as expected, then fail the iget() operation and mark the
file system as corrupted.

This commit also makes the ext4_iget() always perform the
is_bad_inode() check even when the inode is already inode cache.  This
allows us to remove the is_bad_inode() check from the callers of
ext4_iget() in the ea_inode code.

Reported-by: syzbot+cbb68193bdb95af4340a@syzkaller.appspotmail.com
Reported-by: syzbot+62120febbd1ee3c3c860@syzkaller.appspotmail.com
Reported-by: syzbot+edce54daffee36421b4c@syzkaller.appspotmail.com
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230524034951.779531-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-28 14:18:03 -04:00
Linus Torvalds 7877cb91f1 Linux 6.4-rc4 2023-05-28 07:49:00 -04:00
Linus Torvalds f8b2507c26 A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is caused
    by the CPUID evaluation trainwreck of X86. That recomputes the value
    for each CPU, so the last CPU "wins". That can cause completely bogus
    sibling values.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBxMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodfbEACp+oRBD2zIcmf8kpakMxIbH2CkDAxl
 AgqGHYVKAvsqKPQZhYmOFEsk+0isMzssuVaub9gMeLRmpMQxUqv7K20pluvWmuQm
 h7MTYEAvCZpBU2BGB1mixp57tQ/gpLSB4uaJU2Q5hh9po6uImvalIdqThf7ArxgA
 mm3BhtbFObLGwRaV2981zdsEs8Cjs9RusnAOrX5LbBBb6ibcnrqWy7DAhySMA0zr
 7RfaGyG/T9z6/BgX54FBQ7aDNU/RkZ8YYQoMj0kAFXdM3V+sa3oPfMRQpACPHKm4
 kwskJfWVdMq06kgOD7N9+WrOfBAgIsmzasT6VhTuGAGNo4CiEOthLqXDHwf4NLhN
 hj0XReHVyBiQ1KuI2lGNJ71c30KRkh7SCdGNiEFtnlpteXnS4bRnzWtfMf/+2SCC
 WOziRRYRzuAvvoTwoQS1BYJrDAB9f5jOX8FTYbC12rEk/RPOB6t4iFNLUzid7u1H
 yPYphoftXlQlli7EajVc5pSZCftMHRCzWernptiPFU5tMvyagYppcg25JY54rquC
 CTJQqoa/HPAEZhBjgZElwO9QBgu+VyJUq3R/Z+LbkQuz5z+PVDfqeDuDWhaqbcz1
 WGtdGdNo4bqVhwqgwyHjgKCFe05mF6tPuotnYZE/VSkXUCvLkjGEVg8HQ7WiPhJd
 IYDqxkIhX5h1Mg==
 =l43p
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu fix from Thomas Gleixner:
 "A single fix for x86:

   - Prevent a bogus setting for the number of HT siblings, which is
     caused by the CPUID evaluation trainwreck of X86. That recomputes
     the value for each CPU, so the last CPU "wins". That can cause
     completely bogus sibling values"

* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
2023-05-28 07:42:05 -04:00