Commit Graph

873723 Commits

Author SHA1 Message Date
Rafael J. Wysocki 8362f4dbb9 intel_idle: Relocate definitions of cpuidle callbacks
commit 30a996fbb3 upstream.

Move the definitions of intel_idle() and intel_idle_s2idle() before
the definitions of cpuidle_state structures referring to them to
avoid having to use additional declarations of them (and drop those
declarations).

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:23 +08:00
Rafael J. Wysocki c26b45ea7d intel_idle: Clean up definitions of cpuidle callbacks
commit bc721c1e45 upstream.

Add proper kerneldoc descriptions to intel_idle() and
intel_idle_s2idle(), annotate the latter with __cpuidle and
reorder the declarations of local variables in both of them to
reflect the mwait_idle_with_hints() arguments order.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:23 +08:00
Rafael J. Wysocki 525c095aab intel_idle: Simplify LAPIC timer reliability checks
commit 40ab82e08d upstream.

The lapic_timer_always_reliable variable really takes only two values
and some arithmetic in intel_idle() related to comparing it with the
target C-state's MWAIT hint value is unnecessary.

Simplify the code by replacing lapic_timer_always_reliable with
a bool variable lapic_timer_always_reliable and dropping the
LAPIC_TIMER_ALWAYS_RELIABLE symbol along with the excess
computations in intel_idle().

While at it, add a comment explaining the branch taken in intel_idle()
if the LAPIC timer is only reliable in C1 and modify the related debug
message in intel_idle_init() accordingly (the modification of this
message in the only expected functional impact of the change made
here).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:23 +08:00
Rafael J. Wysocki 29626393d1 intel_idle: Introduce 'states_off' module parameter
commit 4dcb78ee57 upstream.

In certain system configurations it may not be desirable to use some
C-states assumed to be available by intel_idle and the driver needs
to be prevented from using them even before the cpuidle sysfs
interface becomes accessible to user space.  Currently, the only way
to achieve that is by setting the 'max_cstate' module parameter to a
value lower than the index of the shallowest of the C-states in
question, but that may be overly intrusive, because it effectively
makes all of the idle states deeper than the 'max_cstate' one go
away (and the C-state to avoid may be in the middle of the range
normally regarded as available).

To allow that limitation to be overcome, introduce a new module
parameter called 'states_off' to represent a list of idle states to
be disabled by default in the form of a bitmask and update the
documentation to cover it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:22 +08:00
Rafael J. Wysocki b84c30be8f intel_idle: Introduce 'use_acpi' module parameter
commit 3a5be9b8f4 upstream.

For diagnostics, it is generally useful to be able to make intel_idle
take the system's ACPI tables into consideration even if that is not
required for the processor model in there, so introduce a new module
parameter, 'use_acpi', to make that happen and update the documentation
to cover it.

While at it, fix the 'no_acpi' module parameter name in the
documentation.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:22 +08:00
Rafael J. Wysocki 25bca34f26 intel_idle: Clean up irtl_2_usec()
commit 86e9466ae6 upstream.

Move the irtl_ns_units[] definition into irtl_2_usec() which is the
only user of it, use div_u64() for the division in there (as the
divisor is small enough) and use the NSEC_PER_USEC symbol for the
divisor.  Also convert the irtl_2_usec() comment to a proper
kerneldo one.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:21 +08:00
Rafael J. Wysocki 7d0d3657be intel_idle: Move 3 functions closer to their callers
commit 1aefbd7aeb upstream.

Move intel_idle_verify_cstate(), auto_demotion_disable() and
c1e_promotion_disable() closer to their callers.

While at it, annotate intel_idle_verify_cstate() with __init,
as it is only used during the initialization of the driver.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:21 +08:00
Rafael J. Wysocki 3796f4ee32 intel_idle: Annotate initialization code and data structures
commit 095928ae48 upstream.

Annotate the functions that are only used at the initialization time
with __init and the data structures used by them with __initdata or
__initconst.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:21 +08:00
Rafael J. Wysocki af099e57e6 intel_idle: Rearrange intel_idle_cpuidle_driver_init()
commit 3d3a1ae9b4 upstream.

Notice that intel_idle_state_table_update() only needs to be called
if icpu is not NULL, so fold it into intel_idle_init_cstates_icpu(),
and pass a pointer to the driver object to
intel_idle_cpuidle_driver_init() as an argument instead of
referencing it locally in there.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:20 +08:00
Rafael J. Wysocki c82844ca5f intel_idle: Fold intel_idle_probe() into intel_idle_init()
commit a6c86e3362 upstream.

There is no particular reason why intel_idle_probe() needs to be
a separate function and folding it into intel_idle_init() causes
the code to be somewhat easier to follow, so do just that.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:20 +08:00
Rafael J. Wysocki cd6efdfb7c intel_idle: Eliminate __setup_broadcast_timer()
commit cbd2c4c25d upstream.

The __setup_broadcast_timer() static function is only called in one
place and "true" is passed to it as the argument in there, so
effectively it is a wrapper arround tick_broadcast_enable().

To simplify the code, call tick_broadcast_enable() directly instead
of __setup_broadcast_timer() and drop the latter.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Zhuo <sagazchen@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:51:20 +08:00
Menglong Dong ed011e007b net: tcp: add sysctl_tcp_wnd_shrink
Add the 'sysctl_tcp_wnd_shrink' to control the enable/disable of TCP
window shrink. By default, it is disabled.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
2024-06-11 20:51:19 +08:00
mengensun e1bf1991a5 net/tcp: switch to GSO being always on
when open gso, tcp Write queues have less overhead, and make some app
run faster.

test of redis-benchmark like follow:

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
2024-06-11 20:51:19 +08:00
Menglong Dong f0d423d51c net: tcp: raise zero-window probe without check wnd_end
In the origin logic, zero-window probe can not only be raised on
0 window, but also in other case, such as MTU probe fails.

Therefore, we need modify tcp_probe0_needed() to make it compatible
with origin logic.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
2024-06-11 20:51:19 +08:00
Linus Torvalds 980a335360 mm: make wait_on_page_writeback() wait for multiple pending writebacks
upstream commit: c2407cf7d2

Ever since commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").

But syzbot just reported another one, even with that commit in place.

And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use.  It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.

Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.

This gives us this fairly simple race condition:

  CPU1 = end previous writeback
  CPU2 = start new writeback under page lock
  CPU3 = write_cache_pages()

  CPU1          CPU2            CPU3
  ----          ----            ----

  end_page_writeback()
    test_clear_page_writeback(page)
    ... delayed...

                lock_page();
                set_page_writeback()
                unlock_page()

                                lock_page()
                                wait_on_page_writeback();

    wake_up_page(page, PG_writeback);
    .. wakes up CPU3 ..

                                BUG_ON(PageWriteback(page));

where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.

The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.

The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").

However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.

IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions.  Painful.

Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior.  This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.

Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:18 +08:00
Paolo Bonzini e31375a771 KVM: Do not leak memory for duplicate debugfs directories
commit 85cd39af14 upstream.

KVM creates a debugfs directory for each VM in order to store statistics
about the virtual machine.  The directory name is built from the process
pid and a VM fd.  While generally unique, it is possible to keep a
file descriptor alive in a way that causes duplicate directories, which
manifests as these messages:

  [  471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!

Even though this should not happen in practice, it is more or less
expected in the case of KVM for testcases that call KVM_CREATE_VM and
close the resulting file descriptor repeatedly and in parallel.

When this happens, debugfs_create_dir() returns an error but
kvm_create_vm_debugfs() goes on to allocate stat data structs which are
later leaked.  The slow memory leak was spotted by syzkaller, where it
caused OOM reports.

Since the issue only affects debugfs, do a lookup before calling
debugfs_create_dir, so that the message is downgraded and rate-limited.
While at it, ensure kvm->debugfs_dentry is NULL rather than an error
if it is not created.  This fixes kvm_destroy_vm_debugfs, which was not
checking IS_ERR_OR_NULL correctly.

Cc: stable@vger.kernel.org
Fixes: 536a6f88c4 ("KVM: Create debugfs dir and stat files for each VM")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:18 +08:00
Andrew Sy Kim f2f46e7af4 ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1
[upstream commit 35dfb01314]

When expire_nodest_conn=1 and a destination is deleted, IPVS does not
expire the existing connections until the next matching incoming packet.
If there are many connection entries from a single client to a single
destination, many packets may get dropped before all the connections are
expired (more likely with lots of UDP traffic). An optimization can be
made where upon deletion of a destination, IPVS queues up delayed work
to immediately expire any connections with a deleted destination. This
ensures any reused source ports from a client (within the IPVS timeouts)
are scheduled to new real servers instead of silently dropped.

Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-11 20:51:18 +08:00
Menglong Dong 0fe440f03b net: tcp: handle window shrink properly
Window shrink is not allowed and also not handled for now, but it's
needed in some case.

In the origin logic, 0 probe is triggered only when there is no any
data in the retrans queue and the receive window can't hold the data
of the 1th packet in the send queue.

Now, let's change it and trigger the 0 probe in such cases:

- if the retrans queue has data and the 1th packet in it is not within
  the receive window
- no data in the retrans queue and the 1th packet in the send queue is
  out of the end of the receive window

Signed-off-by: Menglong Dong <imagedong@tencent.com>
2024-06-11 20:51:17 +08:00
Menglong Dong bb00f4ca4c net: tcp: send zero-window when no memory
For now, skb will be dropped when no memory, which makes client keep
retrans util timeout and it's not friendly to the users.

Therefore, now we force to receive one packet on current socket when
the protocol memory is out of the limitation. Then, this socket will
stay in 'no mem' status, util protocol memory is available.

When a socket is in 'no mem' status, it's receive window will become
0, which means window shrink happens. And the sender need to handle
such window shrink properly, which is done in the next commit.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
2024-06-11 20:51:17 +08:00
caelli 1ea94d5505 driver: update e1000e to 3.8.4
E1000e driver is update to 3.8.4 on x86, arm64
still use 3.2.6.

Signed-off-by: caelli <caelli@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:17 +08:00
Liu Chun 73b70ea3f0 kdump: the capture kernel can't use dma memory
In arm64 system, when the memory that less than 4G is a little,
the capture kernel cannot use dma memory. Therefore, it is necessary
to enable CONFIG_EXEC_FILE and fixes the issue of reserved memory
to pass low memory to the kdump kernel.

Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
Signed-off-by: Liu Chun <kaicliu@tencent.com>
2024-06-11 20:51:11 +08:00
Liu Chun d151d105a1 drm: Fixed system hang caused by memory failure
When the dma memory is insufficient, the wrong release of
resources will cause the system to hang.

[   35.975823] [TTM] Initializing pool allocator
[   35.980166] [TTM] Initializing DMA pool allocator
[   35.984864] [drm:hibmc_mm_init [hibmc_drm]] *ERROR* Error initializing VRAM MM; -12
[   35.992517] ------------[ cut here ]------------
[   35.997154] WARNING: CPU: 0 PID: 116 at drivers/gpu/drm/drm_modeset_lock.c:266 drm_modeset_lock+0xd8/0xf8 [drm]
[   36.007192] Modules linked in: hibmc_drm(+) drm_vram_helper ttm drm_kms_helper drm autofs4 overlay squashfs
[   36.016890] CPU: 0 PID: 116 Comm: kworker/0:2 Not tainted 5.4.119-0.20230227git9d7d3558a64d.19 #1
[   36.025719] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
[   36.033173] Workqueue: events work_for_cpu_fn
[   36.037510] pstate: a0800009 (NzCv daif -PAN +UAO)
[   36.042297] pc : drm_modeset_lock+0xd8/0xf8 [drm]
[   36.046995] lr : drm_modeset_lock+0x44/0xf8 [drm]
[   36.051676] sp : ffff80005462fc30
[   36.054974] x29: ffff80005462fc30 x28: 0000000000000000
[   36.060260] x27: ffff2057ebe20000 x26: 0000000000000000
[   36.065546] x25: 0000000000000000 x24: ffff80004cf6f8e8
[   36.070833] x23: 0000000000000000 x22: ffff2057f4739800
[   36.076119] x21: ffff800049803908 x20: ffff2057f4739998
[   36.081405] x19: ffff80005462fcc0 x18: 0000000000000010
[   36.086690] x17: 0000000000000000 x16: ffff800048b61b88
[   36.091976] x15: ffffffffffffffff x14: 204d41525620676e
[   36.097261] x13: 697a696c61697469 x12: 6e6920726f727245
[   36.102547] x11: 202a524f5252452a x10: 205d5d6d72645f63
[   36.107832] x9 : ffff800048b61bcc x8 : ffff800048703a60
[   36.113118] x7 : 065448]  work_for_cpu_fn+0x20/0x30
[   36.169181]  process_one_work+0x1f8/0x488
[   36.173173]  worker_thread+0x248/0x528
[   36.176906]  kthread+0x124/0x128
[   36.180121]  ret_from_fork+0x10/0x18
[   36.183679] ---[ end trace aae0476f91651f5d ]---
[   36.188284] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[   36.197028] Mem abort info:
[   36.199809]   ESR = 0x96000005
[   36.202851]   EC = 0x25: DABT (current EL), IL = 32 bits
[   36.208137]   SET = 0, FnV = 0
[   36.211176]   EA = 0, S1PTW = 0
[   36.214303] Data abort info:
[   36.217172]   ISV = 0, ISS = 0x00000005
[   36.220990]   CM = 0, WnR = 0
[   36.223946] user pgtable: 64k pages, 48-bit VAs, pgdp=00002057df5a0600
[   36.230444] [0000000000000018] pgd=0000000000000000, pud=0000000000000000
[   36.237200] Internal error: Oops: 96000005 [#1] SMP
[   36.242056] Modules linked in: hibmc_drm(+) drm_vram_helper ttm drm_kms_helper drm autofs4 overlay squashfs
[   36.251751] CPU: 0 PID: 116 Comm: kworker/0:2 Tainted: G        W         5.4.119-0.20230227git9d7d3558a64d.19 #1
[   36.261963] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
[   36.269411] Workqueue: events work_for_cpu_fn
[   36.273747] pstate: a0800009 (NzCv daif -PAN +UAO)
[   36.278518] pc : ww_mutex_lock+0x2c/0x70
[   36.282439] lr : drm_modeset_lock+0x44/0xf8 [drm]
[   36.287120] sp : ffff80005462fc20
[   36.290418] x29: ffff80005462fc20 x28: 0000000000000000
[   36.295704] x27: ffff2057ebe20000 x26: 0000000000000000
[   36.300989] x25: 0000000000000000 x24: ffff80004cf6f8e8
[   36.306274] x23: 0000000000000000 x22: ffff2057f4739800
[   36.311560] x21: ffff2057f4739af8 x20: 0000000000000018
[   36.316846] x19: ffff80005462fcc0 x18: 0000000000000010
[   36.322131] x17: 0000000000000000 x16: ffff800048b61b88
[   36.327418] x15: ffffffffffffffff x14: 204d41525620676e
[   36.332703] x13: 697a696c61697469 x12: 6e6920726f727245
[   36.337989] x11: 202a524f5252452a x10: 205d5d6d72645f63
[   36.343274] x9 : ffff800008ce4594 x8 : ffff800048703a60
[   36.348560] x7 : 0000000000000469 x6 : ffff80004998e5e6
[   36.353845] x5 : 0000000000000000 x4 : ffff80005462fcc0
[   36.359131] x3 : 0000000000000018 x2 : ffff2057ebe30000
[   36.364417] x1 : 0000000000000000 x0 : 0000000000000018
[   36.369703] Call trace:
[   36.372138]  ww_mutex_lock+0x2c/0x70
[   36.375712]  drm_modeset_lock+0x44/0xf8 [drm]
[   36.380064]  drm_modeset_lock_all_ctx+0x68/0xf8 [drm]
[   36.385100]  drm_atomic_helper_shutdown+0x54/0xd0 [drm_kms_helper]
[   36.391251]  hibmc_unload+0x2c/0xa8 [hibmc_drm]
[   36.395762]  hibmc_pci_probe+0x318/0x430 [hibmc_drm]
[   36.400703]  local_pci_probe+0x44/0xa8
[   36.404435]  work_for_cpu_fn+0x20/0x30
[   36.408167]  process_one_work+0x1f8/0x488
[   36.412158]  worker_thread+0x248/0x528
[   36.415890]  kthread+0x124/0x128
[   36.419103]  ret_from_fork+0x10/0x18
[   36.422662] Code: d503201f d503201f d2800001 aa0103e5 (c8e57c02)
[   36.428727] ---[ end trace aae0476f91651f5e ]---
[   37.169300] systemd-udevd[307]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.

Signed-off-by: Chun Liu <kaicliu@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:02 +08:00
Kairui Song 1b7d9fa70f arm64: kexec_file: add crash dump support
Upstream: 3751e728ce
Link: 40e94ab32e

commit 3751e728ce
Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date:   Mon Dec 16 11:12:47 2019 +0900

    arm64: kexec_file: add crash dump support

    Enabling crash dump (kdump) includes
    * prepare contents of ELF header of a core dump file, /proc/vmcore,
      using crash_prepare_elf64_headers(), and
    * add two device tree properties, "linux,usable-memory-range" and
      "linux,elfcorehdr", which represent respectively a memory range
      to be used by crash dump kernel and the header's location

    Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Reviewed-by: James Morse <james.morse@arm.com>
    Tested-and-reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
    Signed-off-by: Will Deacon <will@kernel.org>

Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:02 +08:00
Kairui Song b7e9b568c2 libfdt: include fdt_addresses.c
Upstream: c273a2bd8a
Link: 887436bdb7

commit c273a2bd8a
Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date:   Mon Dec 9 12:03:44 2019 +0900

    libfdt: include fdt_addresses.c

    In the implementation of kexec_file_loaded-based kdump for arm64,
    fdt_appendprop_addrrange() will be needed.

    So include fdt_addresses.c in making libfdt.

    Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
    Cc: Rob Herring <robh+dt@kernel.org>
    Cc: Frank Rowand <frowand.list@gmail.com>
    Signed-off-by: Will Deacon <will@kernel.org>

Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:02 +08:00
Kairui Song 6fb78c4cc6 arm64: kdump: remove dependency on arm64_dma32_phys_limit
From: Yi Li <adamliyi@msn.com>
Link: 696027f109

The patch b2da6ad294
(arm64: kdump: reimplement crashkernel=X) depends on commit 1a8e1cef76
("arm64: use both ZONE_DMA and ZONE_DMA32").

Commit 1a8e1cef76 is not ported to 5.4 kernel. So use arm64_dma_phys_limit.

Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:01 +08:00
Kairui Song fd02a1b5bc kdump: update Documentation about crashkernel
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 023deaec32

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone or DMA32 zone if CONFIG_ZONE_DMA
is disabled, and fall back to high allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).

"crashkernel=Y,low" can be used to allocate specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:01 +08:00
Kairui Song 3fd41ff677 arm64: kdump: add memory for devices by DT property linux,usable-memory-range
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 2012a3b392

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
	linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:01 +08:00
Kairui Song d001dccf2b x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: c8013ee6cd

We make the functions reserve_crashkernel[_low]() as generic for
x86 and arm64. Since reserve_crashkernel[_low]() implementations
are quite similar on other architectures as well, we can have more
users of this later.

So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
select this by X86 and ARM64.

Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:00 +08:00
Kairui Song bd482067c3 arm64: kdump: reimplement crashkernel=X
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 70e586365f

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
3. Since commit 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
if the memory reserved for crash dump kernel falled in ZONE_DMA32,
the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back
to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically
(or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
"crashkernel=Y,low" can be used to allocate specified size low memory.

Another minor change, there may be two regions reserved for crash
dump kernel, in order to distinct from the high region and make no
effect to the use of existing kexec-tools, rename the low region as
"Crash kernel (low)".

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:00 +08:00
Kairui Song f30355b620 arm64: kdump: introduce some macroes for crash kernel reservation
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 667118f8c1

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macroes instead.

Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:00 +08:00
Kairui Song 21ff8ff8f3 x86/elf: Move vmcore_elf_check_arch_cross to arch/x86/include/asm/elf.h
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: b332ab8970

Move macro vmcore_elf_check_arch_cross from arch/x86/include/asm/kexec.h
to arch/x86/include/asm/elf.h to fix the following compiling warning:

In file included from arch/x86/kernel/setup.c:39:0:
./arch/x86/include/asm/kexec.h:77:0: warning: "vmcore_elf_check_arch_cross" redefined
 # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)

In file included from arch/x86/kernel/setup.c:9:0:
./include/linux/crash_dump.h:39:0: note: this is the location of the previous definition
 #define vmcore_elf_check_arch_cross(x) 0

The root cause is that vmcore_elf_check_arch_cross under CONFIG_CRASH_CORE
depend on CONFIG_KEXEC_CORE. Commit 532b66d2279d ("x86: kdump: move
reserve_crashkernel[_low]() into crash_core.c") triggered the issue.

Suggested by Mike, simply move vmcore_elf_check_arch_cross from
arch/x86/include/asm/kexec.h to arch/x86/include/asm/elf.h to fix
the warning.

Fixes: 532b66d2279d ("x86: kdump: move reserve_crashkernel[_low]() into crash_core.c")
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:59 +08:00
Kairui Song 3177fa46ec x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 8cb8686864

Make the functions reserve_crashkernel[_low]() as generic.
Arm64 will use these to reimplement crashkernel=X.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:59 +08:00
Kairui Song e5eac006f6 x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 8ec4a816f2

We will make the functions reserve_crashkernel() as generic, the
xen_pv_domain() check in reserve_crashkernel() is relevant only to
x86, the same as insert_resource() in reserve_crashkernel[_low]().
So move xen_pv_domain() check and insert_resource() to setup_arch()
to keep them in x86.

Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:59 +08:00
Kairui Song 2fbb10e99b x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: a2e0b4351d

To make the functions reserve_crashkernel() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:58 +08:00
Kairui Song cc6803d7d8 x86: kdump: make the lower bound of crash kernel reservation consistent
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 8882ba540e

The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:58 +08:00
Kairui Song b3ab0276fe x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
From: Chen Zhou <chenzhou10@huawei.com>
Link: https://lkml.org/lkml/2021/1/30/53
Link: 873384fe79

Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the
alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but
function reserve_crashkernel() also used 1M alignment. So just
replace hard-coded alignment 1M with macro CRASH_ALIGN.

Suggested-by: Dave Young <dyoung@redhat.com>
Suggested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:57 +08:00
Kairui Song 47c1cc9217 arm64: remove the hard coded crashkernel address limit
This conflicts with upstream's kdump high reservation support, and we
already have CONFIG_ZONE_DMA32 set, so we have:

ARCH_LOW_ADDRESS_LIMIT = min(offset + (1ULL << 32), memblock_end_of_DRAM());

Which limits the address below 4G, so this hard code limit is redundant.

Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:57 +08:00
Alex Shi 6bc9581ddd Revert "gup: document and work around "COW can break either way" issue"
This reverts commit 918f50807eccd63d482ef4cf778b1d2b416770a9.
the commit force COW to write model, which force COW breaking, and cause
page usage increase a lot. On upstream, commit 376a34efa ("mm/gup:
refactor and de-duplicate gup_fast() code") give another way to fix fork
secuirty issue of COW, and then revert the buggy commit by commit a308c71bf1
("mm/gup: Remove enfornced COW mechanism")

Signed-off-by: Alex Shi <alexsshi@tencent.com>
2024-06-11 20:50:57 +08:00
Peter Xu 2453865ed4 mm/ksm: Remove reuse_ksm_page()
Remove the function as the last reference has gone away with the do_wp_page()
changes.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1a0cf26323)
Signed-off-by: Alex Shi <alexsshi@tencent.com>
2024-06-11 20:50:56 +08:00
Linus Torvalds 0fb4d8fd75 mm: do_wp_page() simplification
commit 09854ba94c upstrem
How about we just make sure we're the only possible valid user fo the
page before we bother to reuse it?

Simplify, simplify, simplify.

And get rid of the nasty serialization on the page lock at the same time.

[peterx: add subject prefix]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 09854ba94c)
Signed-off-by: Alex Shi <alexsshi@tencent.com>

Conflicts:
	mm/memory.c
2024-06-11 20:50:56 +08:00
Yuehong Wu 85ba10e6ef config: enable BFQ io scheduler
Enable CONFIG_IOSCHED_BFQ,CONFIG_BFQ_GROUP_IOSCHED for ARM to
support bfq io-scheduler.

Signed-off-by: Yuehong Wu <yuehongwu@tencent.com>
Signed-off-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:50:53 +08:00
Ni Xun 3c048b0f89 config: change CONFIG_CONFIGFS_FS to Y for default conf
CONFIG_CONFIGFS_FS from M to Y for arm default config

Signed-off-by: Ni Xun <richardni@tencent.com>
2024-06-11 20:50:40 +08:00
KP Singh 5e0977fd08 security: Fix hook iteration for secid_to_secctx
[upstream commit 0550cfe8c2]

secid_to_secctx is not stackable, and since the BPF LSM registers this
hook by default, the call_int_hook logic is not suitable which
"bails-on-fail" and casues issues when other LSMs register this hook and
eventually breaks Audit.

In order to fix this, directly iterate over the security hooks instead
of using call_int_hook as suggested in:

https: //lore.kernel.org/bpf/9d0eb6c6-803a-ff3a-5603-9ad6d9edfc00@schaufler-ca.com/#t

Fixes: 98e828a065 ("security: Refactor declaration of LSM hooks")
Fixes: 625236ba38 ("security: Fix the default value of secid_to_secctx hook")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200520125616.193765-1-kpsingh@chromium.org
Signed-off-by: Menglong Dong <imagedong@tencent.com>
2024-06-11 20:50:15 +08:00
soonflywang f1f1da34d4 arm64: fix NEON/VFP reentrant in fast_copy_page
Add fixup in fast_copy_page, this feature is disabled by default,
set vm.fast_copy_page_enabled to enable it.

Signed-off-by: soonflywang <soonflywang@tencent.com>
Signed-off-by: caelli <caelli@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:50:14 +08:00
soonflywang ccb2a062d5 arm64: implemented a fast copy_page version while NEON/VFP is met
When running on Arm server, usually there is NEON/VFP extension on
Arm server CPU, this patch levearges SIMD instructions to speed up
the efficiency of current copy_page().

Signed-off-by: soonflywang <soonflywang@tencent.com>
Signed-off-by: Chengdong Li <chengdongli@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:50:14 +08:00
Anders Roxell c7ff1ae2e7 security: Fix the default value of secid_to_secctx hook
Upstream commit 625236ba38

security_secid_to_secctx is called by the bpf_lsm hook and a successful
return value (i.e 0) implies that the parameter will be consumed by the
LSM framework. The current behaviour return success when the pointer
isn't initialized when CONFIG_BPF_LSM is enabled, with the default
return from kernel/bpf/bpf_lsm.c.

This is the internal error:

[ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
[ 1229.374977][ T2659] ------------[ cut here ]------------
[ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
[ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1229.380348][ T2659] Modules linked in:
[ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G    B   W         5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty #13
[ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
[ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
[ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
[ 1229.392225][ T2659] sp : ffff000064247450
[ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
[ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
[ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
[ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
[ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
[ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
[ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
[ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
[ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
[ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
[ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
[ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
[ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
[ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
[ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
[ 1229.422526][ T2659] Call trace:
[ 1229.423631][ T2659]  usercopy_abort+0xc8/0xcc
[ 1229.425091][ T2659]  __check_object_size+0xdc/0x7d4
[ 1229.426729][ T2659]  put_cmsg+0xa30/0xa90
[ 1229.428132][ T2659]  unix_dgram_recvmsg+0x80c/0x930
[ 1229.429731][ T2659]  sock_recvmsg+0x9c/0xc0
[ 1229.431123][ T2659]  ____sys_recvmsg+0x1cc/0x5f8
[ 1229.432663][ T2659]  ___sys_recvmsg+0x100/0x160
[ 1229.434151][ T2659]  __sys_recvmsg+0x110/0x1a8
[ 1229.435623][ T2659]  __arm64_sys_recvmsg+0x58/0x70
[ 1229.437218][ T2659]  el0_svc_common.constprop.1+0x29c/0x340
[ 1229.438994][ T2659]  do_el0_svc+0xe8/0x108
[ 1229.440587][ T2659]  el0_svc+0x74/0x88
[ 1229.441917][ T2659]  el0_sync_handler+0xe4/0x8b4
[ 1229.443464][ T2659]  el0_sync+0x17c/0x180
[ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
[ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
[ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
[ 1229.450692][ T2659] Kernel Offset: disabled
[ 1229.452061][ T2659] CPU features: 0x240002,20002004
[ 1229.453647][ T2659] Memory Limit: none
[ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---

Rework the so the default return value is -EOPNOTSUPP.

There are likely other callbacks such as security_inode_getsecctx() that
may have the same problem, and that someone that understand the code
better needs to audit them.

Thank you Arnd for helping me figure out what went wrong.

Fixes: 98e828a065 ("security: Refactor declaration of LSM hooks")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Chun Liu <kaicliu@tencent.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/bpf/20200512174607.9630-1-anders.roxell@linaro.org
2024-06-11 20:49:57 +08:00
Xinghui Li 5c899e5403 firmware: fix one UAF issue
There could be the use after free issue in dmi_sysfs_register_handle.
During handling specializations process, the entry->child could be
free if the error occurs. However, it will be kobject_put after free.
So, we set the entry->child to NULL to avoid above case.

Reported-by: loydlv <loydlv@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 20:49:56 +08:00
Xinghui Li 84daaf3511 media:cec:fix double free and uaf issue when cancel data during noblocking
data could be free when it is not completed during transmit if
the opt is nonblocking.In this case,the regular free could lead
to double-free.So, add the return value '-EPERM' to mark the
above case.

Reported-by: loydlv <loydlv@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
2024-06-11 20:49:56 +08:00
Lv Yunlong ca589c18a1 gpu/xen: Fix a use after free in xen_drm_drv_init
commit 52762efa2b upstream.

In function displback_changed, has the call chain
displback_connect(front_info)->xen_drm_drv_init(front_info).
We can see that drm_info is assigned to front_info->drm_info
and drm_info is freed in fail branch in xen_drm_drv_init().

Later displback_disconnect(front_info) is called and it calls
xen_drm_drv_fini(front_info) cause a use after free by
drm_info = front_info->drm_info statement.

My patch has done two things. First fixes the fail label which
drm_info = kzalloc() failed and still free the drm_info.
Second sets front_info->drm_info to NULL to avoid uaf.

Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323014656.10068-1-lyl2019@mail.ustc.edu.cn
Signed-off-by: Xinghui Li <korantli@tencent.com>
Reviewed-by: Robinlai <robinlai@tencent.com>
2024-06-11 20:49:56 +08:00
Vinicius Costa Gomes 627ec74d6a igc: Fix use-after-free error during reset
upstream commit: 56ea7ed103

Cleans the next descriptor to watch (next_to_watch) when cleaning the
TX ring.

Failure to do so can cause invalid memory accesses. If igc_poll() runs
while the controller is being reset this can lead to the driver try to
free a skb that was already freed.

Log message:

 [  101.525242] refcount_t: underflow; use-after-free.
 [  101.525251] WARNING: CPU: 1 PID: 646 at lib/refcount.c:28 refcount_warn_saturate+0xab/0xf0
 [  101.525259] Modules linked in: sch_etf(E) sch_mqprio(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E)
 x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) binfmt_misc(E) kvm_intel(E) kvm(E) irqbypass(E) crc32_pclmul(E)
 ghash_clmulni_intel(E) aesni_intel(E) mei_wdt(E) libaes(E) crypto_simd(E) cryptd(E) glue_helper(E) snd_hda_codec_hdmi(E)
 rapl(E) intel_cstate(E) snd_hda_intel(E) snd_intel_dspcfg(E) sg(E) soundwire_intel(E) intel_uncore(E) at24(E)
 soundwire_generic_allocation(E) iTCO_wdt(E) soundwire_cadence(E) intel_pmc_bxt(E) serio_raw(E) snd_hda_codec(E)
 iTCO_vendor_support(E) watchdog(E) snd_hda_core(E) snd_hwdep(E) snd_soc_core(E) snd_compress(E) snd_pcsp(E)
 soundwire_bus(E) snd_pcm(E) evdev(E) snd_timer(E) mei_me(E) snd(E) soundcore(E) mei(E) configfs(E) ip_tables(E) x_tables(E)
 autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E)
 i915(E) ahci(E) libahci(E) ehci_pci(E) igb(E) xhci_pci(E) ehci_hcd(E)
 [  101.525303]  drm_kms_helper(E) dca(E) xhci_hcd(E) libata(E) crct10dif_pclmul(E) cec(E) crct10dif_common(E) tsn(E) igc(E)
 e1000e(E) ptp(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_algo_bit(E) i2c_smbus(E) scsi_mod(E) lpc_ich(E) pps_core(E)
 usbcore(E) drm(E) button(E) video(E)
 [  101.525318] CPU: 1 PID: 646 Comm: irq/37-enp7s0-T Tainted: G            E     5.10.30-rt37-tsn1-rt-ipipe #ipipe
 [  101.525320] Hardware name: SIEMENS AG SIMATIC IPC427D/A5E31233588, BIOS V17.02.09 03/31/2017
 [  101.525322] RIP: 0010:refcount_warn_saturate+0xab/0xf0
 [  101.525325] Code: 05 31 48 44 01 01 e8 f0 c6 42 00 0f 0b c3 80 3d 1f 48 44 01 00 75 90 48 c7 c7 78 a8 f3 a6 c6 05 0f 48
 44 01 01 e8 d1 c6 42 00 <0f> 0b c3 80 3d fe 47 44 01 00 0f 85 6d ff ff ff 48 c7 c7 d0 a8 f3
 [  101.525327] RSP: 0018:ffffbdedc0917cb8 EFLAGS: 00010286
 [  101.525329] RAX: 0000000000000000 RBX: ffff98fd6becbf40 RCX: 0000000000000001
 [  101.525330] RDX: 0000000000000001 RSI: ffffffffa6f2700c RDI: 00000000ffffffff
 [  101.525332] RBP: ffff98fd6becc14c R08: ffffffffa7463d00 R09: ffffbdedc0917c50
 [  101.525333] R10: ffffffffa74c3578 R11: 0000000000000034 R12: 00000000ffffff00
 [  101.525335] R13: ffff98fd6b0b1000 R14: 0000000000000039 R15: ffff98fd6be35c40
 [  101.525337] FS:  0000000000000000(0000) GS:ffff98fd6e240000(0000) knlGS:0000000000000000
 [  101.525339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  101.525341] CR2: 00007f34135a3a70 CR3: 0000000150210003 CR4: 00000000001706e0
 [  101.525343] Call Trace:
 [  101.525346]  sock_wfree+0x9c/0xa0
 [  101.525353]  unix_destruct_scm+0x7b/0xa0
 [  101.525358]  skb_release_head_state+0x40/0x90
 [  101.525362]  skb_release_all+0xe/0x30
 [  101.525364]  napi_consume_skb+0x57/0x160
 [  101.525367]  igc_poll+0xb7/0xc80 [igc]
 [  101.525376]  ? sched_clock+0x5/0x10
 [  101.525381]  ? sched_clock_cpu+0xe/0x100
 [  101.525385]  net_rx_action+0x14c/0x410
 [  101.525388]  __do_softirq+0xe9/0x2f4
 [  101.525391]  __local_bh_enable_ip+0xe3/0x110
 [  101.525395]  ? irq_finalize_oneshot.part.47+0xe0/0xe0
 [  101.525398]  irq_forced_thread_fn+0x6a/0x80
 [  101.525401]  irq_thread+0xe8/0x180
 [  101.525403]  ? wake_threads_waitq+0x30/0x30
 [  101.525406]  ? irq_thread_check_affinity+0xd0/0xd0
 [  101.525408]  kthread+0x183/0x1a0
 [  101.525412]  ? kthread_park+0x80/0x80
 [  101.525415]  ret_from_fork+0x22/0x30

Fixes: 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
Reported-by: Erez Geva <erez.geva.ext@siemens.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: jackjunliu <jackjunliu@tencent.com>
2024-06-11 20:49:55 +08:00