Commit Graph

1228244 Commits

Author SHA1 Message Date
Haisu Wang 495e0e311e rue/io: add support for recursive diskstats
Blkcg add recursive diskstats.
Fix the issue print the last partition in original solution and
remove the list.

Note:
This function just for backward compatible of tkernel4. Since
commit f733164829 ("blk-cgroup: reimplement basic IO stats
using cgroup rstat") implement blkg_iostat_set for cgroup stat
in blkcg_gq.

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Signed-off-by: Lenny Chen <lennychen@tencent.com>
2024-09-27 11:31:26 +08:00
Haisu Wang f35df3f918 rue/io: blkcg export blkcg symbols to be used in bpf accounting
Make block cgroup I/O completion and done function dynamic
to account per cgroup I/O status in ebpf.

Fix blkcg_dkstats.alloc_node not undefined blkcg_dkstats.alloc_node
only available when CONFIG_SMP enabled, move the INIT to the right
place.
Export blkcg symbols to be used in bpf accounting.

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
2024-09-27 11:25:19 +08:00
Haojie Ning a1574c433d rue/mm: add sysctl_vm_use_priority_oom to enable priority oom for all cgroups
Add sysctl_vm_use_priority_oom as a global setting to enable the
priority_oom setting for all cgroups without the need to manually
set it for each cgroup. This global setting has no effect when it
is turned off.

Signed-off-by: Haojie Ning <paulning@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:32 +08:00
Honglin Li 7c45f9b01f rue/mm: compatible with mglru for pagecache limit
The pagecache limit for system and per-cgroup will
cause the process to get stuck when mglru is enabled.
Use lru_gen_enabled() to check whether mglru is
enabled in the system.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
2024-09-27 11:13:32 +08:00
Xin Hao 4e6f350b03 rue/mm: fix file page_counter 'memcg->pagecache' error when THP enabled
When the CONFIG_MEM_QPS feature is enabled, the __mod_lruvec_state function is
called to increase the page_counter 'pagecache' value in per-memcg by 'NR_FILE_PAGES',
which is not a problem if THP is not enabled, but if THP is enabled, the CONFIG_MEM_QPS
feature forgot to increase the value of page_counter 'pagecache', because THP pagecache
becomes 'NR_FILE_THPS' type. it will lead to the page_counter 'pagecache' val becomes
negative val when these THP pagecache pages is released, so the results in the following
warning situation.

[55530.397796] ------------[ cut here ]------------
[55530.398854] page_counter underflow: -512 nr_pages=512
[55530.399864] WARNING: CPU: 1 PID: 3026157 at mm/page_counter.c:63 page_counter_cancel+0x55/0x60
[55530.412193] CPU: 1 PID: 3026157 Comm: bash Kdump: loaded Tainted: G
[55530.416075] RIP: 0010:page_counter_cancel+0x55/0x60
[55530.421353] RAX: 0000000000000000 RBX: ffff8888161a8270 RCX: 0000000000000006
[55530.422680] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88881f85bb60
[55530.424008] RBP: ffffc90004ceba58 R08: 0000000000009617 R09: ffff88881584c820
[55530.425330] R10: 0000000000000000 R11: ffffffffa00d60b0 R12: 0000000000000200
[55530.426663] R13: ffff8888194f7000 R14: 0000000000000000 R15: 0000000000000000
[55530.427999] FS:  00007fe2932d1740(0000) GS:ffff88881f840000(0000) knlGS:0000000000000000
[55530.429447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[55530.430645] CR2: 00007f97c4e00000 CR3: 00000007e7256004 CR4: 00000000003706e0
[55530.432007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[55530.433360] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[55530.434711] Call Trace:
[55530.435541]  page_counter_uncharge+0x22/0x40
[55530.436571]  __mod_memcg_state.part.80+0x79/0xe0
[55530.437645]  __mod_memcg_lruvec_state+0x27/0x110
[55530.438712]  __mod_lruvec_state+0x39/0x40
[55530.439712]  unaccount_page_cache_page+0xd0/0x210
[55530.440803]  __delete_from_page_cache+0x3d/0x1d0
[55530.441877]  __remove_mapping+0xeb/0x220
[55530.442871]  remove_mapping+0x16/0x30
[55530.443836]  invalidate_inode_page+0x84/0x90
[55530.444869]  invalidate_mapping_pages+0x162/0x3e0
[55530.445957]  ? pick_next_task_fair+0x1f2/0x520
[55530.446996]  drop_pagecache_sb+0xac/0x130
[55530.447972]  iterate_supers+0xa2/0x110
[55530.448907]  ? do_coredump+0xb20/0xb20
[55530.449840]  drop_caches_sysctl_handler+0x5d/0x90
[55530.450893]  proc_sys_call_handler+0x1d0/0x290
[55530.451906]  proc_sys_write+0x14/0x20
[55530.452830]  __vfs_write+0x1b/0x40
[55530.453722]  vfs_write+0xab/0x1b0
[55530.454598]  ksys_write+0x61/0xe0
[55530.455471]  __x64_sys_write+0x1a/0x20
[55530.456392]  do_syscall_64+0x4d/0x120
[55530.457296]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[55530.458346] RIP: 0033:0x7fe292836bc8

Fixes: a0d7d9851512 ("rue/mm: pagecache limit per cgroup support")
Signed-off-by: Xin Hao <vernhao@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li b82ababba6 rue/mm: introduce new feature to async clean dying memcgs
When memcg was removed, page caches and slab pages still
reference to this memcg, it will cause very large number
of dying memcgs in out system. This feature can async to
clean dying memcgs in system.

1) sysctl -w vm.clean_dying_memcg_async=1
   #start a kthread to async clean dying memcgs, default
   #value is 0.

2) sysctl -w vm.clean_dying_memcg_threshold=10
   #Whenever 10 dying memcgs are generated in the system,
   #wakeup a kthread to async clean dying memcgs, default
   #value is 100.

Signed-off-by: Bin Lai <robinlai@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 200560da23 rue/mm: introduce memcg page cache hit & miss ratio tool
A new memory.page_cache_hit control file is added
under each memory cgroup directory. Cat this file can
print page cache hit and miss ratio at the memory
cgroup level.

Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 8de07be077 rue/mm: introduce memory allocation latency for per-cgroup tool
A new memory.latency_histogram control file is added
under each memory cgroup directory. Cat this file can
print the memory access latency at the memory cgroup level.

Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 1824581599 rue/mm: async free memory while process exiting
Introduce async free memory while process exiting
to shorten exit time.

Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 75ad2bae3d rue/mm: pagecache limit per cgroup support
Functional test:
http://tapd.oa.com/TencentOS_QoS/prong/stories/view/
1020426664867405667?jump_count=1

Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Xuan Liu <benxliu@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 56d80c4ea2 rue/mm: add memory cgroup async page reclaim mechanism
Introduce background page reclaim mechanism for memcg, it can
be configured according to the cgroup priorities for different
reclaim strategies.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
Signed-off-by: Mengmeng Chen <bauerchen@tencent.com>
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 0d35c4c639 rue/mm: introduce memcg priority oom
Under memory pressure reclaim and oom would happen,
with multiple cgroups exist in one system, we might
want some of their memory or tasks survived the
reclaim and oom while there are other cadidates.

When oom happens it always choose victim from low
priority memcg. And it works both for memcg oom and
global oom, it can be enabled/disabled through
@memory.use_priority_oom, for global oom through the root
memcg's @memory.use_priority_oom, it is disabled by default.

Signed-off-by: Haiwei Li <gerryhwli@tencent.com>
Signed-off-by: Mengmeng Chen <bauerchen@tencent.com>
Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li db44c11cdd rue/mm: add priority reclaim support
Introduce the sync && async priority reclaim mechanism.

Signed-off-by: Yu Liu <allanyuliu@tencent.com>
Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 04f49a445c pagecachelimit: set an initial value for may_deactivate in shrink page cache
The global pagecache limit function fails due to backport the
upstream commit. In the scenario where the active file list
needs to be reclaimed, it cannot reclaim the LRU_ACTIVE_FILE
list, making the pagecache limit inaccurate.

When shrinking page cache, we set an initial value for
may_deactivate in scan_control to DEACTIVATE_FILE, allowing
the active file list to be scanned in shrink_list.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Hongbo Li <herberthbli@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 26941c0f5e rue/net: avoid wrong memory access to struct net_device
It assigns the net_device pointer of network interface to
sock->in_dev in cls_tc_rx_hook() in the receiving process.
The use of a sock->in_dev pointer can potentially lead to
wrong memory access if the memory of struct net_device is
freed after network interface is unregistered, which may
cause kernel crash.

The above use after free issue causes a crash as follows:

BUG: unable to handle page fault for address: ffffffed698999c8
CPU: 50 PID: 1290732 Comm: kubelet Kdump: loaded
Tainted: G O K 5.4.119-1-tlinux4-0009.1 #1
RIP: 0010:cls_cgroup_tx_accept+0x5e/0x120
Call Trace:
 <IRQ>
 cls_tc_tx_hook+0x10d/0x1a0
 nf_hook_slow+0x43/0xc0
 __ip_local_out+0xcb/0x130
 ? ip_forward_options+0x190/0x190
 ip_local_out+0x1c/0x40
 __ip_queue_xmit+0x162/0x3d0
 ? rx_cgroup_throttle.isra.4+0x2b0/0x2b0
 ip_queue_xmit+0x10/0x20
 __tcp_transmit_skb+0x57f/0xbe0
 __tcp_retransmit_skb+0x1b0/0x8a0
 tcp_retransmit_skb+0x19/0xd0
 tcp_retransmit_timer+0x367/0xa80
 ? kvm_clock_get_cycles+0x11/0x20
 ? ktime_get+0x34/0x90
 tcp_write_timer_handler+0x93/0x1f0
 tcp_write_timer+0x7c/0x80
 ? tcp_write_timer_handler+0x1f0/0x1f0
 call_timer_fn+0x35/0x130
 run_timer_softirq+0x1a8/0x420
 ? ktime_get+0x34/0x90
 ? clockevents_program_event+0x85/0xe0
 __do_softirq+0x8c/0x2d7
 ? hrtimer_interrupt+0x12a/0x210
 irq_exit+0xa3/0xb0
 smp_apic_timer_interrupt+0x77/0x130
 apic_timer_interrupt+0xf/0x20
 </IRQ>

We introduce indev_ifindex as a new struct filed to record
the ifindex of net_device, and then indev_ifindex can be
used for obtaining an index to avoid direct memory access
to struct members of in_dev pointer.

Fixes: f8829546f3b3 ("rue/net: init netcls traffic controller")
Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Ze Gao <zegao@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 68a7910a16 rue/net: avoid wrong memory access to struct cgroup_cls_state
The memory of struct cgroup_cls_state may be freed
during the use of a pointer to the struct. This issue
can potentially lead to wrong memory access and thus
kernel crashes.

Increase the reference count of struct cgroup_cls_state
through css_tryget_online while the struct is in use.

The above causes a crash as follows:

CPU: 56 PID: 4161866 Comm: AppSourceDatapr Kdump: loaded
Tainted: G O 5.4.119-1-tlinux4-0008 #1
RIP: 0010:cls_cgroup_adjust_wnd+0x58/0x180
Call Trace:
 <IRQ>
 __tcp_transmit_skb+0x6a8/0xbe0
 __tcp_send_ack.part.50+0xc2/0x170
 tcp_send_ack+0x1c/0x20
 tcp_send_dupack+0x29/0x130
 ? kvm_clock_get_cycles+0x11/0x20
 tcp_validate_incoming+0x332/0x440
 tcp_rcv_established+0x1f6/0x670
 tcp_v4_do_rcv+0x18a/0x220
 tcp_v4_rcv+0xbfd/0xca0
 ip_protocol_deliver_rcu+0x1f/0x180
 ip_local_deliver_finish+0x51/0x60
 ip_local_deliver+0xcd/0xe0
 ? ip_protocol_deliver_rcu+0x180/0x180
 ip_rcv_finish+0x7b/0x90
 ip_rcv+0xb5/0xc0
 ? ip_rcv_finish_core.isra.18+0x380/0x380
 __netif_receive_skb_one_core+0x59/0x80
 __netif_receive_skb+0x26/0x70
 process_backlog+0xac/0x150
 net_rx_action+0x127/0x380
 ? ktime_get+0x34/0x90
 __do_softirg+0x8c/0x2d7
 irq_exit+0xa3/0xb0
 smp_call_function_single_interrupt+0x4c/0xd0
 call_function_single_interrupt+0xf/0x20
 </IRQ>

Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Ze Gao <zegao@tencent.com>
Reviewed-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 55f6748cd1 rue/net: adapt to the new rue modular framework
Add to register and unregister rue net ops through
rue modular framework.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li ca8edadc91 rue/net: add dynamic bandwidth allocation between online cgroups
Introduce netcls controller interface files, which can be
configured to enable/disable bandwidth allocation mechanism
among online net cgroups.

The mechanism realizes the migration of idle bandwidth resources
among online cgroups, while guaranteeing the minimum bandwidth
for per-cgroup, to improve resource utilization.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: Jason Xing <kernelxing@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 3811ff7c02 rue/net: add netdev-based rate limit for per cgroup
Introduce netdev-based rate limit for rx && tx direction.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Zhiping Du <zhipingdu@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 9a447c5cb9 rue/net: add total bandwidth limit for multiprio preemption
Introduce the total bandwidth limit mechanism for rx && tx direction.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Signed-off-by: Zhiping Du <zhipingdu@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 703664bf47 rue/net: add support for cgroup whitelist ports
Introduce the cgroup whitelist ports mechanism.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Signed-off-by: Zhiping Du <zhipingdu@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li ca0f6ddd21 rue/net: add rx && tx rate limit for per cgroup
Introduce the bandwidth rate limit mechanism for per cgroup.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Signed-off-by: Zhiping Du <zhipingdu@tencent.com>
2024-09-27 11:13:29 +08:00
Honglin Li 669bbf19cd rue/net: init netcls traffic controller
Add multiprio dynamic bandwidth controller.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Signed-off-by: Hongbo Li <herberthbli@tencent.com>
Signed-off-by: Zhiping Du <zhipingdu@tencent.com>
2024-09-27 11:13:29 +08:00
Haisu Wang 0f93976785 rue: Revert "kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()"
Export the two functions again for module like RUE

This reverts commit 0bd476e6c6.

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:29 +08:00
Ze Gao d5a175186d rue: Add support for rue modularization
Add framework support to enable rue to be installed as
a separate module.

In order to safely insmod/rmmod, we use per-cpu counter to
track how many rue related functions are on the fly, and
it's only safe to insmod/rmmod when there's no tasks using
any of these functions registered by rue module.

Signed-off-by: Ze Gao <zegao@tencent.com>
2024-09-27 11:13:29 +08:00
Hongbo Li 5dc70a633d rue: init rue module
Add the init code of rue module.
Support both built-in and module(default) way.

Signed-off-by: Hongbo Li <herberthbli@tencent.com>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:29 +08:00
Hongbo Li fce3609ebf rue: cgroup priority
Add cgroup priority.

Signed-off-by: Hongbo Li <herberthbli@tencent.com>
Signed-off-by: Lei Chen  <lennychen@tencent.com>
Signed-off-by: Yu Liu    <allanyuliu@tencent.com>
2024-09-27 11:13:29 +08:00
Haisu Wang 61bf5b5b7e blkcg/diskstats: Fix the extra cpu parameter
Upstream: no

In 6dfa517032, unified the blkcg_part_stat_add without
implicitly pass cpu number. Add CONFIG_BLK_CGROUP_DISKSTATS
is depends on CONFIG_BLK_CGROUP, so no need to define
blkcg_part_stat_add() when CONFIG_BLK_CGROUP disabled.

Correct the error msg "implicit declaration of function
‘blkcg_dkstats_show_comm’" when disable CONFIG_BLK_CGROUP_DISKSTATS

Fixes: 6dfa517032 ("blkcg/diskstats: add per blkcg diskstats support")
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:29 +08:00
Haisu Wang 2fc4b0e9c0 mm: set default watermark_boost_factor value to 0
Upstream: no

Watermark boost factor controls the level of reclaim when memory is
being fragmented. The intent is that compaction has less work to do in the
future and to increase the success rate of future high-order allocations
such as SLUB allocations, THP and hugetlbfs pages.
However, it wakeup kswapd to do defragmentation, the action caused
performance jitter in many cases without enough gain.

In some distributions like Debian, also set the default boost
fator to 0 to disable the feature.

WXG Story of compaction cause performance jitter:
https://doc.weixin.qq.com/doc/w3_AIAAcwacAAYudo6ERcUQMiNUbmvzb?scode=AJEAIQdfAAoeO7AbqSAYQATQaYAJg

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Zeng Jingxiang <linuszeng@tencent.com>
Reviewed-by:  Jianping Liu <frankjpliu@tencent.com>
2024-09-27 11:13:28 +08:00
Haisu Wang b03afc0d33 Revert "io/tqos: merge buffer io limit series patch from brookxu, and rework some function."
This reverts commit 538ec11bed.

Revert due to refactory the buffer IO function.
In TK5, unnecessary to compatible kabi by using the "nodeinfo"
in "struct mem_cgroup {}".

Original tapd and MR:
  https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502
  https://git.woa.com/tlinux/tkernel5/-/merge_requests/117

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:24 +08:00
Haisu Wang 3231efb956 Revert "io/tqos: add sysctl_buffer_io_limit switch for buffer io limit."
This reverts commit 4d87de6bb4.

Revert due to refactory the buffer IO function.
In TK5, unnecessary to compatible kabi by using the "nodeinfo"
in "struct mem_cgroup {}".

Original tapd and MR:
  https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502
  https://git.woa.com/tlinux/tkernel5/-/merge_requests/117

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:21 +08:00
Haisu Wang 24cfc0a666 Revert "cgroup: allow cgroup to split direct io and buffered io into different blkio cgroup"
This reverts commit 71aaa09350.

Revert due to refactory the buffer IO function.
In TK5, unnecessary to compatible kabi by using the "nodeinfo"
in "struct mem_cgroup {}".

Original tapd and MR:
  https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502
  https://git.woa.com/tlinux/tkernel5/-/merge_requests/117

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:12:41 +08:00
aurelianliu b1e1aed588 config,x86: open edr
open dpc and edr which can enable pcie edpc function,
when uce comes, edpc could reset device link, resume deveice,
which likes to hotplug this device.

Signed-off-by: Aurelianliu <aurelianliu@tencent.com>
2024-09-11 02:06:12 +00:00
Daniel Maslowski e16058ed64 riscv/purgatory: align riscv_kernel_entry
Fix CVE: CVE-2024-43868

[ Upstream commit fb197c5d2fd24b9af3d4697d0cf778645846d6d5 ]

When alignment handling is delegated to the kernel, everything must be
word-aligned in purgatory, since the trap handler is then set to the
kexec one. Without the alignment, hitting the exception would
ultimately crash. On other occasions, the kernel's handler would take
care of exceptions.
This has been tested on a JH7110 SoC with oreboot and its SBI delegating
unaligned access exceptions and the kernel configured to handle them.

Fixes: 736e30af58 ("RISC-V: Add purgatory")
Signed-off-by: Daniel Maslowski <cyrevolt@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-09-10 19:36:42 +08:00
Jianping Liu 7a6899b55a config,x86: disable CONFIG_IOMMU_DEBUGFS
To avoid the log like below:
[    0.095948] *************************************************************
[    0.095948] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
[    0.096220] **                                                         **
[    0.096221] **  IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL  **
[    0.096222] **                                                         **
[    0.096223] ** This means that this kernel is built to expose internal **
[    0.096224] ** IOMMU data structures, which may compromise security on **
[    0.096225] ** your system.                                            **
[    0.096227] **                                                         **
[    0.096227] ** If you see this message and you are not debugging the   **
[    0.096228] ** kernel, report this immediately to your vendor!         **
[    0.096229] **                                                         **
[    0.096230] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
[    0.096231] *************************************************************
disable CONFIG_IOMMU_DEBUGFS.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-06 15:03:58 +08:00
Jianping Liu 64a21c8a25 hung_task,watchdog: set thresh time to 600 seconds
When CONFIG_KASAN is enabled, the kernel will run more slower, set
hung_task and soft lockup thresh time to 600 seconds.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-05 15:24:07 +08:00
Jianping Liu 2748b6ef40 Merge OCK next branch to TK5 master branch 2024-09-03 11:26:15 +08:00
Jianping Liu 41d84212f5 dist: release 6.6.47-12
Upstream: no

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-09-03 11:21:49 +08:00
Jianping Liu 5b4374c873 config,x86: set CONFIG_HW_RANDOM_ZHAOXIN to m
Most x86 cpu don't need CONFIG_HW_RANDOM_ZHAOXIN, so set it from y
to m, which could reduce the size of vmlinux.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-03 11:17:32 +08:00
Kairui Song 1789bec3a9 emm: fix panic in kdump
Upstream: no
Tested: Tested on Devcloud

Kdump environment disables memory cgroup so related helpers will all
return NULL. Abort early in such case.

Signed-off-by: Kairui Song <kasong@tencent.com>
2024-09-03 03:06:14 +00:00
Huang Cun 0ba277671b config: trace: enable CONFIG_FUNCTION_GRAPH_RETVAL
Signed-off-by: Huang Cun <cunhuang@tencent.com>
2024-09-03 03:04:10 +00:00
Ze Gao 2e2ffe48c5 rue/scx: Fix cgroupv2 cpu controller regression
Due to the odd behavior of gcc designated initializer, we
have to carefully order the fields inside cpu_cftypes.
otherwise some important interfaces like cpu.max could
be lost.

Checkout details in [1]

[1]: https://onlinegdb.com/T-AMLp4zw

Fixes: 8c320a09af ("rue/scx: Add cpu.offline to maintain SCHED_BT compatibility")
Fixes: 2b9d28baab ("rue/scx: Add cpu.scx to the cpu cgroup controller")
Reported-by: likexu <likexu@tencent.com>
Signed-off-by: Ze Gao <zegao@tencent.com>
2024-09-03 02:47:04 +00:00
leoliu-oc 95e99651a2 zhaoxin_rng: Remove redundant pr_err log after matching cpu_ids
On non-Zhaoxin platforms, log related to the zhaoxin rng driver should not
appear.

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-09-02 17:11:48 +08:00
Jianping Liu dbef74015d watchdog: increase watchdog_thresh max value to 300 in debug kernel
If enable CONFIG_KASAN or CONFIG_KCSAN, the system will run much
slower, increase watchdog_thresh's max value to avoid soft lockup
or hungtask when run heavy test suit.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-30 17:19:41 +08:00
Jianping Liu 42be2152a4 drivers,thirdparty: add backup url for mlnx driver
If getting mlnx driver fail at https://content.mellanox.com, using
backup url for mlnx driver.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-29 12:43:18 +08:00
Jianping Liu 198d728bcc dist: check sha256 if mlnx tgz is already exist
In dist/sources/download-and-copy-drivers.sh, if mlnx tgz is greater
than 1024 byte, that stand for really mlnx tgz is exist. Script will
return 0, without check sha256. Change it to check sha256 anyway.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-29 10:21:59 +08:00
Jianping Liu 7a56ca3fca drivers,thirdparty: keep copy other thirdparty drivers if with_ofed is 0
If with_ofed is 0, only mlnx driver using kernel native driver. Other
drivers in drivers/thirdparty are commercial quality drivers, they should
be copied to override kernel native drivers before build.

In drivers/thirdparty/copy-drivers.sh, kernel native bnxt directory
is wrong. Fix it by the way.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-28 21:37:46 +08:00
Jianping Liu c92c287ac7 drivers,mlnx: add sha256 check for MLNX tgz
To ensure the down load file is correct, add sha256 check for MLNX tgz.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-28 14:32:34 +08:00
Jianping Liu 3353ce662c dist: delete useless code in kernel.template.spec
Now release-drivers is in three, so it needn't to judge whether
drivers/thirdparty/release-drivers/mlnx is exist.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-28 14:07:21 +08:00
Jianping Liu c99409f7fe Merge OKC next branch to TK5 master branch 2024-08-27 19:48:02 +08:00