OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Haisu Wang	701785a3f8	rue/io: skip throttle REQ_META/REQ_PRIO IO Won't throttle REQ_META/REQ_PRIO and kswapd IO when skip_throttle_prio_req enabled Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	f630af7168	rue/io: buffered_write_bps hierarchy support Support hierarchy setting of buffered_write_bps Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	701147d7b1	rue/io: support readwrite unified configuration Support readwrite unified configuration, so we can more easily configure the bps/iops of the cgroup. Add readwrite_dynamic_ratio interface. Support anticipate ratio via previous data to control readwrite block throttle dynamically. Anticipate readwrite ratio based on dispatched bytes/iops in the last slice. Considering read and write slice are not aligned and able to trim orextent. Use the elapsed slice numbers to get an approximate rate. Tencent-internal-TAPDID: 878345747 Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com> Reviewed-by: Hongbo Li <herberthbli@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	8b986f7bdc	rue/io: Add iocost and iolatency entry for cgroup v1 Add entry of iocost and iolatency for cgroup v1 The effective weight of iocost sometimes may differs from the weight that users configured. This patch displays useful information for each cgroup's blk.cost.stat. Signed-off-by: Haisu Wang <haisuwang@tencent.com> Signed-off-by: Lenny Chen <lennychen@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	fed4a7c8be	rue/io: add io_cgv1_buff_wb to enable buffer IO counting in cgroup v1 Add a sysctl switch to control buffer IO counting in memcg of cgroup v1. If turn on this switch, remove memory cgroup may leave zombie slabs until wb finished. Need to turn on io_qos and io_cgv1_buff_wb in cgroup v1. Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	826a0366a1	rue/io: introduce per mem_cgroup sync interface Introduce the per cgroup.sync interface, so that we can ensure that the dirty pages of the cgroup are actually written to the disk without considering the dirty pages generated elsewhere. This can avoid the problem of large cgroup exit delay caused by system-level sync and avoid the problem of IO jitter. Note: struct wb_writeback_work moved from fs/fs-writeback.c to include/linux/writeback.h Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	a12bb1a43d	rue/io: add bufio isolation based for cgroup v1 Add buffer IO isolation bind_blkio based on v2 infrastructure to v1, so we can unify the interface for dio and bufio. Add sysctl switch to allow migrate already bind cgroup. Signed-off-by: Haisu Wang <haisuwang@tencent.com> Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Lenny Chen <lennychen@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	1b1b938068	rue/io: Add bps information to blkio.throttle.stat Bps information is missing in blkio.throttle.stat Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	a65dd3dd13	rue/io: Add blkio.throttle.stat Add blkio.throttle.stat to show throttle stat Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-28 15:42:22 +08:00
Haisu Wang	1860b51781	rue/io: add buffer IO writeback throtl for cgroup v1 Add buffer IO throttle for cgroup v1 base on dirty throttle, Since the actual IO speed is not considered, this solution may cause the continuous accumulation of dirty pages in the IO performance bottleneck scenario, which will lead to the deterioration of the isolation effect. Note: struct blkcg moved from block/blk-cgroup.h to include/linux/blk-cgroup.h Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Lenny Chen <lennychen@tencent.com> Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-28 15:41:58 +08:00
Haisu Wang	286c5f95c6	rue/io: add io_qos switch and throtl hierarchy Use sysctl_io_qos as rue IO function switch. Also support blk throttle hierarchy and enable by default. Note: throttle hierarchy won't effected by kernel.io_qos since it is linked in initialized phase Signed-off-by: Haisu Wang <haisuwang@tencent.com> Signed-off-by: Lenny Chen <lennychen@tencent.com>	2024-09-28 15:41:58 +08:00
Haisu Wang	2497ec22c1	rue/io: Enable CONFIG_BLK_DEV_THROTTLING_CGROUP_V1 configuration Make CONFIG_BLK_DEV_THROTTLING_CGROUP_V1 enable by default. Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:31:27 +08:00
Haisu Wang	fd71891bb8	rue/io: Correct the alloc type to disk_stats Upstream: no For non-SMP build, also alloc the dkstats dynamiclly. however, wrong struct type is assigned. Fixes: `6dfa517032` ("blkcg/diskstats: add per blkcg diskstats support") Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:31:26 +08:00
Haisu Wang	495e0e311e	rue/io: add support for recursive diskstats Blkcg add recursive diskstats. Fix the issue print the last partition in original solution and remove the list. Note: This function just for backward compatible of tkernel4. Since commit `f733164829` ("blk-cgroup: reimplement basic IO stats using cgroup rstat") implement blkg_iostat_set for cgroup stat in blkcg_gq. Signed-off-by: Haisu Wang <haisuwang@tencent.com> Signed-off-by: Lenny Chen <lennychen@tencent.com>	2024-09-27 11:31:26 +08:00
Haisu Wang	f35df3f918	rue/io: blkcg export blkcg symbols to be used in bpf accounting Make block cgroup I/O completion and done function dynamic to account per cgroup I/O status in ebpf. Fix blkcg_dkstats.alloc_node not undefined blkcg_dkstats.alloc_node only available when CONFIG_SMP enabled, move the INIT to the right place. Export blkcg symbols to be used in bpf accounting. Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-09-27 11:25:19 +08:00
Haojie Ning	a1574c433d	rue/mm: add sysctl_vm_use_priority_oom to enable priority oom for all cgroups Add sysctl_vm_use_priority_oom as a global setting to enable the priority_oom setting for all cgroups without the need to manually set it for each cgroup. This global setting has no effect when it is turned off. Signed-off-by: Haojie Ning <paulning@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:32 +08:00
Honglin Li	7c45f9b01f	rue/mm: compatible with mglru for pagecache limit The pagecache limit for system and per-cgroup will cause the process to get stuck when mglru is enabled. Use lru_gen_enabled() to check whether mglru is enabled in the system. Signed-off-by: Honglin Li <honglinli@tencent.com> Signed-off-by: Kairui Song <kasong@tencent.com> Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>	2024-09-27 11:13:32 +08:00
Xin Hao	4e6f350b03	rue/mm: fix file page_counter 'memcg->pagecache' error when THP enabled When the CONFIG_MEM_QPS feature is enabled, the __mod_lruvec_state function is called to increase the page_counter 'pagecache' value in per-memcg by 'NR_FILE_PAGES', which is not a problem if THP is not enabled, but if THP is enabled, the CONFIG_MEM_QPS feature forgot to increase the value of page_counter 'pagecache', because THP pagecache becomes 'NR_FILE_THPS' type. it will lead to the page_counter 'pagecache' val becomes negative val when these THP pagecache pages is released, so the results in the following warning situation. [55530.397796] ------------[ cut here ]------------ [55530.398854] page_counter underflow: -512 nr_pages=512 [55530.399864] WARNING: CPU: 1 PID: 3026157 at mm/page_counter.c:63 page_counter_cancel+0x55/0x60 [55530.412193] CPU: 1 PID: 3026157 Comm: bash Kdump: loaded Tainted: G [55530.416075] RIP: 0010:page_counter_cancel+0x55/0x60 [55530.421353] RAX: 0000000000000000 RBX: ffff8888161a8270 RCX: 0000000000000006 [55530.422680] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88881f85bb60 [55530.424008] RBP: ffffc90004ceba58 R08: 0000000000009617 R09: ffff88881584c820 [55530.425330] R10: 0000000000000000 R11: ffffffffa00d60b0 R12: 0000000000000200 [55530.426663] R13: ffff8888194f7000 R14: 0000000000000000 R15: 0000000000000000 [55530.427999] FS: 00007fe2932d1740(0000) GS:ffff88881f840000(0000) knlGS:0000000000000000 [55530.429447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [55530.430645] CR2: 00007f97c4e00000 CR3: 00000007e7256004 CR4: 00000000003706e0 [55530.432007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [55530.433360] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [55530.434711] Call Trace: [55530.435541] page_counter_uncharge+0x22/0x40 [55530.436571] __mod_memcg_state.part.80+0x79/0xe0 [55530.437645] __mod_memcg_lruvec_state+0x27/0x110 [55530.438712] __mod_lruvec_state+0x39/0x40 [55530.439712] unaccount_page_cache_page+0xd0/0x210 [55530.440803] __delete_from_page_cache+0x3d/0x1d0 [55530.441877] __remove_mapping+0xeb/0x220 [55530.442871] remove_mapping+0x16/0x30 [55530.443836] invalidate_inode_page+0x84/0x90 [55530.444869] invalidate_mapping_pages+0x162/0x3e0 [55530.445957] ? pick_next_task_fair+0x1f2/0x520 [55530.446996] drop_pagecache_sb+0xac/0x130 [55530.447972] iterate_supers+0xa2/0x110 [55530.448907] ? do_coredump+0xb20/0xb20 [55530.449840] drop_caches_sysctl_handler+0x5d/0x90 [55530.450893] proc_sys_call_handler+0x1d0/0x290 [55530.451906] proc_sys_write+0x14/0x20 [55530.452830] __vfs_write+0x1b/0x40 [55530.453722] vfs_write+0xab/0x1b0 [55530.454598] ksys_write+0x61/0xe0 [55530.455471] __x64_sys_write+0x1a/0x20 [55530.456392] do_syscall_64+0x4d/0x120 [55530.457296] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [55530.458346] RIP: 0033:0x7fe292836bc8 Fixes: a0d7d9851512 ("rue/mm: pagecache limit per cgroup support") Signed-off-by: Xin Hao <vernhao@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	b82ababba6	rue/mm: introduce new feature to async clean dying memcgs When memcg was removed, page caches and slab pages still reference to this memcg, it will cause very large number of dying memcgs in out system. This feature can async to clean dying memcgs in system. 1) sysctl -w vm.clean_dying_memcg_async=1 #start a kthread to async clean dying memcgs, default #value is 0. 2) sysctl -w vm.clean_dying_memcg_threshold=10 #Whenever 10 dying memcgs are generated in the system, #wakeup a kthread to async clean dying memcgs, default #value is 100. Signed-off-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	200560da23	rue/mm: introduce memcg page cache hit & miss ratio tool A new memory.page_cache_hit control file is added under each memory cgroup directory. Cat this file can print page cache hit and miss ratio at the memory cgroup level. Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	8de07be077	rue/mm: introduce memory allocation latency for per-cgroup tool A new memory.latency_histogram control file is added under each memory cgroup directory. Cat this file can print the memory access latency at the memory cgroup level. Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	1824581599	rue/mm: async free memory while process exiting Introduce async free memory while process exiting to shorten exit time. Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com> Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	75ad2bae3d	rue/mm: pagecache limit per cgroup support Functional test: http://tapd.oa.com/TencentOS_QoS/prong/stories/view/ 1020426664867405667?jump_count=1 Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com> Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com> Signed-off-by: Xuan Liu <benxliu@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	56d80c4ea2	rue/mm: add memory cgroup async page reclaim mechanism Introduce background page reclaim mechanism for memcg, it can be configured according to the cgroup priorities for different reclaim strategies. Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com> Signed-off-by: Mengmeng Chen <bauerchen@tencent.com> Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	0d35c4c639	rue/mm: introduce memcg priority oom Under memory pressure reclaim and oom would happen, with multiple cgroups exist in one system, we might want some of their memory or tasks survived the reclaim and oom while there are other cadidates. When oom happens it always choose victim from low priority memcg. And it works both for memcg oom and global oom, it can be enabled/disabled through @memory.use_priority_oom, for global oom through the root memcg's @memory.use_priority_oom, it is disabled by default. Signed-off-by: Haiwei Li <gerryhwli@tencent.com> Signed-off-by: Mengmeng Chen <bauerchen@tencent.com> Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:31 +08:00
Honglin Li	db44c11cdd	rue/mm: add priority reclaim support Introduce the sync && async priority reclaim mechanism. Signed-off-by: Yu Liu <allanyuliu@tencent.com> Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	04f49a445c	pagecachelimit: set an initial value for may_deactivate in shrink page cache The global pagecache limit function fails due to backport the upstream commit. In the scenario where the active file list needs to be reclaimed, it cannot reclaim the LRU_ACTIVE_FILE list, making the pagecache limit inaccurate. When shrinking page cache, we set an initial value for may_deactivate in scan_control to DEACTIVATE_FILE, allowing the active file list to be scanned in shrink_list. Signed-off-by: Honglin Li <honglinli@tencent.com> Reviewed-by: Hongbo Li <herberthbli@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	26941c0f5e	rue/net: avoid wrong memory access to struct net_device It assigns the net_device pointer of network interface to sock->in_dev in cls_tc_rx_hook() in the receiving process. The use of a sock->in_dev pointer can potentially lead to wrong memory access if the memory of struct net_device is freed after network interface is unregistered, which may cause kernel crash. The above use after free issue causes a crash as follows: BUG: unable to handle page fault for address: ffffffed698999c8 CPU: 50 PID: 1290732 Comm: kubelet Kdump: loaded Tainted: G O K 5.4.119-1-tlinux4-0009.1 #1 RIP: 0010:cls_cgroup_tx_accept+0x5e/0x120 Call Trace: <IRQ> cls_tc_tx_hook+0x10d/0x1a0 nf_hook_slow+0x43/0xc0 __ip_local_out+0xcb/0x130 ? ip_forward_options+0x190/0x190 ip_local_out+0x1c/0x40 __ip_queue_xmit+0x162/0x3d0 ? rx_cgroup_throttle.isra.4+0x2b0/0x2b0 ip_queue_xmit+0x10/0x20 __tcp_transmit_skb+0x57f/0xbe0 __tcp_retransmit_skb+0x1b0/0x8a0 tcp_retransmit_skb+0x19/0xd0 tcp_retransmit_timer+0x367/0xa80 ? kvm_clock_get_cycles+0x11/0x20 ? ktime_get+0x34/0x90 tcp_write_timer_handler+0x93/0x1f0 tcp_write_timer+0x7c/0x80 ? tcp_write_timer_handler+0x1f0/0x1f0 call_timer_fn+0x35/0x130 run_timer_softirq+0x1a8/0x420 ? ktime_get+0x34/0x90 ? clockevents_program_event+0x85/0xe0 __do_softirq+0x8c/0x2d7 ? hrtimer_interrupt+0x12a/0x210 irq_exit+0xa3/0xb0 smp_apic_timer_interrupt+0x77/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> We introduce indev_ifindex as a new struct filed to record the ifindex of net_device, and then indev_ifindex can be used for obtaining an index to avoid direct memory access to struct members of in_dev pointer. Fixes: f8829546f3b3 ("rue/net: init netcls traffic controller") Signed-off-by: Honglin Li <honglinli@tencent.com> Reviewed-by: Ze Gao <zegao@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	68a7910a16	rue/net: avoid wrong memory access to struct cgroup_cls_state The memory of struct cgroup_cls_state may be freed during the use of a pointer to the struct. This issue can potentially lead to wrong memory access and thus kernel crashes. Increase the reference count of struct cgroup_cls_state through css_tryget_online while the struct is in use. The above causes a crash as follows: CPU: 56 PID: 4161866 Comm: AppSourceDatapr Kdump: loaded Tainted: G O 5.4.119-1-tlinux4-0008 #1 RIP: 0010:cls_cgroup_adjust_wnd+0x58/0x180 Call Trace: <IRQ> __tcp_transmit_skb+0x6a8/0xbe0 __tcp_send_ack.part.50+0xc2/0x170 tcp_send_ack+0x1c/0x20 tcp_send_dupack+0x29/0x130 ? kvm_clock_get_cycles+0x11/0x20 tcp_validate_incoming+0x332/0x440 tcp_rcv_established+0x1f6/0x670 tcp_v4_do_rcv+0x18a/0x220 tcp_v4_rcv+0xbfd/0xca0 ip_protocol_deliver_rcu+0x1f/0x180 ip_local_deliver_finish+0x51/0x60 ip_local_deliver+0xcd/0xe0 ? ip_protocol_deliver_rcu+0x180/0x180 ip_rcv_finish+0x7b/0x90 ip_rcv+0xb5/0xc0 ? ip_rcv_finish_core.isra.18+0x380/0x380 __netif_receive_skb_one_core+0x59/0x80 __netif_receive_skb+0x26/0x70 process_backlog+0xac/0x150 net_rx_action+0x127/0x380 ? ktime_get+0x34/0x90 __do_softirg+0x8c/0x2d7 irq_exit+0xa3/0xb0 smp_call_function_single_interrupt+0x4c/0xd0 call_function_single_interrupt+0xf/0x20 </IRQ> Signed-off-by: Honglin Li <honglinli@tencent.com> Reviewed-by: Ze Gao <zegao@tencent.com> Reviewed-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	55f6748cd1	rue/net: adapt to the new rue modular framework Add to register and unregister rue net ops through rue modular framework. Signed-off-by: Honglin Li <honglinli@tencent.com> Reviewed-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	ca8edadc91	rue/net: add dynamic bandwidth allocation between online cgroups Introduce netcls controller interface files, which can be configured to enable/disable bandwidth allocation mechanism among online net cgroups. The mechanism realizes the migration of idle bandwidth resources among online cgroups, while guaranteeing the minimum bandwidth for per-cgroup, to improve resource utilization. Signed-off-by: Honglin Li <honglinli@tencent.com> Reviewed-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: Jason Xing <kernelxing@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	3811ff7c02	rue/net: add netdev-based rate limit for per cgroup Introduce netdev-based rate limit for rx && tx direction. Signed-off-by: Honglin Li <honglinli@tencent.com> Reviewed-by: Zhiping Du <zhipingdu@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	9a447c5cb9	rue/net: add total bandwidth limit for multiprio preemption Introduce the total bandwidth limit mechanism for rx && tx direction. Signed-off-by: Honglin Li <honglinli@tencent.com> Signed-off-by: Zhiping Du <zhipingdu@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	703664bf47	rue/net: add support for cgroup whitelist ports Introduce the cgroup whitelist ports mechanism. Signed-off-by: Honglin Li <honglinli@tencent.com> Signed-off-by: Zhiping Du <zhipingdu@tencent.com>	2024-09-27 11:13:30 +08:00
Honglin Li	ca0f6ddd21	rue/net: add rx && tx rate limit for per cgroup Introduce the bandwidth rate limit mechanism for per cgroup. Signed-off-by: Honglin Li <honglinli@tencent.com> Signed-off-by: Zhiping Du <zhipingdu@tencent.com>	2024-09-27 11:13:29 +08:00
Honglin Li	669bbf19cd	rue/net: init netcls traffic controller Add multiprio dynamic bandwidth controller. Signed-off-by: Honglin Li <honglinli@tencent.com> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Signed-off-by: Zhiping Du <zhipingdu@tencent.com>	2024-09-27 11:13:29 +08:00
Haisu Wang	0f93976785	rue: Revert "kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()" Export the two functions again for module like RUE This reverts commit `0bd476e6c6`. Signed-off-by: Haisu Wang <haisuwang@tencent.com> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:29 +08:00
Ze Gao	d5a175186d	rue: Add support for rue modularization Add framework support to enable rue to be installed as a separate module. In order to safely insmod/rmmod, we use per-cpu counter to track how many rue related functions are on the fly, and it's only safe to insmod/rmmod when there's no tasks using any of these functions registered by rue module. Signed-off-by: Ze Gao <zegao@tencent.com>	2024-09-27 11:13:29 +08:00
Hongbo Li	5dc70a633d	rue: init rue module Add the init code of rue module. Support both built-in and module(default) way. Signed-off-by: Hongbo Li <herberthbli@tencent.com> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: Honglin Li <honglinli@tencent.com>	2024-09-27 11:13:29 +08:00
Hongbo Li	fce3609ebf	rue: cgroup priority Add cgroup priority. Signed-off-by: Hongbo Li <herberthbli@tencent.com> Signed-off-by: Lei Chen <lennychen@tencent.com> Signed-off-by: Yu Liu <allanyuliu@tencent.com>	2024-09-27 11:13:29 +08:00
Haisu Wang	61bf5b5b7e	blkcg/diskstats: Fix the extra cpu parameter Upstream: no In `6dfa517032`, unified the blkcg_part_stat_add without implicitly pass cpu number. Add CONFIG_BLK_CGROUP_DISKSTATS is depends on CONFIG_BLK_CGROUP, so no need to define blkcg_part_stat_add() when CONFIG_BLK_CGROUP disabled. Correct the error msg "implicit declaration of function ‘blkcg_dkstats_show_comm’" when disable CONFIG_BLK_CGROUP_DISKSTATS Fixes: `6dfa517032` ("blkcg/diskstats: add per blkcg diskstats support") Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:13:29 +08:00
Haisu Wang	2fc4b0e9c0	mm: set default watermark_boost_factor value to 0 Upstream: no Watermark boost factor controls the level of reclaim when memory is being fragmented. The intent is that compaction has less work to do in the future and to increase the success rate of future high-order allocations such as SLUB allocations, THP and hugetlbfs pages. However, it wakeup kswapd to do defragmentation, the action caused performance jitter in many cases without enough gain. In some distributions like Debian, also set the default boost fator to 0 to disable the feature. WXG Story of compaction cause performance jitter: https://doc.weixin.qq.com/doc/w3_AIAAcwacAAYudo6ERcUQMiNUbmvzb?scode=AJEAIQdfAAoeO7AbqSAYQATQaYAJg Signed-off-by: Haisu Wang <haisuwang@tencent.com> Signed-off-by: Kairui Song <kasong@tencent.com> Signed-off-by: Zeng Jingxiang <linuszeng@tencent.com> Reviewed-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-27 11:13:28 +08:00
Haisu Wang	b03afc0d33	Revert "io/tqos: merge buffer io limit series patch from brookxu, and rework some function." This reverts commit `538ec11bed`. Revert due to refactory the buffer IO function. In TK5, unnecessary to compatible kabi by using the "nodeinfo" in "struct mem_cgroup {}". Original tapd and MR: https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502 https://git.woa.com/tlinux/tkernel5/-/merge_requests/117 Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:13:24 +08:00
Haisu Wang	3231efb956	Revert "io/tqos: add sysctl_buffer_io_limit switch for buffer io limit." This reverts commit `4d87de6bb4`. Revert due to refactory the buffer IO function. In TK5, unnecessary to compatible kabi by using the "nodeinfo" in "struct mem_cgroup {}". Original tapd and MR: https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502 https://git.woa.com/tlinux/tkernel5/-/merge_requests/117 Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:13:21 +08:00
Haisu Wang	24cfc0a666	Revert "cgroup: allow cgroup to split direct io and buffered io into different blkio cgroup" This reverts commit `71aaa09350`. Revert due to refactory the buffer IO function. In TK5, unnecessary to compatible kabi by using the "nodeinfo" in "struct mem_cgroup {}". Original tapd and MR: https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502 https://git.woa.com/tlinux/tkernel5/-/merge_requests/117 Signed-off-by: Haisu Wang <haisuwang@tencent.com>	2024-09-27 11:12:41 +08:00
aurelianliu	b1e1aed588	config,x86: open edr open dpc and edr which can enable pcie edpc function, when uce comes, edpc could reset device link, resume deveice, which likes to hotplug this device. Signed-off-by: Aurelianliu <aurelianliu@tencent.com>	2024-09-11 02:06:12 +00:00
Daniel Maslowski	e16058ed64	riscv/purgatory: align riscv_kernel_entry Fix CVE: CVE-2024-43868 [ Upstream commit fb197c5d2fd24b9af3d4697d0cf778645846d6d5 ] When alignment handling is delegated to the kernel, everything must be word-aligned in purgatory, since the trap handler is then set to the kexec one. Without the alignment, hitting the exception would ultimately crash. On other occasions, the kernel's handler would take care of exceptions. This has been tested on a JH7110 SoC with oreboot and its SBI delegating unaligned access exceptions and the kernel configured to handle them. Fixes: `736e30af58` ("RISC-V: Add purgatory") Signed-off-by: Daniel Maslowski <cyrevolt@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-10 19:36:42 +08:00
Jianping Liu	7a6899b55a	config,x86: disable CONFIG_IOMMU_DEBUGFS To avoid the log like below: [ 0.095948] *********************************************************** [ 0.095948] NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE [ 0.096220] [ 0.096221] IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL [ 0.096222] [ 0.096223] This means that this kernel is built to expose internal [ 0.096224] IOMMU data structures, which may compromise security on [ 0.096225] your system. [ 0.096227] [ 0.096227] If you see this message and you are not debugging the [ 0.096228] kernel, report this immediately to your vendor! [ 0.096229] [ 0.096230] NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE [ 0.096231] *********************************************************** disable CONFIG_IOMMU_DEBUGFS. Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Yongliang Gao <leonylgao@tencent.com>	2024-09-06 15:03:58 +08:00
Jianping Liu	64a21c8a25	hung_task,watchdog: set thresh time to 600 seconds When CONFIG_KASAN is enabled, the kernel will run more slower, set hung_task and soft lockup thresh time to 600 seconds. Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Yongliang Gao <leonylgao@tencent.com>	2024-09-05 15:24:07 +08:00
Jianping Liu	2748b6ef40	Merge OCK next branch to TK5 master branch	2024-09-03 11:26:15 +08:00

1 2 3 4 5 ...

1228257 Commits All Branches Search

1228257 Commits

All Branches