anolis-cloud-kernel

Commit Graph

Author	SHA1	Message	Date
Duanqiang Wen	63c75247db	anolis: net: txgbe: fix mailbox error when echo vf ANBZ: #8072 when vf driver make modules_install, if echo vf of different ports successively， there will be problems with the mailbox lock. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	31a26be1ba	anolis: net: txgbe: fix ethtool set rss indir table ANBZ: #8072 ethtool -X ethx equal/weight, can't update ethx rss indir table. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	058e67ee39	anolis: net: txgbe: show max_combined wrong when enable sriov ANBZ: #8072 when enable sriov, ethtool -l ethx to get port pre-set max_combined always be 1, but in reality max_combined depends on num_vfs. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	4f08963061	anolis: net: txgbevf: support for vf rss ANBZ: #8072 support to virtual function rss features. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	2dda2f6f04	anolis: net: txgbevf: fix vf queues maximum ANBZ: #8072 fix vf only supports a maximum of 2 queues, get maximum of queues for pf msg[TXGBE_VF_RX_QUEUES] and msg[TXGBE_VF_TX_QUEUES]. set maximum of queues to 4. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	bbaa246502	anolis: net: txgbe: support to add ether type filter by ethtool ANBZ: #8072 support to add ether type filter by ethtool and fix vf rss function. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	05464c990e	anolis: net: txgbe: add led blink support for oem id 0x1ff9 ANBZ: #8072 ethtool -p ethx, add support for oem id 0x1ff9. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	65b7bbfdcd	anolis: net: txgbe: fix ethtool -t loopback test failed ANBZ: #8072 in version 5042000f firmware, ethtool -t ethx will failed. because diag_test will clear driver load bit and lan reset, and then firmware will configuration pcs, it will cause loopback test failed. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	48e24cb036	anolis: net: txgbe: support copper modules and DAC ANBZ: #8072 add support for detecting copper module link status and DAC cable. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	e8e8e274be	anolis: net: txgbe: fix cannot link to 10G ANBZ: #8072 fix ethool -s ethx autoneg on cannot link to 10G, step: 1.ethtool -s eth0 speed 1000 duplex full autoneg on 2.unplug eth0 fiber 3.ethtool -s eth0 autoneg on 4.plug fiber link speed is 1G. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	1f2ec94aa0	anolis: net: txgbe: return error code for unsupported parameters ANBZ: #8072 When do ethtool -C, if not change any coalesce parameters supported will return -EINVAL. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	ccb245692a	anolis: net: txgbe: support for 802.1ad ANBZ: #8072 add support for 802.1ad vlan, offload setting follows 802.1q vlan setting. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	6867fd682f	anolis: net: txgbe: add support to show fw version on vf ANBZ: #8072 add support for show fw version on vf driver. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	f435e65a3b	anolis: net: txgbe:set i2c_speed to standard mode ANBZ: #8072 real i2c speed larger than standard mode, set i2c speed to standard mode. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	1c68adced0	anolis: net: txgbe: sriov mode can't enable lro ANBZ: #8072 after enabling sriov mode, ethtool -k to set ntuple on will return requested on, and setting is not effective. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	2410a678b3	anolis: txgbevf: fix make allyesconfig build failed ANBZ: #8072 make allyesconfig, build failed because multiple definition, change txgbevf module function names. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	26fab43533	anolis: net: txgbe: support for pf change ntuple setting ANBZ: #8072 ethtool -k ethx show ntuple setting is fixed, add support fot changing ntuple setting. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	544be4200a	anolis: net: txgbe: fix set vf ntuple rule cannot work ANBZ: #8072 pf ethtool operation for ntuple setting, is not supported for vf, add support for ntuple rules to flow packets to vf queue. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	93c4c80423	anolis: net: wangxun: change driver version ANBZ: #8072 append driver version with anolis, it is helpful to distinguish inbox driver and out of tree driver. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	b3718d5c2e	anolis: net: txgbe: fix qinq or double vlan tso is not work ANBZ: #8072 add ndo_fetures_check for double vlan or qinq to fix tso bug. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	2486231a5a	anolis: net: txgbe: fix different vlanid can send to virtual function ANBZ: #8072 pf ack vf vlan setting, didn't check vid in active_vlan. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Duanqiang Wen	c806ad788a	anolis: config: default to build txgbevf for module ANBZ: #8072 default to build txgbevf for module, only in x86 and arm64 arch. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
DuanqiangWen	ddb0a323cf	anolis: net: txgbevf: add support for power management ANBZ: #8072 add support for power management interface, for suspending and resuming nic. Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com> Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
DuanqiangWen	34cc16beef	anolis: net: anolis: add support for ethtool ANBZ: #8072 add support for ethtool, use ethtool can get some virtual function information. Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com> Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
DuanqiangWen	19411b102a	anolis: net: txgbevf: add support for tx/rx traffic ANBZ: #8072 add xmit and receive codes for virtual function. Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com> Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
DuanqiangWen	5a6f19204f	anolis: net: txgbevf: add hardware initialization ANBZ: #8072 initialize hardware, including vf mac layer, mailbox interface. Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com> Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
DuanqiangWen	59cb85c93e	anolis: net: txgbevf: Add build support for txgbevf ANBZ: #8072 Add doc build infrastructure for txgbevf driver. Initialize PCI memory space for WangXun 10 Gigabit virtual function Ethernet devices. Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com> Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3121	2024-05-27 07:30:09 +00:00
Ricardo B. Marliere	8d56d12294	media: pvrusb2: fix use after free on context disconnection ANBZ: #8555 commit `ded85b0c0e` upstream. Upon module load, a kthread is created targeting the pvr2_context_thread_func function, which may call pvr2_context_destroy and thus call kfree() on the context object. However, that might happen before the usb hub_event handler is able to notify the driver. This patch adds a sanity check before the invalid read reported by syzbot, within the context disconnection call stack. Reported-and-tested-by: syzbot+621409285c4156a009b3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000a02a4205fff8eb92@google.com/ Fixes: `e5be15c638` ("V4L/DVB (7711): pvrusb2: Fix race on module unload") Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Acked-by: Mike Isely <isely@pobox.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Fixes: CVE-2023-52445 Signed-off-by: Xiao Long <xiaolong@openanolis.org> Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3021	2024-05-22 12:03:38 +00:00
Joseph Qi	266e564f30	anolis: check cgroup v1 for memcg_blkcg_tree operations ANBZ: #8973 Currently parameter 'cgwb_v1' can be setup unconditionally. Take the following abnormal case into consideration: System administrator configures both 'cgwb_v1' and 'systemd.unified_cgroup_hierarchy=1' in command line by mistake, so we use cgroup v2 after boot in fact. Though we'll check if current kernel is under cgroup v2 in inode_cgwb_enabled(), we still allocate, insert and delete links for memcg_blkcg_tree since we only check parameter 'cgwb_v1'. This seems no actual harm, but it is entirely unnecessary and wasty. So restrict these operations only under cgroup v1. Since bdi initialization is before enabling cgroup subsys, so we'll still create debug file bdi_wb_link but without any links in above abnormal case. Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Xu Yu <xuyu@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3147	2024-05-09 15:54:37 +08:00
Mao Wenan	a665acc125	af_packet: set defaule value for tmo ANBZ: #8733 [ Upstream commit `b43d1f9f70` ] There is softlockup when using TPACKET_V3: ... NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms! (__irq_svc) from [<c0558a0c>] (_raw_spin_unlock_irqrestore+0x44/0x54) (_raw_spin_unlock_irqrestore) from [<c027b7e8>] (mod_timer+0x210/0x25c) (mod_timer) from [<c0549c30>] (prb_retire_rx_blk_timer_expired+0x68/0x11c) (prb_retire_rx_blk_timer_expired) from [<c027a7ac>] (call_timer_fn+0x90/0x17c) (call_timer_fn) from [<c027ab6c>] (run_timer_softirq+0x2d4/0x2fc) (run_timer_softirq) from [<c021eaf4>] (__do_softirq+0x218/0x318) (__do_softirq) from [<c021eea0>] (irq_exit+0x88/0xac) (irq_exit) from [<c0240130>] (msa_irq_exit+0x11c/0x1d4) (msa_irq_exit) from [<c0209cf0>] (handle_IPI+0x650/0x7f4) (handle_IPI) from [<c02015bc>] (gic_handle_irq+0x108/0x118) (gic_handle_irq) from [<c0558ee4>] (__irq_usr+0x44/0x5c) ... If __ethtool_get_link_ksettings() is failed in prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies is zero and the timer expire for retire_blk_timer is turn to mod_timer(&pkc->retire_blk_timer, jiffies + 0), which will trigger cpu usage of softirq is 100%. Fixes: `f6fb8f100b` ("af-packet: TPACKET_V3 flexible buffer implementation.") Tested-by: Xiao Jiangfeng <xiaojiangfeng@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tom Yang <yangqixiao@inspur.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3036	2024-05-06 07:57:11 +00:00
Cruz Zhao	9a4dfd6c08	anolis: sched/fair: fix underclass unscheduled after ID_ABSOLUTE_EXPEL turned off ANBZ: #8821 In function sync_min_vruntime(), expel_start and expel_spread will not be cleared if CONFIG_SCHED_SMT is off, which results that the priority of underclass will be much lower than other once ID_ABSOLUTE_EXPEL is turned off, because min_vruntime << min_under_vruntime + expel_spread after sync_min_vruntime(). To fix this problem, we clear expel_start and expel_spread in sync_min_vruntime() regardless of whether CONFIG_SCHED_SMT is on. Fixes: 139aefab8eaa("anolis: sched/fair: introduce sched_feat ID_ABSOLUTE_EXPEL") Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com> Reviewed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3104	2024-04-29 02:31:42 +00:00
Philo Lu	18288b4ee8	anolis: Revert "anolis: virtio-net: open napi for tx" ANBZ: #8910 This reverts commit `8927cac904`. Few regressions are found in benchmarks, so we decide to disable napi_tx by default, which is in consistence with old versions before, to keep the performance stable. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3106	2024-04-28 21:08:34 +08:00
Cruz Zhao	c805900600	anolis: sched/fair: fix invalid ID_ABSOLUTE_EXPEL without CONFIG_SCHED_SMT ANBZ: #8821 If CONFIG_SCHED_SMT is turned off, ID_ABSOLUTE_EXPELL will be invalid, because update_expel_start() is a NULL function, resulting that there's no chance to adjust vruntime to make underclass lag, and underclass tasks will get a chance to run, unexpectedly. To fix this problem, we just change the logic of id_vruntime_before(), letting the vruntime of highclass and normal sched_entity always be before underclass sched_entity. Fixes: 139aefab8eaa("anolis: sched/fair: introduce sched_feat ID_ABSOLUTE_EXPEL") Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com> Reviewed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3085	2024-04-24 02:08:55 +00:00
Johannes Weiner	d62bc059c9	mm: fix false-positive OVERCOMMIT_GUESS failures ANBZ: #8860 commit `8c7829b04c` upstream With the default overcommit==guess we occasionally run into mmap rejections despite plenty of memory that would get dropped under pressure but just isn't accounted reclaimable. One example of this is dying cgroups pinned by some page cache. A previous case was auxiliary path name memory associated with dentries; we have since annotated those allocations to avoid overcommit failures (see `d79f7aa496` ("mm: treat indirectly reclaimable memory as free in overcommit logic")). But trying to classify all allocated memory reliably as reclaimable and unreclaimable is a bit of a fool's errand. There could be a myriad of dependencies that constantly change with kernel versions. It becomes even more questionable of an effort when considering how this estimate of available memory is used: it's not compared to the system-wide allocated virtual memory in any way. It's not even compared to the allocating process's address space. It's compared to the single allocation request at hand! So we have an elaborate left-hand side of the equation that tries to assess the exact breathing room the system has available down to a page - and then compare it to an isolated allocation request with no additional context. We could fail an allocation of N bytes, but for two allocations of N/2 bytes we'd do this elaborate dance twice in a row and then still let N bytes of virtual memory through. This doesn't make a whole lot of sense. Let's take a step back and look at the actual goal of the heuristic. From the documentation: Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default. If all we want to do is catch clearly bogus allocation requests irrespective of the general virtual memory situation, the physical memory counter-part doesn't need to be that complicated, either. When in GUESS mode, catch wild allocations by comparing their request size to total amount of ram and swap in the system. Link: http://lkml.kernel.org/r/20190412191418.26333-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kaihao Bai <carlo.bai@linux.alibaba.com> Reviewed-by: Xu Yu <xuyu@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3088	2024-04-23 17:33:25 +08:00
Guixin Liu	35d297d6b4	anolis: net: directly copy page instead of map page ANBZ: #8749 If __skb_datagram_iter's cb parm is simple_copy_to_iter, we dont need to map page first, just use copy_page_to_iter to copy page directly. And also remove simple_copy_to_iter(). Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3050	2024-04-18 05:55:54 +00:00
Dust Li	a5de83dd72	anolis: mlx5: fix double rcu_read_lock() in mlx5_eq_cq_get() ANBZ: #8774 when backporting upstream commit 1fbf1252df0e42("mlx5: use RCU lock in mlx5_eq_cq_get()"), we miss used the rcu_read_lock() twice, without unlock. Fixes: 1335c7384274("mlx5: use RCU lock in mlx5_eq_cq_get()") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3061	2024-04-17 10:14:14 +00:00
Cruz Zhao	83842b76dd	anolis: sched/fair: optimize ID_LOAD_BALANCE to rescue underclass ANBZ: #8758 ID_LOAD_BALANCE tends to migrate highclass and normal tasks first to prevent cpu competition among them, which will result that underclass tasks lose the migration opportunity with a high probability, even when they are expelled. To optimize ID_LOAD_BALANCE, we will redo load balance if there is still imbalance, and in the second loop we will allow migrating underclass tasks. Fixes: 9fa7c9d6eb14("anolis: sched/fair: introduce sched_feat ID_LOAD_BALANCE") Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com> Reviewed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3054	2024-04-17 07:16:33 +00:00
Liguang Zhang	56cf8b6f8b	PCI: pciehp: Clear cmd_busy bit in polling mode ANBZ: #8731 commit `92912b1751` upstream. Writes to a Downstream Port's Slot Control register are PCIe hotplug "commands." If the Port supports Command Completed events, software must wait for a command to complete before writing to Slot Control again. pcie_do_write_cmd() sets ctrl->cmd_busy when it writes to Slot Control. If software notification is enabled, i.e., PCI_EXP_SLTCTL_HPIE and PCI_EXP_SLTCTL_CCIE are set, ctrl->cmd_busy is cleared by pciehp_isr(). But when software notification is disabled, as it is when pcie_init() powers off an empty slot, pcie_wait_cmd() uses pcie_poll_cmd() to poll for command completion, and it neglects to clear ctrl->cmd_busy, which leads to spurious timeouts: pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 (issued 2264 msec ago) pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 (issued 2288 msec ago) Clear ctrl->cmd_busy in pcie_poll_cmd() when it detects a Command Completed event (PCI_EXP_SLTSTA_CC). [bhelgaas: commit log] Fixes: `a5dd4b4b05` ("PCI: pciehp: Wait for hotplug command completion where necessary") Link: https://lore.kernel.org/r/20211111054258.7309-1-zhangliguang@linux.alibaba.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143 Link: https://lore.kernel.org/r/20211126173309.GA12255@wunner.de Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: chuguangqing <chuguangqing@inspur.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3033	2024-04-10 03:10:56 +00:00
Lukas Wunner	34d0ea664c	PCI: pciehp: Fix infinite loop in IRQ handler upon power fault ANBZ: #8731 commit `23584c1ed3` upstream. The Power Fault Detected bit in the Slot Status register differs from all other hotplug events in that it is sticky: It can only be cleared after turning off slot power. Per PCIe r5.0, sec. 6.7.1.8: If a power controller detects a main power fault on the hot-plug slot, it must automatically set its internal main power fault latch [...]. The main power fault latch is cleared when software turns off power to the hot-plug slot. The stickiness used to cause interrupt storms and infinite loops which were fixed in 2009 by commits `5651c48cfa` ("PCI pciehp: fix power fault interrupt storm problem") and `99f0169c17` ("PCI: pciehp: enable software notification on empty slots"). Unfortunately in 2020 the infinite loop issue was inadvertently reintroduced by commit `8edf5332c3` ("PCI: pciehp: Fix MSI interrupt race"): The hardirq handler pciehp_isr() clears the PFD bit until pciehp's power_fault_detected flag is set. That happens in the IRQ thread pciehp_ist(), which never learns of the event because the hardirq handler is stuck in an infinite loop. Fix by setting the power_fault_detected flag already in the hardirq handler. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214989 Link: https://lore.kernel.org/linux-pci/DM8PR11MB5702255A6A92F735D90A4446868B9@DM8PR11MB5702.namprd11.prod.outlook.com Fixes: `8edf5332c3` ("PCI: pciehp: Fix MSI interrupt race") Link: https://lore.kernel.org/r/66eaeef31d4997ceea357ad93259f290ededecfd.1637187226.git.lukas@wunner.de Reported-by: Joseph Bao <joseph.bao@intel.com> Tested-by: Joseph Bao <joseph.bao@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v4.19+ Cc: Stuart Hayes <stuart.w.hayes@gmail.com> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: chuguangqing <chuguangqing@inspur.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3033	2024-04-10 03:10:56 +00:00
Stuart Hayes	2b963e6ed5	PCI: pciehp: Fix MSI interrupt race ANBZ: #8731 [ Upstream commit `8edf5332c3` ] Without this commit, a PCIe hotplug port can stop generating interrupts on hotplug events, so device adds and removals will not be seen: The pciehp interrupt handler pciehp_isr() reads the Slot Status register and then writes back to it to clear the bits that caused the interrupt. If a different interrupt event bit gets set between the read and the write, pciehp_isr() returns without having cleared all of the interrupt event bits. If this happens when the MSI isn't masked (which by default it isn't in handle_edge_irq(), and which it will never be when MSI per-vector masking is not supported), we won't get any more hotplug interrupts from that device. That is expected behavior, according to the PCIe Base Spec r5.0, section 6.7.3.4, "Software Notification of Hot-Plug Events". Because the Presence Detect Changed and Data Link Layer State Changed event bits can both get set at nearly the same time when a device is added or removed, this is more likely to happen than it might seem. The issue was found (and can be reproduced rather easily) by connecting and disconnecting an NVMe storage device on at least one system model where the NVMe devices were being connected to an AMD PCIe port (PCI device 0x1022/0x1483). Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot Status register immediately after writing to it, until it sees that all of the event status bits have been cleared. [lukas: drop loop count limitation, write "events" instead of "status", don't loop back in INTx and poll modes, tweak code comment & commit msg] Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: chuguangqing <chuguangqing@inspur.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3033	2024-04-10 03:10:56 +00:00
Coly Li	bece57dd9b	bcache: avoid NULL checking to c->root in run_cache_set() ANBZ: #8720 commit `3eba5e0b24` upstream. In run_cache_set() after c->root returned from bch_btree_node_get(), it is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL because bch_btree_node_get() will not return NULL pointer to caller. This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-11-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Coly Li	bbcf50d69f	bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc() ANBZ: #8720 commit `31f5b956a1` upstream. This patch adds code comments to bch_btree_node_get() and __bch_btree_node_alloc() that NULL pointer will not be returned and it is unnecessary to check NULL pointer by the callers of these routines. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-10-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Coly Li	b8b4c2407f	bcache: avoid oversize memory allocation by small stripe_size ANBZ: #8720 commit `baf8fb7e0e` upstream. Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are used for dirty data writeback, their sizes are decided by backing device capacity and stripe size. Larger backing device capacity or smaller stripe size make these two arraies occupies more dynamic memory space. Currently bcache->stripe_size is directly inherited from queue->limits.io_opt of underlying storage device. For normal hard drives, its limits.io_opt is 0, and bcache sets the corresponding stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for devices do declare value for queue->limits.io_opt, small stripe_size (comparing to 1TB) becomes an issue for oversize memory allocations of bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the capacity of hard drives gets much larger in recent decade. For example a raid5 array assembled by three 20TB hardrives, the raid device capacity is 40TB with typical 512KB limits.io_opt. After the math calculation in bcache code, these two arraies will occupy 400MB dynamic memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is declared on a new 2TB hard drive, then these two arraies request 2GB and 512MB dynamic memory from kzalloc(). The result is that bcache device always fails to initialize on his system. To avoid the oversize memory allocation, bcache->stripe_size should not directly inherited by queue->limits.io_opt from the underlying device. This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size and set bcache device's stripe size against the declared limits.io_opt value from the underlying storage device, - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will set its stripe size directly by this limits.io_opt value. - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will set its stripe size by a value multiplying limits.io_opt and euqal or large than BCH_MIN_STRIPE_SZ. Then the minimal stripe size of a bcache device will always be >= 4MB. For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied by these two arraies will be 2.5MB in total. Such mount of memory allocated for bcache->stripe_sectors_dirty and bcache->full_dirty_stripes is reasonable for most of storage devices. Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net> Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Markus Weippert	d871f7434d	bcache: revert replacing IS_ERR_OR_NULL with IS_ERR ANBZ: #8720 commit `bb6cc25386` upstream. Commit `028ddcac47` ("bcache: Remove unnecessary NULL point check in node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000080 Call Trace: ? __die_body.cold+0x1a/0x1f ? page_fault_oops+0xd2/0x2b0 ? exc_page_fault+0x70/0x170 ? asm_exc_page_fault+0x22/0x30 ? btree_node_free+0xf/0x160 [bcache] ? up_write+0x32/0x60 btree_gc_coalesce+0x2aa/0x890 [bcache] ? bch_extent_bad+0x70/0x170 [bcache] btree_gc_recurse+0x130/0x390 [bcache] ? btree_gc_mark_node+0x72/0x230 [bcache] bch_btree_gc+0x5da/0x600 [bcache] ? cpuusage_read+0x10/0x10 ? bch_btree_gc+0x600/0x600 [bcache] bch_gc_thread+0x135/0x180 [bcache] The relevant code starts with: new_nodes[0] = NULL; for (i = 0; i < nodes; i++) { if (__bch_keylist_realloc(&keylist, bkey_u64s(&r[i].b->key))) goto out_nocoalesce; // ... out_nocoalesce: // ... for (i = 0; i < nodes; i++) if (!IS_ERR(new_nodes[i])) { // IS_ERR_OR_NULL before `028ddcac47` btree_node_free(new_nodes[i]); // new_nodes[0] is NULL rw_unlock(true, new_nodes[i]); } This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this. Fixes: `028ddcac47` ("bcache: Remove unnecessary NULL point check in node allocations") Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/ Cc: stable@vger.kernel.org Cc: Zheng Wang <zyytlz.wz@163.com> Cc: Coly Li <colyli@suse.de> Signed-off-by: Markus Weippert <markus@gekmihesg.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Rand Deeb	4a7d815e44	bcache: prevent potential division by zero error ANBZ: #8720 commit `2c7f497ac2` upstream. In SHOW(), the variable 'n' is of type 'size_t.' While there is a conditional check to verify that 'n' is not equal to zero before executing the 'do_div' macro, concerns arise regarding potential division by zero error in 64-bit environments. The concern arises when 'n' is 64 bits in size, greater than zero, and the lower 32 bits of it are zeros. In such cases, the conditional check passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to 'uint32_t,' effectively truncating it to its lower 32 bits. Consequently, the 'n' value becomes zero. To fix this potential division by zero error and ensure precise division handling, this commit replaces the 'do_div' macro with div64_u64(). div64_u64() is designed to work with 64-bit operands, guaranteeing that division is performed correctly. This change enhances the robustness of the code, ensuring that division operations yield accurate results in all scenarios, eliminating the possibility of division by zero, and improving compatibility across different 64-bit environments. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-5-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Coly Li	6cdca0c05a	bcache: check return value from btree_node_alloc_replacement() ANBZ: #8720 commit `777967e7e9` upstream. In btree_gc_rewrite_node(), pointer 'n' is not checked after it returns from btree_gc_rewrite_node(). There is potential possibility that 'n' is a non NULL ERR_PTR(), referencing such error code is not permitted in following code. Therefore a return value checking is necessary after 'n' is back from btree_node_alloc_replacement(). Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231120052503.6122-3-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Coly Li	82eb5f3436	bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce() ANBZ: #8720 commit `f72f4312d4` upstream. Commit `028ddcac47` ("bcache: Remove unnecessary NULL point check in node allocations") do the following change inside btree_gc_coalesce(), 31 @@ -1340,7 +1340,7 @@ static int btree_gc_coalesce( 32 memset(new_nodes, 0, sizeof(new_nodes)); 33 closure_init_stack(&cl); 34 35 - while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b)) 36 + while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b)) 37 keys += r[nodes++].keys; 38 39 blocks = btree_default_blocks(b->c) * 2 / 3; At line 35 the original r[nodes].b is not always allocatored from __bch_btree_node_alloc(), and possibly initialized as NULL pointer by caller of btree_gc_coalesce(). Therefore the change at line 36 is not correct. This patch replaces the mistaken IS_ERR() by IS_ERR_OR_NULL() to avoid potential issue. Fixes: `028ddcac47` ("bcache: Remove unnecessary NULL point check in node allocations") Cc: <stable@vger.kernel.org> # 6.5+ Cc: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-9-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Zheng Wang	ba81754866	bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent ANBZ: #8720 commit `80fca8a10b` upstream. In some specific situations, the return value of __bch_btree_node_alloc may be NULL. This may lead to a potential NULL pointer dereference in caller function like a calling chain : btree_split->bch_btree_node_alloc->__bch_btree_node_alloc. Fix it by initializing the return value in __bch_btree_node_alloc. Fixes: `cafe563591` ("bcache: A block layer cache") Cc: stable@vger.kernel.org Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20230615121223.22502-6-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Coly Li	df5923bd45	bcache: remove 'int n' from parameter list of bch_bucket_alloc_set() ANBZ: #8720 commit `17e4aed830` upstream. The parameter 'int n' from bch_bucket_alloc_set() is not cleared defined. From the code comments n is the number of buckets to alloc, but from the code itself 'n' is the maximum cache to iterate. Indeed all the locations where bch_bucket_alloc_set() is called, 'n' is alwasy 1. This patch removes the confused and unnecessary 'int n' from parameter list of bch_bucket_alloc_set(), and explicitly allocates only 1 bucket for its caller. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: `80fca8a10b` ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00
Shenghui Wang	d38337a3fb	bcache: use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set ANBZ: #8720 commit `8792099f9a` upstream. Current cache_set has MAX_CACHES_PER_SET caches most, and the macro is used for " struct cache *cache_by_alloc[MAX_CACHES_PER_SET]; " in the define of struct cache_set. Use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set. Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: `80fca8a10b` ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/3034	2024-04-10 02:34:36 +00:00

1 2 3 4 5 ...

801122 Commits All Branches Search

801122 Commits

All Branches