ANBZ: #8072
when vf driver make modules_install,
if echo vf of different ports successively,
there will be problems with the mailbox lock.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
when enable sriov, ethtool -l ethx to get port
pre-set max_combined always be 1, but in reality
max_combined depends on num_vfs.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
fix vf only supports a maximum of 2 queues,
get maximum of queues for pf msg[TXGBE_VF_RX_QUEUES]
and msg[TXGBE_VF_TX_QUEUES]. set maximum of queues to
4.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
in version 5042000f firmware, ethtool -t ethx will failed.
because diag_test will clear driver load bit and lan reset,
and then firmware will configuration pcs, it will cause loopback
test failed.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
When do ethtool -C, if not change any coalesce parameters supported
will return -EINVAL.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
after enabling sriov mode, ethtool -k to set ntuple on
will return requested on, and setting is not effective.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
pf ethtool operation for ntuple setting, is not
supported for vf, add support for ntuple rules to
flow packets to vf queue.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
append driver version with anolis,
it is helpful to distinguish inbox driver
and out of tree driver.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
add support for power management interface,
for suspending and resuming nic.
Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com>
Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8072
add support for ethtool, use ethtool
can get some virtual function information.
Signed-off-by: DuanqiangWen <duanqiangwen@net-swift.com>
Reviewed-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3121
ANBZ: #8555
commit ded85b0c0e upstream.
Upon module load, a kthread is created targeting the
pvr2_context_thread_func function, which may call pvr2_context_destroy
and thus call kfree() on the context object. However, that might happen
before the usb hub_event handler is able to notify the driver. This
patch adds a sanity check before the invalid read reported by syzbot,
within the context disconnection call stack.
Reported-and-tested-by: syzbot+621409285c4156a009b3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000a02a4205fff8eb92@google.com/
Fixes: e5be15c638 ("V4L/DVB (7711): pvrusb2: Fix race on module unload")
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Acked-by: Mike Isely <isely@pobox.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: CVE-2023-52445
Signed-off-by: Xiao Long <xiaolong@openanolis.org>
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3021
ANBZ: #8973
Currently parameter 'cgwb_v1' can be setup unconditionally.
Take the following abnormal case into consideration:
System administrator configures both 'cgwb_v1' and
'systemd.unified_cgroup_hierarchy=1' in command line by mistake, so we
use cgroup v2 after boot in fact. Though we'll check if current kernel
is under cgroup v2 in inode_cgwb_enabled(), we still allocate, insert
and delete links for memcg_blkcg_tree since we only check parameter
'cgwb_v1'.
This seems no actual harm, but it is entirely unnecessary and wasty. So
restrict these operations only under cgroup v1. Since bdi initialization
is before enabling cgroup subsys, so we'll still create debug file
bdi_wb_link but without any links in above abnormal case.
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Xu Yu <xuyu@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3147
ANBZ: #8733
[ Upstream commit b43d1f9f70 ]
There is softlockup when using TPACKET_V3:
...
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
(__irq_svc) from [<c0558a0c>] (_raw_spin_unlock_irqrestore+0x44/0x54)
(_raw_spin_unlock_irqrestore) from [<c027b7e8>] (mod_timer+0x210/0x25c)
(mod_timer) from [<c0549c30>]
(prb_retire_rx_blk_timer_expired+0x68/0x11c)
(prb_retire_rx_blk_timer_expired) from [<c027a7ac>]
(call_timer_fn+0x90/0x17c)
(call_timer_fn) from [<c027ab6c>] (run_timer_softirq+0x2d4/0x2fc)
(run_timer_softirq) from [<c021eaf4>] (__do_softirq+0x218/0x318)
(__do_softirq) from [<c021eea0>] (irq_exit+0x88/0xac)
(irq_exit) from [<c0240130>] (msa_irq_exit+0x11c/0x1d4)
(msa_irq_exit) from [<c0209cf0>] (handle_IPI+0x650/0x7f4)
(handle_IPI) from [<c02015bc>] (gic_handle_irq+0x108/0x118)
(gic_handle_irq) from [<c0558ee4>] (__irq_usr+0x44/0x5c)
...
If __ethtool_get_link_ksettings() is failed in
prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
is zero and the timer expire for retire_blk_timer is turn to
mod_timer(&pkc->retire_blk_timer, jiffies + 0),
which will trigger cpu usage of softirq is 100%.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Tested-by: Xiao Jiangfeng <xiaojiangfeng@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tom Yang <yangqixiao@inspur.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3036
ANBZ: #8821
In function sync_min_vruntime(), expel_start and expel_spread will not
be cleared if CONFIG_SCHED_SMT is off, which results that the priority
of underclass will be much lower than other once ID_ABSOLUTE_EXPEL is
turned off, because min_vruntime << min_under_vruntime + expel_spread
after sync_min_vruntime().
To fix this problem, we clear expel_start and expel_spread in
sync_min_vruntime() regardless of whether CONFIG_SCHED_SMT is on.
Fixes: 139aefab8eaa("anolis: sched/fair: introduce sched_feat ID_ABSOLUTE_EXPEL")
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Reviewed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3104
ANBZ: #8910
This reverts commit 8927cac904.
Few regressions are found in benchmarks, so we decide to disable napi_tx
by default, which is in consistence with old versions before, to keep
the performance stable.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3106
ANBZ: #8821
If CONFIG_SCHED_SMT is turned off, ID_ABSOLUTE_EXPELL will be invalid,
because update_expel_start() is a NULL function, resulting that there's no
chance to adjust vruntime to make underclass lag, and underclass tasks
will get a chance to run, unexpectedly.
To fix this problem, we just change the logic of id_vruntime_before(),
letting the vruntime of highclass and normal sched_entity always be
before underclass sched_entity.
Fixes: 139aefab8eaa("anolis: sched/fair: introduce sched_feat ID_ABSOLUTE_EXPEL")
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Reviewed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3085
ANBZ: #8860
commit 8c7829b04c upstream
With the default overcommit==guess we occasionally run into mmap
rejections despite plenty of memory that would get dropped under
pressure but just isn't accounted reclaimable. One example of this is
dying cgroups pinned by some page cache. A previous case was auxiliary
path name memory associated with dentries; we have since annotated
those allocations to avoid overcommit failures (see d79f7aa496 ("mm:
treat indirectly reclaimable memory as free in overcommit logic")).
But trying to classify all allocated memory reliably as reclaimable
and unreclaimable is a bit of a fool's errand. There could be a myriad
of dependencies that constantly change with kernel versions.
It becomes even more questionable of an effort when considering how
this estimate of available memory is used: it's not compared to the
system-wide allocated virtual memory in any way. It's not even
compared to the allocating process's address space. It's compared to
the single allocation request at hand!
So we have an elaborate left-hand side of the equation that tries to
assess the exact breathing room the system has available down to a
page - and then compare it to an isolated allocation request with no
additional context. We could fail an allocation of N bytes, but for
two allocations of N/2 bytes we'd do this elaborate dance twice in a
row and then still let N bytes of virtual memory through. This doesn't
make a whole lot of sense.
Let's take a step back and look at the actual goal of the
heuristic. From the documentation:
Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a
seriously wild allocation fails while allowing overcommit to
reduce swap usage. root is allowed to allocate slightly more
memory in this mode. This is the default.
If all we want to do is catch clearly bogus allocation requests
irrespective of the general virtual memory situation, the physical
memory counter-part doesn't need to be that complicated, either.
When in GUESS mode, catch wild allocations by comparing their request
size to total amount of ram and swap in the system.
Link: http://lkml.kernel.org/r/20190412191418.26333-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kaihao Bai <carlo.bai@linux.alibaba.com>
Reviewed-by: Xu Yu <xuyu@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3088
ANBZ: #8749
If __skb_datagram_iter's cb parm is simple_copy_to_iter, we dont need
to map page first, just use copy_page_to_iter to copy page directly.
And also remove simple_copy_to_iter().
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3050
ANBZ: #8774
when backporting upstream commit
1fbf1252df0e42("mlx5: use RCU lock in mlx5_eq_cq_get()"),
we miss used the rcu_read_lock() twice, without unlock.
Fixes: 1335c7384274("mlx5: use RCU lock in mlx5_eq_cq_get()")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3061
ANBZ: #8758
ID_LOAD_BALANCE tends to migrate highclass and normal tasks first to
prevent cpu competition among them, which will result that underclass
tasks lose the migration opportunity with a high probability, even
when they are expelled.
To optimize ID_LOAD_BALANCE, we will redo load balance if there is
still imbalance, and in the second loop we will allow migrating
underclass tasks.
Fixes: 9fa7c9d6eb14("anolis: sched/fair: introduce sched_feat ID_LOAD_BALANCE")
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Reviewed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3054
ANBZ: #8731
commit 92912b1751 upstream.
Writes to a Downstream Port's Slot Control register are PCIe hotplug
"commands." If the Port supports Command Completed events, software must
wait for a command to complete before writing to Slot Control again.
pcie_do_write_cmd() sets ctrl->cmd_busy when it writes to Slot Control. If
software notification is enabled, i.e., PCI_EXP_SLTCTL_HPIE and
PCI_EXP_SLTCTL_CCIE are set, ctrl->cmd_busy is cleared by pciehp_isr().
But when software notification is disabled, as it is when pcie_init()
powers off an empty slot, pcie_wait_cmd() uses pcie_poll_cmd() to poll for
command completion, and it neglects to clear ctrl->cmd_busy, which leads to
spurious timeouts:
pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 (issued 2264 msec ago)
pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 (issued 2288 msec ago)
Clear ctrl->cmd_busy in pcie_poll_cmd() when it detects a Command Completed
event (PCI_EXP_SLTSTA_CC).
[bhelgaas: commit log]
Fixes: a5dd4b4b05 ("PCI: pciehp: Wait for hotplug command completion where necessary")
Link: https://lore.kernel.org/r/20211111054258.7309-1-zhangliguang@linux.alibaba.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143
Link: https://lore.kernel.org/r/20211126173309.GA12255@wunner.de
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: chuguangqing <chuguangqing@inspur.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3033
ANBZ: #8731
commit 23584c1ed3 upstream.
The Power Fault Detected bit in the Slot Status register differs from
all other hotplug events in that it is sticky: It can only be cleared
after turning off slot power. Per PCIe r5.0, sec. 6.7.1.8:
If a power controller detects a main power fault on the hot-plug slot,
it must automatically set its internal main power fault latch [...].
The main power fault latch is cleared when software turns off power to
the hot-plug slot.
The stickiness used to cause interrupt storms and infinite loops which
were fixed in 2009 by commits 5651c48cfa ("PCI pciehp: fix power fault
interrupt storm problem") and 99f0169c17 ("PCI: pciehp: enable
software notification on empty slots").
Unfortunately in 2020 the infinite loop issue was inadvertently
reintroduced by commit 8edf5332c3 ("PCI: pciehp: Fix MSI interrupt
race"): The hardirq handler pciehp_isr() clears the PFD bit until
pciehp's power_fault_detected flag is set. That happens in the IRQ
thread pciehp_ist(), which never learns of the event because the hardirq
handler is stuck in an infinite loop. Fix by setting the
power_fault_detected flag already in the hardirq handler.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214989
Link: https://lore.kernel.org/linux-pci/DM8PR11MB5702255A6A92F735D90A4446868B9@DM8PR11MB5702.namprd11.prod.outlook.com
Fixes: 8edf5332c3 ("PCI: pciehp: Fix MSI interrupt race")
Link: https://lore.kernel.org/r/66eaeef31d4997ceea357ad93259f290ededecfd.1637187226.git.lukas@wunner.de
Reported-by: Joseph Bao <joseph.bao@intel.com>
Tested-by: Joseph Bao <joseph.bao@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: chuguangqing <chuguangqing@inspur.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3033
ANBZ: #8731
[ Upstream commit 8edf5332c3 ]
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: chuguangqing <chuguangqing@inspur.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3033
ANBZ: #8720
commit 3eba5e0b24 upstream.
In run_cache_set() after c->root returned from bch_btree_node_get(), it
is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL
because bch_btree_node_get() will not return NULL pointer to caller.
This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason.
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-11-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit 31f5b956a1 upstream.
This patch adds code comments to bch_btree_node_get() and
__bch_btree_node_alloc() that NULL pointer will not be returned and it
is unnecessary to check NULL pointer by the callers of these routines.
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-10-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit baf8fb7e0e upstream.
Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
used for dirty data writeback, their sizes are decided by backing device
capacity and stripe size. Larger backing device capacity or smaller
stripe size make these two arraies occupies more dynamic memory space.
Currently bcache->stripe_size is directly inherited from
queue->limits.io_opt of underlying storage device. For normal hard
drives, its limits.io_opt is 0, and bcache sets the corresponding
stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
devices do declare value for queue->limits.io_opt, small stripe_size
(comparing to 1TB) becomes an issue for oversize memory allocations of
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
capacity of hard drives gets much larger in recent decade.
For example a raid5 array assembled by three 20TB hardrives, the raid
device capacity is 40TB with typical 512KB limits.io_opt. After the math
calculation in bcache code, these two arraies will occupy 400MB dynamic
memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
declared on a new 2TB hard drive, then these two arraies request 2GB and
512MB dynamic memory from kzalloc(). The result is that bcache device
always fails to initialize on his system.
To avoid the oversize memory allocation, bcache->stripe_size should not
directly inherited by queue->limits.io_opt from the underlying device.
This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
and set bcache device's stripe size against the declared limits.io_opt
value from the underlying storage device,
- If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
set its stripe size directly by this limits.io_opt value.
- If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
set its stripe size by a value multiplying limits.io_opt and euqal or
large than BCH_MIN_STRIPE_SZ.
Then the minimal stripe size of a bcache device will always be >= 4MB.
For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
by these two arraies will be 2.5MB in total.
Such mount of memory allocated for bcache->stripe_sectors_dirty and
bcache->full_dirty_stripes is reasonable for most of storage devices.
Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net>
Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit 2c7f497ac2 upstream.
In SHOW(), the variable 'n' is of type 'size_t.' While there is a
conditional check to verify that 'n' is not equal to zero before
executing the 'do_div' macro, concerns arise regarding potential
division by zero error in 64-bit environments.
The concern arises when 'n' is 64 bits in size, greater than zero, and
the lower 32 bits of it are zeros. In such cases, the conditional check
passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to
'uint32_t,' effectively truncating it to its lower 32 bits.
Consequently, the 'n' value becomes zero.
To fix this potential division by zero error and ensure precise
division handling, this commit replaces the 'do_div' macro with
div64_u64(). div64_u64() is designed to work with 64-bit operands,
guaranteeing that division is performed correctly.
This change enhances the robustness of the code, ensuring that division
operations yield accurate results in all scenarios, eliminating the
possibility of division by zero, and improving compatibility across
different 64-bit environments.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit 777967e7e9 upstream.
In btree_gc_rewrite_node(), pointer 'n' is not checked after it returns
from btree_gc_rewrite_node(). There is potential possibility that 'n' is
a non NULL ERR_PTR(), referencing such error code is not permitted in
following code. Therefore a return value checking is necessary after 'n'
is back from btree_node_alloc_replacement().
Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20231120052503.6122-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit f72f4312d4 upstream.
Commit 028ddcac47 ("bcache: Remove unnecessary NULL point check in
node allocations") do the following change inside btree_gc_coalesce(),
31 @@ -1340,7 +1340,7 @@ static int btree_gc_coalesce(
32 memset(new_nodes, 0, sizeof(new_nodes));
33 closure_init_stack(&cl);
34
35 - while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b))
36 + while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b))
37 keys += r[nodes++].keys;
38
39 blocks = btree_default_blocks(b->c) * 2 / 3;
At line 35 the original r[nodes].b is not always allocatored from
__bch_btree_node_alloc(), and possibly initialized as NULL pointer by
caller of btree_gc_coalesce(). Therefore the change at line 36 is not
correct.
This patch replaces the mistaken IS_ERR() by IS_ERR_OR_NULL() to avoid
potential issue.
Fixes: 028ddcac47 ("bcache: Remove unnecessary NULL point check in node allocations")
Cc: <stable@vger.kernel.org> # 6.5+
Cc: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-9-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit 80fca8a10b upstream.
In some specific situations, the return value of __bch_btree_node_alloc
may be NULL. This may lead to a potential NULL pointer dereference in
caller function like a calling chain :
btree_split->bch_btree_node_alloc->__bch_btree_node_alloc.
Fix it by initializing the return value in __bch_btree_node_alloc.
Fixes: cafe563591 ("bcache: A block layer cache")
Cc: stable@vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20230615121223.22502-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit 17e4aed830 upstream.
The parameter 'int n' from bch_bucket_alloc_set() is not cleared
defined. From the code comments n is the number of buckets to alloc, but
from the code itself 'n' is the maximum cache to iterate. Indeed all the
locations where bch_bucket_alloc_set() is called, 'n' is alwasy 1.
This patch removes the confused and unnecessary 'int n' from parameter
list of bch_bucket_alloc_set(), and explicitly allocates only 1 bucket
for its caller.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 80fca8a10b ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034
ANBZ: #8720
commit 8792099f9a upstream.
Current cache_set has MAX_CACHES_PER_SET caches most, and the macro
is used for
"
struct cache *cache_by_alloc[MAX_CACHES_PER_SET];
"
in the define of struct cache_set.
Use MAX_CACHES_PER_SET instead of magic number 8 in
__bch_bucket_alloc_set.
Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 80fca8a10b ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://gitee.com/anolis/cloud-kernel/pulls/3034