In many places we can use damon_sz_region() to instead of "r->ar.end -
r->ar.start".
Link: https://lkml.kernel.org/r/20220927001946.85375-2-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rename sz_damon_region() to damon_sz_region(), and move it to
"include/linux/damon.h", because in many places, we can to use this func.
Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
alloc_pages(), kmalloc() and vmalloc() are all memory allocation functions
which can return NULL when some internal memory failures happen. So it is
better to check the return of them to catch the failure in time for better
test them.
Link: https://lkml.kernel.org/r/tencent_D44A49FFB420EDCCBFB9221C8D14DFE12908@qq.com
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is an optimization to reduce stackdepot pressure.
struct mmu_gather contains 7 1-bit fields packed into a 32-bit unsigned
int value. The remaining 25 bits remain uninitialized and are never used,
but KMSAN updates the origin for them in zap_pXX_range() in mm/memory.c,
thus creating very long origin chains. This is technically correct, but
consumes too much memory.
Unpoisoning the whole structure will prevent creating such chains.
Link: https://lkml.kernel.org/r/20220905122452.2258262-20-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The recent change of page_cache_ra_unbounded() arguments was buggy in the
two callers, causing us to readahead the wrong pages. Move the definition
of ractl down to after the index is set correctly. This affected
performance on configurations that use fs-verity.
Link: https://lkml.kernel.org/r/20221012193419.1453558-1-willy@infradead.org
Fixes: 73bb49da50 ("mm/readahead: make page_cache_ra_unbounded take a readahead_control")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Jintao Yin <nicememory@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit c462ac288f ("mm: Introduce arch_validate_flags()") added a late
check in mmap_region() to let architectures validate vm_flags. The check
needs to happen after calling ->mmap() as the flags can potentially be
modified during this callback.
If arch_validate_flags() check fails we unmap and free the vma. However,
the error path fails to undo the ->mmap() call that previously succeeded
and depending on the specific ->mmap() implementation this translates to
reference increments, memory allocations and other operations what will
not be cleaned up.
There are several places (mainly device drivers) where this is an issue.
However, one specific example is bpf_map_mmap() which keeps count of the
mappings in map->writecnt. The count is incremented on ->mmap() and then
decremented on vm_ops->close(). When arch_validate_flags() fails this
count is off since bpf_map_mmap_close() is never called.
One can reproduce this issue in arm64 devices with MTE support. Here the
vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
previously. From userspace then is enough to pass the PROT_MTE flag to
mmap() syscall to trigger the arch_validate_flags() failure.
The following program reproduces this issue:
#include <stdio.h>
#include <unistd.h>
#include <linux/unistd.h>
#include <linux/bpf.h>
#include <sys/mman.h>
int main(void)
{
union bpf_attr attr = {
.map_type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(long long),
.max_entries = 256,
.map_flags = BPF_F_MMAPABLE,
};
int fd;
fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);
return 0;
}
By manually adding some log statements to the vm_ops callbacks we can
confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon
->release():
With PROT_MTE flag:
root@debian:~# ./bpf-test
[ 111.263874] bpf_map_write_active_inc: map=9 writecnt=1
[ 111.288763] bpf_map_release: map=9 writecnt=1
Without PROT_MTE flag:
root@debian:~# ./bpf-test
[ 157.816912] bpf_map_write_active_inc: map=10 writecnt=1
[ 157.830442] bpf_map_write_active_dec: map=10 writecnt=0
[ 157.832396] bpf_map_release: map=10 writecnt=0
This patch fixes the above issue by calling vm_ops->close() when the
arch_validate_flags() check fails, after this we can proceed to unmap and
free the vma on the error path.
Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com
Fixes: c462ac288f ("mm: Introduce arch_validate_flags()")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
FIELD_GET() must only be used with a mask that is a compile-time
constant. Mark the functions as __always_inline to avoid the problem.
Fixes: 21b688dabe ("net: phy: micrel: Cable Diag feature for lan8814 phy")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Link: https://lore.kernel.org/r/20221011095437.12580-1-Divya.Koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)'
applies on a confirmed connection:
WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98
RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack]
Call Trace:
<TASK>
nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack]
__nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack]
__ovs_ct_lookup+0x72e/0x780 [openvswitch]
ovs_ct_execute+0x1d8/0x920 [openvswitch]
do_execute_actions+0x4e6/0xb60 [openvswitch]
ovs_execute_actions+0x60/0x140 [openvswitch]
ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch]
genl_family_rcv_msg_doit.isra.15+0x113/0x150
genl_rcv_msg+0xef/0x1f0
which can be reproduced with these OVS flows:
table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
actions=ct(commit, table=1)
table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
actions=ct(commit, alg=ftp),normal
The issue was introduced by commit 248d45f1e1 ("openvswitch: Allow
attaching helper in later commit") where it somehow removed the check
of nf_ct_is_confirmed before asigning the helper. This patch is to fix
it by bringing it back.
Fixes: 248d45f1e1 ("openvswitch: Allow attaching helper in later commit")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To
avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
for the reads and writes.
Thanks to Eric Dumazet for providing the syzbot report:
BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
__inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
__sys_connect_file net/socket.c:1976 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1993
__do_sys_connect net/socket.c:2003 [inline]
__se_sys_connect net/socket.c:2000 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:2000
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
__sys_setsockopt+0x212/0x2b0 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/26/2022
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 086d49058c ("ipv6: annotate some data-races around sk->sk_prot")
fixed some data-races around sk->sk_prot but it was not enough.
Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
load tearing.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adc ("udpv6:
Add lockless sendmsg() support") added a lockless memory allocation path,
which could cause a memory leak:
setsockopt(IPV6_ADDRFORM) sendmsg()
+-----------------------+ +-------+
- do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...)
- sockopt_lock_sock(sk) ^._ called via udpv6_prot
- lock_sock(sk) before WRITE_ONCE()
- WRITE_ONCE(sk->sk_prot, &tcp_prot)
- inet6_destroy_sock() - if (!corkreq)
- sockopt_release_sock(sk) - ip6_make_skb(sk, ...)
- release_sock(sk) ^._ lockless fast path for
the non-corking case
- __ip6_append_data(sk, ...)
- ipv6_local_rxpmtu(sk, ...)
- xchg(&np->rxpmtu, skb)
^._ rxpmtu is never freed.
- goto out_no_dst;
- lock_sock(sk)
For now, rxpmtu is only the case, but not to miss the future change
and a similar bug fixed in commit e27326009a ("net: ping6: Fix
memleak in ipv6_renew_options()."), let's set a new function to IPv6
sk->sk_destruct() and call inet6_cleanup_sock() there. Since the
conversion does not change sk->sk_destruct(), we can guarantee that
we can clean up IPv6 resources finally.
We can now remove all inet6_destroy_sock() calls from IPv6 protocol
specific ->destroy() functions, but such changes are invasive to
backport. So they can be posted as a follow-up later for net-next.
Fixes: 03485f2adc ("udpv6: Add lockless sendmsg() support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 4b340ae20d ("IPv6: Complete IPV6_DONTFRAG support") forgot
to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is
changed to udp_prot and ->destroy() never cleans it up, resulting in
a memory leak.
This is due to the discrepancy between inet6_destroy_sock() and
IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
to remove the difference.
However, this is not enough for now because rxpmtu can be changed
without lock_sock() after commit 03485f2adc ("udpv6: Add lockless
sendmsg() support"). We will fix this case in the following patch.
Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
in the future.
Fixes: 4b340ae20d ("IPv6: Complete IPV6_DONTFRAG support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Conor Dooley <mail@conchuod.ie> says:
From: Conor Dooley <conor.dooley@microchip.com>
The device trees produced automatically for the virt and spike machines
fail dt-validate on several grounds. Some of these need to be fixed in
the linux kernel's dt-bindings, but others are caused by bugs in QEMU.
Patches been sent that fix the QEMU issues [0], but a couple of them
need to be fixed in the kernel's dt-bindings. The first patches add
compatibles for "riscv,{clint,plic}0" which are present in drivers and
the auto generated QEMU dtbs.
Thanks to Rob Herring for reporting these issues [1],
Conor.
To reproduce the errors:
./build/qemu-system-riscv64 -nographic -machine virt,dumpdtb=qemu.dtb
dt-validate -p /path/to/linux/kernel/Documentation/devicetree/bindings/processed-schema.json qemu.dtb
(The processed schema needs to be generated first)
0 - https://lore.kernel.org/linux-riscv/20220810184612.157317-1-mail@conchuod.ie/
1 - https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/
* fix-dt-validate:
dt-bindings: riscv: add new riscv,isa strings for emulators
dt-bindings: interrupt-controller: sifive,plic: add legacy riscv compatible
dt-bindings: timer: sifive,clint: add legacy riscv compatible
Link: https://lore.kernel.org/r/20220823183319.3314940-1-mail@conchuod.ie
[Palmer: some cover letter pruning, and dropped #4 as suggested.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The QEMU virt and spike machines currently export a riscv,isa string of
"rv64imafdcsuh",
While the RISC-V foundation has been ratifying a bunch of extenstions
etc, the kernel has remained relatively static with what hardware is
supported - but the same is not true of QEMU. Using the virt machine
and running dt-validate on the dumped dtb fails, partly due to the
unexpected isa string.
Rather than enumerate the many many possbilities, change the pattern
to a regex, with the following assumptions:
- ima are required
- the single letter order is fixed & we don't care about things that
can't even do "ima"
- the standard multi letter extensions are all in a "_z<foo>" format
where the first letter of <foo> is a valid single letter extension
- _s & _h are used for supervisor and hyper visor extensions
- convention says that after the first two chars, a standard multi
letter extension name could be an english word (ifencei anyone?) so
it's not worth restricting the charset
- as the above is just convention, don't apply any charset restrictions
to reduce future churn
- vendor ISA extensions begind with _x and have no charset restrictions
- we don't care about an e extension from an OS pov
- that attempting to validate the contents of the multiletter extensions
with dt-validate beyond the formatting is a futile, massively verbose
or unwieldy exercise at best
The following limitations also apply:
- multi letter extension ordering is not enforced. dt-schema does not
appear to allow for named match groups, so the resulting regex would
be even more of a headache
- ditto for the numbered extensions
Finally, add me as a maintainer of the binding so that when it breaks
in the future, I can be held responsible!
Reported-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220823183319.3314940-4-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
While "real" hardware might not use the compatible string "riscv,plic0"
it is present in the driver & QEMU uses it for automatically generated
virt machine dtbs. To avoid dt-validate problems with QEMU produced
dtbs, such as the following, add it to the binding.
riscv-virt.dtb: plic@c000000: compatible: 'oneOf' conditional failed, one must be fixed:
'sifive,plic-1.0.0' is not one of ['sifive,fu540-c000-plic', 'starfive,jh7100-plic', 'canaan,k210-plic']
'sifive,plic-1.0.0' is not one of ['allwinner,sun20i-d1-plic']
'sifive,plic-1.0.0' was expected
'thead,c900-plic' was expected
riscv-virt.dtb: plic@c000000: '#address-cells' is a required property
Reported-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220823183319.3314940-3-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
While "real" hardware might not use the compatible string "riscv,clint0"
it is present in the driver & QEMU uses it for automatically generated
virt machine dtbs. To avoid dt-validate problems with QEMU produced
dtbs, such as the following, add it to the binding.
riscv-virt.dtb: clint@2000000: compatible:0: 'sifive,clint0' is not one of ['sifive,fu540-c000-clint', 'starfive,jh7100-clint', 'canaan,k210-clint']
Reported-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220823183319.3314940-2-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When PTE_MARKER_UFFD_WP not configured, it's still possible to reach pte
marker code and trigger an warning. Add a few CONFIG_PTE_MARKER_UFFD_WP
ifdefs to make sure the code won't be reached when not compiled in.
Link: https://lkml.kernel.org/r/YzeR+R6b4bwBlBHh@x1n
Fixes: b1f9e87686 ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: <syzbot+2b9b4f0895be09a6dec3@syzkaller.appspotmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Edward Liaw <edliaw@google.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If the brk VMA is the last vma in a maple node and meets the rare criteria
that it can be expanded, then preallocation is necessary to avoid a
potential fs_reclaim circular lock issue on low resources.
At the same time use the actual vma start address (unaligned) when calling
vma_adjust_trans_huge().
Link: https://lkml.kernel.org/r/20221011160624.1253454-1-Liam.Howlett@oracle.com
Fixes: 2e7ce7d354 (mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap())
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The anon vma was not unlinked and the file was not closed in the failure
path when the machine runs out of memory during the maple tree
modification. This caused a memory leak of the anon vma chain and vma
since neither would be freed.
Link: https://lkml.kernel.org/r/20221011203621.1446507-1-Liam.Howlett@oracle.com
Fixes: 524e00b36e ("mm: remove rb tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When we successfully find a pageblock in fast_find_migrateblock(), the
block will be set skip-flag through set_pageblock_skip(). However, when
entering isolate_migratepages_block(), the whole pageblock will be skipped
due to the branch 'if (!valid_page && IS_ALIGNED(low_pfn,
pageblock_nr_pages))'. Eventually we will goto isolate_abort and isolate
nothing. That makes fast_find_migrateblock useless.
In this patch, when we find a suitable pageblock in
fast_find_migrateblock, we do noting but let isolate_migratepages_block to
set skip flag to the pageblock after scan it. Normally, we would isolate
some pages from the fast-find block.
I use mmtest/thpscale-madvhugepage test it. Here is the result:
baseline patch
Amean fault-both-1 1331.66 ( 0.00%) 1261.04 * 5.30%*
Amean fault-both-3 1383.95 ( 0.00%) 1191.69 * 13.89%*
Amean fault-both-5 1568.13 ( 0.00%) 1445.20 * 7.84%*
Amean fault-both-7 1819.62 ( 0.00%) 1555.13 * 14.54%*
Amean fault-both-12 1106.96 ( 0.00%) 1149.43 * -3.84%*
Amean fault-both-18 2196.93 ( 0.00%) 1875.77 * 14.62%*
Amean fault-both-24 2642.69 ( 0.00%) 2671.21 * -1.08%*
Amean fault-both-30 2901.89 ( 0.00%) 2857.32 * 1.54%*
Amean fault-both-32 3747.00 ( 0.00%) 3479.23 * 7.15%*
Link: https://lkml.kernel.org/r/20220713062009.597255-1-zhouchuyi@bytedance.com
Fixes: 70b44595ea ("mm, compaction: use free lists to quickly locate a migration source")
Signed-off-by: zhouchuyi <zhouchuyi@bytedance.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The devm_clk_get_enabled() helper:
- calls devm_clk_get()
- calls clk_prepare_enable() and registers what is needed in order to
call clk_disable_unprepare() when needed, as a managed resource.
This simplifies the code, the error handling paths and avoid the need of
a dedicated function used with devm_add_action_or_reset().
Based on my test with allyesconfig, this reduces the .o size from:
text data bss dec hex filename
12843 4804 64 17711 452f drivers/rtc/rtc-ti-k3.o
down to:
12523 4804 64 17391 43ef drivers/rtc/rtc-ti-k3.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/601288834ab71c0fddde7eedd8cdb8001254ed7e.1661329498.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
The devm_clk_get_enabled() helper:
- calls devm_clk_get()
- calls clk_prepare_enable() and registers what is needed in order to
call clk_disable_unprepare() when needed, as a managed resource.
This simplifies the code, the error handling paths and avoid the need of
a dedicated function used with devm_add_action_or_reset().
As a side effect, some error messages are not logged anymore, so also use
dev_err_probe() instead of dev_err() in case of error.
At least the error code will be logged (and -EPROBE_DEFER will be filtered)
Based on my test with allyesconfig, this reduces the .o size from:
text data bss dec hex filename
9025 2488 128 11641 2d79 drivers/rtc/rtc-jz4740.o
down to:
8267 2080 128 10475 28eb drivers/rtc/rtc-jz4740.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/af10570000d7e103d70bbea590ce8df4f8902b67.1661330532.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
The devm_clk_get_enabled() helper:
- calls devm_clk_get()
- calls clk_prepare_enable() and registers what is needed in order to
call clk_disable_unprepare() when needed, as a managed resource.
This simplifies the code, the error handling paths and avoid the need of
a dedicated function used with devm_add_action_or_reset().
That said, mpfs_rtc_init_clk() is the same as devm_clk_get_enabled(), so
use this function directly instead.
This also fixes an (unlikely) unchecked devm_add_action_or_reset() error.
Based on my test with allyesconfig, this reduces the .o size from:
text data bss dec hex filename
5330 2208 0 7538 1d72 drivers/rtc/rtc-mpfs.o
down to:
5074 2208 0 7282 1c72 drivers/rtc/rtc-mpfs.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/e55c959f2821a2c367a4c5de529a638b1cc6b8cd.1661329086.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
A previous commit moved the notifications and end-write handling, but
it is now missing a few spots where we also want to call both of those.
Without that, we can potentially be missing file notifications, and
more importantly, have an imbalance in the super_block writers sem
accounting.
Fixes: b000145e99 ("io_uring/rw: defer fsnotify calls to task context")
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/20221010050319.GC2703033@dread.disaster.area/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This fixes the shadowing of the outer variable rw in the function
io_write(). No issue is caused by this, but let's silence the shadowing
warning anyway.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Link: https://lore.kernel.org/r/20221010234330.244244-1-shr@devkernel.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The msg variants of sending aren't audited separately, so we should not
be setting audit_skip for the zerocopy sendmsg variant either.
Fixes: 493108d95f ("io_uring/net: zerocopy sendmsg")
Reported-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Running local task_work requires taking uring_lock, for submit + wait we
can try to run them right after submit while we still hold the lock and
save one lock/unlokc pair. The optimisation was implemented in the first
local tw patches but got dropped for simplicity.
Suggested-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/281fc79d98b5d91fe4778c5137a17a2ab4693e5c.1665088876.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_cqring_wake() needs a barrier for the waitqueue_active() check.
However, in the case of io_req_local_work_add(), we call llist_add()
first, which implies an atomic. Hence we can replace smb_mb() with
smp_mb__after_atomic().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/43983bc8bc507172adda7a0f00cab1aff09fd238.1665018309.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We treat EINPROGRESS like EAGAIN, but if we're retrying post getting
EINPROGRESS, then we just need to check the socket for errors and
terminate the request.
This was exposed on a bluetooth connection request which ends up
taking a while and hitting EINPROGRESS, and yields a CQE result of
-EBADFD because we're retrying a connect on a socket that is now
connected.
Cc: stable@vger.kernel.org
Fixes: 87f80d623c ("io_uring: handle connect -EINPROGRESS like -EAGAIN")
Link: https://github.com/axboe/liburing/issues/671
Reported-by: Aidan Sun <aidansun05@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of putting io_uring's registered files in unix_gc() we want it
to be done by io_uring itself. The trick here is to consider io_uring
registered files for cycle detection but not actually putting them down.
Because io_uring can't register other ring instances, this will remove
all refs to the ring file triggering the ->release path and clean up
with io_ring_ctx_free().
Cc: stable@vger.kernel.org
Fixes: 6b06314c47 ("io_uring: add file set registration")
Reported-and-tested-by: David Bouman <dbouman03@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[axboe: add kerneldoc comment to skb, fold in skb leak fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The function name is missing the letter 'd' in the comment block.
Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Joshua Kinard <kumba@gentoo.org>
Link: https://lore.kernel.org/r/20221003153711.271630-1-colin.i.king@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
The regmap abstraction allows us to avoid the private i2c transfer
helpers, and also offers some nice utility functions such as the
regmap_update_bits family.
While at it, simplify the code even more by not keeping track of
->write_enabled: rtc_set_time is not a hot path, so one extra i2c read
doesn't hurt (regmap_update_bits elides the write when the bits are
already as desired).
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-9-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
There's nothing in the data sheet that says writing to one of the time
keeping registers is necessary to start the RTC. It does so at the
stop condition of the i2c transfer setting the WRTC bit:
Upon initialization or power-up, the WRTC must be set to "1" to
enable the RTC. Upon the completion of a valid write (STOP), the RTC
starts counting.
Moreover, even if such a write to one of the timekeeping registers was
necessary, that's exactly what we do anyway just below when we
actually write the given struct rtc_time to the device.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-8-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
As another preparation for removing direct references to the
i2c_client in the helper functions, stash a pointer to the private
data via dev_set_drvdata() instead of i2c_set_clientdata().
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-7-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Simplify the code and make the output format consistent with other RTC
drivers by standardizing on using the %ptR printf extension.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-6-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
These instances of '&client->dev' might as well be spelled 'dev', since
'client' has been computed from 'dev' via 'client =
to_i2c_client(dev)'.
Later patches will get rid of that local variable 'client', so remove
these unnecessary references so those later patches become easier to
read.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-5-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
This dev_info() seems to be a debug leftover, and it would only get
printed once (or, once per battery change).
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-4-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
The isl12022 can (only) keep track of times in the range
2000-2099. The data sheet says
The calendar registers track date, month, year, and day of the week
and are accurate through 2099, with automatic leap year correction.
The lower bound of 2000 is obtained by simply observing that its YR
register only counts from 00 through 99.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-3-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
The comments say that devm_rtc_device_register() is deprecated and
that one should instead use devm_rtc_allocate_device() and
[devm_]rtc_register_device. So do that.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220921114624.3250848-2-linux@rasmusvillemoes.dk
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
This second KUnit update for Linux 6.1-rc1 consists of features and
fixes:
- simplifying resource use.
- make kunit_malloc() and kunit_free() allocations and frees consistent.
kunit_free() frees only the memory allocated by kunit_malloc().
- stop downloading risc-v opensbi binaries using wget.
- other fixes and improvements to tool and KUnit framework.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNG/4EACgkQCwJExA0N
QxxznhAAqtXbCYxIxerdiAHwYifnsrLcCMm/Ol2yuFJhmTn6sZh7w4S8bRBt0RlX
+1IfqtzOi1K1fTpmWQqnq0/fH8gNZrhZHHqXxx3c353pG0BfrC3vODx1VzxuPCMi
nr/OHqAQ0VSTuxgWxsIr0SuhOM4LFDjhBcLDoCDoBF5aQSJricpa++ixiYsVgaUt
nG+E1i7I/hvEYwqqUqtJLp9fOD6LK2IeiOP4oH2PwYBIpFO+BXwk0Gbs/ISL+fRP
F8pph2Qm2jxCJ4kRDvs/N41mkIvG9PwC1h7fW4vDXix0zryJdh0TbilFQFFwiuW3
S8kFE1tarMBWyqEZU/2cln9MFdZpxXAWtJu1/B8dqOvLA06mBOaNbB4tOXzfyriE
QBOnEJNqgT0wqnwWONvrljz7L+YaFAkJAGxbub1cGIUa/t5HHs0WX5XncctGfsaE
Ec6bLOXMgemb3dm35fDpBHyN6np9K5BMmz8Ggv02+V8FH8nrXAzblOW/CN8KgXiG
R5+1vd3SxaLq7npal4S88LmNRoJCVCSWnNPItBTgWFXy6Ni2T5WEoi6rSdqJNX+/
bpPM4G47IO5BH0YEbl9IPvKLfDGczVB4TVLpIt61QST4rf+puUhysr76ZweqoU6f
sOyEenr3YZ7C3EpSbcAztzgyPomPAacR/lNbG5lezcEPRSo184I=
=FgDN
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more KUnit updates from Shuah Khan:
"Features and fixes:
- simplify resource use
- make kunit_malloc() and kunit_free() allocations and frees
consistent. kunit_free() frees only the memory allocated by
kunit_malloc()
- stop downloading risc-v opensbi binaries using wget
- other fixes and improvements to tool and KUnit framework"
* tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Update description of --alltests option
kunit: declare kunit_assert structs as const
kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED
kunit: remove format func from struct kunit_assert, get it to 0 bytes
kunit: tool: Don't download risc-v opensbi firmware with wget
kunit: make kunit_kfree(NULL) a no-op to match kfree()
kunit: make kunit_kfree() not segfault on invalid inputs
kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends
kunit: drop test pointer in string_stream_fragment
kunit: string-stream: Simplify resource use
Fixups, reference design changes and new boards:
- The addition of QSPI support for mpfs had a corresponding change to
the devicetree node.
- The v2022.{09,10} reference designs brought with them several memory
map changes which are not backwards compatible. The old devicetrees
from the v2022.08 and earlier releases still work with current
kernels.
- Two new devicetrees for a first-party development kit and for the
Aries Embedded M100FPSEVP kit.
- Corresponding dt-bindings changes for the above.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCY0Qb5gAKCRB4tDGHoIJi
0kEeAQDBUZ3e/RDJlwPVKlZmgcUMbQ8wyaz3e1irlja0W5O+WgD/eQnHec2LrYPz
fSLBCdXpNmViswJBRfmmXDt4l4K9uQs=
=WlDi
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmNHONsTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiQJ0D/47XAIG7vzlFoi3+EVTllg33nDmLyxJ
Rc6uC5lBWZHyJSOEeggH3VeIm6nM7a9na8KdpOzvxlkfv+NpZ9xTiTi9Q5I9L9u3
u8nUJSnoeUFv2qOuhAYHUzgx0J59isTkT1cbKpYAF4zvrw4ajVNwYNaCm6y2gtHK
I4pFepbPFUwFD8EeGqG2xpKpQxd0Z6y9kLGWI5iF1ComdnKgJFpDGYXE+KdAKIjZ
ZlLYH4qW70rMb9XhiAmEOhMt91y/ZBXBHfUl+C3ixKG+9I9ce4le4gc5Q9A0VJAK
+Eg2FaZO6j3zwtulF+d9m+49rlfERsy9h/ob9K+1qRoasjP0GlupBu/sH+f7RhaJ
VX4InltR8DQj5Q6tVnyOBhIHdJAEQXlSyKC9KF+8WUZZSmTmGdbr/DLJtBICuao7
Yuojr54PrHx78jFW3csRajKGqIFoGTDzPd+/3/wxMhQu65Fo8zINjpWXBore3ihy
4ac9zqjj3PgRKVbYYZc3oXk68hnhg5nqnRNeKEZ+DYhDji1owmnmVf6FcG4cDlz8
ctvL8RcS44+ktjcEexbXv+9qdRLsXhk2wp7tY9+gWBzlv5EjOXnx77NhdRE3unW1
hVgjpgeuBZd6IBLphsyPMPNVpL3QnuOdDGgKbREy+8BWNKPjrKK6zl3AvoSLdwJY
htaWJ5XDrq+pYA==
=6hJA
-----END PGP SIGNATURE-----
Merge tag 'dt-for-palmer-v6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into for-next
Microchip RISC-V devicetrees for v6.1
Fixups, reference design changes and new boards:
- The addition of QSPI support for mpfs had a corresponding change to
the devicetree node.
- The v2022.{09,10} reference designs brought with them several memory
map changes which are not backwards compatible. The old devicetrees
from the v2022.08 and earlier releases still work with current
kernels.
- Two new devicetrees for a first-party development kit and for the
Aries Embedded M100FPSEVP kit.
- Corresponding dt-bindings changes for the above.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'dt-for-palmer-v6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: fix fabric i2c reg size
riscv: dts: microchip: update memory configuration for v2022.10
riscv: dts: microchip: add a devicetree for aries' m100pfsevp
riscv: dts: microchip: add sevkit device tree
riscv: dts: microchip: reduce the fic3 clock rate
riscv: dts: microchip: icicle: re-jig fabric peripheral addresses
riscv: dts: microchip: icicle: update pci address properties
riscv: dts: microchip: move the mpfs' pci node to -fabric.dtsi
riscv: dts: microchip: add pci dma ranges for the icicle kit
dt-bindings: riscv: microchip: document the sev kit
dt-bindings: riscv: microchip: document the aries m100pfsevp
dt-bindings: riscv: microchip: document icicle reference design
riscv: dts: microchip: add qspi compatible fallback
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This second Kselftest update for Linux 6.1-rc1 consists of fixes
and improvements to memory-hotplug test and a minor spelling fix
to ftrace test.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNG4hgACgkQCwJExA0N
QxwkPBAAyPd0ZUHlF7JjzdV2obHDGxbjMzi0x8Di8md4B24gE0PvGY79E7eM/uKd
pBsop5cnvwGBZGuoBM0E/1J7UB/Lgedl2iYDUFXQe8JoPlOgvmBMbJCdZ3Zv8gxp
sk5yIrLakgyp2WZng0QyQwZQY4nvq8Lf/f50T8/3+g8OBqF+xTo60DyEpsaDNHS4
3SddH8/jJ6TkG/5lRoEOlfYFrhCDuxq1e8R0jts1vgnpdhpSD9JZPr26VNGVcygB
dkp4icsQFWAaZjNO6+7scgp1yfxBFJ2Fh/gDdfWqEAYvZtvnnr2XhwlYK+O7JZRp
DuglF4Lo/AN3betWuAz4rWyqAYoBZxrUTxrsIVyzb3FqpRAlR32YPFfMo6iWYYn4
638E6cYvkNbbbhCEEgHJJiFZzUB/xbLR/Y8gD4Que/Y+Ck7+zuvQMzZWHQNJfsGx
OhhfUcJlw/VzRpdZx1UToT++DqOqJLBL7DVMATbiXd2rDGKbnEw2pKkeuURXVged
1nis9odge5yY42Q5I3doyPHO7rENOAP2wmlKvJqFDKZFoD23MsGv/m6gpg/HaS1Q
T27L1hHFXPrAZ14MxGva1DTVTPU8D/ciHcqjCWWmnHp8M359JvjOsDL7PyMUcGlm
bVSqcciy71utp1XaFfaF7kT1bwnNeGgGqs0EXcmj4xE8CyOtat8=
=GXpd
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more Kselftest updates from Shuah Khan:
"This consists of fixes and improvements to memory-hotplug test and a
minor spelling fix to ftrace test"
* tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs: notifier-error-inject: Correct test's name
selftests/memory-hotplug: Adjust log info for maintainability
selftests/memory-hotplug: Restore memory before exit
selftests/memory-hotplug: Add checking after online or offline
selftests/ftrace: func_event_triggers: fix typo in user message