Similar to the multi-device mode, erofs could be mounted from one
primary data blob (mandatory) and multiple extra data blobs (optional).
Register fscache context for each extra data blob.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-17-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Registers fscache context for primary data blob. Also move the
initialization of s_op and related fields forward, since anonymous
inode will be allocated under the super block when registering the
fscache context.
Something worth mentioning about the cleanup routine.
1. The fscache context will instantiate anonymous inodes under the super
block. Release these anonymous inodes when .put_super() is called, or
we'll get "VFS: Busy inodes after unmount." warning.
2. The fscache context is initialized prior to the root inode. If
.kill_sb() is called when mount failed, .put_super() won't be called
when root inode has not been initialized yet. Thus .kill_sb() shall
also contain the cleanup routine.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-16-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Add erofs_fscache_read_folios() helper reading from fscache. It supports
on-demand read semantics. That is, it will make the backend prepare for
the data when cache miss. Once data ready, it will read from the cache.
This helper can then be used to implement .readpage()/.readahead() of
on-demand read semantics.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-15-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Introduce one anonymous inode for data blobs so that erofs can cache
metadata directly within such anonymous inode.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-14-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Introduce a context structure for managing data blobs, and helper
functions for initializing and cleaning up this context structure.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-13-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
A new fscache based mode is going to be introduced for erofs, in which
case on-demand read semantics is implemented through fscache.
As the first step, register fscache volume for each erofs filesystem.
That means, data blobs can not be shared among erofs filesystems. In the
following iteration, we are going to introduce the domain semantics, in
which case several erofs filesystems can belong to one domain, and data
blobs can be shared among these erofs filesystems of one domain.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-12-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Until then erofs is exactly blockdev based filesystem.
A new fscache-based mode is going to be introduced for erofs to support
scenarios where on-demand read semantics is needed, e.g. container
image distribution. In this case, erofs could be mounted from data blobs
through fscache.
Add a helper checking which mode erofs works in, and twist the code in
preparation for the upcoming fscache mode.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-11-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
... so that it can be used in the following introduced fscache mode.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-10-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Add tracepoints for on-demand read mode. Currently following tracepoints
are added:
OPEN request / COPEN reply
CLOSE request
READ request / CREAD reply
write through anonymous fd
release of anonymous fd
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-8-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Enable on-demand read mode by adding an optional parameter to the "bind"
command.
On-demand mode will be turned on when this parameter is "ondemand", i.e.
"bind ondemand". Otherwise cachefiles will work in the original mode.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220509074028.74954-7-jefflexu@linux.alibaba.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Implement the data plane of on-demand read mode.
The early implementation [1] place the entry to
cachefiles_ondemand_read() in fscache_read(). However, fscache_read()
can only detect if the requested file range is fully cache miss, whilst
we need to notify the user daemon as long as there's a hole inside the
requested file range.
Thus the entry is now placed in cachefiles_prepare_read(). When working
in on-demand read mode, once a hole detected, the read routine will send
a READ request to the user daemon. The user daemon needs to fetch the
data and write it to the cache file. After sending the READ request, the
read routine will hang there, until the READ request is handled by the
user daemon. Then it will retry to read from the same file range. If no
progress encountered, the read routine will fail then.
A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
read should be done when a cache miss encountered.
[1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-6-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Notify the user daemon that cookie is going to be withdrawn, providing a
hint that the associated anonymous fd can be closed.
Be noted that this is only a hint. The user daemon may close the
associated anonymous fd when receiving the CLOSE request, then it will
receive another anonymous fd when the cookie gets looked up. Or it may
ignore the CLOSE request, and keep writing data through the anonymous
fd. However the next time the cookie gets looked up, the user daemon
will still receive another new anonymous fd.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-5-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Add a refcount to avoid the deadlock in on-demand read mode. The
on-demand read mode will pin the corresponding cachefiles object for
each anonymous fd. The cachefiles object is unpinned when the anonymous
fd gets closed. When the user daemon exits and the fd of
"/dev/cachefiles" device node gets closed, it will wait for all
cahcefiles objects getting withdrawn. Then if there's any anonymous fd
getting closed after the fd of the device node, the user daemon will
hang forever, waiting for all objects getting withdrawn.
To fix this, add a refcount indicating if there's any object pinned by
anonymous fds. The cachefiles cache gets unbound and withdrawn when the
refcount is decreased to 0. It won't change the behaviour of the
original mode, in which case the cachefiles cache gets unbound and
withdrawn as long as the fd of the device node gets closed.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220509074028.74954-4-jefflexu@linux.alibaba.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Fscache/CacheFiles used to serve as a local cache for a remote
networking fs. A new on-demand read mode will be introduced for
CacheFiles, which can boost the scenario where on-demand read semantics
are needed, e.g. container image distribution.
The essential difference between these two modes is seen when a cache
miss occurs: In the original mode, the netfs will fetch the data from
the remote server and then write it to the cache file; in on-demand
read mode, fetching the data and writing it into the cache is delegated
to a user daemon.
As the first step, notify the user daemon when looking up cookie. In
this case, an anonymous fd is sent to the user daemon, through which the
user daemon can write the fetched data to the cache file. Since the user
daemon may move the anonymous fd around, e.g. through dup(), an object
ID uniquely identifying the cache file is also attached.
Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
the cache file size shall be retrieved at runtime. This helps the
scenario where one cache file contains multiple netfs files, e.g. for
the purpose of deduplication. In this case, netfs itself has no idea the
size of the cache file, whilst the user daemon should give the hint on
it.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Extract the generic routine of writing data to cache files, and make it
generally available.
This will be used by the following patch implementing on-demand read
mode. Since it's called inside CacheFiles module, make the interface
generic and unrelated to netfs_cache_resources.
It is worth noting that, ki->inval_counter is not initialized after
this cleanup. It shall not make any visible difference, since
inval_counter is no longer used in the write completion routine, i.e.
cachefiles_write_complete().
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
This patch enables idmapped mounts for erofs, since all dedicated helpers
for this functionality existsm, so, in this patch we just pass down the
user_namespace argument from the VFS methods to the relevant helpers.
Simple idmap example on erofs image:
1. mkdir dir
2. touch dir/file
3. mkfs.erofs erofs.img dir
4. mount -t erofs -o loop erofs.img /mnt/erofs/
5. ls -ln /mnt/erofs/
total 0
-rw-rw-r-- 1 1000 1000 0 May 17 15:26 file
6. mount-idmapped --map-mount b:1000:1001:1 /mnt/erofs/ /mnt/scratch_erofs/
7. ls -ln /mnt/scratch_erofs/
total 0
-rw-rw-r-- 1 1001 1001 0 May 17 15:26 file
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Link: https://lore.kernel.org/r/20220517104103.3570721-1-chao@kernel.org
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Implement export operations in order to make EROFS support accessing
inodes with filehandles so that it can be exported via NFS and used
by overlayfs.
Without this patch, 'exportfs -rv' will report:
exportfs: /root/erofs_mp does not support NFS export
Also tested with unionmount-testsuite and the testcase below passes now:
./run --ov --erofs --verify hard-link
For more details about the testcase, see:
https://github.com/amir73il/unionmount-testsuite/pull/6
Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220425040712.91685-1-hongnan.li@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
I got some KASAN report as below:
[ 46.959738] ==================================================================
[ 46.960430] BUG: KASAN: use-after-free in z_erofs_shifted_transform+0x2bd/0x370
[ 46.960430] Read of size 4074 at addr ffff8880300c2f8e by task fssum/188
...
[ 46.960430] Call Trace:
[ 46.960430] <TASK>
[ 46.960430] dump_stack_lvl+0x41/0x5e
[ 46.960430] print_report.cold+0xb2/0x6b7
[ 46.960430] ? z_erofs_shifted_transform+0x2bd/0x370
[ 46.960430] kasan_report+0x8a/0x140
[ 46.960430] ? z_erofs_shifted_transform+0x2bd/0x370
[ 46.960430] kasan_check_range+0x14d/0x1d0
[ 46.960430] memcpy+0x20/0x60
[ 46.960430] z_erofs_shifted_transform+0x2bd/0x370
[ 46.960430] z_erofs_decompress_pcluster+0xaae/0x1080
The root cause is that the tail pcluster won't be a complete filesystem
block anymore. So if ztailpacking is used, the second part of an
uncompressed tail pcluster may not be ``rq->pageofs_out``.
Fixes: ab749badf9 ("erofs: support unaligned data decompression")
Fixes: cecf864d3d ("erofs: support inline data decompression")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220512115833.24175-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Some comments haven't been useful anymore since the code updated.
Let's drop them instead.
Link: https://lore.kernel.org/r/20220506194612.117120-2-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
The big pcluster feature has been merged for a year, it has been mostly
stable now.
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220407050505.12683-1-huyue2@coolpad.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJlHhcACgkQEsHwGGHe
VUrPyQ/9FE7zLj8euC4HJ4HPJwf7vkwaeRHz1T2gS+izChBI+QSo5Ipe5zFOKz55
vYSfaYF0MIvVJtKSHMbnQf6f/2i+y5j0ozMjKEkHRZdYP26okPoj+M2effgbceiJ
pOIZUsdr8SdBQv313icuUfsXIGfMv/xIw20OtIhVpOFQPB4ZbLASn6AhusZL7U6Z
0BIcfGmmOwV6p4petOJVUXRcwkgfT812UOBLV71DEz9jzP8dXYGVvPV8ZnSYoVQW
tm6rcmnpzsOqb3xnp7hqFHevyoIzBT31KVo4xnB80CtCoWB/tbEIPIbNjUPaREp0
ezE8yXv6euob92+Uh5DH/+8oWuzlctKv1Pc98rFnrGGfW4ocDsr5ibsi9472Mkec
s+waTwemZMGN3bQHH5QvjWxPGdGuPsqrNvgHbZRFGYGJcMoC+2F9p+vKOXK00fMF
9ivhhuFqH8OVAFu3WUvvD8zO18tfnST7fQflQJNxZ/TqPumNc0+zLrpKDp+7ZE+r
qgdvxvXO3ZRnPttiEP1/J+uKxQGNMuDEU8NcfdA7nOzEv9yPyKLLcwo2qu3IYgP0
XuM3Gqt5/Cf38b+1ddR1LWai3KjxVTn7HV4G9YPdvYP296YcZlFjGOtzfNOfw905
djGEFTFyGQuS7BEHKhD3OoDbegT4FvB+69k2ddy4Dut99WkDk84=
=S7Ux
-----END PGP SIGNATURE-----
Merge tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
- Fix a corner case when calculating sched runqueue variables
That fix also removes a check for a zero divisor in the code, without
mentioning it. Vincent clarified that it's ok after I whined about it:
https://lore.kernel.org/all/CAKfTPtD2QEyZ6ADd5WrwETMOX0XOwJGnVddt7VHgfURdqgOS-Q@mail.gmail.com/
* tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/pelt: Fix attach_entity_load_avg() corner case
- Partly revert a change to our timer_interrupt() that caused lockups with high res
timers disabled.
- Fix a bug in KVM TCE handling that could corrupt kernel memory.
- Two commits fixing Power9/Power10 perf alternative event selection.
Thanks to: Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic Barrat, Madhavan
Srinivasan, Miguel Ojeda, Nicholas Piggin.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmJlSCATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM7CD/9KH+mjtSwF3hSdun/WxMcWawdNY24g
f+eMI/vABVqN1RvmO3oC5Z1ruMUw4AxL7BMugAa/SlTjQXOyCuyHQP7vIe4ax3rZ
4TMfsRm8W4xlgI4k9d9q/unrIHko2k1OhY/wvfGMFhFdG0LWt4qJDL5vbccG5CKb
xikrutQ5+t8fNLtGH+fJVDeK9q2CU4inJRuyD4m3sfKnXygLI13l1GhcOebxN/p1
W8qBIac+YJqeezbqiwLl4BC+yXAEDixvFpTh9NuvWdoJaQHdvrltYSLQxCFMIE4B
dSp5EaBTXemalZ4F8fnGyKf4eTbtO9VIfWq3hicjfnJiFRodbFZOo7dnSpDrYlfJ
EysGdmI2HxpmWC8DgQQFv+xwZxKW/ExvPiPYb49n+j/4hKJ724wTi9Z8r3XP5fkg
lD/th40NDhe/sjCSPNWoK3l/UJb3gexd+Ict8iUp2fgNEq3FoJkTR4QlWGj6BeP3
3pOBoqmWjSXR8tWNShvyK6mLn6fclD0IA7cwTIsZZVmqI+nNR4nR0fC2Ah66Rj+R
EOof4kCBOcZ2getDyk+Hv97EFNbkDcIm6IE291Vp9hgilp0n2VnPbwwwEdexp6Jv
KpsYCHosCchnHcu7P1VFFt9w46JFSN7/euq8YZe6znFua2qhV6AGeI7H/uA2X7yL
KvuO+c+ORhrVKQ==
=xieK
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Partly revert a change to our timer_interrupt() that caused lockups
with high res timers disabled.
- Fix a bug in KVM TCE handling that could corrupt kernel memory.
- Two commits fixing Power9/Power10 perf alternative event selection.
Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic
Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin.
* tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix 32bit compile
powerpc/perf: Fix power10 event alternatives
powerpc/perf: Fix power9 event alternatives
KVM: PPC: Fix TCE handling for VFIO
powerpc/time: Always set decrementer in timer_interrupt()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJlGhAACgkQEsHwGGHe
VUqDeA//Ud2Y6x/OlrYSb1uetOTYXpt8orCh8rFTkkB+6XXs8kyfxgrn1JOzzYtU
5whZxo1SvsqIDLToUtstcFhmK0uO5BtuK/pfy6qElTuVLI17yuyx359yKwAWcKIu
5xAM3S9vtoqUw9YKeXiq61sjLVEJo2qFEIq8BvGwjuZ9DBkemlM34sCPu+tu7o2F
DI5LqdkCGQGAMbXzljyHxcLgZS2bCSgYs7LzbYPe7KqtDzlwo+4ofT9r/E9r/6iD
PaWjR34cvy+KyyxcDhdPzuWYjvkvuAOZHhtvQmPVBsw/diCZD4NLodj02/TyNN5u
3P2ehe4KXLxAWFDdV2XrxjnQsWni5aJrti7HfFmKT1zadh1SBb1vun4sSe1+FXia
ej+68xG4tvk05zRCjZgy9teCLLT2bejSkYcdRGv+M/DDZZI840Uq+ub/jGmBGG8P
wpYqGixgWvCmD9lW/4jztHpWhpkn2sdyVk1iDrgjre+M3NE9pPO7yIh0MK2B7TBq
ORVr6z1bAXVNHm9fXwpptpQz1tcK87hKzVMX63kuEpQLf7+XBiffoJgNHscl26/h
gVGq24lFRotsGQlsZIXc6Bfu0u13mVxsNF7yYhlU30Tlgqk45cYA/x+btlSehfWB
6j9/nx+A21Ocjx3s2LOIDQTSZCdzTt3KVBkwPBafDBAtO55XhEE=
=gcLL
-----END PGP SIGNATURE-----
Merge tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Read the reported error count from the proper register on
synopsys_edac
* tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Read the error count from the correct register
Since commit 559089e0a9 ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an
opt-in strategy, because it caused a number of problems that weren't
noticed until x86 enabled it too.
One of the issues was fixed by Nick Piggin in commit 3b8000ae18
("mm/vmalloc: huge vmalloc backing pages should be split rather than
compound"), but I'm still worried about page protection issues, and
VM_FLUSH_RESET_PERMS in particular.
However, like the hash table allocation case (commit f2edd118d02d:
"page_alloc: use vmalloc_huge for large system hash"), the use of
kvmalloc() should be safe from any such games, since the returned
pointer might be a SLUB allocation, and as such no user should
reasonably be using it in any odd ways.
We also know that the allocations are fairly large, since it falls back
to the vmalloc case only when a kmalloc() fails. So using a hugepage
mapping seems both safe and relevant.
This patch does show a weakness in the opt-in strategy: since the opt-in
flag is in the 'vm_flags', not the usual gfp_t allocation flags, very
few of the usual interfaces actually expose it.
That's not much of an issue in this case that already used one of the
fairly specialized low-level vmalloc interfaces for the allocation, but
for a lot of other vmalloc() users that might want to opt in, it's going
to be very inconvenient.
We'll either have to fix any compatibility problems, or expose it in the
gfp flags (__GFP_COMP would have made a lot of sense) to allow normal
vmalloc() users to use hugepage mappings. That said, the cases that
really matter were probably already taken care of by the hash tabel
allocation.
Link: https://lore.kernel.org/all/20220415164413.2727220-1-song@kernel.org/
Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use vmalloc_huge() in alloc_large_system_hash() so that large system
hash (>= PMD_SIZE) could benefit from huge pages.
Note that vmalloc_huge only allocates huge pages for systems with
HAVE_ARCH_HUGE_VMALLOC.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJkGaQACgkQiiy9cAdy
T1G7Pgv/SHmxsBGIT+YyW+ppWZGIMMpCld3YbovtN2EWdt3NXwvnGrTTFkXe0acJ
+63NjNnCoDTKy+XHN0HhMqLsJSep6J63e5rmiuvUxKQIR0n1WzZl7W1YpHKptZog
wMkObv0UzKgr6gLEwQeMMqvcyxOVmgssAXou4N8p6rDJLFijM2kcVjpB/B9uyUAR
JG0ss8lhX7+YTcRuI0QqyulHlUTiGwu4/XjO9oWs3bF2faADVIfZXOKGMFfRpjCK
YDGdh+HieW5y2SngvstdBuVxmZXLjWjWwe9mQCUaF7khZJ0acuGeQh9BCPNopjUD
0CBYe9JWM2NxTjXqmhspCGUi40+EedZgIMxukRyl7MrBX1wBF0ErSYpGmYSZQqoH
u2R9Pr8tUwuzVfQ6s7VhWACckKFNwI53lHOhY1ikc2M/NWeLy961Wi9JRykac8cZ
ElkJkJUdGttntjwilcKp7NuWmHssDzxNH103WAr3GZrhBDgntUHFBpsLqzQmQW3a
Gp3jkj1e
=Y9T8
-----END PGP SIGNATURE-----
Merge tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- cap maximum sector size reported to avoid mount problems
- reference count fix
- fix filename rename race
* tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
ksmbd: increment reference count of parent fp
ksmbd: remove filename in ksmbd_file
One fix for an information leak caused by copying a buffer to
userspace without checking for error first in the sr driver.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYmRQ+SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVhPAP9a/cHI
0uZ8IC5LBjpXEyvvniGBIXz87i2IKR3XSme6BwEAxeH5AX54q4Pkhbr+izdEQM9/
A09GaOdzQXHBsgD92+I=
=MEyO
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix for an information leak caused by copying a buffer to
userspace without checking for error first in the sr driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sr: Do not leak information in ioctl
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYmPu9QAKCRCAXGG7T9hj
vmvYAP4sD8jB17bSSUq9jGhNIr3vc6UZ6Oz+8R6G6CvSBhOa0gEAqWTU04RVHeYy
Sqs6qs9dvSdF2AvGR9DydrkCF5n6sgU=
=s5yA
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A simple cleanup patch and a refcount fix for Xen on Arm"
* tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Fix some refcount leaks
xen: Convert kmap() to kmap_local_page()
panel:
- revert of patch that broke panel/bridge issues
dma-buf:
- remove unused header file.
amdgpu:
- partial revert of locking change
radeon:
- fix dma_resv logic inversion
panel:
- pi touchscreen panel init fixes
vc4:
- build fix
- runtime pm refcount fix
vmwgfx:
- refcounting fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmJji5EACgkQDHTzWXnE
hr5JhRAAh8pL5zIEo2E2f9VQSjk8Z6+YMyyrEsbaENS++KuC/xCyhdG4NMA5er4A
4E1HEnRJq2+xfBOBLJESAL+fLx9vN0CRqy4Q0Lvq3OUDB7Rtbw5hDoHGbfiM/EBB
zPoxPl2LGuCmTVtLc4CoEQNqhtg65FNHB51/HJqpvzXGttQA4il3kNdG29IiRUEt
94XMkRkw9bUdi7YW8MV/zpsgP5384aZ87npY8qy8QYgP0EJ+paDJzPnaFypkZF9Q
P0W0aac1/MpHO8lEMWrhJPzzYNM6Vs01Av5CkLMtiauXIaD8lYCN83xeOBkeBQpk
vJOe4yCM74uPn/7yoGA2MknrfsI6yQBGvBJt/wjEgtExVDqzlf3cvsVVkyEYITsA
NoWzO2uqH/VbAfQfz5RHyvcJJGa+TI5u3wEmM4GPTYwIIc5+NkBc9MPaVLlfWTXJ
wksCJnoAm/fy29CdtI06/R0yPMLB73Mc1XbAGrC+j13fj0YqcecZ7/sIUbFlTUjw
gOm3JDANHgX7g/DYjDSO/26fmf1k8E62oXse5Usdn91PASSrc6J4iH2FS24MWoUg
vydNRBY+2Evzxkc65I9nwx4hrIz1kVEnvy4ApyBPQcWqbRFj6+ZVIpxWtnJDGwj1
16cIvkMdnOZqZmUvtzzgiFTsXtEMyOO9j9/Fv1qci+Uu4MOBMJ8=
=dkFQ
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Maarten was away, so Maxine stepped up and sent me the drm-fixes
merge, so no point leaving it for another week.
The big change is an OF revert around bridge/panels, it may have some
driver fallout, but hopefully this revert gets them shook out in the
next week easier.
Otherwise it's a bunch of locking/refcounts across drivers, a radeon
dma_resv logic fix and some raspberry pi panel fixes.
panel:
- revert of patch that broke panel/bridge issues
dma-buf:
- remove unused header file.
amdgpu:
- partial revert of locking change
radeon:
- fix dma_resv logic inversion
panel:
- pi touchscreen panel init fixes
vc4:
- build fix
- runtime pm refcount fix
vmwgfx:
- refcounting fix"
* tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: partial revert "remove ctx->lock" v2
Revert "drm: of: Lookup if child node has panel or bridge"
Revert "drm: of: Properly try all possible cases for bridge/panel detection"
drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
drm/vmwgfx: Fix gem refcounting and memory evictions
drm/vc4: Fix build error when CONFIG_DRM_VC4=y && CONFIG_RASPBERRYPI_FIRMWARE=m
drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
dma-buf-map: remove renamed header file
drm/radeon: fix logic inversion in radeon_sync_resv
- a new set of keycodes to be used by marine navigation systems
- minor fixes to omap4-keypad and cypress-sf drivers
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCYmODEQAKCRBAj56VGEWX
nBpOAP42d5FvJiPnVfOzbUb8KT3Btd5g3ze5m+qMgfqirsxatwEApoCZkx4hDGW8
/aEHF6LVzpbhz7Gc+4mGdVP8CcWAPAM=
=9csQ
-----END PGP SIGNATURE-----
Merge tag 'input-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a new set of keycodes to be used by marine navigation systems
- minor fixes to omap4-keypad and cypress-sf drivers
* tag 'input-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: add Marine Navigation Keycodes
Input: omap4-keypad - fix pm_runtime_get_sync() error checking
Input: cypress-sf - register a callback to disable the regulators
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJjYKoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqOtEADHSqgTsfwwsYOq15ups32oEdoFtPJ0Fl1C
RUWLOzqskk2AV9Foj9TeGSqp448AW+sgV9P6NjrUR7nrefRtYwEdTx2ou1oEwaq0
wl/wsZGg9BiX2w6kBSDRxoYZL220n4B2iscg76VOb6+9VeLdD5s/qeL2P912Bre8
zAEfdGvmLK7gF/6Oxo0fqpzwWyQ7O8+wBTBbpqcwZPImPFsbuWLMG4fgFINb20fD
UsVIXWEyV1UlWOk2v4GJl3rToR1f6l7tVT9gUYrRIt9iPTDH86rHkAiLg4QCbfZA
556xHivv5Dw91JJfGDCeiMtq9kjeIRiwpOTNiP1Av5MTM0TC8BaXtkpfwhO5r3iZ
tbc2M/7Q8pOM0ati+HWBOp6Gi9hjzqwObGllKJnD1MZ/ESx88z2vlIcQYyBPkNOO
giNmP52sgN/5ToMqW25AFbSHJ5Tb+p0epEbbJOv3m+0sSR2TnQ9/DweDBbsOs3Yw
KyaHb1e7WxqtIn+4t1r2GgXdDJkQrMcdonkLu+i4Yoz0v4rfRGfg89F+LNylq1kI
hng/xPsbgSAjn1j/1EkCKKjX0+R6Blhc/I99EucD9zaOmHYnx6hO0KySJXQzIAo7
lalfsE5bx1LWR1yC/KJ0To6TXQo9Onf5IIUPs1duiHnTFYgpM9z5sOmcyRL+3OtF
Rvzw0yR37Q==
=hxoY
-----END PGP SIGNATURE-----
Merge tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just two small regression fixes for bcache"
* tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()
bcache: put bch_bio_map() back to correct location in journal_write_unlocked()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJjYJUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpsZZEADS/dD7pZKxBLcHTCGJAik1/IIv/3ynTrOp
o86uV0AH+nL6lUyBU7uTTQVFn9Hjh6T10ZfRmcU+1Xb8G4obHTrQJkk5evwCNPng
9CUW2fwQa+6H4Ui8TU7f1rLcLlm+AUSVmab6h/20X5ldMwzF1JhcE11qtqZw0ti7
mDPwmxEsx7KMvMy59awA+5IpnXHxe5SvHXuzLsMNwux6dH7VauxE8R+Y+HHVzLWc
fM2dbEU2Hq5nL23DedMw3ZaHwhQTiWdOQA0386iDB6cJdFv19iw+ApD4KS/qAT2X
URQ3pmyNOXvOsBosVL4za7VVCoUlA23ZSMoU82p2K3NK4NGfV7S4oIeO5ZR2BK/C
bIC4c2gutIbYrtdSITBW4z2tj+26BBZS7LaT3Bek/3BL+GjQuM6vK8N4ZhRXPC+l
vWAwXUnWSyXR4+HWpvm3ewlrSY5CQjfsZgU1PIYybhTf/oo2BxX/HQfRk3XKfLIR
89gvITTUrC8B4dgPgLs/MF4Ercmoa2//2yL0onBEwdC2b1lRqD/bGM2FAYolzLzf
W1+BrFj3sRUjexdO7ChrtZvAWo59REAxXdP/3h+NbkIz8sunG2Vpf9NX3cXj30n3
bZL3SkEbtFxpnXspRSQRnmL6DMLPMa7UC+MxpNBAV0g6aMmuSdxqiIZR10+kIEyR
PJyRqbRefg==
=Huq7
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one fixing a potential leak for the iovec for
larger requests added in this cycle, and one fixing a theoretical leak
with CQE_SKIP and IOPOLL"
* tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
io_uring: fix leaks on IOPOLL and CQE_SKIP
io_uring: free iovec if file assignment fails
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE perf.data
files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on s390, where
it is not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYmNPCAAKCRCyPKLppCJ+
JxW9AQCgzYxEw5CJ+zn58lGmYJdfV5Kc6C8MPD671oo39lC49AD/Qw8tyklKTok5
hJkZ3CqahjMdN1j+xNgskXBNcJW6Rww=
=Ayk0
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE
perf.data files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on
s390, where it is not supported.
* tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Fix error message for test case 71 on s390, where it is not supported
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
perf script: Always allow field 'data_src' for auxtrace
perf clang: Fix header include for LLVM >= 14
Add a struct page forward declaration to cacheflush_32.h.
Fixes this build warning:
CC drivers/crypto/xilinx/zynqmp-sha.o
In file included from arch/sparc/include/asm/cacheflush.h:11,
from include/linux/cacheflush.h:5,
from drivers/crypto/xilinx/zynqmp-sha.c:6:
arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
38 | void sparc_flush_page_to_ram(struct page *page);
Exposed by commit 0e03b8fd29 ("crypto: xilinx - Turn SHA into a
tristate and allow COMPILE_TEST") but not Fixes: that commit because the
underlying problem is older.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux@vger.kernel.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inversion in radeon, a build and pm refcounting fix for vc4, two reverts
for drm_of_get_bridge that caused a number of regression and a locking
regression for amdgpu.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYmJqDwAKCRDj7w1vZxhR
xQlAAP9N78SStxmzZ3UjjU2h4fj7JXs3y97DddJZpzyu92+d5QD7BFP8i3mKLGhq
hmmYabGl58dWK+bXZRD85kOsIxv80A0=
=RgD5
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for the raspberrypi panel initialisation, one fix for a logic
inversion in radeon, a build and pm refcounting fix for vc4, two reverts
for drm_of_get_bridge that caused a number of regression and a locking
regression for amdgpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220422084403.2xrhf3jusdej5yo4@houat
injection testing. Change ext4's fallocate to update consistently
drop set[ug]id bits when an fallocate operation might possibly change
the user-visible contents of a file. Also, improve handling of
potentially invalid values in the the s_overhead_cluster superblock
field to avoid ext4 returning a negative number of free blocks.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmJinf8ACgkQ8vlZVpUN
gaOHgQf+MKgUZgteYogLzoP3mF1kSycOGawk4wZ3QHOLz7AvsV2p9J8BWihbS/EK
dBydfXbTMvCUrjWmpqb5dHECRzxdfxOJ0SPJtibc8DZaJc9ImNFmgSp9kyJ3uRaN
cPGO6Lz2RXpdumVMPPLwzUJdVyrLi0K6I1NYSocxKgribePzd+xil8S9zRZj8Bpe
RaeH0EytcRj2CI5qs5mI/mOPBAMsZeczd3HInI3gyCgP2I4ZOfsADne3APx57mcI
IGKf77nvIwMHeKel3MGYfFPitEs5cZpHUhHplCMtgFsO8H0IR93tqnlaCvTM7VAZ
Slamgl7pfcXFcLZP+pm0QL/82ub7iw==
=FIds
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix some syzbot-detected bugs, as well as other bugs found by I/O
injection testing.
Change ext4's fallocate to consistently drop set[ug]id bits when an
fallocate operation might possibly change the user-visible contents of
a file.
Also, improve handling of potentially invalid values in the the
s_overhead_cluster superblock field to avoid ext4 returning a negative
number of free blocks"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix a potential race while discarding reserved buffers after an abort
ext4: update the cached overhead value in the superblock
ext4: force overhead calculation if the s_overhead_cluster makes no sense
ext4: fix overhead calculation to account for the reserved gdt blocks
ext4, doc: fix incorrect h_reserved size
ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
ext4: fix use-after-free in ext4_search_dir
ext4: fix bug_on in start_this_handle during umount filesystem
ext4: fix symlink file size not match to file content
ext4: fix fallocate to use file_modified to update permissions consistently
A single fix to avoid a NULL pointer dereference in the pata_marvell
driver with adapters not supporting DMA, from Zheyu.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYmMpKgAKCRDdoc3SxdoY
dnkYAP46Mc/ENx1CIE6PA8xM6pmItw++ddyf3nN71xtqBTjTHgD+LIz9BRq8nc0u
aTkagLmdRsrmdSJT6nnHGREI/TJIUAc=
=PSEF
-----END PGP SIGNATURE-----
Merge tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to avoid a NULL pointer dereference in the pata_marvell
driver with adapters not supporting DMA, from Zheyu"
* tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_marvell: Check the 'bmdma_addr' beforing reading
* Remove 's' & 'u' as valid ISA extension
* Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
* Fix NMI watchdog in guests on AMD
* Fix for SEV cache incoherency issues
* Don't re-acquire SRCU lock in complete_emulated_io()
* Avoid NULL pointer deref if VM creation fails
* Fix race conditions between APICv disabling and vCPU creation
* Bugfixes for disabling of APICv
* Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
* Do not use bitfields larger than 32-bits, they differ between GCC and clang
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJi3KUUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhvQf/Yncfg3MkOvKsVxnCe7diKDTI/E2n
wBGNIcL8r7L9oIltHL4Mh7JQTacHFQOZ9PQ30NO1p+pznZ03e8LR59IF1JpP7VOU
sWrLZ5a4bIAEjOpA7Jxcee6hUBwewBauDgFLbb+YAI2lAahiH7jVfywDRife/c3k
N2LjeA75K8UvMiDCfjxxxerFJK91zaqjWlUNF2OhtFp/5pnMfS+nli9Q8QS837pZ
oUf+0Beb2RpSHan+wbYVU7X3ZLwtpR0M3w3uXOG+X3as56wDf26znXS02aSwa45x
lfX+pqJfmb4vCJJDXt6avH27EVgTq0Vew+BhQHG3VLRO6uxZ+smX6qmsuw==
=kvbw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main and larger change here is a workaround for AMD's lack of
cache coherency for encrypted-memory guests.
I have another patch pending, but it's waiting for review from the
architecture maintainers.
RISC-V:
- Remove 's' & 'u' as valid ISA extension
- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
- Fix NMI watchdog in guests on AMD
- Fix for SEV cache incoherency issues
- Don't re-acquire SRCU lock in complete_emulated_io()
- Avoid NULL pointer deref if VM creation fails
- Fix race conditions between APICv disabling and vCPU creation
- Bugfixes for disabling of APICv
- Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
- Do not use bitfields larger than 32-bits, they differ between GCC
and clang"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: introduce and use more page size-related constants
kvm: selftests: do not use bitfields larger than 32-bits for PTEs
KVM: SEV: add cache flush to solve SEV cache incoherency issues
KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
KVM: selftests: Silence compiler warning in the kvm_page_table_test
KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
KVM: SPDX style and spelling fixes
KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
RISC-V: KVM: Restrict the extensions that can be disabled
RISC-V: KVM: Remove 's' & 'u' as valid ISA extension
Test case 71 'Convert perf time to TSC' is not supported on s390.
Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.
The root cause is function evlist__open() called from
test__perf_time_to_tsc(). evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.
The PMU driver returns -ENOENT in this case. This error is leads to the
failure.
Fix this by returning TEST_SKIP on -ENOENT.
Output before:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: FAILED!
Output after:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: Skip (perf_read_tsc_conversion is not supported)
This also happens on an x86_64 virtual machine:
# uname -m
x86_64
$ ./perf test -F 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : FAILED!
$
Committer testing:
Continues to work on x86_64:
$ perf test 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : Ok
$
Fixes: 290fa68bdc ("perf test tsc: Fix error message when not supported")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: chengdongli@tencent.com
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220420062921.1211825-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit bb30acae4c ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.
The commit ffab487052 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue. It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.
On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data. This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.
Fixes: bb30acae4c ("perf report: Bail out --mem-mode if mem info is not available")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/20220414123201.842754-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:
# perf script -F,+data_src
Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.
This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.
Fixes: e55ed3423c ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220417114837.839896-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The header TargetRegistry.h has moved in LLVM/clang 14.
Committer notes:
The problem as noticed when building in ubuntu:22.04:
90 98.61 ubuntu:22.04 : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
util/c++/clang.cpp:23:10: fatal error: llvm/Support/TargetRegistry.h: No such file or directory
23 | #include "llvm/Support/TargetRegistry.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Fixed after applying this patch.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Guilherme Amadio <amadio@gentoo.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://twitter.com/GuilhermeAmadio/status/1514970524232921088
Link: http://lore.kernel.org/lkml/Ylp0M/VYgHOxtcnF@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 5467801f1f ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.
This manifests in messages showing deferred probing while trying to
allocate IRQs like so:
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ .. more of the same .. ]
The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.
Fixes: 5467801f1f ("gpio: Restrict usage of GPIO chip irq members before initialization")
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@BL1PR12MB5157.namprd12.prod.outlook.com/
Link: https://bugzilla.suse.com/show_bug.cgi?id=1198697
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215850
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1979
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1976
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Shreeya Patel <shreeya.patel@collabora.com>
Tested-By: Samuel Čavoj <samuel@cavoj.net>
Tested-By: lukeluk498@gmail.com Link:
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-and-tested-by: Takashi Iwai <tiwai@suse.de>
Cc: Shreeya Patel <shreeya.patel@collabora.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* A pair of build fixes for the recent cpuidle driver.
* A fix for systems without sv57 that manifests as a crash early in
boot.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmJizQQTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYidEWD/oC5VN4rPf1AetQeY9Zibg6sNf3X2e7
ta0vUmm1nOpAWWsxVI/JKrDCUpF6f1Vl6p6xW780SPU2pTUmAlu/3hMTaLA54Upy
J05cmLqJKRM6vaSE1frCGy6DVjGQ0iKvR9kjMkUINhs+XI/Qe5SA0OSx5e13417q
SFyo+PZ8SqzoUzKJHXvBuJpsHXM1D5T6R/XTwElYUF2QK3bA1tBScAsfB1bnWr1p
KkczI2YHrrQiIumeYal/sbtMi9MF+50Mm8tOq7RKh/dRcty1U+bDNSNemNO9BwEu
qmfz6pTwd2zUMWClt2AvZq2D5WzNVvR2sEJz6qiok+XqrYWslWhKuLjTZlRPrews
bkHCPVeBHP6wc1yHusyMgc/TumFJKxZV2PN4z/KABjrZSvta/ldHP2ykIROVidSC
J3dI5bJHA9A2EJoBeBMsH2Rbk6uYnjrUo+Ovj9Hhyjwp8JkyTqkj2nQiLaslx/yC
kOM48z1pZ0yAlF4J2meCQCVJDnbS/9PazZwVjiwg8UUpW2DcTnpI5iGFAUb3chGE
mjlyfsSx/xSYaGvrXSvbpsWBAjvgIAxSB1I4p8Ooie17w8F6WKiFzmtD9Ptr4PsF
VQP3NAJWFfXCNdMQ6MuLdw46/celhyoa2B/0VSaEFt/zSWeJamYyZ9lKiRA5tmzn
H31QFLOrWMNsZg==
=36Ir
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes Palmer Dabbelt:
- A pair of build fixes for the recent cpuidle driver
- A fix for systems without sv57 that manifests as a crash
early in boot
* tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE
RISC-V: mm: Fix set_satp_mode() for platform not having Sv57
cpuidle: riscv: support non-SMP config
- Fix PMU event validation in the absence of any event counters
- Fix allmodconfig build using clang in conjunction with binutils
- Fix definitions of pXd_leaf() to handle PROT_NONE entries
- More typo fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmJiimkQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPVEB/9rTcl5GKh7rruMrPK2HnVCNCEzMvYJnWLz
UUd72TfsVWdEnwGRWKUSJRXEMH27Pac+yKcog8aEPzOOLB6mKOsQbLyC8X7mSpo0
hsZJLfjv2PjX0g/OnZi9Yuxqi0u+7HB5ThpKbMqbW+/tAfikqUTfuIVAC5WD2DZx
OG0IdyTdP2VL+ud0Vz/8zTyRh1kbFL82ER823dk8FytKEPGN8tIMRlv8r1YCeFQW
t1V4ZfzsfS0wZGoigOz8JDyMIzq7PNZ5cfW6Mk6wuhf32nxgJlbBWjEk0OdrMSHz
Ifv13TsIK376mMF8uFR/o8pE3UGV7y1tWoYfjq0XnSIx59bX4TFS
=J0Gy
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's no real pattern to the fixes, but the main one fixes our
pmd_leaf() definition to resolve a NULL dereference on the migration
path.
- Fix PMU event validation in the absence of any event counters
- Fix allmodconfig build using clang in conjunction with binutils
- Fix definitions of pXd_leaf() to handle PROT_NONE entries
- More typo fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: fix p?d_leaf()
arm64: fix typos in comments
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
arm_pmu: Validate single/group leader events