(not turned on by default), which also need STIBP enabled (if
available) to be '100% safe' on even the shortest speculation
windows.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmL3fqcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gnuw/6AighFp+Gp4qXP1DIVU+acVnZsxbdt7GA
WGs/JJfKYsKpWvDGFxnwtF2V1Imq8XVRPVPyFKvLQiBs2h8vNcVkgIvJsdeTFsqQ
uUwUaYgDXuhLYaFpnMGouoeA3iw2zf/CY5ZJX79Nl/CwNwT7FxiLbu+JF/I2Yc0V
yddiQ8xgT0VJhaBcUTsD2qFl8wjpxer7gNBFR4ujiYWXHag3qKyZuaySmqCz4xhd
4nyhJCp34548MsTVXDys2gnYpgLWweB9zOPvH4+GgtiFF3UJxRMhkB9NzfZq1l5W
tCjgGupb3vVoXOVb/xnXyZlPbdFNqSAja7iOXYdmNUSURd7LC0PYHpVxN0rkbFcd
V6noyU3JCCp86ceGTC0u3Iu6LLER6RBGB0gatVlzomWLjTEiC806eo23CVE22cnk
poy7FO3RWa+q1AqWsEzc3wr14ZgSKCBZwwpn6ispT/kjx9fhAFyKtH2/Sznx26GH
yKOF7pPCIXjCpcMnNoUu8cVyzfk0g3kOWQtKjaL9WfeyMtBaHhctngR0s1eCxZNJ
rBlTs+YO7fO42unZEExgvYekBzI70aThIkvxahKEWW48owWph+i/sn5gzdVF+ynR
R4PGeylfd8ZXr21cG2rG9250JLwqzhsxnAGvNjYg1p/hdyrzLTGWHIc9r9BU9000
mmOP9uY6Cjc=
=Ac6x
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
turned on by default), which also need STIBP enabled (if available) to
be '100% safe' on even the shortest speculation windows"
* tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Enable STIBP for IBPB mitigated RETBleed
- hardware poisoning support for 1GB hugepages, from Naoya Horiguchi
- highmem documentation fixups from Fabio De Francesco
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYvMFaQAKCRDdBJ7gKXxA
jrM7AQCsaVsrRJgVMNqjDvLAsWFEm48s6/smQT4UZ9n/rPQUsAEA0iRpOfbjcxwK
npAml1mp5uP2AkimTVLB6Bokt6N+wAg=
=9Wta
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull remaining MM updates from Andrew Morton:
"Three patch series - two that perform cleanups and one feature:
- hugetlb_vmemmap cleanups from Muchun Song
- hardware poisoning support for 1GB hugepages, from Naoya Horiguchi
- highmem documentation fixups from Fabio De Francesco"
* tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
Documentation/mm: add details about kmap_local_page() and preemption
highmem: delete a sentence from kmap_local_page() kdocs
Documentation/mm: rrefer kmap_local_page() and avoid kmap()
Documentation/mm: avoid invalid use of addresses from kmap_local_page()
Documentation/mm: don't kmap*() pages which can't come from HIGHMEM
highmem: specify that kmap_local_page() is callable from interrupts
highmem: remove unneeded spaces in kmap_local_page() kdocs
mm, hwpoison: enable memory error handling on 1GB hugepage
mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
mm, hwpoison: make __page_handle_poison returns int
mm, hwpoison: set PG_hwpoison for busy hugetlb pages
mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
mm, hwpoison, hugetlb: support saving mechanism of raw error pages
mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE
mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst
mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability
mm: hugetlb_vmemmap: replace early_param() with core_param()
mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c
...
Intel eIBRS machines do not sufficiently mitigate against RET
mispredictions when doing a VM Exit therefore an additional RSB,
one-entry stuffing is needed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe
VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ
AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl
2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt
DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox
ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf
0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x
GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK
YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN
jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS
ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1
xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo=
=a8em
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 eIBRS fixes from Borislav Petkov:
"More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET
mispredictions when doing a VM Exit therefore an additional RSB,
one-entry stuffing is needed"
* tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add LFENCE to RSB fill sequence
x86/speculation: Add RSB VM Exit protections
It it inconvenient to mention the feature of optimizing vmemmap pages
associated with HugeTLB pages when communicating with others since there
is no specific or abbreviated name for it when it is first introduced.
Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now.
This commit also updates the document about "hugetlb_free_vmemmap" by the
way discussed in thread [1].
Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For the 6.0 merge window the modules code shifts to cleanup and minor fixes
effort. This is becomes much easier to do and review now due to the code
split to its own directory from effort on the last kernel release. I expect
to see more of this with time and as we expand on test coverage in the future.
The cleanups and fixes come from usual suspects such as Christophe Leroy and
Aaron Tomlin but there are also some other contributors.
One particular minor fix worth mentioning is from Helge Deller, where he spotted
a *forever* incorrect natural alignment on both ELF section header tables:
* .altinstructions
* __bug_table sections
A lot of back and forth went on in trying to determine the ill effects of this
misalignment being present for years and it has been determined there should
be no real ill effects unless you have a buggy exception handler. Helge actually
hit one of these buggy exception handlers on parisc which is how he ended up
spotting this issue. When implemented correctly these paths with incorrect
misalignment would just mean a performance penalty, but given that we are
dealing with alternatives on modules and with the __bug_table (where info
regardign BUG()/WARN() file/line information associated with it is stored)
this really shouldn't be a big deal.
The only other change with mentioning is the kmap() with kmap_local_page()
and my only concern with that was on what is done after preemption, but the
virtual addresses are restored after preemption. This is only used on module
decompression.
This all has sit on linux-next for a while except the kmap stuff which has
been there for 3 weeks.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmLxL4gSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoin8AYP/iv/Oh/Zzh4UvZzkkOSzhf1qDgGhjFb0
aFIODZzpEfZ5ix5GcLapB8/QIwQgxiIRa3WkTMc0uyv+mddlbKuILFnI9A1I+TQe
N4gmKeYXwWRyxLa6y7/B3lVzuLxf4DpcxfS2c3A65MkYi09XPA9oXCy7JjzsmEiZ
z2Lu8lTe6hg8VarBTogHBxiEU7ybfDCnHWj7/Oe6zz8tS/R0i0ndNBu9xmaCqSh7
QC8++eqCaS+zfW0uTmnGDo1/zWLBblCZ5HAHG8bLlPHezUbekNz6G1D4CVwFyNQ8
wy1Gjy8nFWc+rwUl1CTgJ+A7wodGrMCyt5SmcNUVBOWdlSmli5vFJp61ET6UdrV+
+8owATwwIm8hbkIAI4037j7pMgrO27d130GRxFwgG9GNoqew2AM7y/9HrlmW49PE
IqJA4Pm3zg26IhLIRcH7jLg3oKGuFf0nkMTDoooI5a9DlcsCXPuGd0FBw2WbR71D
Px6dlVoAW0NrP2tm8YzkTKIT+aN+UId4Vdi2oFs1t8Sye/U+LCjvwrXPk13pZKdR
VxfM1oVxeRwiAUq0VuIrnj7windF5Mpy2hDLHeWjzQmLcEGAtCYEGyxKTBkNTtPt
gm9XBzT6Rbzi+Sc++ZoHYHe1g4T66sjYOp4N90sRRMD3FR97ZyW8eD01gwf6p1Uy
aCOrA+sRHK3F
=hPvl
-----END PGP SIGNATURE-----
Merge tag 'modules-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"For the 6.0 merge window the modules code shifts to cleanup and minor
fixes effort. This becomes much easier to do and review now due to the
code split to its own directory from effort on the last kernel
release. I expect to see more of this with time and as we expand on
test coverage in the future. The cleanups and fixes come from usual
suspects such as Christophe Leroy and Aaron Tomlin but there are also
some other contributors.
One particular minor fix worth mentioning is from Helge Deller, where
he spotted a *forever* incorrect natural alignment on both ELF section
header tables:
* .altinstructions
* __bug_table sections
A lot of back and forth went on in trying to determine the ill effects
of this misalignment being present for years and it has been
determined there should be no real ill effects unless you have a buggy
exception handler. Helge actually hit one of these buggy exception
handlers on parisc which is how he ended up spotting this issue. When
implemented correctly these paths with incorrect misalignment would
just mean a performance penalty, but given that we are dealing with
alternatives on modules and with the __bug_table (where info regardign
BUG()/WARN() file/line information associated with it is stored) this
really shouldn't be a big deal.
The only other change with mentioning is the kmap() with
kmap_local_page() and my only concern with that was on what is done
after preemption, but the virtual addresses are restored after
preemption. This is only used on module decompression.
This all has sit on linux-next for a while except the kmap stuff which
has been there for 3 weeks"
* tag 'modules-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Replace kmap() with kmap_local_page()
module: Show the last unloaded module's taint flag(s)
module: Use strscpy() for last_unloaded_module
module: Modify module_flags() to accept show_state argument
module: Move module's Kconfig items in kernel/module/
MAINTAINERS: Update file list for module maintainers
module: Use vzalloc() instead of vmalloc()/memset(0)
modules: Ensure natural alignment for .altinstructions and __bug_table sections
module: Increase readability of module_kallsyms_lookup_name()
module: Fix ERRORs reported by checkpatch.pl
module: Add support for default value for module async_probe
AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:
Similar to the Jmp2Ret mitigation, if the code on the sibling thread
cannot be trusted, software should set STIBP to 1 or disable SMT to
ensure SMT safety when using this mitigation.
So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.
[ bp: Remove the "we" and remove "[AMD]" applicability parameter which
doesn't work here. ]
Fixes: 3ebc170068 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
fatfs, autofs, squashfs, procfs, etc.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA
jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9
kwS/13j38guulSlXRzDLmitbg81zAAI=
=Zfum
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
- Add support for syscall stack randomization.
- Add support for atomic operations to the 32 & 64-bit BPF JIT.
- Full support for KASAN on 64-bit Book3E.
- Add a watchdog driver for the new PowerVM hypervisor watchdog.
- Add a number of new selftests for the Power10 PMU support.
- Add a driver for the PowerVM Platform KeyStore.
- Increase the NMI watchdog timeout during live partition migration, to avoid timeouts
due to increased memory access latency.
- Add support for using the 'linux,pci-domain' device tree property for PCI domain
assignment.
- Many other small features and fixes.
Thanks to: Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira Rajeev, Bagas
Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Greg Kurz,
Haowen Bai, Hari Bathini, Jason A. Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg
Haefliger, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch, Naveen N. Rao, Nayna
Jain, Nicholas Piggin, Ning Qiang, Pali Rohár, Petr Mladek, Rashmica Gupta, Sachin Sant,
Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
Jianfeng, Zhouyi Zhou.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmLuAPgTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBPpD/9kY/T0qlOXABxlZCgtqeAjPX+2xpnY
BF+TlsN1TS1auFcEZL2BapmVacsvOeGEFDVuZHZvZJc69Hx+gSjnjFCnZjp6n+Yz
wt6y9w9Pu0t/sjD5vNQ46O15/dXqm6RoVI7um12j/WLMN8Ko5+x3gKAyQONjQd2/
1kPcxVH6FUosAdnCuvIcqCX4e4IIHl2ZkitHOTXoQUvUy9oAK/mOBnwqZ6zLGUKC
E5M+Zyt4RFGxhPs48FkX6Nq6crDGU/P0VJpDKkR/t7GHnE67Bm70gZougAPrzrgP
nx8zoTWgDKpqDeuqK7pFcyKgNS3dKbxsN3sAfKHOWu/YnV4wMyy+7fmwagMauki7
lXccKN6F/r+8JcMNx80Jp/dAw3ZdLceP38M3Ryf8IL6lTfkNySumUvrKJn6r1Cu1
wvzhgyEuDawss9KHdEmXcA2i3+XVZvitaipO7JWUC8pblrP1SJMoPfIIe9zh3y3M
pyZj0TcGJ8XaK+badvI+PW/K/KeRgXEY8HpC3wDHSoIkli3OE4jDwXn6TiZgvm3n
k0sKL8YSmQZ8hP8QAkR+r8NQKYqLlfyPxdslK5omDPxfub5Uzk9ZV2Ep7svkaiQn
Wqjq27Dpz8+w0XPjsQ0Tkv+ByTkOhrawOH7x9SpFLHpv9g5otcYmS79NkO/htx8C
6LyPNx1VYn5IRA==
=tRkm
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for syscall stack randomization
- Add support for atomic operations to the 32 & 64-bit BPF JIT
- Full support for KASAN on 64-bit Book3E
- Add a watchdog driver for the new PowerVM hypervisor watchdog
- Add a number of new selftests for the Power10 PMU support
- Add a driver for the PowerVM Platform KeyStore
- Increase the NMI watchdog timeout during live partition migration, to
avoid timeouts due to increased memory access latency
- Add support for using the 'linux,pci-domain' device tree property for
PCI domain assignment
- Many other small features and fixes
Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira
Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas,
Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A.
Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch,
Naveen N. Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár,
Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher
Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
Jianfeng, and Zhouyi Zhou.
* tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits)
powerpc/64e: Fix kexec build error
EDAC/ppc_4xx: Include required of_irq header directly
powerpc/pci: Fix PHB numbering when using opal-phbid
powerpc/64: Init jump labels before parse_early_param()
selftests/powerpc: Avoid GCC 12 uninitialised variable warning
powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address
powerpc/xive: Fix refcount leak in xive_get_max_prio
powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader
powerpc/perf: Include caps feature for power10 DD1 version
powerpc: add support for syscall stack randomization
powerpc: Move system_call_exception() to syscall.c
powerpc/powernv: rename remaining rng powernv_ functions to pnv_
powerpc/powernv/kvm: Use darn for H_RANDOM on Power9
powerpc/powernv: Avoid crashing if rng is NULL
selftests/powerpc: Fix matrix multiply assist test
powerpc/signal: Update comment for clarity
powerpc: make facility_unavailable_exception 64s
powerpc/platforms/83xx/suspend: Remove write-only global variable
powerpc/platforms/83xx/suspend: Prevent unloading the driver
powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration
...
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy,
Christoph Hellwig)
- restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
- allow the IOMMU code to communicate an optional DMA mapping length
and use that in scsi and libata (John Garry)
- split the global swiotlb lock (Tianyu Lan)
- various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
Lukas Bulwahn, Robin Murphy)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmLuIYULHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPS5A//Ty1ZNyXExmwZ6J6g7/oIvQlpAHilDr22mCd8tR8Y
Ne7TgLa/X+usFvJTxJfkvg/LNMDjD7qx0J/mhDGm4reOFcEL4/PBy0rDSOgnmntV
k/fPhgwnpuztiAQ+s+WkJ3pkrmG1HaEId7GGj2JaoYdas6RX2mGX7vL8uvUFepjw
lYPAqWMtJHkOfsDK0PqqyQsr7dcC6lyFLqnn/wqvHtTJeKCfGs6W/SIrlWme2SZY
3dNx84ZR1uPjaazAmtf2IWfjh/TBmd0ETRYycgUUKRP9iwsCkBQDBwsBGSIYXiWj
BUKQ5oMvjAlUGRF0jYz9e77KuedE6GxWiXNQstitBmid142M37DHA5tvZRf65MPS
THHcjTDmmoaO4YfFhhXOcFOrjG4/V8bF7fgHB6XkHDjhVVTcnIx8zuOAXIVBZvIV
VAALmamBqEfIZZrCqgr7hzFssK2bip+TIMkdoD46Wcr+D7bAlujhuzWxubn9+ulT
23v/pAvC80ut6LvKj6EA+GpRm/pejfOtEbjXPoO2hguNxvuUKvPQqNh9hy0q+v1e
8n2Y/4lhy5bv02S7wKooNkfCoV753jBY1TIru45UmEYc3EkTQPii6okYe0DvW4QX
VCnKgo156wSBfE+9eWdxCROv2SZqJFMV/wL3vw54dpJQMbDy7VkNsh4mGREdUkU1
uek=
=Bv19
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin
Murphy, Christoph Hellwig)
- restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
- allow the IOMMU code to communicate an optional DMA mapping length
and use that in scsi and libata (John Garry)
- split the global swiotlb lock (Tianyu Lan)
- various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
Lukas Bulwahn, Robin Murphy)
* tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits)
swiotlb: fix passing local variable to debugfs_create_ulong()
dma-mapping: reformat comment to suppress htmldoc warning
PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
RDMA/rw: drop pci_p2pdma_[un]map_sg()
RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
nvme-pci: convert to using dma_map_sgtable()
nvme-pci: check DMA ops when indicating support for PCI P2PDMA
iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
iommu: Explicitly skip bus address marked segments in __iommu_map_sg()
dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
dma-direct: support PCI P2PDMA pages in dma-direct map_sg
dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
PCI/P2PDMA: Attempt to set map_type if it has not been set
lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
swiotlb: clean up some coding style and minor issues
dma-mapping: update comment after dmabounce removal
scsi: sd: Add a comment about limiting max_sectors to shost optimal limit
ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors
scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit
...
Including:
- Most intrusive patch is small and changes the default
allocation policy for DMA addresses. Before the change the
allocator tried its best to find an address in the first 4GB.
But that lead to performance problems when that space gets
exhaused, and since most devices are capable of 64-bit DMA
these days, we changed it to search in the full DMA-mask
range from the beginning. This change has the potential to
uncover bugs elsewhere, in the kernel or the hardware. There
is a Kconfig option and a command line option to restore the
old behavior, but none of them is enabled by default.
- Add Robin Murphy as reviewer of IOMMU code and maintainer for
the dma-iommu and iova code
- Chaning IOVA magazine size from 1032 to 1024 bytes to save
memory
- Some core code cleanups and dead-code removal
- Support for ACPI IORT RMR node
- Support for multiple PCI domains in the AMD-Vi driver
- ARM SMMU changes from Will Deacon:
- Add even more Qualcomm device-tree compatible strings
- Support dumping of IMP DEF Qualcomm registers on TLB sync
timeout
- Fix reference count leak on device tree node in Qualcomm
driver
- Intel VT-d driver updates from Lu Baolu:
- Make intel-iommu.h private
- Optimize the use of two locks
- Extend the driver to support large-scale platforms
- Cleanup some dead code
- MediaTek IOMMU refactoring and support for TTBR up to 35bit
- Basic support for Exynos SysMMU v7
- VirtIO IOMMU driver gets a map/unmap_pages() implementation
- Other smaller cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmLs3DIACgkQK/BELZcB
GuMizhAAguAnLLOkOLlR9/MhrTZfNXCUX+bfrEIevjFXMw4iPNfCCr4ydQ7EdVK6
ZA/3Z89huYl0d0x/FELolnQi+HOeqYrfTDe4rB7TgNgwZnWa+fdHcyYkgBGyfPaV
ilgjNcx8o//9o4NasyB6kU395jVmFxb735gMTTb+tcO9fr+/qIB6hxrHuCklxrNr
C7wK6kkoDPi5n0QuXCSjXEx2Hk245pAWKPLwqxsUYzHGlLfl7ULOxw65BUBGvn/H
uCsTfJFu7u+ErwQYf0qPuOwRBnRdsx9g5EAnfab8p074SoKWvbNnftIxgIRp8ZEM
YgCbhYa1GOFI4r+XzqRzEbc0/vPSttims4Jqz0KxYs7pr5EoVifrWLJFjJdCdc2h
Tio1gTvOq8HbH63kwYNKJhg4iSC6zVd37ihEhvfFO6LcgFl4iCfd2o9zK7oY40J4
XoOxofVnJ2e3tzdhZ/n5quCXiudHixm6WuVa7QYKscF7Ud0tY1wWKuibdlMQTeNM
68MvtlteKcfs1BrWzZyrFMrFeAfIY8LI82y6jdJuoNMU5LE9+5yelXBdJhnVygZ+
Jglv1TIt6W/z1H5JgXtNVZ1wWgBm7rurOqNyfN8XCd8eP1z321CLfX8ujkhKrIWP
ApG15cwvpnh1JX630+UFiEikTGU0fb2orMdPwYmwuu8DAsoLVHE=
=hI2K
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- The most intrusive patch is small and changes the default allocation
policy for DMA addresses.
Before the change the allocator tried its best to find an address in
the first 4GB. But that lead to performance problems when that space
gets exhaused, and since most devices are capable of 64-bit DMA these
days, we changed it to search in the full DMA-mask range from the
beginning.
This change has the potential to uncover bugs elsewhere, in the
kernel or the hardware. There is a Kconfig option and a command line
option to restore the old behavior, but none of them is enabled by
default.
- Add Robin Murphy as reviewer of IOMMU code and maintainer for the
dma-iommu and iova code
- Chaning IOVA magazine size from 1032 to 1024 bytes to save memory
- Some core code cleanups and dead-code removal
- Support for ACPI IORT RMR node
- Support for multiple PCI domains in the AMD-Vi driver
- ARM SMMU changes from Will Deacon:
- Add even more Qualcomm device-tree compatible strings
- Support dumping of IMP DEF Qualcomm registers on TLB sync
timeout
- Fix reference count leak on device tree node in Qualcomm driver
- Intel VT-d driver updates from Lu Baolu:
- Make intel-iommu.h private
- Optimize the use of two locks
- Extend the driver to support large-scale platforms
- Cleanup some dead code
- MediaTek IOMMU refactoring and support for TTBR up to 35bit
- Basic support for Exynos SysMMU v7
- VirtIO IOMMU driver gets a map/unmap_pages() implementation
- Other smaller cleanups and fixes
* tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (116 commits)
iommu/amd: Fix compile warning in init code
iommu/amd: Add support for AVIC when SNP is enabled
iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement
ACPI/IORT: Fix build error implicit-function-declaration
drivers: iommu: fix clang -wformat warning
iommu/arm-smmu: qcom_iommu: Add of_node_put() when breaking out of loop
iommu/arm-smmu-qcom: Add SM6375 SMMU compatible
dt-bindings: arm-smmu: Add compatible for Qualcomm SM6375
MAINTAINERS: Add Robin Murphy as IOMMU SUBSYTEM reviewer
iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled
iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled
iommu/amd: Set translation valid bit only when IO page tables are in use
iommu/amd: Introduce function to check and enable SNP
iommu/amd: Globally detect SNP support
iommu/amd: Process all IVHDs before enabling IOMMU features
iommu/amd: Introduce global variable for storing common EFR and EFR2
iommu/amd: Introduce Support for Extended Feature 2 Register
iommu/amd: Change macro for IOMMU control register bit shift to decimal value
iommu/exynos: Enable default VM instance on SysMMU v7
iommu/exynos: Add SysMMU v7 register set
...
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
* Unwinder implementations for both nVHE modes (classic and
protected), complete with an overflow stack
* Rework of the sysreg access from userspace, with a complete
rewrite of the vgic-v3 view to allign with the rest of the
infrastructure
* Disagregation of the vcpu flags in separate sets to better track
their use model.
* A fix for the GICv2-on-v3 selftest
* A small set of cosmetic fixes
RISC-V:
* Track ISA extensions used by Guest using bitmap
* Added system instruction emulation framework
* Added CSR emulation framework
* Added gfp_custom flag in struct kvm_mmu_memory_cache
* Added G-stage ioremap() and iounmap() functions
* Added support for Svpbmt inside Guest
s390:
* add an interface to provide a hypervisor dump for secure guests
* improve selftests to use TAP interface
* enable interpretive execution of zPCI instructions (for PCI passthrough)
* First part of deferred teardown
* CPU Topology
* PV attestation
* Minor fixes
x86:
* Permit guests to ignore single-bit ECC errors
* Intel IPI virtualization
* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
* PEBS virtualization
* Simplify PMU emulation by just using PERF_TYPE_RAW events
* More accurate event reinjection on SVM (avoid retrying instructions)
* Allow getting/setting the state of the speaker port data bit
* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent
* "Notify" VM exit (detect microarchitectural hangs) for Intel
* Use try_cmpxchg64 instead of cmpxchg64
* Ignore benign host accesses to PMU MSRs when PMU is disabled
* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
* Allow NX huge page mitigation to be disabled on a per-vm basis
* Port eager page splitting to shadow MMU as well
* Enable CMCI capability by default and handle injected UCNA errors
* Expose pid of vcpu threads in debugfs
* x2AVIC support for AMD
* cleanup PIO emulation
* Fixes for LLDT/LTR emulation
* Don't require refcounted "struct page" to create huge SPTEs
* Miscellaneous cleanups:
** MCE MSR emulation
** Use separate namespaces for guest PTEs and shadow PTEs bitmasks
** PIO emulation
** Reorganize rmap API, mostly around rmap destruction
** Do not workaround very old KVM bugs for L0 that runs with nesting enabled
** new selftests API for CPUID
Generic:
* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache
* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL
jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e
oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB
vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO
Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg
iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA==
=PM8o
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Quite a large pull request due to a selftest API overhaul and some
patches that had come in too late for 5.19.
ARM:
- Unwinder implementations for both nVHE modes (classic and
protected), complete with an overflow stack
- Rework of the sysreg access from userspace, with a complete rewrite
of the vgic-v3 view to allign with the rest of the infrastructure
- Disagregation of the vcpu flags in separate sets to better track
their use model.
- A fix for the GICv2-on-v3 selftest
- A small set of cosmetic fixes
RISC-V:
- Track ISA extensions used by Guest using bitmap
- Added system instruction emulation framework
- Added CSR emulation framework
- Added gfp_custom flag in struct kvm_mmu_memory_cache
- Added G-stage ioremap() and iounmap() functions
- Added support for Svpbmt inside Guest
s390:
- add an interface to provide a hypervisor dump for secure guests
- improve selftests to use TAP interface
- enable interpretive execution of zPCI instructions (for PCI
passthrough)
- First part of deferred teardown
- CPU Topology
- PV attestation
- Minor fixes
x86:
- Permit guests to ignore single-bit ECC errors
- Intel IPI virtualization
- Allow getting/setting pending triple fault with
KVM_GET/SET_VCPU_EVENTS
- PEBS virtualization
- Simplify PMU emulation by just using PERF_TYPE_RAW events
- More accurate event reinjection on SVM (avoid retrying
instructions)
- Allow getting/setting the state of the speaker port data bit
- Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls
are inconsistent
- "Notify" VM exit (detect microarchitectural hangs) for Intel
- Use try_cmpxchg64 instead of cmpxchg64
- Ignore benign host accesses to PMU MSRs when PMU is disabled
- Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
- Allow NX huge page mitigation to be disabled on a per-vm basis
- Port eager page splitting to shadow MMU as well
- Enable CMCI capability by default and handle injected UCNA errors
- Expose pid of vcpu threads in debugfs
- x2AVIC support for AMD
- cleanup PIO emulation
- Fixes for LLDT/LTR emulation
- Don't require refcounted "struct page" to create huge SPTEs
- Miscellaneous cleanups:
- MCE MSR emulation
- Use separate namespaces for guest PTEs and shadow PTEs bitmasks
- PIO emulation
- Reorganize rmap API, mostly around rmap destruction
- Do not workaround very old KVM bugs for L0 that runs with nesting enabled
- new selftests API for CPUID
Generic:
- Fix races in gfn->pfn cache refresh; do not pin pages tracked by
the cache
- new selftests API using struct kvm_vcpu instead of a (vm, id)
tuple"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits)
selftests: kvm: set rax before vmcall
selftests: KVM: Add exponent check for boolean stats
selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test
selftests: KVM: Check stat name before other fields
KVM: x86/mmu: remove unused variable
RISC-V: KVM: Add support for Svpbmt inside Guest/VM
RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap()
RISC-V: KVM: Add G-stage ioremap() and iounmap() functions
KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache
RISC-V: KVM: Add extensible CSR emulation framework
RISC-V: KVM: Add extensible system instruction emulation framework
RISC-V: KVM: Factor-out instruction emulation into separate sources
RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run
RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function
RISC-V: KVM: Fix variable spelling mistake
RISC-V: KVM: Improve ISA extension by using a bitmap
KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs()
KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog
KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT
KVM: x86: Do not block APIC write for non ICR registers
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmLpNF4ACgkQCF8+vY7k
4RX6eQ//cYDxEbyeCbjfQH7ePDYCK8vW5EbjqBIcdFWV8rv4QdSwQbM7q4ySnxd/
nOWulm1lgpGrV3bPXE1nQ+RH7T2dQDeJobWcyZd2F2gGlnf515GYIOOnAGWnmM2Z
qgDhYKVVf3cleMS44n3T2kRVIFO39MmheuucboR8PCBPaT4BoRVIEee0xZIrbsBx
/ggFXSM/ywA456g09cQbhGLz48YBGshwvbN9vhrdMNcJ74cPQ+sBPaT6lRwxEBYT
soaDwohH7jLGqIHxccd9qljss31q1glQrK/Szwr/kXe2d4Es0L1ctDK3oP0B9Bcv
9P6xgXIbMM3cVK57ITXsTKfBFDrxLILy+8bMHWGgRTuZVjoiybRvoDKOKqckyTfu
CVsHgxkasSPr/M+WoRUX2zFTOB0Q4CTTS6A6Na+sZUGgX5dBe7Y0VDRCzskL3Q+8
KvlAzA6sF1qyI/Lm4+CprodJ8Ae4qbe0HzkWkaHzsSJfRsazzvzX391T4wCFpBOC
QW3Ac7r27ZVqrIraDxGuOrEf/NRLTLpDskjnAdSERlrxtU6oQqwgRajBXU46cEOx
GGV/fgwAbFXGHjN9tmRAgsJZlKBTd1wLGKWU3YUG5Emwo7ycHoLDj8+NilxyzC26
rnzK5oOaxBm9rnjoxhsz6HWN36mlY0h3bMwS8byclv6USF+ZQZk=
=AUYy
-----END PGP SIGNATURE-----
Merge tag 'media/v5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New driver for Semi AR0521 sensor
- rkisp1 CSI code was split into a separate file
- sun6i has gained support for the A31 MIPI CSI-2 controller
- sun8i has gained support for the A83T MIPI CSI-2 controller
- vimc driver got support for virtual lens
- HEVC uAPI has gained its final form and got added to public headers
- hantro and cedrus got updates on H-265 support
- stkwebcam was promoted from staging
- atomisp staging driver got cleanups on its hmm and kmap related logic
- ov5640 gained support for more modes and got some rework
- imx7-media-csi staging driver got several improvements related to mc
API support
- uvcvideo now handles better power line control
- mediatec vcodec gained support for new hardware and got some codec
updates
- Lots of other bug fixes, improvements and cleanups.
* tag 'media/v5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (446 commits)
media: hantro: Remove dedicated control documentation
hantro: Remove incorrect HEVC SPS validation
media: cedrus: hevc: Add check for invalid timestamp
media: sunxi: sun6i_mipi_csi2.c/sun8i_a83t_mipi_csi2.c: clarify error handling
media: uvcvideo: Fix invalid pointer in uvc_ctrl_init_ctrl()
media: Documentation: mc-core: Fix typo
media: videodev2.h.rst.exceptions: add missing exceptions
media: vimc: wrong pointer is used with PTR_ERR
media: rkisp1: debug: Add dump file in debugfs for MI main path registers
media: rkisp1: Make the internal CSI-2 receiver optional
media: rkisp1: Add infrastructure to support ISP features
media: rkisp1: Support the ISP parallel input
media: dt-bindings: media: rkisp1: Add port for parallel interface
media: rkisp1: Use fwnode_graph_for_each_endpoint
media: rkisp1: csi: Plumb the CSI RX subdev
media: rkisp1: csi: Implement a V4L2 subdev for the CSI receiver
media: rkisp1: isp: Disallow multiple active sources
media: rkisp1: isp: Rename rkisp1_get_remote_source()
media: rkisp1: isp: Constify various local variables
media: rkisp1: isp: Fix whitespace issues
...
Core
----
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source file
with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent
netdev_* schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF
---
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when
possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the
eBPF used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols
---------
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA,
to cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API
----------
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers
----------------------
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers
-------
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability
(parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share
the probe code (parts 1, 2), add support for phylink
mac configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than
10 years.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2
UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD
pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+
jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0
WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9
QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j
u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX
JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf
Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz
DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I
FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm
1nYpXKHA8qML
=hxEG
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni:
"Core:
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_*
schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into their
own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp
gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha
mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS
7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT
/58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z
C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M
Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q==
=DgUi
-----END PGP SIGNATURE-----
Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into
their own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
fs: remove the NULL get_block case in mpage_writepages
fs: don't call ->writepage from __mpage_writepage
fs: remove the nobh helpers
jfs: stop using the nobh helper
ext2: remove nobh support
ntfs3: refactor ntfs_writepages
mm/folio-compat: Remove migration compatibility functions
fs: Remove aops->migratepage()
secretmem: Convert to migrate_folio
hugetlb: Convert to migrate_folio
aio: Convert to migrate_folio
f2fs: Convert to filemap_migrate_folio()
ubifs: Convert to filemap_migrate_folio()
btrfs: Convert btrfs_migratepage to migrate_folio
mm/migrate: Add filemap_migrate_folio()
mm/migrate: Convert migrate_page() to migrate_folio()
nfs: Convert to migrate_folio
btrfs: Convert btree_migratepage to migrate_folio
mm/migrate: Convert expected_page_refs() to folio_expected_refs()
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
...
* threadgroup_rwsem write locking is skipped when configuring controllers in
empty subtrees. Combined with CLONE_INTO_CGROUP, this allows the common
static usage pattern to not grab threadgroup_rwsem at all (glibc still
doesn't seem ready for CLONE_INTO_CGROUP unfortunately).
* threadgroup_rwsem used to be put into non-percpu mode by default due to
latency concerns in specific use cases. There's no reason for everyone
else to pay for it. Make the behavior optional.
* psi no longer allocates memory when disabled.
along with some code cleanups.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYugHIQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGd+oAP9lfD3fTRdNo4qWV2VsZsYzoOxzNIuJSwN/dnYx
IEbQOwD/cd2YMfeo6zcb427U/VfTFqjJjFK04OeljYtJU8fFywo=
=sucy
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Several core optimizations:
- threadgroup_rwsem write locking is skipped when configuring
controllers in empty subtrees.
Combined with CLONE_INTO_CGROUP, this allows the common static
usage pattern to not grab threadgroup_rwsem at all (glibc still
doesn't seem ready for CLONE_INTO_CGROUP unfortunately).
- threadgroup_rwsem used to be put into non-percpu mode by default
due to latency concerns in specific use cases. There's no reason
for everyone else to pay for it. Make the behavior optional.
- psi no longer allocates memory when disabled.
... along with some code cleanups"
* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Skip subtree root in cgroup_update_dfl_csses()
cgroup: remove "no" prefixed mount options
cgroup: Make !percpu threadgroup_rwsem operations optional
cgroup: Add "no" prefixed mount options
cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
psi: dont alloc memory for psi by default
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.
== Background ==
Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.
To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced. eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.
== Problem ==
Here's a simplification of how guests are run on Linux' KVM:
void run_kvm_guest(void)
{
// Prepare to run guest
VMRESUME();
// Clean up after guest runs
}
The execution flow for that would look something like this to the
processor:
1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()
Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:
* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.
* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".
IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.
However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.
Balanced CALL/RET instruction pairs such as in step #5 are not affected.
== Solution ==
The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.
However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.
Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.
In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.
There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
[ bp: Massage, incorporate review comments from Andy Cooper. ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
This KUnit update for Linux 5.20-rc1 consists of several fixes and an
important feature to discourage running KUnit tests on production
systems. Running tests on a production system could leave the system
in a bad state. This new feature adds:
- adds a new taint type, TAINT_TEST to signal that a test has been run.
This should discourage people from running these tests on production
systems, and to make it easier to tell if tests have been run
accidentally (by loading the wrong configuration, etc.)
- several documentation and tool enhancements and fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmLoOXcACgkQCwJExA0N
Qxy5HQ//QehcBsN0rvNM5enP0HyJjDFxoF9HI7RxhHbwAE3LEkMQTNnFJOViJ7cY
XZgvPipySkekPkvbm9uAnJw160hUSTCM3Oikf7JaxSTKS9Zvfaq9k78miQNrU2rT
C9ljhLBF9y2eXxj9348jwlIHmjBwV5iMn6ncSvUkdUpDAkll2qIvtmmdiSgl33Et
CRhdc07XBwhlz/hBDwj8oK2ZYGPsqjxf2CyrhRMJAOEJtY0wt971COzPj8cDGtmi
nmQXiUhGejXPlzL/7hPYNr83YmYa/xGjecgDPKR3hOf5dVEVRUE2lKQ00F4GrwdZ
KC6CWyXCzhhbtH7tfpWBU4ZoBdmyxhVOMDPFNJdHzuAHVAI3WbHmGjnptgV9jT7o
KqgPVDW2n0fggMMUjmxR4fV2VrKoVy8EvLfhsanx961KhnPmQ6MXxL1cWoMT5BwA
JtwPlNomwaee2lH9534Qgt1brybYZRGx1RDbWn2CW3kJabODptL80sZ62X5XxxRi
I/keCbSjDO1mL3eEeGg/n7AsAhWrZFsxCThxSXH6u6d6jrrvCF3X2Ki5m27D1eGD
Yh40Fy+FhwHSXNyVOav6XHYKhyRzJvPxM/mTGe5DtQ6YnP7G7SnfPchX4irZQOkv
T2soJdtAcshnpG6z38Yd3uWM/8ARtSMaBU891ZAkFD9foniIYWE=
=WzBX
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"This consists of several fixes and an important feature to discourage
running KUnit tests on production systems. Running tests on a
production system could leave the system in a bad state.
Summary:
- Add a new taint type, TAINT_TEST to signal that a test has been
run.
This should discourage people from running these tests on
production systems, and to make it easier to tell if tests have
been run accidentally (by loading the wrong configuration, etc)
- Several documentation and tool enhancements and fixes"
* tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
Documentation: KUnit: Fix example with compilation error
Documentation: kunit: Add CLI args for kunit_tool
kcsan: test: Add a .kunitconfig to run KCSAN tests
kunit: executor: Fix a memory leak on failure in kunit_filter_tests
clk: explicitly disable CONFIG_UML_PCI_OVER_VIRTIO in .kunitconfig
mmc: sdhci-of-aspeed: test: Use kunit_test_suite() macro
nitro_enclaves: test: Use kunit_test_suite() macro
thunderbolt: test: Use kunit_test_suite() macro
kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites
kunit: unify module and builtin suite definitions
selftest: Taint kernel when test module loaded
module: panic: Taint the kernel when selftest modules load
Documentation: kunit: fix example run_kunit func to allow spaces in args
Documentation: kunit: Cleanup run_wrapper, fix x-ref
kunit: test.h: fix a kernel-doc markup
kunit: tool: Enable virtio/PCI by default on UML
kunit: tool: make --kunitconfig repeatable, blindly concat
kunit: add coverage_uml.config to enable GCOV on UML
kunit: tool: refactor internal kconfig handling, allow overriding
kunit: tool: introduce --qemu_args
...
earth-shaking:
- More Chinese translations, and an update to the Italian translations.
The Japanese, Korean, and traditional Chinese translations are
more-or-less unmaintained at this point, instead.
- Some build-system performance improvements.
- The removal of the archaic submitting-drivers.rst document, with the
movement of what useful material that remained into other docs.
- Improvements to sphinx-pre-install to, hopefully, give more useful
suggestions.
- A number of build-warning fixes
Plus the usual collection of typo fixes, updates, and more.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmLn9OwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YtrwIAJNZoDYJJIRuVHnFkAn5EJ4b/chnR1dSTBtn
WdE/1zdAlMBWVlEGO48VZybph9Sk0v+cUGf+yviDgASQrfOhRRTkg/0u6XaBAYO0
+C2D1QDd9DggGgajxsfJfTdD3IuB78mGmCQvP17XIJW+NK1CK9rXZBnj6WC5/HJw
PCHzeeVreBxOS3W9GelMYa6vjVl7dv81x4DPllnsgU2AMk0/Ce0MVjeIZ695sOeP
Ki6jZgC2GsgFSK5kBC35OiDe5q+fDzlLfek34EUCn4SIbMALSUYWO1db122w5Pme
Ej0+UTBhD19WH1uB/rcVKnVWugi7UEUJexZsao+nC7UrdIVtYq0=
=83BG
-----END PGP SIGNATURE-----
Merge tag 'docs-6.0' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This was a moderately busy cycle for documentation, but nothing
all that earth-shaking:
- More Chinese translations, and an update to the Italian
translations.
The Japanese, Korean, and traditional Chinese translations
are more-or-less unmaintained at this point, instead.
- Some build-system performance improvements.
- The removal of the archaic submitting-drivers.rst document,
with the movement of what useful material that remained into
other docs.
- Improvements to sphinx-pre-install to, hopefully, give more
useful suggestions.
- A number of build-warning fixes
Plus the usual collection of typo fixes, updates, and more"
* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
docs: efi-stub: Fix paths for x86 / arm stubs
Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
Docs/zh_CN: Update the translation of pci to 5.19-rc8
Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
Docs/zh_CN: Update the translation of usage to 5.19-rc8
Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
Docs/zh_CN: Update the translation of sparse to 5.19-rc8
Docs/zh_CN: Update the translation of kasan to 5.19-rc8
Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
doc:it_IT: align Italian documentation
docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
Documentation: process: Update email client instructions for Thunderbird
docs: ABI: correct QEMU fw_cfg spec path
doc/zh_CN: remove submitting-driver reference from docs
docs: zh_TW: align to submitting-drivers removal
docs: zh_CN: align to submitting-drivers removal
docs: ko_KR: howto: remove reference to removed submitting-drivers
docs: ja_JP: howto: remove reference to removed submitting-drivers
docs: it_IT: align to submitting-drivers removal
docs: process: remove outdated submitting-drivers.rst
...
This pull request contains the following branches:
doc.2022.06.21a: Documentation updates.
fixes.2022.07.19a: Miscellaneous fixes.
nocb.2022.07.19a: Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to
be offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS
and Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel
boot parameter prevents offloaded callbacks from interfering
with real-time workloads and with energy-efficiency mechanisms.
poll.2022.07.21a: Polled grace-period updates, perhaps most notably
making these APIs account for both normal and expedited grace
periods.
rcu-tasks.2022.06.21a: Tasks RCU updates, perhaps most notably reducing
the CPU overhead of RCU tasks trace grace periods by more than
a factor of two on a system with 15,000 tasks. The reduction
is expected to increase with the number of tasks, so it seems
reasonable to hypothesize that a system with 150,000 tasks might
see a 20-fold reduction in CPU overhead.
torture.2022.06.21a: Torture-test updates.
ctxt.2022.07.05a: Updates that merge RCU's dyntick-idle tracking into
context tracking, thus reducing the overhead of transitioning to
kernel mode from either idle or nohz_full userspace execution
for kernels that track context independently of RCU. This is
expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLgMcgTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jArXD/0fjbCwqpRjHVTzjMY8jN4zDkqZZD6m
g8Fx27hZ4ToNFwRptyHwNezrNj14skjAJEXfdjaVw32W62ivXvf0HINvSzsTLCSq
k2kWyBdXLc9CwY5p5W4smnpn5VoAScjg5PoPL59INoZ/Zziji323C7Zepl/1DYJt
0T6bPCQjo1ZQoDUCyVpSjDmAqxnderWG0MeJVt74GkLqmnYLANg0GH8c7mH4+9LL
kVGlLp5nlPgNJ4FEoFdMwNU8T/ETmaVld/m2dkiawjkXjJzB2XKtBigU91DDmXz5
7DIdV4ABrxiy4kGNqtIe/jFgnKyVD7xiDpyfjd6KTeDr/rDS8u2ZH7+1iHsyz3g0
Np/tS3vcd0KR+gI/d0eXxPbgm5sKlCmKw/nU2eArpW/+4LmVXBUfHTG9Jg+LJmBc
JrUh6aEdIZJZHgv/nOQBNig7GJW43IG50rjuJxAuzcxiZNEG5lUSS23ysaA9CPCL
PxRWKSxIEfK3kdmvVO5IIbKTQmIBGWlcWMTcYictFSVfBgcCXpPAksGvqA5JiUkc
egW+xLFo/7K+E158vSKsVqlWZcEeUbsNJ88QOlpqnRgH++I2Yv/LhK41XfJfpH+Y
ALxVaDd+mAq6v+qSHNVq9wT3ozXIPy/zK1hDlMIqx40h2YvaEsH4je+521oSoN9r
vX60+QNxvUBLwA==
=vUNm
-----END PGP SIGNATURE-----
Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS and
Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
parameter prevents offloaded callbacks from interfering with
real-time workloads and with energy-efficiency mechanisms
- Polled grace-period updates, perhaps most notably making these APIs
account for both normal and expedited grace periods
- Tasks RCU updates, perhaps most notably reducing the CPU overhead of
RCU tasks trace grace periods by more than a factor of two on a
system with 15,000 tasks.
The reduction is expected to increase with the number of tasks, so it
seems reasonable to hypothesize that a system with 150,000 tasks
might see a 20-fold reduction in CPU overhead
- Torture-test updates
- Updates that merge RCU's dyntick-idle tracking into context tracking,
thus reducing the overhead of transitioning to kernel mode from
either idle or nohz_full userspace execution for kernels that track
context independently of RCU.
This is expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y
* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
rcu: Diagnose extended sync_rcu_do_polled_gp() loops
rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
rcutorture: Test polled expedited grace-period primitives
rcu: Add polled expedited grace-period primitives
rcutorture: Verify that polled GP API sees synchronous grace periods
rcu: Make Tiny RCU grace periods visible to polled APIs
rcu: Make polled grace-period API account for expedited grace periods
rcu: Switch polled grace-period APIs to ->gp_seq_polled
rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
rcu/nocb: Add option to opt rcuo kthreads out of RT priority
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
rcu/nocb: Add an option to offload all CPUs on boot
rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
rcu/nocb: Add/del rdp to iterate from rcuog itself
rcu/tree: Add comment to describe GP-done condition in fqs loop
rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
rcu/kvfree: Remove useless monitor_todo flag
rcu: Cleanup RCU urgency state for offline CPU
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmLnDOwACgkQSfxwEqXe
A65Fiw//Z0YaPejSslQIGitQ1b0XzdWBhyJArYDieaaiQRXMqlaSKlIUqHz38xb7
+FykUY51/SJLjHV2riPxq1OK3/MPmk6VlTd0HHihcHVmg77oZcFcv2tPnDpZoqND
TsBOujLbXKwxP8tNFedRY/4+K7w+ue9BTfDjuH7aCtz7uWd+4cNJmPg3x9FCfkMA
+hbcRluwE9W3Pg4OCKwv+qxL0JF3qQtNKEOp1wpnjGAZZW/I9gFNgFBEkykvcAsj
TkIRDc3agPFj6QgDeRIgLdnf9KCsLubKAg5oJneeCvQztJJUCSkn8nQXxpx+4sLo
GsRgvCdfL/GyJqfSAzQJVYDHKtKMkJiCiWCC/oOALR8dzHJfSlULDAjbY1m/DAr9
at+vi4678Or7TNx2ZSaUlCXXKZ+UT7yWMlQWax9JuxGk1hGYP5/eT1AH5SGjqUwF
w1q8oyzxt1vUcnOzEddFXPFirnqqhAk4dQFtu83+xKM4ZssMVyeB4NZdEhAdW0ng
MX+RjrVj4l5gWWuoS0Cx3LUxDCgV6WT0dN+Vl9axAZkoJJbcXLEmXwQ6NbzTLPWg
1/MT7qFTxNcTCeAArMdZvvFbeh7pOBXO42pafrK/7vDRnTMUIw9tqXNLQUfvdFQp
F5flPgiVRHDU2vSzKIFtnPTyXU0RBBGvNb4n0ss2ehH2DSsCxYE=
=Zy3d
-----END PGP SIGNATURE-----
Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Though there's been a decent amount of RNG-related development during
this last cycle, not all of it is coming through this tree, as this
cycle saw a shift toward tackling early boot time seeding issues,
which took place in other trees as well.
Here's a summary of the various patches:
- The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
option have been removed, as they overlapped with the more widely
supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
"random.trust_cpu". This change allowed simplifying a bit of arch
code.
- x86's RDRAND boot time test has been made a bit more robust, with
RDRAND disabled if it's clearly producing bogus results. This would
be a tip.git commit, technically, but I took it through random.git
to avoid a large merge conflict.
- The RNG has long since mixed in a timestamp very early in boot, on
the premise that a computer that does the same things, but does so
starting at different points in wall time, could be made to still
produce a different RNG state. Unfortunately, the clock isn't set
early in boot on all systems, so now we mix in that timestamp when
the time is actually set.
- User Mode Linux now uses the host OS's getrandom() syscall to
generate a bootloader RNG seed and later on treats getrandom() as
the platform's RDRAND-like faculty.
- The arch_get_random_{seed_,}_long() family of functions is now
arch_get_random_{seed_,}_longs(), which enables certain platforms,
such as s390, to exploit considerable performance advantages from
requesting multiple CPU random numbers at once, while at the same
time compiling down to the same code as before on platforms like
x86.
- A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
Uros.
- A comment spelling fix"
More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:
https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/
* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: correct spelling of "overwrites"
random: handle archrandom with multiple longs
um: seed rng using host OS rng
random: use try_cmpxchg in _credit_init_bits
timekeeping: contribute wall clock to rng on time change
x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
random: remove CONFIG_ARCH_RANDOM
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmLoEeIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNSOhAAwWwRcmcHnk+k2agT9QjKrLo26NCO
MQLE89o4y2ChEFHxC7F7SKoQRxtfYa323p1vmlGzKrlB+IZ6oqERVp4QNQQbXsfn
n9VvVpxjRNHAetcRhCM9ZOchWjUdw6AMaJ8e3fdRNRESadAUUFDxifw1wpjgG9+i
LmtDbfZ7vLs2grTf9OZy3JIl1VF3lVRUTI7ZBQggfJncMa+LXNWdVNmEe3yfyboA
1MwpSao7K2si0hBGAQo/UGQz4b19Tm4xMg8bSy7oTsP5Lae5ciPkeI3qazvs9usp
WScZYhQ8NugqLbDbjs7dm6QCpj4x3dUs6ei48LKe3GF2mcGesFfOPo9sNHao4kKv
C9t0f9qw+EhGvnNL7uQIDDf8OuTjuLWDvZSrMLID/IJKFF5NJ3y+XzaS9aPM3VEY
qyOsX+cEzheXGhD6xE1sCo+AyPUDYqNDMIKBj2wlIGCKlzDGa8RT6VsQuvgf3c3K
43CnRCQeWDWOHCq3MnRe/fmYtW+JB7tsXiKAq4OJADacwPP36bsP3bqU8AlWYwDt
tnuMa+LKusHnMEQpMPI8FW8qGdxwGSen+mymfLFIMgtwNGkV7WGRJ6Lbyn0SaR6v
HyXgZASIOQRnamK3yZCDpxo0K81IVxPWJIjHyg53znqT5TCpXccPyV4HwbJKI/KG
8PtHrXOdPOGCZ2g=
=WWq1
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A relatively small set of patches for SELinux this time, eight patches
in total with really only one significant change.
The highlights are:
- Add support for proper labeling of memfd_secret anonymous inodes.
This will allow LSMs that implement the anonymous inode hooks to
apply security policy to memfd_secret() fds.
- Various small improvements to memory management: fixed leaks, freed
memory when needed, boundary checks.
- Hardened the selinux_audit_data struct with __randomize_layout.
- A minor documentation tweak to fix a formatting/style issue"
* tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: selinux_add_opt() callers free memory
selinux: Add boundary check in put_entry()
selinux: fix memleak in security_read_state_kernel()
docs: selinux: add '=' signs to kernel boot options
mm: create security context for memfd_secret inodes
selinux: fix typos in comments
selinux: drop unnecessary NULL check
selinux: add __randomize_layout to selinux_audit_data
being split acorss files.
- Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.
- Optimize DM core's more common bio splitting by eliminating the use
of bio cloning with bio_split+bio_chain. Shift that cloning cost to
the relatively unlikely dm_io requeue case that only occurs during
error handling. Introduces dm_io_rewind() that will clone a bio that
reflects the subset of the original bio that must be requeued.
- Remove DM core's dm_table_get_num_targets() wrapper and audit all
dm_table_get_target() callers.
- Fix potential for OOM with DM writecache target by setting a default
MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
whichever is smaller).
- Fix DM writecache target's stats that are reported through
DM-specific table info.
- Fix use-after-free crash in dm_sm_register_threshold_callback().
- Refine DM core's Persistent Reservation handling in preparation for
broader work Mike Christie is doing to add compatibility with
Microsoft Windows Failover Cluster.
- Fix various KASAN reported bugs in the DM raid target.
- Fix DM raid target crash due to md_handle_request() bio splitting
that recurses to block core without properly initializing the bio's
bi_dev.
- Fix some code comment typos and fix some Documentation formatting.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmLoAAUACgkQxSPxCi2d
A1rFUAf/RnLkzNPS1QJ1uVbiiW64zQUD2o5U8kAliOZxoXQ45U+CgL22a5VaT2R+
rI9+YSg/VX9YkdJwB6I+y7VRWjUCO3NfYOfQUX5NS8GfcPQQn2Zp+jy3t8VnjjXN
5+8Ylu5tX1+QSQiBB9JvIgiK71rSorT6H+KF6bZypL7te5kj39hDaRbmh40VOMOY
QWfs8DmyJft9IHPG7Mku/rFJbdVJph3f31IoqUOoGTOKoDPDUDezhCVHi2vpFcV+
S+iCKkeXmP6eUdCnNzBreYPTcP9OtCvkZEPhFg4lh+jyhjyFq1o0B8G5cRo5jvgr
GcN1GUT040sSqNXFquA4UU6RGvaxlg==
=gBRJ
-----END PGP SIGNATURE-----
Merge tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Refactor DM core's mempool allocation so that it clearer by not being
split acorss files.
- Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.
- Optimize DM core's more common bio splitting by eliminating the use
of bio cloning with bio_split+bio_chain. Shift that cloning cost to
the relatively unlikely dm_io requeue case that only occurs during
error handling. Introduces dm_io_rewind() that will clone a bio that
reflects the subset of the original bio that must be requeued.
- Remove DM core's dm_table_get_num_targets() wrapper and audit all
dm_table_get_target() callers.
- Fix potential for OOM with DM writecache target by setting a default
MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
whichever is smaller).
- Fix DM writecache target's stats that are reported through
DM-specific table info.
- Fix use-after-free crash in dm_sm_register_threshold_callback().
- Refine DM core's Persistent Reservation handling in preparation for
broader work Mike Christie is doing to add compatibility with
Microsoft Windows Failover Cluster.
- Fix various KASAN reported bugs in the DM raid target.
- Fix DM raid target crash due to md_handle_request() bio splitting
that recurses to block core without properly initializing the bio's
bi_dev.
- Fix some code comment typos and fix some Documentation formatting.
* tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
dm: fix dm-raid crash if md_handle_request() splits bio
dm raid: fix address sanitizer warning in raid_resume
dm raid: fix address sanitizer warning in raid_status
dm: Start pr_preempt from the same starting path
dm: Fix PR release handling for non All Registrants
dm: Start pr_reserve from the same starting path
dm: Allow dm_call_pr to be used for path searches
dm: return early from dm_pr_call() if DM device is suspended
dm thin: fix use-after-free crash in dm_sm_register_threshold_callback
dm writecache: count number of blocks discarded, not number of discard bios
dm writecache: count number of blocks written, not number of write bios
dm writecache: count number of blocks read, not number of read bios
dm writecache: return void from functions
dm kcopyd: use __GFP_HIGHMEM when allocating pages
dm writecache: set a default MAX_WRITEBACK_JOBS
Documentation: dm writecache: Render status list as list
Documentation: dm writecache: add blank line before optional parameters
dm snapshot: fix typo in snapshot_map() comment
dm raid: remove redundant "the" in parse_raid_params() comment
dm cache: fix typo in 2 comment blocks
...
- Remove unused generic cpuidle support (replaced by PSCI version)
- Fix documentation describing the kernel virtual address space
- Handling of some new CPU errata in Arm implementations
- Rework of our exception table code in preparation for handling
machine checks (i.e. RAS errors) more gracefully
- Switch over to the generic implementation of ioremap()
- Fix lockdep tracking in NMI context
- Instrument our memory barrier macros for KCSAN
- Rework of the kPTI G->nG page-table repainting so that the MMU remains
enabled and the boot time is no longer slowed to a crawl for systems
which require the late remapping
- Enable support for direct swapping of 2MiB transparent huge-pages on
systems without MTE
- Fix handling of MTE tags with allocating new pages with HW KASAN
- Expose the SMIDR register to userspace via sysfs
- Continued rework of the stack unwinder, particularly improving the
behaviour under KASAN
- More repainting of our system register definitions to match the
architectural terminology
- Improvements to the layout of the vDSO objects
- Support for allocating additional bits of HWCAP2 and exposing
FEAT_EBF16 to userspace on CPUs that support it
- Considerable rework and optimisation of our early boot code to reduce
the need for cache maintenance and avoid jumping in and out of the
kernel when handling relocation under KASLR
- Support for disabling SVE and SME support on the kernel command-line
- Support for the Hisilicon HNS3 PMU
- Miscellanous cleanups, trivial updates and minor fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmLeccUQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNCysB/4ml92RJLhVwRAofbtFfVgVz3JLTSsvob9x
Z7FhNDxfM/G32wKtOHU9tHkGJ+PMVWOPajukzxkMhxmilfTyHBbiisNWVRjKQxj4
wrd07DNXPIv3bi8SWzS1y2y8ZqujZWjNJlX8SUCzEoxCVtuNKwrh96kU1jUjrkFZ
kBo4E4wBWK/qW29nClGSCgIHRQNJaB/jvITlQhkqIb0pwNf3sAUzW7QoF1iTZWhs
UswcLh/zC4q79k9poegdCt8chV5OBDLtLPnMxkyQFvsLYRp3qhyCSQQY/BxvO5JS
jT9QR6d+1ewET9BFhqHlIIuOTYBCk3xn/PR9AucUl+ZBQd2tO4B1
=LVH0
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Highlights include a major rework of our kPTI page-table rewriting
code (which makes it both more maintainable and considerably faster in
the cases where it is required) as well as significant changes to our
early boot code to reduce the need for data cache maintenance and
greatly simplify the KASLR relocation dance.
Summary:
- Remove unused generic cpuidle support (replaced by PSCI version)
- Fix documentation describing the kernel virtual address space
- Handling of some new CPU errata in Arm implementations
- Rework of our exception table code in preparation for handling
machine checks (i.e. RAS errors) more gracefully
- Switch over to the generic implementation of ioremap()
- Fix lockdep tracking in NMI context
- Instrument our memory barrier macros for KCSAN
- Rework of the kPTI G->nG page-table repainting so that the MMU
remains enabled and the boot time is no longer slowed to a crawl
for systems which require the late remapping
- Enable support for direct swapping of 2MiB transparent huge-pages
on systems without MTE
- Fix handling of MTE tags with allocating new pages with HW KASAN
- Expose the SMIDR register to userspace via sysfs
- Continued rework of the stack unwinder, particularly improving the
behaviour under KASAN
- More repainting of our system register definitions to match the
architectural terminology
- Improvements to the layout of the vDSO objects
- Support for allocating additional bits of HWCAP2 and exposing
FEAT_EBF16 to userspace on CPUs that support it
- Considerable rework and optimisation of our early boot code to
reduce the need for cache maintenance and avoid jumping in and out
of the kernel when handling relocation under KASLR
- Support for disabling SVE and SME support on the kernel
command-line
- Support for the Hisilicon HNS3 PMU
- Miscellanous cleanups, trivial updates and minor fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
arm64: fix KASAN_INLINE
arm64/hwcap: Support FEAT_EBF16
arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
arm64/hwcap: Document allocation of upper bits of AT_HWCAP
arm64: enable THP_SWAP for arm64
arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
arm64: errata: Remove AES hwcap for COMPAT tasks
arm64: numa: Don't check node against MAX_NUMNODES
drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
mm: kasan: Skip unpoisoning of user pages
mm: kasan: Ensure the tags are visible before the tag in page->flags
drivers/perf: hisi: add driver for HNS3 PMU
drivers/perf: hisi: Add description for HNS3 PMU driver
drivers/perf: riscv_pmu_sbi: perf format
perf/arm-cci: Use the bitmap API to allocate bitmaps
...
- Respect idle=nomwait when supplied on the kernel cmdline
- Two small cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLntx0ACgkQEsHwGGHe
VUqlRxAAkULobsk6Dx3wrQcYlpA8Mt/ctttTQXWiIQwhK1j7uP0zlGWBqImr5Wsk
T04g1s29azulnPs3PydCF2QlLqSyF4v2PyyUwnpKfTP6CPM+MLtz98Gm6Xcbkt+s
f28ISYgNP+15tskWdNqB5XIVGkuyBdNne9TiFwtnVrJYF47FSwqEWRyqMH+bIOGT
wSZUCfjcw7PtKwfIAmYq4beS2+wbY9bsfVyIz+H0ks2EVFQdjYWb/kH9PgUYEQFe
VEOBsPvTHDOJt0QXEXSJjmoSRUS77Wduw56Y3L2T4jWdXXQFWJ79rqNYDBvXGAdh
Y8BKM5IYFZpzrmfw2RB6jbDY/JWO5PPFvHTXogQf9+wttSerZEffVQdOeTwjT8VD
wc9/ZnNkT7915033VI90V+hdFkwarq8FXuFH8TkzcxP9DQNYG8CRTZBceq0UWBl0
5RpIDwNX9JxGrR+frJi0D24qxz//wLe56UqW9hLp73NP8QtEYEW1nb1q30Q2eM3N
iQblgmh63qQ/dy6JV1GFb3aePiWMUNQwcTrj1pd8YDfNlp4IsFsSswnsdAZWtr1A
l9qewHkBZbbzyTQkBjExUsaIdiaMywFwnUmcQNL+fHqznZIvMhJC/oCJeS0Pe/RH
alTUrYsk6Y87HFpxoXpd85a9+20m8yrA64uY8cSQguGZ9i5Lm8g=
=jkpj
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- Remove the vendor check when selecting MWAIT as the default idle
state
- Respect idle=nomwait when supplied on the kernel cmdline
- Two small cleanups
* tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Use MSR_IA32_MISC_ENABLE constants
x86: Fix comment for X86_FEATURE_ZEN
x86: Remove vendor checks from prefer_mwait_c1_over_halt
x86: Handle idle=nomwait cmdline properly for x86_idle
KVM/s390, KVM/x86 and common infrastructure changes for 5.20
x86:
* Permit guests to ignore single-bit ECC errors
* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache
* Intel IPI virtualization
* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
* PEBS virtualization
* Simplify PMU emulation by just using PERF_TYPE_RAW events
* More accurate event reinjection on SVM (avoid retrying instructions)
* Allow getting/setting the state of the speaker port data bit
* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent
* "Notify" VM exit (detect microarchitectural hangs) for Intel
* Cleanups for MCE MSR emulation
s390:
* add an interface to provide a hypervisor dump for secure guests
* improve selftests to use TAP interface
* enable interpretive execution of zPCI instructions (for PCI passthrough)
* First part of deferred teardown
* CPU Topology
* PV attestation
* Minor fixes
Generic:
* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple
x86:
* Use try_cmpxchg64 instead of cmpxchg64
* Bugfixes
* Ignore benign host accesses to PMU MSRs when PMU is disabled
* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis
* Port eager page splitting to shadow MMU as well
* Enable CMCI capability by default and handle injected UCNA errors
* Expose pid of vcpu threads in debugfs
* x2AVIC support for AMD
* cleanup PIO emulation
* Fixes for LLDT/LTR emulation
* Don't require refcounted "struct page" to create huge SPTEs
x86 cleanups:
* Use separate namespaces for guest PTEs and shadow PTEs bitmasks
* PIO emulation
* Reorganize rmap API, mostly around rmap destruction
* Do not workaround very old KVM bugs for L0 that runs with nesting enabled
* new selftests API for CPUID
memory.reclaim is a cgroup v2 interface that allows users to proactively
reclaim memory from a memcg, without real memory pressure. Reclaim
operations invoke vmpressure, which is used: (a) To notify userspace of
reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being
under memory pressure for networking (see
mem_cgroup_under_socket_pressure()).
For (a), vmpressure notifications in v1 are not affected by this change
since memory.reclaim is a v2 feature.
For (b), the effects of the vmpressure signal (according to Shakeel [1])
are as follows:
1. Reducing send and receive buffers of the current socket.
2. May drop packets on the rx path.
3. May throttle current thread on the tx path.
Since proactive reclaim is invoked directly by userspace, not by memory
pressure, it makes sense not to throttle networking. Hence, this change
makes sure that proactive reclaim caused by memory.reclaim does not
trigger vmpressure.
[1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/
[yosryahmed@google.com: update documentation]
Link: https://lkml.kernel.org/r/20220721173015.2643248-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220714064918.2576464-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This fixes the paths of x86 / arm efi-stub source files.
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessos.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220727140539.10021-1-jprvita@endlessos.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
30312730bd ("cgroup: Add "no" prefixed mount options") added "no" prefixed
mount options to allow turning them off and 6a010a49b6 ("cgroup: Make
!percpu threadgroup_rwsem operations optional") added one more "no" prefixed
mount option. However, Michal pointed out that the "no" prefixed options
aren't necessary in allowing mount options to be turned off:
# grep group /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,relatime,nsdelegate,memory_recursiveprot 0 0
# mount -o remount,nsdelegate,memory_recursiveprot none /sys/fs/cgroup
# grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,relatime,nsdelegate,memory_recursiveprot 0 0
Note that this is different from the remount behavior when the mount(1) is
invoked without the device argument - "none":
# grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
# mount -o remount,nsdelegate,memory_recursiveprot /sys/fs/cgroup
# grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
While a bit confusing, given that there is a way to turn off the options,
there's no reason to have the explicit "no" prefixed options. Let's remove
them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet transferred
pages on the arrival side. Thus, the NMI watchdog may be triggered too
frequently, which increases the risk to hit an NMI interrupt in a bad
place in the kernel, leading to a kernel panic.
Disabling the Hard Lockup Watchdog until the memory transfer could be a
too strong work around, some users would want this timeout to be
eventually triggered if the system is hanging even during an LPM.
Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during an LPM. Just before the CPUs
are stopped for the switchover sequence, the NMI watchdog timer is set
to watchdog_thresh + factor%
A value of 0 has no effect. The default value is 200, meaning that the
NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh
value). Once the memory transfer is achieved, the factor is reset to 0.
Setting this value to a high number is like disabling the NMI watchdog
during an LPM.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-5-ldufour@linux.ibm.com
* for-next/boot: (34 commits)
arm64: fix KASAN_INLINE
arm64: Add an override for ID_AA64SMFR0_EL1.FA64
arm64: Add the arm64.nosve command line option
arm64: Add the arm64.nosme command line option
arm64: Expose a __check_override primitive for oddball features
arm64: Allow the idreg override to deal with variable field width
arm64: Factor out checking of a feature against the override into a macro
arm64: Allow sticky E2H when entering EL1
arm64: Save state of HCR_EL2.E2H before switch to EL1
arm64: Rename the VHE switch to "finalise_el2"
arm64: mm: fix booting with 52-bit address space
arm64: head: remove __PHYS_OFFSET
arm64: lds: use PROVIDE instead of conditional definitions
arm64: setup: drop early FDT pointer helpers
arm64: head: avoid relocating the kernel twice for KASLR
arm64: kaslr: defer initialization to initcall where permitted
arm64: head: record CPU boot mode after enabling the MMU
arm64: head: populate kernel page tables with MMU and caches on
arm64: head: factor out TTBR1 assignment into a macro
arm64: idreg-override: use early FDT mapping in ID map
...
* for-next/perf:
drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
drivers/perf: hisi: add driver for HNS3 PMU
drivers/perf: hisi: Add description for HNS3 PMU driver
drivers/perf: riscv_pmu_sbi: perf format
perf/arm-cci: Use the bitmap API to allocate bitmaps
drivers/perf: riscv_pmu: Add riscv pmu pm notifier
perf: hisi: Extract hisi_pmu_init
perf/marvell_cn10k: Fix TAD PMU register offset
perf/marvell_cn10k: Remove useless license text when SPDX-License-Identifier is already used
arm64: cpufeature: Allow different PMU versions in ID_DFR0_EL1
perf/arm-cci: fix typo in comment
drivers/perf:Directly use ida_alloc()/free()
drivers/perf: Directly use ida_alloc()/free()
3942a9bd7b ("locking, rcu, cgroup: Avoid synchronize_sched() in
__cgroup_procs_write()") disabled percpu operations on threadgroup_rwsem
because the impiled synchronize_rcu() on write locking was pushing up the
latencies too much for android which constantly moves processes between
cgroups.
This makes the hotter paths - fork and exit - slower as they're always
forced into the slow path. There is no reason to force this on everyone
especially given that more common static usage pattern can now completely
avoid write-locking the rwsem. Write-locking is elided when turning on and
off controllers on empty sub-trees and CLONE_INTO_CGROUP enables seeding a
cgroup without grabbing the rwsem.
Restore the default percpu operations and introduce the mount option
"favordynmods" and config option CGROUP_FAVOR_DYNMODS for users who need
lower latencies for the dynamic operations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Michal Koutn� <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
We allow modifying these mount options via remount. Let's add "no" prefixed
variants so that they can be turned off too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
This pull request contains a pair of commits that fix 282d8998e9 ("srcu:
Prevent expedited GPs and blocking readers from consuming CPU"), which
was itself a fix to an SRCU expedited grace-period problem that could
prevent kernel live patching (KLP) from completing. That SRCU fix for
KLP introduced large (as in minutes) boot-time delays to embedded Linux
kernels running on qemu/KVM. These delays were due to the emulation of
certain MMIO operations controlling memory layout, which were emulated
with one expedited grace period per access. Common configurations
required thousands of boot-time MMIO accesses, and thus thousands of
boot-time expedited SRCU grace periods.
In these configurations, the occasional sleeps that allowed KLP to proceed
caused excessive boot delays. These commits preserve enough sleeps to
permit KLP to proceed, but few enough that the virtual embedded kernels
still boot reasonably quickly.
This represents a regression introduced in the v5.19 merge window,
and the bug is causing significant inconvenience, hence this pull request.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLZ6LoTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jNHgD/4tb8Un6vZlrEaYbyA/ztUITX/2DisS
kiqbQz1BH8V3B3PxSo4ldEiw+z3fC3SMyIPymuu9bhwm6SFdjEsarFkIqySxkYnX
jnuk0JbWxs4Kk64rIkHHzAxzvM2Iw1EjSzjP1M+DC7iymSJpsgp+0zFJJtcJ8Y87
67hbQRQYk+1T7ZT+vq77NiyAAFEzSd8UydgBVxlsOOdkXQ91NYTyB8D6ldUJAnLU
opwCEpgpu74Sp4Te5q6f9uAt8xZmXsyrm8zJgzTz0KSgivcpt4GmIoyEFYUQczj0
Hewr6+qM9AWfvfQxNvRCS25yeox18kbdp1qdp9rl0BZMtYN2Zsk1Ec4c79s7NBLc
G3TIvJkGLHuZO1dO4BhLkYczgRYlaPxOR/0GKNn4m69/TbVmseUL1WeZS0pswB0q
cH1AKKEg9KdPoaX0hTLoOrlv/vwbgjhKKuoqEv7yEUhJJdACy50rmnhWhSxeuQDb
aIITVKkjkwpDtRX5QTdG1f5uIMoGz9BbUDv7VeodB0mrYHluXEfyNTwlqcISKAgm
T9kLmsdfvMrQ4fLR5S3i3dwnL3b52OB8h5NyfW3YRkXEnA7//ef/XpPiW2HY8BMT
7QwPqOoUSr/IraAcI8j0QxRpioUk1oaNi+UJ3FSHni8re6rZ0kaxatRCT20h6Djq
C9RVLaevw3bGXQ==
=ndhB
-----END PGP SIGNATURE-----
Merge tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This contains a pair of commits that fix 282d8998e9 ("srcu: Prevent
expedited GPs and blocking readers from consuming CPU"), which was
itself a fix to an SRCU expedited grace-period problem that could
prevent kernel live patching (KLP) from completing.
That SRCU fix for KLP introduced large (as in minutes) boot-time
delays to embedded Linux kernels running on qemu/KVM. These delays
were due to the emulation of certain MMIO operations controlling
memory layout, which were emulated with one expedited grace period per
access. Common configurations required thousands of boot-time MMIO
accesses, and thus thousands of boot-time expedited SRCU grace
periods.
In these configurations, the occasional sleeps that allowed KLP to
proceed caused excessive boot delays. These commits preserve enough
sleeps to permit KLP to proceed, but few enough that the virtual
embedded kernels still boot reasonably quickly.
This represents a regression introduced in the v5.19 merge window, and
the bug is causing significant inconvenience"
* tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
srcu: Make expedited RCU grace periods block even less frequently
srcu: Block less aggressively for expedited grace periods
- Fix the used field of struct io_tlb_area wasn't initialized
- Set area number to be 0 if input area number parameter is 0
- Use array_size() to calculate io_tlb_area array size
- Make parameters of swiotlb_do_find_slots() more reasonable
Fixes: 26ffb91fa5e0 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
callback offloading on any of the CPUs, nor can any of the CPUs be
switched to enable callback offloading at runtime. Although this is
intentional, it would be nice to have a way to offload all the CPUs
without having to make random bootloaders specify either the rcu_nocbs=
or the rcu_nohz_full= kernel-boot parameters.
This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
Kconfig option that switches the default so as to offload callback
processing on all of the CPUs. This default can still be overridden
using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
(In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
The purpose of commit 282d8998e9 ("srcu: Prevent expedited GPs
and blocking readers from consuming CPU") was to prevent a long
series of never-blocking expedited SRCU grace periods from blocking
kernel-live-patching (KLP) progress. Although it was successful, it also
resulted in excessive boot times on certain embedded workloads running
under qemu with the "-bios QEMU_EFI.fd" command line. Here "excessive"
means increasing the boot time up into the three-to-four minute range.
This increase in boot time was due to the more than 6000 back-to-back
invocations of synchronize_rcu_expedited() within the KVM host OS, which
in turn resulted from qemu's emulation of a long series of MMIO accesses.
Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
periods") did not significantly help this particular use case.
Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
of non-sleeping per phase counts on a system with preemption enabled,
and observed the following boot times:
+──────────────────────────+────────────────+
| SRCU_MAX_NODELAY_PHASE | Boot time (s) |
+──────────────────────────+────────────────+
| 100 | 30.053 |
| 150 | 25.151 |
| 200 | 20.704 |
| 250 | 15.748 |
| 500 | 11.401 |
| 1000 | 11.443 |
| 10000 | 11.258 |
| 1000000 | 11.154 |
+──────────────────────────+────────────────+
Analysis on the experiment results show additional improvements with
CPU-bound delays approaching one jiffy in duration. This improvement was
also seen when number of per-phase iterations were scaled to one jiffy.
This commit therefore scales per-grace-period phase number of non-sleeping
polls so that non-sleeping polls extend for about one jiffy. In addition,
the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
replaced with a simple check for an expedited grace period. This change
schedules callback invocation immediately after expedited grace periods
complete, which results in greatly improved boot times. Testing done
by Marc and Zhangfei confirms that this change recovers most of the
performance degradation in boottime; for CONFIG_HZ_250 configuration,
specifically, boot times improve from 3m50s to 41s on Marc's setup;
and from 2m40s to ~9.7s on Zhangfei's setup.
In addition to the changes to default per phase delays, this
change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays in
order to more quickly react to additional use cases.
Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
Fixes: 282d8998e9 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reported-by: yueluck <yueluck@163.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The decision of whether or not to trust RDRAND is controlled by the
"random.trust_cpu" boot time parameter or the CONFIG_RANDOM_TRUST_CPU
compile time default. The "nordrand" flag was added during the early
days of RDRAND, when there were worries that merely using its values
could compromise the RNG. However, these days, RDRAND values are not
used directly but always go through the RNG's hash function, making
"nordrand" no longer useful.
Rather, the correct switch is "random.trust_cpu", which not only handles
the relevant trust issue directly, but also is general to multiple CPU
types, not just x86.
However, x86 RDRAND does have a history of being occasionally
problematic. Prior, when the kernel would notice something strange, it'd
warn in dmesg and suggest enabling "nordrand". We can improve on that by
making the test a little bit better and then taking the step of
automatically disabling RDRAND if we detect it's problematic.
Also disable RDSEED if the RDRAND test fails.
Cc: x86@kernel.org
Cc: Theodore Ts'o <tytso@mit.edu>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The gethostname system call returns the hostname for the current machine.
However, the kernel has no mechanism to initially set the current
machine's name in such a way as to guarantee that the first userspace
process to call gethostname will receive a meaningful result. It relies
on some unspecified userspace process to first call sethostname before
gethostname can produce a meaningful name.
Traditionally the machine's hostname is set from userspace by the init
system. The init system, in turn, often relies on a configuration file
(say, /etc/hostname) to provide the value that it will supply in the call
to sethostname. Consequently, the file system containing /etc/hostname
usually must be available before the hostname will be set. There may,
however, be earlier userspace processes that could call gethostname before
the file system containing /etc/hostname is mounted. Such a process will
get some other, likely meaningless, name from gethostname (such as
"(none)", "localhost", or "darkstar").
A real-world example where this can happen, and lead to undesirable
results, is with mdadm. When assembling arrays, mdadm distinguishes
between "local" arrays and "foreign" arrays. A local array is one that
properly belongs to the current machine, and a foreign array is one that
is (possibly temporarily) attached to the current machine, but properly
belongs to some other machine. To determine if an array is local or
foreign, mdadm may compare the "homehost" recorded on the array with the
current hostname. If mdadm is run before the root file system is mounted,
perhaps because the root file system itself resides on an md-raid array,
then /etc/hostname isn't yet available and the init system will not yet
have called sethostname, causing mdadm to incorrectly conclude that all of
the local arrays are foreign.
Solving this problem *could* be delegated to the init system. It could be
left up to the init system (including any init system that starts within
an initramfs, if one is in use) to ensure that sethostname is called
before any other userspace process could possibly call gethostname.
However, it may not always be obvious which processes could call
gethostname (for example, udev itself might not call gethostname, but it
could via udev rules invoke processes that do). Additionally, the init
system has to ensure that the hostname configuration value is stored in
some place where it will be readily accessible during early boot.
Unfortunately, every init system will attempt to (or has already attempted
to) solve this problem in a different, possibly incorrect, way. This
makes getting consistently working configurations harder for users.
I believe it is better for the kernel to provide the means by which the
hostname may be set early, rather than making this a problem for the init
system to solve. The option to set the hostname during early startup, via
a kernel parameter, provides a simple, reliable way to solve this problem.
It also could make system configuration easier for some embedded systems.
[dmoulding@me.com: v2]
Link: https://lkml.kernel.org/r/20220506060310.7495-2-dmoulding@me.com
Link: https://lkml.kernel.org/r/20220505180651.22849-2-dmoulding@me.com
Signed-off-by: Dan Moulding <dmoulding@me.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add documentation for vimc-lens.
Add a lens into the vimc topology graph.
Signed-off-by: Yunke Cao <yunkec@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The statement that the Latex version of Linux Device List is unmaintained,
must go back to some historic time where there were actually two separate
document sources, one in a txt format and another in a tex format (maybe
even just on lanana.org).
Nowadays, html and tex are generated from the ReST document file and the
statement might be confused, believing the actual generated LaTeX document
from the maintained ReST document could be an unmaintained one.
Remove this statement on the LaTeX version and only keep pointing out that
the version on lanana.org is no longer maintained.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220704122537.3407-6-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Change dm-writecache, so that it counts the number of blocks discarded
instead of the number of discard bios. Make it consistent with the
read and write statistics counters that were changed to count the
number of blocks instead of bios.
Fixes: e3a35d0340 ("dm writecache: add event counters")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Change dm-writecache, so that it counts the number of blocks written
instead of the number of write bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.
Fixes: e3a35d0340 ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Change dm-writecache, so that it counts the number of blocks read
instead of the number of read bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.
Fixes: e3a35d0340 ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>