Some architectures define their own arch_test_bit and they also need
arch_test_bit_acquire, otherwise they won't compile. We also clean up
the code by using the generic test_bit if that is equivalent to the
arch-specific version.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8238b45798 ("wait_on_bit: add an acquire memory barrier")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This branch consists of:
Qu Wenruo:
lib: bitmap: fix the duplicated comments on bitmap_to_arr64()
https://lore.kernel.org/lkml/0d85e1dbad52ad7fb5787c4432bdb36cbd24f632.1656063005.git.wqu@suse.com/
Alexander Lobakin:
bitops: let optimize out non-atomic bitops on compile-time constants
https://lore.kernel.org/lkml/20220624121313.2382500-1-alexandr.lobakin@intel.com/T/
Yury Norov:
lib: cleanup bitmap-related headers
https://lore.kernel.org/linux-arm-kernel/YtCVeOGLiQ4gNPSf@yury-laptop/T/#m305522194c4d38edfdaffa71fcaaf2e2ca00a961
Alexander Lobakin:
x86/olpc: fix 'logical not is only applied to the left hand side'
https://www.spinics.net/lists/kernel/msg4440064.html
Yury Norov:
lib/nodemask: inline wrappers around bitmap
https://lore.kernel.org/all/20220723214537.2054208-1-yury.norov@gmail.com/
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmLpVvwACgkQsUSA/Tof
vsiAHgwAwS9pl8GJ+fKYnue2CYo9349d2oT6BBUs/Rv8uqYEa4QkpYsR7NS733TG
pos0hhoRvSOzrUP4qppXUjfJ+NkzLgpnKFOeWfFoNAKlHuaaMRvF3Y0Q/P8g0/Kg
HPWcCQLHyCH9Wjs3e2TTgRjxTrHuruD2VJ401/PX/lw0DicUhmev5mUFa10uwFkP
ZJRprjoFn9HJ0Hk16pFZDi36d3YumhACOcWRiJdoBDrEPV3S6lm9EeOy/yHBNp5k
9bKj+RboeT2t70KaZcKv+M5j1nu0cAhl7kRkjcxcmGyimI0l82Vgq9yFxhGqvWg8
RnCrJ5EaO08FGCAKG9GEwzdiNa24Gdq5XZSpQA7JZHmhmchpnnlNenJicyv0gOQi
abChZeWSEsyA+78l2+kk9nezfVKUOnKDEZQxBVTOyWsmZYxHZV94oam340VjQDaY
4/fETdOy/qqPIxnpxAeFGWxZjcVaYiYPLj7KLPMsB0aAAF7pZrem465vSfgbrE81
+gCdqrWd
=4dTW
-----END PGP SIGNATURE-----
Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)
- optimize out non-atomic bitops on compile-time constants (Alexander
Lobakin)
- cleanup bitmap-related headers (Yury Norov)
- x86/olpc: fix 'logical not is only applied to the left hand side'
(Alexander Lobakin)
- lib/nodemask: inline wrappers around bitmap (Yury Norov)
* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
lib/nodemask: inline next_node_in() and node_random()
powerpc: drop dependency on <asm/machdep.h> in archrandom.h
x86/olpc: fix 'logical not is only applied to the left hand side'
lib/cpumask: move some one-line wrappers to header file
headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
headers/deps: mm: Optimize <linux/gfp.h> header dependencies
lib/cpumask: move trivial wrappers around find_bit to the header
lib/cpumask: change return types to unsigned where appropriate
cpumask: change return types to bool where appropriate
lib/bitmap: change type of bitmap_weight to unsigned long
lib/bitmap: change return types to bool where appropriate
arm: align find_bit declarations with generic kernel
iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
lib/test_bitmap: test the tail after bitmap_to_arr64()
lib/bitmap: fix off-by-one in bitmap_to_arr64()
lib: test_bitmap: add compile-time optimization/evaluations assertions
bitmap: don't assume compiler evaluates small mem*() builtins calls
net/ice: fix initializing the bitmap in the switch code
bitops: let optimize out non-atomic bitops on compile-time constants
...
fatfs, autofs, squashfs, procfs, etc.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA
jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9
kwS/13j38guulSlXRzDLmitbg81zAAI=
=Zfum
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
There are three independent sets of changes:
- Sai Prakash Ranjan adds tracing support to the asm-generic
version of the MMIO accessors, which is intended to help
understand problems with device drivers and has been part
of Qualcomm's vendor kernels for many years.
- A patch from Sebastian Siewior to rework the handling of
IRQ stacks in softirqs across architectures, which is
needed for enabling PREEMPT_RT.
- The last patch to remove the CONFIG_VIRT_TO_BUS option and
some of the code behind that, after the last users of this
old interface made it in through the netdev, scsi, media and
staging trees.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmLqPPEACgkQmmx57+YA
GNlUbQ/+NpIsiA0JUrCGtySt8KrLHdA2dH9lJOR5/iuxfphscPFfWtpcPvcXQWmt
a8u7wyI8SHW1ku4U0Y5sO0dBSldDnoIqJ5t4X5d7YNU9yVtEtucqQhZf+GkrPlVD
1HkRu05B7y0k2BMn7BLhSvkpafs3f1lNGXjs8oFBdOF1/zwp/GjcrfCK7KFzqjwU
dYrX0SOFlKFd4BZC75VfK+XcKg4LtwIOmJraRRl7alz2Q5Oop2hgjgZxXDPf//vn
SPOhXJN/97i1FUpY2TkfHVH1NxbPfjCV4pUnjmLG0Y4NSy9UQ/ZcXHcywIdeuhfa
0LySOIsAqBeccpYYYdg2ubiMDZOXkBfANu/sB9o/EhoHfB4svrbPRDhBIQZMFXJr
MJYu+IYce2rvydA/nydo4q++pxR8v1ES1ZIo8bDux+q1CI/zbpQV+f98kPVRA0M7
ajc+5GTIqNIsvHzzadq7eYxcj5Bi8Li2JA9sVkAQ+6iq1TVyeYayMc9eYwONlmqw
MD+PFYc651pKtXZCfkLXPIKSwS0uPqBndAibuVhpZ0hxWaCBBdKvY9mrWcPxt0kA
tMR8lrosbbrV2K48BFdWTOHvCs2FhHQxPGVPZ/iWuxTA0hHZ9tUlaEkSX+VM57IU
KCYQLdWzT8J9vrgqSbgYKlb6pSPz6FIjTfut6NZMmshIbavHV/Q=
=aTR0
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three independent sets of changes:
- Sai Prakash Ranjan adds tracing support to the asm-generic version
of the MMIO accessors, which is intended to help understand
problems with device drivers and has been part of Qualcomm's vendor
kernels for many years
- A patch from Sebastian Siewior to rework the handling of IRQ stacks
in softirqs across architectures, which is needed for enabling
PREEMPT_RT
- The last patch to remove the CONFIG_VIRT_TO_BUS option and some of
the code behind that, after the last users of this old interface
made it in through the netdev, scsi, media and staging trees"
* tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
uapi: asm-generic: fcntl: Fix typo 'the the' in comment
arch/*/: remove CONFIG_VIRT_TO_BUS
soc: qcom: geni: Disable MMIO tracing for GENI SE
serial: qcom_geni_serial: Disable MMIO tracing for geni serial
asm-generic/io: Add logging support for MMIO accessors
KVM: arm64: Add a flag to disable MMIO trace for nVHE KVM
lib: Add register read/write tracing support
drm/meson: Fix overflow implicit truncation warnings
irqchip/tegra: Fix overflow implicit truncation warnings
coresight: etm4x: Use asm-generic IO memory barriers
arm64: io: Use asm-generic high level MMIO accessors
arch/*: Disable softirq stacks on PREEMPT_RT.
The isa_dma_bridge_buggy symbol is only used for x86_32, and only x86_32
platforms or quirks ever set it.
Add a new linux/isa-dma.h header that #defines isa_dma_bridge_buggy to 0
except on x86_32, where we keep it as a variable, and remove all the arch-
specific definitions.
[bhelgaas: commit log]
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/20220722214944.831438-3-shorne@gmail.com
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
pci_get_legacy_ide_irq() is only used on platforms that support PNP, so
many architectures define it but never use it. Replace uses of it with
ATA_PRIMARY_IRQ() and ATA_SECONDARY_IRQ(), which provide the same
functionality.
Since pci_get_legacy_ide_irq() is no longer used, remove all the
architecture-specific definitions of it as well as asm-generic/pci.h, which
only provides pci_get_legacy_ide_irq()
[bhelgaas: commit log]
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20220722214944.831438-2-shorne@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
kernel test robot throws below warning ->
arch/ia64/include/asm/mmu_context.h: In function 'reload_context':
arch/ia64/include/asm/mmu_context.h:127:48: warning: variable
'old_rr4' set but not used [-Wunused-but-set-variable]
127 | unsigned long rr0, rr1, rr2, rr3, rr4, old_rr4;
Add it under CONFIG_HUGETLB_PAGE
Link: https://lkml.kernel.org/r/20220626022114.4020-1-jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This enables ARCH_HAS_VM_GET_PAGE_PROT on the platform and exports
standard vm_get_page_prot() implementation via DECLARE_VM_GET_PAGE_PROT,
which looks up a private and static protection_map[] array. Subsequently
all __SXXX and __PXXX macros can be dropped which are no longer needed.
Link: https://lkml.kernel.org/r/20220711070600.2378316-20-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:
ret bool, int, unsigned long
nr int, long, unsigned int, unsigned long
addr volatile unsigned long *, volatile void *
Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:
ret bool retval can be only 0 or 1
nr unsigned long native; signed makes no sense
addr volatile unsigned long * bitmaps are arrays of ulongs
Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
* architecture-specific files define only 'arch_' versions;
* non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.
Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!
Fixes: a58786917c ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
All architecture-independent users of virt_to_bus() and bus_to_virt()
have been fixed to use the dma mapping interfaces or have been
removed now. This means the definitions on most architectures, and the
CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.
The only exceptions to this are a few network and scsi drivers for m68k
Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
with the old interfaces and are probably not worth changing.
On alpha and parisc, virt_to_bus() were still used in asm/floppy.h.
alpha can use isa_virt_to_bus() like x86 does, and parisc can just
open-code the virt_to_phys() here, as this is architecture specific
code.
I tried updating the bus-virt-phys-mapping.rst documentation, which
started as an email from Linus to explain some details of the Linux-2.0
driver interfaces. The bits about virt_to_bus() were declared obsolete
backin 2000, and the rest is not all that relevant any more, so in the
end I just decided to remove the file completely.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
On IA64, new sparse's warnings where issued after fixing some __rcu
annotations in kernel/bpf/.
These new warnings are false positives and appear on IA64 because on this
architecture, the macros for cmpxchg() and xchg() make casts that ignore
sparse annotations.
This patch contains the minimal patch to fix this issue: adding a missing
cast and some missing '__force'.
Link: https://lore.kernel.org/r/20220601120013.bq5a3ynbkc3hngm5@mail
Link: https://lkml.kernel.org/r/20220605160738.79736-1-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
of Peter Zijlstra was encountering with ptrace in his freezer rewrite
I identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habbit of the ptrace code to change task->__state
from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No
other code in the kernel does that and it is straight forward to update
signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers for
a process are only reliably available when the process is stopped in
the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly.
According to our testing and our understanding of userspace nothing
cares that spurious SIGTRAPs can be generated in that case.
The entire discussion can be found at:
https://lkml.kernel.org/r/87a6bv6dl6.fsf_-_@email.froward.int.ebiederm.org
Eric W. Biederman (11):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Remove arch_ptrace_attach
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Document that wait_task_inactive can't fail
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Don't change __state
ptrace: Always take siglock in ptrace_resume
Peter Zijlstra (1):
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/x86/kernel/step.c | 3 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 8 +++
include/linux/sched/signal.h | 20 ++++--
include/linux/signal.h | 3 +-
kernel/ptrace.c | 87 ++++++++---------------
kernel/sched/core.c | 5 +-
kernel/signal.c | 140 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
20 files changed, 140 insertions(+), 240 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaXaYACgkQC/v6Eiaj
j0CgoA/+JncSQ6PY2D5Jh1apvHzmnRsFXzr3DRvtv/CVx4oIebOXRQFyVDeD5tRn
TmMgB29HpBlHRDLojlmlZRGAld1HR/aPEW9j8W1D3Sy/ZFO5L8lQitv9aDHO9Ntw
4lZvlhS1M0KhATudVVBqSPixiG6CnV5SsGmixqdOyg7xcXSY6G1l2nB7Zk9I3Tat
ZlmhuZ6R5Z5qsm4MEq0vUSrnsHiGxYrpk6uQOaVz8Wkv8ZFmbutt6XgxF0tsyZNn
mHSmWSiZzIgBjTlaibEmxi8urYJTPj3vGBeJQVYHblFwLFi6+Oy7bDxQbWjQvaZh
DsgWPScfBF4Jm0+8hhCiSYpvPp8XnZuklb4LNCeok/VFr+KfSmpJTIhn00kagQ1u
vxQDqLws8YLW4qsfGydfx9uUIFCbQE/V2VDYk5J3Re3gkUNDOOR1A56hPniKv6VB
2aqGO2Fl0RdBbUa3JF+XI5Pwq5y1WrqR93EUvj+5+u5W9rZL/8WLBHBMEz6gbmfD
DhwFE0y8TG2WRlWJVEDRId+5zo3di/YvasH0vJZ5HbrxhS2RE/yIGAd+kKGx/lZO
qWDJC7IHvFJ7Mw5KugacyF0SHeNdloyBM7KZW6HeXmgKn9IMJBpmwib92uUkRZJx
D8j/bHHqD/zsgQ39nO+c4M0MmhO/DsPLG/dnGKrRCu7v1tmEnkY=
=ZUuO
-----END PGP SIGNATURE-----
Merge tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace_stop cleanups from Eric Biederman:
"While looking at the ptrace problems with PREEMPT_RT and the problems
Peter Zijlstra was encountering with ptrace in his freezer rewrite I
identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habit of the ptrace code to change
task->__state from the tracer to suppress TASK_WAKEKILL from waking up
the tracee. No other code in the kernel does that and it is straight
forward to update signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers
for a process are only reliably available when the process is stopped
in the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly. According
to our testing and our understanding of userspace nothing cares that
spurious SIGTRAPs can be generated in that case"
* tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
ptrace: Always take siglock in ptrace_resume
ptrace: Don't change __state
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Document that wait_task_inactive can't fail
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Remove arch_ptrace_attach
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
signal: Replace __group_send_sig_info with send_signal_locked
signal: Rename send_signal send_signal_locked
file-backed transparent hugepages.
Johannes Weiner has arranged for zswap memory use to be tracked and
managed on a per-cgroup basis.
Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
enablement of the recent huge page vmemmap optimization feature.
Baolin Wang contributes a series to fix some issues around hugetlb
pagetable invalidation.
Zhenwei Pi has fixed some interactions between hwpoisoned pages and
virtualization.
Tong Tiangen has enabled the use of the presently x86-only
page_table_check debugging feature on arm64 and riscv.
David Vernet has done some fixup work on the memcg selftests.
Peter Xu has taught userfaultfd to handle write protection faults against
shmem- and hugetlbfs-backed files.
More DAMON development from SeongJae Park - adding online tuning of the
feature and support for monitoring of fixed virtual address ranges. Also
easier discovery of which monitoring operations are available.
Nadav Amit has done some optimization of TLB flushing during mprotect().
Neil Brown continues to labor away at improving our swap-over-NFS support.
David Hildenbrand has some fixes to anon page COWing versus
get_user_pages().
Peng Liu fixed some errors in the core hugetlb code.
Joao Martins has reduced the amount of memory consumed by device-dax's
compound devmaps.
Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
Muchun Song has found and fixed some errors in the TLB flushing of
transparent hugepages.
Roman Gushchin has done more work on the memcg selftests.
And, of course, many smaller fixes and cleanups. Notably, the customary
million cleanup serieses from Miaohe Lin.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
=nFe6
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Almost all of MM here. A few things are still getting finished off,
reviewed, etc.
- Yang Shi has improved the behaviour of khugepaged collapsing of
readonly file-backed transparent hugepages.
- Johannes Weiner has arranged for zswap memory use to be tracked and
managed on a per-cgroup basis.
- Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
runtime enablement of the recent huge page vmemmap optimization
feature.
- Baolin Wang contributes a series to fix some issues around hugetlb
pagetable invalidation.
- Zhenwei Pi has fixed some interactions between hwpoisoned pages and
virtualization.
- Tong Tiangen has enabled the use of the presently x86-only
page_table_check debugging feature on arm64 and riscv.
- David Vernet has done some fixup work on the memcg selftests.
- Peter Xu has taught userfaultfd to handle write protection faults
against shmem- and hugetlbfs-backed files.
- More DAMON development from SeongJae Park - adding online tuning of
the feature and support for monitoring of fixed virtual address
ranges. Also easier discovery of which monitoring operations are
available.
- Nadav Amit has done some optimization of TLB flushing during
mprotect().
- Neil Brown continues to labor away at improving our swap-over-NFS
support.
- David Hildenbrand has some fixes to anon page COWing versus
get_user_pages().
- Peng Liu fixed some errors in the core hugetlb code.
- Joao Martins has reduced the amount of memory consumed by
device-dax's compound devmaps.
- Some cleanups of the arch-specific pagemap code from Anshuman
Khandual.
- Muchun Song has found and fixed some errors in the TLB flushing of
transparent hugepages.
- Roman Gushchin has done more work on the memcg selftests.
... and, of course, many smaller fixes and cleanups. Notably, the
customary million cleanup serieses from Miaohe Lin"
* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
mm: kfence: use PAGE_ALIGNED helper
selftests: vm: add the "settings" file with timeout variable
selftests: vm: add "test_hmm.sh" to TEST_FILES
selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
selftests: vm: add migration to the .gitignore
selftests/vm/pkeys: fix typo in comment
ksm: fix typo in comment
selftests: vm: add process_mrelease tests
Revert "mm/vmscan: never demote for memcg reclaim"
mm/kfence: print disabling or re-enabling message
include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
mm: fix a potential infinite loop in start_isolate_page_range()
MAINTAINERS: add Muchun as co-maintainer for HugeTLB
zram: fix Kconfig dependency warning
mm/shmem: fix shmem folio swapoff hang
cgroup: fix an error handling path in alloc_pagecache_max_30M()
mm: damon: use HPAGE_PMD_SIZE
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
nodemask.h: fix compilation error with GCC12
...
- don't over-decrypt memory (Robin Murphy)
- takes min align mask into account for the swiotlb max mapping size
(Tianyu Lan)
- use GFP_ATOMIC in dma-debug (Mikulas Patocka)
- fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
- don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
- cleanup swiotlb initialization and share more code with swiotlb-xen
(me, Stefano Stabellini)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmKObTQLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYObmA//dIcDB/q4iFGD+WJh4MhM+asx0ZsdF2OJz42WEhgT
Z9duOrgcneEQundCamqJP9rNTs980LHDA8uWQC5rZEc9vxuRVOdS7bSgYRUwWh6B
r0ZjOsvQCn+ChoZML8uyk4rfmEINq+EvJuec3G5fgecZOhPuJS2i2uzzv5cHwqgP
ChC0fwyZlkfdECXgvZXbEoCJLfTgGNlziN6Ai8dirSoqgEQUoCsY89/M7OiEBvV2
R4XUWD7OvQERfB4t6xLuUHyzf9PAuWB+OiblRVNeAmK3lMjxVrc3k4kIowgklnzD
8hfmphAa9Zou3zdfi6Gd4fiQRHRVOwKVp1rtqUmJ+lPSiwyMzu64z9ld2+2qac0h
V4sSr/yJkhxnBT4/0MkTChvhnRobisackpUzNRpiM4ck7cNVb7eAvkISsbH+pWI9
aEexPhbyskjlV+GOyM4QL4ygG0dpXY0HSyoh6uaSVsaXMycnWIsJCPidXxV1HGV0
q2/RLHuHwYxia8cYCF01/DQvwOKSjwbU0zModxtRezGD5GYh2C0a+SrA1aX+qiTu
yGJCs2UHtSQstAt78tTVp499YeDeL/oGSQkPAu8zyRkSczzF+CncGTuXyoJbAWyK
otcgERWljgZ4scxjfu1uacfoVhKQ7nOu7hiJokL0U80FESAennLC3ZlocvB9h/ff
HNA=
=n2rk
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- don't over-decrypt memory (Robin Murphy)
- takes min align mask into account for the swiotlb max mapping size
(Tianyu Lan)
- use GFP_ATOMIC in dma-debug (Mikulas Patocka)
- fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
- don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
- cleanup swiotlb initialization and share more code with swiotlb-xen
(me, Stefano Stabellini)
* tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits)
dma-direct: don't over-decrypt memory
swiotlb: max mapping size takes min align mask into account
swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
swiotlb: use the right nslabs value in swiotlb_init_remap
swiotlb: don't panic when the swiotlb buffer can't be allocated
dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
x86: remove cruft from <asm/dma-mapping.h>
swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl
swiotlb: merge swiotlb-xen initialization into swiotlb
swiotlb: provide swiotlb_init variants that remap the buffer
swiotlb: pass a gfp_mask argument to swiotlb_init_late
swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction
swiotlb: make the swiotlb_init interface more useful
x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
x86: remove the IOMMU table infrastructure
MIPS/octeon: use swiotlb_init instead of open coding it
arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region
swiotlb: rename swiotlb_late_init_with_default_size
...
Patch series "Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating", v4.
presently, migrating a hugetlb page or unmapping a poisoned hugetlb page,
we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue.
Note: Mike pointed out the huge_ptep_get() will only return the one
specific value, and it would not take into account the dirty or young bits
of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This
inconsistent issue is not introduced by this patch set, and this issue
will be addressed in another thread [2]. Meanwhile the uffd for hugetlb
case [3] pointed out by Gerald also needs another patch to address.
[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
[2] https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/
This patch (of 3):
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table
when unmapping or migrating a hugetlb page, and will change to use
huge_ptep_clear_flush() instead in the following patches.
So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.
[baolin.wang@linux.alibaba.com: fix build in several more architectures]
Link: https://lkml.kernel.org/r/0009a4cd-2826-e8be-e671-f050d4f18d5d@linux.alibaba.com
[sfr@canb.auug.org.au: fixup]
Link: https://lkml.kernel.org/r/20220511181531.7f27a5c1@canb.auug.org.au
Link: https://lkml.kernel.org/r/cover.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20f77ddab90baa249bd24504c413189b82acde69.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/cover.1652147571.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/dcf065868cce35bceaf138613ad27f17bb7c0c19.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Itanium defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e900 ("[IA64] Synchronize RBS on PTRACE_ATTACH").
Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED. In all other cases arch_ptrace_stop
takes care of the register saving.
In commit d79fdd6d96 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.
This makes ptrace_attach_sync_user_rbs completely unnecessary. So just
remove it.
I read through the code to verify that ptrace_attach_sync_user_rbs is
unnecessary. What I found is that the code is quite dead.
Reading ptrace_attach_sync_user_rbs it is easy to see that the it does
nothing unless __state == TASK_STOPPED.
Calling arch_ptrace_attach (aka ptrace_attach_sync_user_rbs) after
ptrace_traceme it is easy to see that because we are talking about the
current process the value of __state is TASK_RUNNING. Which means
ptrace_attach_sync_user_rbs does nothing.
The only other call of arch_ptrace_attach (aka
ptrace_attach_sync_user_rbs) is after ptrace_attach.
If the task is running (and PTRACE_SEIZE is not specified), a SIGSTOP
is sent which results in do_signal_stop setting JOBCTL_TRAP_STOP on
the target task (as it is ptraced) and the target task stopping
in ptrace_stop with __state == TASK_TRACED.
If the task was already stopped then ptrace_attach sets
JOBCTL_TRAPPING and JOBCTL_TRAP_STOP, wakes it out of __TASK_STOPPED,
and waits until the JOBCTL_TRAPPING_BIT is clear. At which point
the task stops in ptrace_stop.
In both cases there are a couple of funning excpetions such as if the
traced task receiveds a SIGCONT, or is set a fatal signal.
However in all of those cases the tracee never stops in __state
TASK_STOPPED. Which is a long way of saying that ptrace_attach_sync_user_rbs
is guaranteed never to do anything.
Cc: linux-ia64@vger.kernel.org
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The IOMMU table tries to separate the different IOMMUs into different
backends, but actually requires various cross calls.
Rewrite the code to do the generic swiotlb/swiotlb-xen setup directly
in pci-dma.c and then just call into the IOMMU drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull vfs updates from Al Viro:
"Assorted bits and pieces"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: drop needless assignment in aio_read()
clean overflow checks in count_mounts() a bit
seq_file: fix NULL pointer arithmetic warning
uml/x86: use x86 load_unaligned_zeropad()
asm/user.h: killed unused macros
constify struct path argument of finish_automount()/do_add_mount()
fs: Remove FIXME comment in generic_write_checks()
Here are the big set of tty and serial driver changes for 5.18-rc1.
Nothing major, some more good cleanups from Jiri and 2 new serial
drivers. Highlights include:
- termbits cleanups
- export symbol cleanups and other core cleanups from Jiri Slaby
- new sunplus and mvebu uart drivers (amazing that people are
still creating new uarts...)
- samsung serial driver cleanups
- ldisc 29 is now "reserved" for experimental/development line
disciplines
- lots of other tiny fixes and cleanups to serial drivers and
bindings
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkGznQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymnFwCgwGD/syV+BH2krgY6cRixZz72vPsAn2RSnicd
2YUwSNCHoL+B7hvQMtDG
=A3X9
-----END PGP SIGNATURE-----
Merge tag 'tty-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here are the big set of tty and serial driver changes for 5.18-rc1.
Nothing major, some more good cleanups from Jiri and 2 new serial
drivers. Highlights include:
- termbits cleanups
- export symbol cleanups and other core cleanups from Jiri Slaby
- new sunplus and mvebu uart drivers (amazing that people are still
creating new uarts...)
- samsung serial driver cleanups
- ldisc 29 is now "reserved" for experimental/development line
disciplines
- lots of other tiny fixes and cleanups to serial drivers and
bindings
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (104 commits)
vt_ioctl: fix potential spectre v1 in VT_DISALLOCATE
serial: 8250: fix XOFF/XON sending when DMA is used
tty: serial: samsung: Add ARTPEC-8 support
dt-bindings: serial: samsung: Add ARTPEC-8 UART
serial: sc16is7xx: Clear RS485 bits in the shutdown
tty: serial: samsung: simplify getting OF match data
tty: serial: samsung: constify variables and pointers
tty: serial: samsung: constify s3c24xx_serial_drv_data members
tty: serial: samsung: constify UART name
tty: serial: samsung: constify s3c24xx_serial_drv_data
tty: serial: samsung: reduce number of casts
tty: serial: samsung: embed s3c2410_uartcfg in parent structure
tty: serial: samsung: embed s3c24xx_uart_info in parent structure
serial: 8250_tegra: mark acpi_device_id as unused with !ACPI
tty: serial: bcm63xx: use more precise Kconfig symbol
serial: SERIAL_SUNPLUS should depend on ARCH_SUNPLUS
tty: serial: jsm: fix two assignments in if conditions
tty: serial: jsm: remove redundant assignments to variable linestatus
serial: 8250_mtk: make two read-only arrays static const
serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
...
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some toolchains. This allows
powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to: Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh Kumar K.V, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chen
Jingwen, Christophe JAILLET, Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel
Henrique Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu Hua, Haren
Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason Wang, Jeremy Kerr, Joachim
Wiberg, Jordan Niethe, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan
Srinivasan, Mamatha Inamdar, Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal
Suchanek, Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nour-eddine
Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy Dunlap, Ritesh Harjani, Rohan
McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain,
Thierry Reding, Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean, Wedson
Almeida Filho, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmI9TtQTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLp2D/0dwoliEJubRCfoawYUGhxTRZuo6ZYw
EQzprOiFA/MtrZyPfbrX/FwxeeetzQWysaw2r5JAuQwx5Jb7Od9dNIrVmueFEktC
hD4fkO8YT+QuOD3Xhp/rDQTImdw4fkeofIjnWIqEAtz0XGInmiRQKOnojVe/Po7f
72Yi1u80LxYBAnkN/Hhpmi/BsVmu0Nh3wELu+JZopQXjINj4RyD49ayCBSLbmiNc
uo7oYzJ0/WsZHNTpX9kAzzCq+XmI3dKZPyf2AOCvoRxJTmUPCRZF9QCwsmQFikiI
vZOdz4fI5e+C0aYJj8ODmWMrXiS+JUQdEShjGg9t9K6EN8idC8joKWpAuXjTA9KN
kRjzXX7AvjxaMEGbLe8gjU0PmEjY3eSzMOy15Oc/C0DRRswXRzrXdx2AF+/J6bQb
MWMM4aCKfcYs5/TENkEnV0xpbOCOy4ikHM1KZbxvVrShvjSlNIL9XTOnl/pNK5BJ
XSSI2mfnjKkbI1+l0KQ4NBXIRTo6HLpu5jwY3Xh97Tq7kaEfqDbO5p2P2HoOCiLa
ZjdzmpP99zM6wnqUSj+lyvjob7btyhoq6TKmPtxfKbR6OaSfRJ760BCJ5y15Y9Hc
rHey4Y/NL7LqsVYFZxi4/T6Ncq1hNeYr2Fiis4gH+/1zjr6Cd4othnvw3Slaxhst
AaHpN3pyx1QI6g==
=8r2c
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Livepatch support for 32-bit is probably the standout new feature,
otherwise mostly just lots of bits and pieces all over the board.
There's a series of commits cleaning up function descriptor handling,
which touches a few other arches as well as LKDTM. It has acks from
Arnd, Kees and Helge.
Summary:
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some
toolchains. This allows powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional
memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh
Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar
Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET,
Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique
Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu
Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason
Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar,
Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek,
Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy
Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant,
Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding,
Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean,
Wedson Almeida Filho, and YueHaibing"
* tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc/pseries: Fix use after free in remove_phb_dynamic()
powerpc/time: improve decrementer clockevent processing
powerpc/time: Fix KVM host re-arming a timer beyond decrementer range
powerpc/tm: Fix more userspace r13 corruption
powerpc/xive: fix return value of __setup handler
powerpc/64: Add UADDR64 relocation support
powerpc: 8xx: fix a return value error in mpc8xx_pic_init
powerpc/ps3: remove unneeded semicolons
powerpc/64: Force inlining of prevent_user_access() and set_kuap()
powerpc/bitops: Force inlining of fls()
powerpc: declare unmodified attribute_group usages const
powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n
powerpc/secvar: fix refcount leak in format_show()
powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
powerpc: Move C prototypes out of asm-prototypes.h
powerpc/kexec: Declare kexec_paca static
powerpc/smp: Declare current_set static
powerpc: Cleanup asm-prototypes.c
powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S
powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S
...
Hi Linus,
Please, pull the following treewide patch that replaces zero-length arrays with
flexible-array members. This patch has been baking in linux-next for a
whole development cycle.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG
2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw
GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY
dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy
T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7
t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh
yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ
rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r
uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i
EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S
bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO
NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k=
=xZD2
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array transformations from Gustavo Silva:
"Treewide patch that replaces zero-length arrays with flexible-array
members.
This has been baking in linux-next for a whole development cycle"
* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: Replace zero-length arrays with flexible-array members
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
- Rewrite how munlock works to massively reduce the contention
on i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
=n9Ad
-----END PGP SIGNATURE-----
Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
We need to use this function in common code, so define it for
architectures and/or configrations that miss it. The result of
pmd_pfn() will only be used if TRANSPARENT_HUGEPAGE is enabled,
but a function or macro called pmd_pfn() must be defined, even
on machines with two level page tables.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Christoph Hellwig and a few others spent a huge effort on removing
set_fs() from most of the important architectures, but about half the
other architectures were never completed even though most of them don't
actually use set_fs() at all.
I did a patch for microblaze at some point, which turned out to be fairly
generic, and now ported it to most other architectures, using new generic
implementations of access_ok() and __{get,put}_kernel_nocheck().
Three architectures (sparc64, ia64, and sh) needed some extra work,
which I also completed.
* 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
uaccess: fix integer overflow on access_ok()
ia64 only uses set_fs() in one file to handle unaligned access for
both user space and kernel instructions. Rewrite this to explicitly
pass around a flag about which one it is and drop the feature from
the architecture.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.
Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.
For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.
Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.
Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Provide a generic alloc_thread_stack_node() for IA64 and
CONFIG_ARCH_THREAD_STACK_ALLOCATOR which returns stack pointer and sets
task_struct::stack so it behaves exactly like the other implementations.
Rename IA64's alloc_thread_stack_node() and add the generic version to the
fork code so it is in one place _and_ to drastically lower the chances of
fat fingering the IA64 code. Do the same for free_thread_stack().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220217102406.3697941-4-bigeasy@linutronix.de
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].
This code was transformed with the help of Coccinelle:
(next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)
@@
identifier S, member, array;
type T1, T2;
@@
struct S {
...
T1 member;
T2 array[
- 0
];
};
UAPI and wireless changes were intentionally excluded from this patch
and will be sent out separately.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/78
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
linux/signal.h and asm/signal.h are currently excluded from the UAPI
compile-test because of the errors like follows:
HDRTEST usr/include/asm/signal.h
In file included from <command-line>:
./usr/include/asm/signal.h:103:9: error: unknown type name ‘size_t’
103 | size_t ss_size;
| ^~~~~~
The errors can be fixed by replacing size_t with __kernel_size_t.
Then, remove the no-header-test entries from user/include/Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
dereference_function_descriptor() and
dereference_kernel_function_descriptor() are identical on the
three architectures implementing them.
Make them common and put them out-of-line in kernel/extable.c
which is one of the users and has similar type of functions.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/449db09b2eba57f4ab05f80102a67d8675bc8bcd.1644928018.git.christophe.leroy@csgroup.eu
We have three architectures using function descriptors, each with its
own type and name.
Add a common typedef that can be used in generic code.
Also add a stub typedef for architecture without function descriptors,
to avoid a forest of #ifdefs.
It replaces the similar 'func_desc_t' previously defined in
arch/powerpc/kernel/module_64.c
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f1f91b142b3c1082bdc1586ce71c9bac1e75213c.1644928018.git.christophe.leroy@csgroup.eu
Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option
named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of
'dereference_function_descriptor' macro to know whether an
arch has function descriptors.
To limit churn in one of the following patches, use
an #ifdef/#else construct with empty first part
instead of an #ifndef in asm-generic/sections.h
On powerpc, make sure the config option matches the ABI used
by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2
when calling 'sparse' so that sparse sees the same piece of
code as GCC.
And include a helper to check whether an arch has function
descriptors or not : have_function_descriptors()
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a0f11fb0ea74a3197bc44dd7ba25e53a24fd03d.1644928018.git.christophe.leroy@csgroup.eu
There are three architectures with function descriptors, try to
have common names for the address they contain in order to
refactor some functions into generic functions later.
powerpc has 'entry'
ia64 has 'ip'
parisc has 'addr'
Vote for 'addr' and update 'struct fdesc' accordingly.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65b73ac614e4c002c5819d40b42f6f426d2ee52b.1644928018.git.christophe.leroy@csgroup.eu
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.
So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some of them used to be used by libbfd for a.out coredump handling.
Seeing that
* libbfd has their copies anyway
* we don't export them into userland headers
* we don't support a.out coredumps anymore
let's bury the definitions. They never had in-kernel
users anyway...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
=RKW4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
The printk header file includes ratelimit_types.h for its __ratelimit()
based usage. It is required for the static initializer used in
printk_ratelimited(). It uses a raw_spinlock_t and includes the
spinlock_types.h.
PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so
its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and
atomic.h which leads to recursive includes where defines are missing.
By including only the raw_spinlock_t defines it avoids the atomic.h
related includes at this stage.
An example on powerpc:
| CALL scripts/atomic/check-atomics.sh
|In file included from include/linux/bug.h:5,
| from include/linux/page-flags.h:10,
| from kernel/bounds.c:10:
|arch/powerpc/include/asm/page_32.h: In function âclear_pageâ:
|arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
| 87 | __WARN(); \
| | ^~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro âWARN_ONâ=99
| 48 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
| | ^~~~~~~
|arch/powerpc/include/asm/bug.h:58:17: error: invalid application of âsizeofâ=99 to incomplete type âstruct bug_entryâ=99
| 58 | "i" (sizeof(struct bug_entry)), \
| | ^~~~~~
|arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro âBUG_ENTRYâ=99
| 89 | BUG_ENTRY(PPC_TLNEI " %4, 0", \
| | ^~~~~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro âWARN_ONâ=99
| 48 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
| | ^~~~~~~
|In file included from arch/powerpc/include/asm/ptrace.h:298,
| from arch/powerpc/include/asm/hw_irq.h:12,
| from arch/powerpc/include/asm/irqflags.h:12,
| from include/linux/irqflags.h:16,
| from include/asm-generic/cmpxchg-local.h:6,
| from arch/powerpc/include/asm/cmpxchg.h:526,
| from arch/powerpc/include/asm/atomic.h:11,
| from include/linux/atomic.h:7,
| from include/linux/rwbase_rt.h:6,
| from include/linux/rwlock_types.h:55,
| from include/linux/spinlock_types.h:74,
| from include/linux/ratelimit_types.h:7,
| from include/linux/printk.h:10,
| from include/asm-generic/bug.h:22,
| from arch/powerpc/include/asm/bug.h:109,
| from include/linux/bug.h:5,
| from include/linux/page-flags.h:10,
| from kernel/bounds.c:10:
|include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99:
|include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
| 210 | WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
| | ^~~~
The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN
(from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON()
statements.
On POWERPC64 there are missing atomic64 defines while building 32bit
VDSO:
| VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o
|In file included from include/linux/atomic.h:80,
| from include/linux/rwbase_rt.h:6,
| from include/linux/rwlock_types.h:55,
| from include/linux/spinlock_types.h:74,
| from include/linux/ratelimit_types.h:7,
| from include/linux/printk.h:10,
| from include/linux/kernel.h:19,
| from arch/powerpc/include/asm/page.h:11,
| from arch/powerpc/include/asm/vdso/gettimeofday.h:5,
| from include/vdso/datapage.h:137,
| from lib/vdso/gettimeofday.c:5,
| from <command-line>:
|include/linux/atomic-arch-fallback.h: In function âarch_atomic64_incâ=99:
|include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function âarch_atomic64_addâ; did you mean âarch_atomic_addâ? [-Werror=3Dimpl
|icit-function-declaration]
| 1447 | arch_atomic64_add(1, v);
| | ^~~~~~~~~~~~~~~~~
| | arch_atomic_add
The generic fallback is not included, atomics itself are not used. If
kernel.h does not include printk.h then it comes later from the bug.h
include.
Allow asm/spinlock_types.h to be included from
linux/spinlock_types_raw.h.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
This is a single cleanup from Peter Collingbourne, removing
some dead code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGKm+kACgkQmmx57+YA
GNkksg/7BUxJrWFrQmLA3fhzh4wG3KswdrKTGQMf0jRVI1n77vmdfig3hEJmMekH
0SZoIYmPztOSj34+6p4NxuqY/Sk62oYLr8Awo8ZLDhIBWNUJE8UWC07Qb/3DpGbp
JfD7/8mXu10htgM85aGlhaVLnHvvqUBR8PlkJUGNxuY5Gy2L+eCkpwCAYlZpSOqr
vs0BJWFY1LzO1POcjIq4/IM0PvcU2ncLB5XoxDjjIIWnWKyHzWY21ZvoeaNumvou
xqQ/Hj8Sc+ufS0yNlSgIC+bJP0bp1bSw/dALKr8oYxLt7X9LELVY3WXyRH+It4nS
b6HPYmga26NVq/u7RrylBA+2fRCDB8E6z73gHt4SeHrRDEvFjhNzIyV+aXPae/dY
XI/pjiwpG6k6FpSnF69YZJ/Y+GmUA90V/Jq8aLFZhGz8SgpjRl+2foEUSDhRVXCA
jGB1Y388m0e6jPlVJROB7ORzXMd8K5iciyUGqtAI87QCOtPozn10ruh5RdziguKm
kDW5IKy9E4l1ch8WRprVbgV5Ew+QWKS1JIbyjDaX3jN0lUPCqgwkwvRxxtgdFnVA
Lq5BiUdraSWFUr84rLU3gCRU0+VoEdyZYI+bQGGNlQ4ovmLYU1nLmCU/azMvSEgz
ZsoM5YdffShxUtfwMg+W67RpYOjSnV55ZzwN0+d1C2oqz9xECXc=
=8EFP
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanup from Arnd Bergmann:
"This is a single cleanup from Peter Collingbourne, removing some dead
code"
* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: remove unused function syscall_set_arguments()
Pull per signal_struct coredumps from Eric Biederman:
"Current coredumps are mixed up with the exit code, the signal handling
code, and the ptrace code making coredumps much more complicated than
necessary and difficult to follow.
This series of changes starts with ptrace_stop and cleans it up,
making it easier to follow what is happening in ptrace_stop. Then
cleans up the exec interactions with coredumps. Then cleans up the
coredump interactions with exit. Finally the coredump interactions
with the signal handling code is cleaned up.
The first and last changes are bug fixes for minor bugs.
I believe the fact that vfork followed by execve can kill the process
the called vfork if exec fails is sufficient justification to change
the userspace visible behavior.
In previous discussions some of these changes were organized
differently and individually appeared to make the code base worse. As
currently written I believe they all stand on their own as cleanups
and bug fixes.
Which means that even if the worst should happen and the last change
needs to be reverted for some unimaginable reason, the code base will
still be improved.
If the worst does not happen there are a more cleanups that can be
made. Signals that generate coredumps can easily become eligible for
short circuit delivery in complete_signal. The entire rendezvous for
generating a coredump can move into get_signal. The function
force_sig_info_to_task be written in a way that does not modify the
signal handling state of the target task (because coredumps are
eligible for short circuit delivery). Many of these future cleanups
can be done another way but nothing so cleanly as if coredumps become
per signal_struct"
* 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
coredump: Limit coredumps to a single thread group
coredump: Don't perform any cleanups before dumping core
exit: Factor coredump_exit_mm out of exit_mm
exec: Check for a pending fatal signal instead of core_state
ptrace: Remove the unnecessary arguments from arch_ptrace_stop
signal: Remove the bogus sigkill_pending in ptrace_stop
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only denying
others). There's been pressure to allow non root to tracefs in a
controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function tracer
instead of having its own trampoline (this change will happen on an arch
by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform calculations
against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent warnings
from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over if
branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
=Loz8
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- kprobes: Restructured stack unwinder to show properly on x86 when a
stack dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only
denying others). There's been pressure to allow non root to tracefs
in a controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function
tracer instead of having its own trampoline (this change will happen
on an arch by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform
calculations against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent
warnings from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over
if branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
tracing/histogram: Fix semicolon.cocci warnings
tracing/histogram: Fix documentation inline emphasis warning
tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
tracing: Show size of requested perf buffer
bootconfig: Initialize ret in xbc_parse_tree()
ftrace: do CPU checking after preemption disabled
ftrace: disable preemption when recursion locked
tracing/histogram: Document expression arithmetic and constants
tracing/histogram: Optimize division by a power of 2
tracing/histogram: Covert expr to const if both operands are constants
tracing/histogram: Simplify handling of .sym-offset in expressions
tracing: Fix operator precedence for hist triggers expression
tracing: Add division and multiplication support for hist triggers
tracing: Add support for creating hist trigger variables from literal
selftests/ftrace: Stop tracing while reading the trace file by default
MAINTAINERS: Update KPROBES and TRACING entries
test_kprobes: Move it from kernel/ to lib/
docs, kprobes: Remove invalid URL and add new reference
samples/kretprobes: Fix return value if register_kretprobe() failed
lib/bootconfig: Fix the xbc_get_info kerneldoc
...
- Revert the printk format based wchan() symbol resolution as it can leak
the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset and
__sched_setscheduler() introduced a new lock dependency which is now
triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
/1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
d2qWG7UvoseT+g==
=fgtS
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- Revert the printk format based wchan() symbol resolution as it can
leak the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset
and __sched_setscheduler() introduced a new lock dependency which is
now triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
sched/fair: Cleanup newidle_balance
sched/fair: Remove sysctl_sched_migration_cost condition
sched/fair: Wait before decaying max_newidle_lb_cost
sched/fair: Skip update_blocked_averages if we are defering load balance
sched/fair: Account update_blocked_averages in newidle_balance cost
x86: Fix __get_wchan() for !STACKTRACE
sched,x86: Fix L2 cache mask
sched/core: Remove rq_relock()
sched: Improve wake_up_all_idle_cpus() take #2
irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
sched: Add cluster scheduler level for x86
sched: Add cluster scheduler level in core and related Kconfig for ARM64
topology: Represent clusters of CPUs within a die
sched: Disable -Wunused-but-set-variable
sched: Add wrapper for get_wchan() to keep task blocked
x86: Fix get_wchan() to support the ORC unwinder
proc: Use task_is_running() for wchan in /proc/$pid/stat
...
parisc, ia64 and powerpc32 are the only remaining architectures that
provide custom arch_{spin,read,write}_lock_flags() functions, which are
meant to re-enable interrupts while waiting for a spinlock.
However, none of these can actually run into this codepath, because
it is only called on architectures without CONFIG_GENERIC_LOCKBREAK,
or when CONFIG_DEBUG_LOCK_ALLOC is set without CONFIG_LOCKDEP, and none
of those combinations are possible on the three architectures.
Going back in the git history, it appears that arch/mn10300 may have
been able to run into this code path, but there is a good chance that
it never worked. On the architectures that still exist, it was
already impossible to hit back in 2008 after the introduction of
CONFIG_GENERIC_LOCKBREAK, and possibly earlier.
As this is all dead code, just remove it and the helper functions built
around it. For arch/ia64, the inline asm could be cleaned up, but
it seems safer to leave it untouched.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Link: https://lore.kernel.org/r/20211022120058.1031690-1-arnd@kernel.org
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org