Merge updates from Andrew Morton:
"Incoming:
- a small number of updates to scripts/, ocfs2 and fs/buffer.c
- most of MM
I still have quite a lot of material (mostly not MM) staged after
linux-next due to -next dependencies. I'll send those across next week
as the preprequisites get merged up"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
mm/page_io.c: annotate refault stalls from swap_readpage
mm/Kconfig: fix trivial help text punctuation
mm/Kconfig: fix indentation
mm/memory_hotplug.c: remove __online_page_set_limits()
mm: fix typos in comments when calling __SetPageUptodate()
mm: fix struct member name in function comments
mm/shmem.c: cast the type of unmap_start to u64
mm: shmem: use proper gfp flags for shmem_writepage()
mm/shmem.c: make array 'values' static const, makes object smaller
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
userfaultfd: wrap the common dst_vma check into an inlined function
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: use vma_pagesize for all huge page size calculation
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
mm/madvise.c: replace with page_size() in madvise_inject_error()
mm/mmap.c: make vma_merge() comment more easy to understand
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
autonuma: reduce cache footprint when scanning page tables
autonuma: fix watermark checking in migrate_balanced_pgdat()
...
A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault(). That will lead to pmd_alloc() returning an
invalid pmd pointer.
Fix this by adding a pud_trans_unstable() function similar to
pmd_trans_unstable() and check whether the pud is really stable before
using the pmd pointer.
Race:
Thread 1: Thread 2: Comment
create_huge_pud() Fallback - not taken.
create_huge_pud() Taken.
pmd_alloc() Returns an invalid pointer.
This will result in user-visible huge page data corruption.
Note that this was caught during a code audit rather than a real
experienced problem. It looks to me like the only implementation that
currently creates huge pud pagetable entries is dev_dax_huge_fault()
which doesn't appear to care much about private (COW) mappings or
write-tracking which is, I believe, a prerequisite for create_huge_pud()
falling back on thread 1, but not in thread 2.
Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org
Fixes: a00cc7d9dd ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic/pgtable.h include file appears to be the correct place for
the backup x_devmap() inline functions. Moving them here is also
necessary if we want to include x_devmap() in the [pmd|pud]_unstable
functions. So move the x_devmap() functions to asm-generic/pgtable.h
Link: http://lkml.kernel.org/r/20191115115808.21181-1-thomas_os@shipmail.org
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction.
| bloat-o-meter2 vmlinux-D-elide-p4d_free_tlb vmlinux-E-elide-p?d_clear_bad
| add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-40 (-40)
| function old new delta
| pud_clear_bad 20 - -20
| p4d_clear_bad 20 - -20
| Total: Before=4136930, After=4136890, chg -1.000000%
Link: http://lkml.kernel.org/r/20191016162400.14796-6-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction.
| bloat-o-meter2 vmlinux-E-elide-p?d_clear_bad vmlinux-F-elide-pmd_free_tlb
| add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-112 (-112)
| function old new delta
| free_pgd_range 422 310 -112
| Total: Before=4137042, After=4136930, chg -1.000000%
Note that pmd folding can be tricky: In 2-level setup (where pmd is
conceptually folded) most pmd routines are valid and refer to upper levels.
In this patch we can, but see next patch for example where we can't
Link: http://lkml.kernel.org/r/20191016162400.14796-5-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... independent of __ARCH_HAS_5LEVEL_HACK
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction
| bloat-o-meter2 vmlinux-C-elide-pud_free_tlb vmlinux-D-elide-p4d_free_tlb
| add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-104 (-104)
| function old new delta
| free_pgd_range 552 422 -130
| Total: Before=4137172, After=4137042, chg -1.000000%
Link: http://lkml.kernel.org/r/20191016162400.14796-4-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... independent of __ARCH_HAS_4LEVEL_HACK
This came up when removing __ARCH_HAS_5LEVEL_HACK for ARC as code bloat.
With this patch we see the following code reduction
| bloat-o-meter2 vmlinux-B-elide-ARCH_USE_5LEVEL_HACK vmlinux-C-elide-pud_free_tlb
| add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-104 (-104)
| function old new delta
| free_pgd_range 656 552 -104
| Total: Before=4137276, After=4137172, chg -1.000000%
Note: The primary change is alternate defintion for pud_free_tlb() but
while there also removed empty stubs for __pud_free_tlb, which is anyhow
called only from pud_free_tlb()
Link: http://lkml.kernel.org/r/20191016162400.14796-3-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Hibernation support (Dexuan Cui).
- Latency testing framework (Branden Bonaby).
- Decoupling Hyper-V page size from guest page size (Himadri Pandya).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl3f5YIACgkQ3qZv95d3
LNzBww/8Cpv/BnOs2cp56OhC+2++3YlWfmxGnvQb9h52weElgr1AZF33lAynp8BZ
YssOcDnS/G2iAkNDffbQA7s3WTwIjP1weJibOeKbtcXp4SuhNR3gnJafufNddNDv
bw8ZReLQV7hy3sHb3OUx0aJk5Mssp0N9ZpxRilyIpLELPfVp63gFebq6s1MQYljk
BAiNO4SKqsGQGZApt2F4Cc3hX2wU2ZfiDm6SifXiLYITGnvilIn7XFIht+2jJBWS
CdzRoGXcwhQhlj68XWlc89SOzJb7vVUMO1sr84psfbQ2LbhJU8lfJKRJ4b4lR07Z
Uv5FYxjr14S65fv7DkzCfWU+uPN/sObG4pPXihlfqcTraOvYLQ6/x8cw+9tGZg4H
aTtnF40hnO81aKsvPAeIsSzVkoyPaSrt7KKhk+Bw/5EUDTTNp6EbIuL4xwnKt6Rt
2UpA5HM9guQqNb6OZrjlpZfJgd9bNP4CZLBTfOukmnZpONKr2Wv3wubcwQJ8ibQc
1WZ5SfN2Wmg999Ski7j9qzHk0tWJxa6SX+2NLEHRKxy2nJSJ1zlAr//bznMyMgH/
yKPDaSkOFoy0aqiTKV2WzuOY6FGXTrSo5vq8YAgYRgp3xB+5+7zLeqlj3ipXhLYE
HH/eqB27eSnvi0jpub4TbszGJG0o4Z1aYx3aHYYqrOfWX/A5Vls=
=oJGE
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Sasha Levin:
- support for new VMBus protocols (Andrea Parri)
- hibernation support (Dexuan Cui)
- latency testing framework (Branden Bonaby)
- decoupling Hyper-V page size from guest page size (Himadri Pandya)
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
drivers/hv: Replace binary semaphore with mutex
drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
HID: hyperv: Add the support of hibernation
hv_balloon: Add the support of hibernation
x86/hyperv: Implement hv_is_hibernation_supported()
Drivers: hv: balloon: Remove dependencies on guest page size
Drivers: hv: vmbus: Remove dependencies on guest page size
x86: hv: Add function to allocate zeroed page for Hyper-V
Drivers: hv: util: Specify ring buffer size using Hyper-V page size
Drivers: hv: Specify receive buffer size using Hyper-V page size
tools: hv: add vmbus testing tool
drivers: hv: vmbus: Introduce latency testing
video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
hv_netvsc: Add the support of hibernation
hv_sock: Add the support of hibernation
video: hyperv_fb: Add the support of hibernation
scsi: storvsc: Add the support of hibernation
Drivers: hv: vmbus: Add module parameter to cap the VMBus version
...
Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines. The
firmware support is still in development, so the code here won't actually
activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict it to
read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop
into xmon and modify kernel data, such as the lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work
with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management) driver to
make it behave more like other balloon drivers and enable some cleanups of
generic mm code.
- A series of fixes to our hardware breakpoint support to properly handle
unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to:
Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser,
Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M.
Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand,
Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg
Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason
Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M.
Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi
Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel
Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3hBycTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgApBEACk91MEQDYJ9MF9I6uN+85qb5p4pMsp
rGzqnpt+XFidbDAc3eP63pYfIDSo3jtkQ2YL7shAnDOTvkO0md+Vqkl9Aq/G6FIf
lDBlwbgkXMSxS/O2Lpvfn4NZAoK6dKmiV55LSgfliM62X3e2Saeg6TR55wBTgJ6/
SlYPDwZfcVHOAiFS3UmfB+hkiIZk+AI5Zr5VAZvT2ZmeH36yAWkq4JgJI1uAk6m1
/7iCnlfUjx/nl/BhnA3kjjmAgGCJ5s/WuVgwFMz47XpMBWGBhLWpMh/NqDTFb8ca
kpiVQoVPQe2xyO3pL/kOwBy6sii26ftfHDhLKMy1hJdEhVQzS5LerPIMeh1qsU8Q
hV/Cj+jfsrS/vBDOehj3jwx93t+861PmTOqgLnpYQ6Ltrt+2B/74+fufGMHE1kI3
Ffo7xvNw4sw6bSziDxDFqUx2P1dFN5D5EJsJsYM98ekkVAAkzNqCDRvfD2QI8Pif
VXWPYXqtNJTrVPJA0D7Yfo9FDNwhANd0f1zi7r/U5mVXBFUyKOlGqTQSkXgMrVeK
3I7wHPOVGgdA5UUkfcd3pcuqsY081U9E//o5PUfj8ybO5JCwly8NoatbG+xHmKia
a72uJT8MjCo9mGCHKDrwi9l/kqms6ZSv8RP+yMhGuB52YoiGc6PpVyab5jXIUd1N
yTtBlC0YGW1JYw==
=JHzg
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Infrastructure for secure boot on some bare metal Power9 machines.
The firmware support is still in development, so the code here
won't actually activate secure boot on any existing systems.
- A change to xmon (our crash handler / pseudo-debugger) to restrict
it to read-only mode when the kernel is lockdown'ed, otherwise it's
trivial to drop into xmon and modify kernel data, such as the
lockdown state.
- Support for KASLR on 32-bit BookE machines (Freescale / NXP).
- Fixes for our flush_icache_range() and __kernel_sync_dicache()
(VDSO) to work with memory ranges >4GB.
- Some reworks of the pseries CMM (Cooperative Memory Management)
driver to make it behave more like other balloon drivers and enable
some cleanups of generic mm code.
- A series of fixes to our hardware breakpoint support to properly
handle unaligned watchpoint addresses.
Plus a bunch of other smaller improvements, fixes and cleanups.
Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"
* tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
powerpc/fixmap: fix crash with HIGHMEM
x86/efi: remove unused variables
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Don't add -mabi= flags when building with Clang
powerpc: Fix Kconfig indentation
powerpc/fixmap: don't clear fixmap area in paging_init()
selftests/powerpc: spectre_v2 test must be built 64-bit
powerpc/powernv: Disable native PCIe port management
powerpc/kexec: Move kexec files into a dedicated subdir.
powerpc/32: Split kexec low level code out of misc_32.S
powerpc/sysdev: drop simple gpio
powerpc/83xx: map IMMR with a BAT.
powerpc/32s: automatically allocate BAT in setbat()
powerpc/ioremap: warn on early use of ioremap()
powerpc: Add support for GENERIC_EARLY_IOREMAP
powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
powerpc/8xx: use the fixmapped IMMR in cpm_reset()
powerpc/8xx: add __init to cpm1 init functions
...
- clean up various obsolete ioremap and iounmap variants
- add a new generic ioremap implementation and switch csky, nds32 and
riscv over to it
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl3cKcsLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYO1CRAAwFQigsbi0CqqshPWnP0owKV+HA4Xfz/lQZsd7SM/
BVXhKyDJQum6gp73dW025HCfjidTknsbdCUIP/LNUgAnop3lOlnB31/munDnJJ1H
6hB1pc+zB9VgbOe0A6TxtxPRm5aE33k1hZIZS99lOh7mY3FvF7mbkkbVoCjdS3Cq
a9bTX+X+esfUQ5GgaIc2zmz2GLkyFXIeVGs8/CoOX58ESCWQcVZrsQRompo4SgrI
jqwf47NzdmK8hW4mZ+jdQUiWiAmNs5+2om7Bvi/deFAIFUo1/hLHvQzqEGramq/j
5SPHax2gWAN3uWYP91QISkUAJWFydwgmUDoTO1M04ov4xLuBrqIQmc43tLjHo2UT
RwMozWJWN+gkB9zTIboqMPi2qcuDaWcCij7LwHl5zLxPTcOKsrALarL55BQ8MipQ
x6fpvskrQQvlArNTsRWFRUq0mCtkzE3wMZ9RR3AIETQL2hlAzB1S4gzhD+Z6WTYY
pXNgkunonVGxwyN/7iJTEl/mvF/+MynGcWqhrwHZLqncyhn/WJJ2USH3nAD1+yjp
v8v6UUeMXIjUsGAyfTjXy/WXAfwRuSC038AAFcmWKDdh08h4XvPHRficT4U8wr34
7WzGizHP9f1CqrhYL/4exhPY9X2Yb7HhsFd0bZGG0rRvSillPUp0b8s++m12QuQU
+VY=
=ooiA
-----END PGP SIGNATURE-----
Merge tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap
Pull generic ioremap support from Christoph Hellwig:
"This adds the remaining bits for an entirely generic ioremap and
iounmap to lib/ioremap.c. To facilitate that, it cleans up the giant
mess of weird ioremap variants we had with no users outside the arch
code.
For now just the three newest ports use the code, but there is more
than a handful others that can be converted without too much work.
Summary:
- clean up various obsolete ioremap and iounmap variants
- add a new generic ioremap implementation and switch csky, nds32 and
riscv over to it"
* tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap: (21 commits)
nds32: use generic ioremap
csky: use generic ioremap
csky: remove ioremap_cache
riscv: use the generic ioremap code
lib: provide a simple generic ioremap implementation
sh: remove __iounmap
nios2: remove __iounmap
hexagon: remove __iounmap
m68k: rename __iounmap and mark it static
arch: rely on asm-generic/io.h for default ioremap_* definitions
asm-generic: don't provide ioremap for CONFIG_MMU
asm-generic: ioremap_uc should behave the same with and without MMU
xtensa: clean up ioremap
x86: Clean up ioremap()
parisc: remove __ioremap
nios2: remove __ioremap
alpha: remove the unused __ioremap wrapper
hexagon: clean up ioremap
ia64: rename ioremap_nocache to ioremap_uc
unicore32: remove ioremap_cached
...
- PERAMAENT flag to ftrace_ops when attaching a callback to a function
As /proc/sys/kernel/ftrace_enabled when set to zero will disable all
attached callbacks in ftrace, this has a detrimental impact on live
kernel tracing, as it disables all that it patched. If a ftrace_ops
is registered to ftrace with the PERMANENT flag set, it will prevent
ftrace_enabled from being disabled, and if ftrace_enabled is already
disabled, it will prevent a ftrace_ops with PREMANENT flag set from
being registered.
- New register_ftrace_direct(). As eBPF would like to register its own
trampolines to be called by the ftrace nop locations directly,
without going through the ftrace trampoline, this function has been
added. This allows for eBPF trampolines to live along side of
ftrace, perf, kprobe and live patching. It also utilizes the ftrace
enabled_functions file that keeps track of functions that have been
modified in the kernel, to allow for security auditing.
- Allow for kernel internal use of ftrace instances. Subsystems in
the kernel can now create and destroy their own tracing instances
which allows them to have their own tracing buffer, and be able
to record events without worrying about other users from writing over
their data.
- New seq_buf_hex_dump() that lets users use the hex_dump() in their
seq_buf usage.
- Notifications now added to tracing_max_latency to allow user space
to know when a new max latency is hit by one of the latency tracers.
- Wider spread use of generic compare operations for use of bsearch and
friends.
- More synthetic event fields may be defined (32 up from 16)
- Use of xarray for architectures with sparse system calls, for the
system call trace events.
This along with small clean ups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXdwv4BQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnB5AP91vsdHQjwE1+/UWG/cO+qFtKvn2QJK
QmBRIJNH/s+1TAD/fAOhgw+ojSK3o/qc+NpvPTEW9AEwcJL1wacJUn+XbQc=
=ztql
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New tracing features:
- New PERMANENT flag to ftrace_ops when attaching a callback to a
function.
As /proc/sys/kernel/ftrace_enabled when set to zero will disable
all attached callbacks in ftrace, this has a detrimental impact on
live kernel tracing, as it disables all that it patched. If a
ftrace_ops is registered to ftrace with the PERMANENT flag set, it
will prevent ftrace_enabled from being disabled, and if
ftrace_enabled is already disabled, it will prevent a ftrace_ops
with PREMANENT flag set from being registered.
- New register_ftrace_direct().
As eBPF would like to register its own trampolines to be called by
the ftrace nop locations directly, without going through the ftrace
trampoline, this function has been added. This allows for eBPF
trampolines to live along side of ftrace, perf, kprobe and live
patching. It also utilizes the ftrace enabled_functions file that
keeps track of functions that have been modified in the kernel, to
allow for security auditing.
- Allow for kernel internal use of ftrace instances.
Subsystems in the kernel can now create and destroy their own
tracing instances which allows them to have their own tracing
buffer, and be able to record events without worrying about other
users from writing over their data.
- New seq_buf_hex_dump() that lets users use the hex_dump() in their
seq_buf usage.
- Notifications now added to tracing_max_latency to allow user space
to know when a new max latency is hit by one of the latency
tracers.
- Wider spread use of generic compare operations for use of bsearch
and friends.
- More synthetic event fields may be defined (32 up from 16)
- Use of xarray for architectures with sparse system calls, for the
system call trace events.
This along with small clean ups and fixes"
* tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits)
tracing: Enable syscall optimization for MIPS
tracing: Use xarray for syscall trace events
tracing: Sample module to demonstrate kernel access to Ftrace instances.
tracing: Adding new functions for kernel access to Ftrace instances
tracing: Fix Kconfig indentation
ring-buffer: Fix typos in function ring_buffer_producer
ftrace: Use BIT() macro
ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured
ftrace: Rename ftrace_graph_stub to ftrace_stub_graph
ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization
ftrace: Add helper find_direct_entry() to consolidate code
ftrace: Add another check for match in register_ftrace_direct()
ftrace: Fix accounting bug with direct->count in register_ftrace_direct()
ftrace/selftests: Fix spelling mistake "wakeing" -> "waking"
tracing: Increase SYNTH_FIELDS_MAX for synthetic_events
ftrace/samples: Add a sample module that implements modify_ftrace_direct()
ftrace: Add modify_ftrace_direct()
tracing: Add missing "inline" in stub function of latency_fsnotify()
tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text
tracing: Use seq_buf_hex_dump() to dump buffers
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The patches
introduce an arch_faults_on_old_pte() macro, defined as false on x86.
When true, cow_user_page() makes the pte young before attempting
__copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a MAINTAINERS
update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with the
wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in the
IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale
macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl3YJswACgkQa9axLQDI
XvFwYg//aTGhNLew3ADgW2TYal7LyqetRROixPBrzqHLu2A8No1+QxHMaKxpZVyf
pt25tABuLtPHql3qBzE0ltmfbLVsPj/3hULo404EJb9HLRfUnVGn7gcPkc+p4YAr
IYkYPXJbk6OlJ84vI+4vXmDEF12bWCqamC9qZ+h99qTpMjFXFO17DSJ7xQ8Xic3A
HHgCh4uA7gpTVOhLxaS6KIw+AZNYwvQxLXch2+wj6agbGX79uw9BeMhqVXdkPq8B
RTDJpOdS970WOT4cHWOkmXwsqqGRqgsgyu+bRUJ0U72+0y6MX0qSHIUnVYGmNc5q
Dtox4rryYLvkv/hbpkvjgVhv98q3J1mXt/CalChWB5dG4YwhJKN2jMiYuoAvB3WS
6dR7Dfupgai9gq1uoKgBayS2O6iFLSa4g58vt3EqUBqmM7W7viGFPdLbuVio4ycn
CNF2xZ8MZR6Wrh1JfggO7Hc11EJdSqESYfHO6V/pYB4pdpnqJLDoriYHXU7RsZrc
HvnrIvQWKMwNbqBvpNbWvK5mpBMMX2pEienA3wOqKNH7MbepVsG+npOZTVTtl9tN
FL0ePb/mKJu/2+gW8ntiqYn7EzjKprRmknOiT2FjWWo0PxgJ8lumefuhGZZbaOWt
/aTAeD7qKd/UXLKGHF/9v3q4GEYUdCFOXP94szWVPyLv+D9h8L8=
=TPL9
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the arm64-specific bits (core arch and perf, new arm64
selftests), it touches the generic cow_user_page() (reviewed by
Kirill) together with a macro for x86 to preserve the existing
behaviour on this architecture.
Summary:
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The
patches introduce an arch_faults_on_old_pte() macro, defined as
false on x86. When true, cow_user_page() makes the pte young before
attempting __copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a
MAINTAINERS update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with
the wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in
the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
stale macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: Kconfig: add a choice for endianness
kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
arm64: kaslr: Check command line before looking for a seed
arm64: kaslr: Announce KASLR status on boot
kselftest: arm64: fake_sigreturn_misaligned_sp
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: add helper get_current_context
kselftest: arm64: extend test_init functionalities
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: extend toplevel skeleton Makefile
drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
...
The API will be used by the hv_balloon and hv_vmbus drivers.
Balloon up/down and hot-add of memory must not be active if the user
wants the Linux VM to support hibernation, because they are incompatible
with hibernation according to Hyper-V team, e.g. upon suspend the
balloon VSP doesn't save any info about the ballooned-out pages (if any);
so, after Linux resumes, Linux balloon VSC expects that the VSP will
return the pages if Linux is under memory pressure, but the VSP will
never do that, since the VSP thinks it never stole the pages from the VM.
So, if the user wants Linux VM to support hibernation, Linux must forbid
balloon up/down and hot-add, and the only functionality of the balloon VSC
driver is reporting the VM's memory pressure to the host.
Ideally, when Linux detects that the user wants it to support hibernation,
the balloon VSC should tell the VSP that it does not support ballooning
and hot-add. However, the current version of the VSP requires the VSC
should support these capabilities, otherwise the capability negotiation
fails and the VSC can not load at all, so with the later changes to the
VSC driver, Linux VM still reports to the VSP that the VSC supports these
capabilities, but the VSC ignores the VSP's requests of balloon up/down
and hot add, and reports an error to the VSP, when applicable. BTW, in
the future the balloon VSP driver will allow the VSC to not support the
capabilities of balloon up/down and hot add.
The ACPI S4 state is not a must for hibernation to work, because Linux is
able to hibernate as long as the system can shut down. However in practice
we decide to artificially use the presence of the virtual ACPI S4 state as
an indicator of the user's intent of using hibernation, because Linux VM
must find a way to know if the user wants to use the hibernation feature
or not.
By default, Hyper-V does not enable the virtual ACPI S4 state; on recent
Hyper-V hosts (e.g. RS5, 19H1), the administrator is able to enable the
state for a VM by WMI commands.
Once all the vmbus and VSC patches for the hibernation feature are
accepted, an extra patch will be submitted to forbid hibernation if the
virtual ACPI S4 state is absent, i.e. hv_is_hibernation_supported() is
false.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The ftrace_graph_stub was created and points to ftrace_stub as a way to
assign the functon graph tracer function pointer to a stub function with a
different prototype than what ftrace_stub has and not trigger the C
verifier. The ftrace_graph_stub was created via the linker script
vmlinux.lds.h. Unfortunately, powerpc already uses the name
ftrace_graph_stub for its internal implementation of the function graph
tracer, and even though powerpc would still build, the change via the linker
script broke function tracer on powerpc from working.
By using the name ftrace_stub_graph, which does not exist anywhere else in
the kernel, this should not be a problem.
Link: https://lore.kernel.org/r/1573849732.5937.136.camel@lca.pw
Fixes: b83b43ffc6 ("fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub")
Reorted-by: Qian Cai <cai@lca.pw>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The C compiler is allowing more checks to make sure that function pointers
are assigned to the correct prototype function. Unfortunately, the function
graph tracer uses a special name with its assigned ftrace_graph_return
function pointer that maps to a stub function used by the function tracer
(ftrace_stub). The ftrace_graph_return variable is compared to the
ftrace_stub in some archs to know if the function graph tracer is enabled or
not. This means we can not just simply create a new function stub that
compares it without modifying all the archs.
Instead, have the linker script create a function_graph_stub that maps to
ftrace_stub, and this way we can define the prototype for it to match the
prototype of ftrace_graph_return, and make the compiler checks all happy!
Link: http://lkml.kernel.org/r/20191015090055.789a0aed@gandalf.local.home
Cc: linux-sh@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
A lot of architectures reuse the same simple ioremap implementation, so
start lifting the most simple variant to lib/ioremap.c. It provides
ioremap_prot and iounmap, plus a default ioremap that uses prot_noncached,
although that can be overridden by asm/io.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
Various architectures that use asm-generic/io.h still defined their
own default versions of ioremap_nocache, ioremap_wt and ioremap_wc
that point back to plain ioremap directly or indirectly. Remove these
definitions and rely on asm-generic/io.h instead. For this to work
the backup ioremap_* defintions needs to be changed to purely cpp
macros instea of inlines to cover for architectures like openrisc
that only define ioremap after including <asm-generic/io.h>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
All MMU-enabled ports have a non-trivial ioremap and should thus provide
the prototype for their implementation instead of providing a generic
one unless a different symbol is not defined. Note that this only
affects sparc32 nds32 as all others do provide their own version.
Also update the kerneldoc comments in asm-generic/io.h to explain the
situation around the default ioremap* implementations correctly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Whatever reason there is for the existence of ioremap_uc, and the fact
that it returns NULL by default on architectures with an MMU applies
equally to nommu architectures, so don't provide different defaults.
In practice the difference is meaningless as the only portable driver
that uses ioremap_uc is atyfb which probably doesn't show up on nommu
devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
When using patchable-function-entry, the compiler will record the
callsites into a section named "__patchable_function_entries" rather
than "__mcount_loc". Let's abstract this difference behind a new
FTRACE_CALLSITE_SECTION, so that architectures don't have to handle this
explicitly (e.g. with custom module linker scripts).
As parisc currently handles this explicitly, it is fixed up accordingly,
with its custom linker script removed. Since FTRACE_CALLSITE_SECTION is
only defined when DYNAMIC_FTRACE is selected, the parisc module loading
code is updated to only use the definition in that case. When
DYNAMIC_FTRACE is not selected, modules shouldn't have this section, so
this removes some redundant work in that case.
To make sure that this is keep up-to-date for modules and the main
kernel, a comment is added to vmlinux.lds.h, with the existing ifdeffery
simplified for legibility.
I built parisc generic-{32,64}bit_defconfig with DYNAMIC_FTRACE enabled,
and verified that the section made it into the .ko files for modules.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: linux-parisc@vger.kernel.org
With the previous patch, we should now not be using need_flush_all for
powerpc. But then make sure we force a PID tlbie flush with RIC=2 if
we ever find need_flush_all set. Also don't reset it after a mmu
gather flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024075801.22434-3-aneesh.kumar@linux.ibm.com
The update of the VDSO data is depending on __arch_use_vsyscall() returning
True. This is a leftover from the attempt to map the features of various
architectures 1:1 into generic code.
The usage of __arch_use_vsyscall() in the actual vsyscall implementations
got dropped and replaced by the requirement for the architecture code to
return U64_MAX if the global clocksource is not usable in the VDSO.
But the __arch_use_vsyscall() check in the update code stayed which causes
the VDSO data to be stale or invalid when an architecture actually
implements that function and returns False when the current clocksource is
not usable in the VDSO.
As a consequence the VDSO implementations of clock_getres(), time(),
clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
information.
Remove the __arch_use_vsyscall() check from the VDSO update function and
update the VDSO data unconditionally.
[ tglx: Massaged changelog and removed the now useless implementations in
asm-generic/ARM64/MIPS ]
Fixes: 44f57d788e ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.
To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().
These changes were generated with the following shell script:
----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----
... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.
There should be no functional change as a result of this patch.
Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original clean up of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.
This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:
text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after
This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.
Link: http://lkml.kernel.org/r/201908200943.601DD59DCE@keescook
Fixes: 6b15f678fb ("include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for cleaning up "cut here" even more, this removes the
__WARN_*TAINT() helpers, as they limit the ability to add new BUGFLAG_*
flags to call sites. They are removed by expanding them into full
__WARN_FLAGS() calls.
Link: http://lkml.kernel.org/r/20190819234111.9019-6-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Clean up WARN() "cut here" handling", v2.
Christophe Leroy noticed that the fix for missing "cut here" in the WARN()
case was adding explicit printk() calls instead of teaching the exception
handler to add it. This refactors the bug/warn infrastructure to pass
this information as a new BUGFLAG.
Longer details repeated from the last patch in the series:
bug: move WARN_ON() "cut here" into exception handler
The original cleanup of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.
This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:
text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after
This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.
This patch (of 7):
There's no reason to have specialized helpers for passing the warn taint
down to __warn(). Consolidate and refactor helper macros, removing
__WARN_printf() and warn_slowpath_fmt_taint().
Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to 78ddc53473 ("thp: rename split_huge_page_pmd() to
split_huge_pmd()"), update related comment.
Link: http://lkml.kernel.org/r/20190731033406.185285-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
cache for page table allocations on several architectures that do not use
PAGE_SIZE tables for one or more levels of the page table hierarchy.
Most architectures do not implement these functions and use __weak default
NOP implementation of pgd_cache_init(). Since there is no such default
for pgtable_cache_init(), its empty stub is duplicated among most
architectures.
Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
drop empty stubs of pgtable_cache_init().
Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org> [arm64]
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: remove quicklist page table caches".
A while ago Nicholas proposed to remove quicklist page table caches [1].
I've rebased his patch on the curren upstream and switched ia64 and sh to
use generic versions of PTE allocation.
[1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com
This patch (of 3):
Remove page table allocator "quicklists". These have been around for a
long time, but have not got much traction in the last decade and are only
used on ia64 and sh architectures.
The numbers in the initial commit look interesting but probably don't
apply anymore. If anybody wants to resurrect this it's in the git
history, but it's unhelpful to have this code and divergent allocator
behaviour for minor archs.
Also it might be better to instead make more general improvements to page
allocator if this is still so slow.
Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Summary of modules changes for the 5.4 merge window:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing kernel
developers to better manage the export surface, allow subsystem
maintainers to explicitly state that usage of some exported symbols
should only be limited to certain users (think: inter-module or
inter-driver symbols, debugging symbols, etc), as well as more easily
limiting the availability of namespaced symbols to other parts of the
kernel. With the module import requirement, it is also easier to spot
the misuse of exported symbols during patch review. Two new macros are
introduced: EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL(). The API is
thoroughly documented in Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJdh3n8AAoJEMBFfjjOO8Fy94kP+QHZF39QDvLbxAzEYAETAS+o
CFu6wix/DrAwFkTU/kX1eAsAwDBEz0xkMciR4BsLX3sIafUVERxtDXVAui/dA1+6
zfw2c3ObyVwPEk6aUPFprgkj+08gxujsJFlYTsQQUhtRbmxg6R7hD6t6ANxiHaY2
AQe5TzOWXoIa2hHO+7rPMqf8l6qiFCaL0s3v5jrmBXa5mHmc4PVy95h1J6xQVw2u
b+SlvKeylHv+OtCtvthkAJS3hfS35J/1TNb/RNYIvh60IfEguEuFsGuQ9JiSSAZS
pv1cJ+I5d4v8Y/md1rZpdjTJL9gCrq/UUC67+UkejCOn0C+7XM2eR4Bu/jWvdMSn
ZQDHcPhFSIfmP7FaKomPogaBbw1sI1FvM5930pPJzHnyO9+cefBXe7rWaaB+y0At
GAxOtmk1dKh01BT7YO/C0oVuX87csWd74NHypVsbs0TgQo5jBFdZRheyDrq5YB+s
tVK+5H0nqQrCcfo/TvhcsZlgITTGtgTPenaW99/i7qNa9mRUtxC/VkE+aob6HNRF
1iBxxopOTxGN8akyKOVumtkuTQH3EJfouZee//pWbXLzyDmScg/k67vuao8kxbyq
NA1piFAGJAHFsHATxrbvNOq6jZ5bfUT8pwSTs83JppuR++8Hxk7zaShS3/LvsvHt
6ist/epOwTZ7oiNQ04nj
=72Uy
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"The main bulk of this pull request introduces a new exported symbol
namespaces feature. The number of exported symbols is increasingly
growing with each release (we're at about 31k exports as of 5.3-rc7)
and we currently have no way of visualizing how these symbols are
"clustered" or making sense of this huge export surface.
Namespacing exported symbols allows kernel developers to more
explicitly partition and categorize exported symbols, as well as more
easily limiting the availability of namespaced symbols to other parts
of the kernel. For starters, we have introduced the USB_STORAGE
namespace to demonstrate the API's usage. I have briefly summarized
the feature and its main motivations in the tag below.
Summary:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing
kernel developers to better manage the export surface, allow
subsystem maintainers to explicitly state that usage of some
exported symbols should only be limited to certain users (think:
inter-module or inter-driver symbols, debugging symbols, etc), as
well as more easily limiting the availability of namespaced symbols
to other parts of the kernel.
With the module import requirement, it is also easier to spot the
misuse of exported symbols during patch review.
Two new macros are introduced: EXPORT_SYMBOL_NS() and
EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in
Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there"
* tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Remove leftover '#undef' from export header
module: remove unneeded casts in cmp_name()
module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES
module: remove redundant 'depends on MODULES'
module: Fix link failure due to invalid relocation on namespace offset
usb-storage: export symbols in USB_STORAGE namespace
usb-storage: remove single-use define for debugging
docs: Add documentation for Symbol Namespaces
scripts: Coccinelle script for namespace dependencies.
modpost: add support for generating namespace dependencies
export: allow definition default namespaces in Makefiles or sources
module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
modpost: add support for symbol namespaces
module: add support for symbol namespaces.
export: explicitly align struct kernel_symbol
module: support reading multiple values per modinfo tag
Pull crypto updates from Herbert Xu:
"API:
- Add the ability to abort a skcipher walk.
Algorithms:
- Fix XTS to actually do the stealing.
- Add library helpers for AES and DES for single-block users.
- Add library helpers for SHA256.
- Add new DES key verification helper.
- Add surrounding bits for ESSIV generator.
- Add accelerations for aegis128.
- Add test vectors for lzo-rle.
Drivers:
- Add i.MX8MQ support to caam.
- Add gcm/ccm/cfb/ofb aes support in inside-secure.
- Add ofb/cfb aes support in media-tek.
- Add HiSilicon ZIP accelerator support.
Others:
- Fix potential race condition in padata.
- Use unbound workqueues in padata"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (311 commits)
crypto: caam - Cast to long first before pointer conversion
crypto: ccree - enable CTS support in AES-XTS
crypto: inside-secure - Probe transform record cache RAM sizes
crypto: inside-secure - Base RD fetchcount on actual RD FIFO size
crypto: inside-secure - Base CD fetchcount on actual CD FIFO size
crypto: inside-secure - Enable extended algorithms on newer HW
crypto: inside-secure: Corrected configuration of EIP96_TOKEN_CTRL
crypto: inside-secure - Add EIP97/EIP197 and endianness detection
padata: remove cpu_index from the parallel_queue
padata: unbind parallel jobs from specific CPUs
padata: use separate workqueues for parallel and serial work
padata, pcrypt: take CPU hotplug lock internally in padata_alloc_possible
crypto: pcrypt - remove padata cpumask notifier
padata: make padata_do_parallel find alternate callback CPU
workqueue: require CPU hotplug read exclusion for apply_workqueue_attrs
workqueue: unconfine alloc/apply/free_workqueue_attrs()
padata: allocate workqueue internally
arm64: dts: imx8mq: Add CAAM node
random: Use wait_event_freezable() in add_hwgenerator_randomness()
crypto: ux500 - Fix COMPILE_TEST warnings
...
Here are three small cleanup patches for the include/asm-generic
directory. Christoph removes the __ioremap as part of a cleanup,
Nico improves the constant do_div() optimization, and Denis
changes BUG_ON() to be consistent with other implementations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdf97EAAoJEJpsee/mABjZwTAP+wXle7qDBAkT+ddDwJ9IM+5x
j3oJ/eo4YN3II20xTbjmPfXBByb4sBnyZ1EWMtkwcMTjTbyuEwRLOHy810NWn0Yc
zKfrUGuPN2J54WGbEtzlCSf5hWOPdYij2JbkqZmwbCgnn4vHWOtOlL0m7RZ4sKWt
PjjdjZ+7B7ZdVIqkwCmyv7kY+0IQtdekjQDeqXSmoNeT2zeLS/5Di3inD53mi3Q0
1B6Eh1VEETqnnFke3GtHr5euo2EzglNnOpIP5V0VyN/t0HSz4mD9ZwGdANKPUij7
NSNUpCkbyvU6TNY82TMDP81mGH/QBF9Dy/1SKHFc258VJvLcB3Dtdp47fS9rYZ/N
XqxKZ+BVhZSdDV9JKJXX1Bl4vwP6IOF8LalQM50qOIw2kdxA8ZDJ17CPQ3a4kM6Y
QBNepCTIPZdjXpT5OaJdQiX8prsRMEBiD7I+Z9kO54ImHt94IB2MeZqMuuFG0GmL
/o3Ia1FYd2TF3SWHCxe2n2kFsLxQFvtwGP7zPvQpjl0Bh1UGEo0XhBERVe5sKkGm
xYfzXAbi38R6mY25lsqie7ibUQTY6Z2ZFYK5km9oGiAyNO0r9ioy56ROPWxZG2Kv
f74cQiEIORfc5ePz17NRsD8Mppe0g9lSwZ1RV29GtnaXNLET6gnxGQDLdzvitgF4
ZWPIGg+qOM9fmrxdNdCr
=9VXy
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Here are three small cleanup patches for the include/asm-generic
directory.
Christoph removes the __ioremap as part of a cleanup, Nico improves
the constant do_div() optimization, and Denis changes BUG_ON() to be
consistent with other implementations"
* tag 'asm-generic-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: add unlikely to default BUG_ON(x)
__div64_const32(): improve the generic C version
asm-generic: don't provide __ioremap
Pull core timer updates from Thomas Gleixner:
"Timers and timekeeping updates:
- A large overhaul of the posix CPU timer code which is a preparation
for moving the CPU timer expiry out into task work so it can be
properly accounted on the task/process.
An update to the bogus permission checks will come later during the
merge window as feedback was not complete before heading of for
travel.
- Switch the timerqueue code to use cached rbtrees and get rid of the
homebrewn caching of the leftmost node.
- Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
single function
- Implement the separation of hrtimers to be forced to expire in hard
interrupt context even when PREEMPT_RT is enabled and mark the
affected timers accordingly.
- Implement a mechanism for hrtimers and the timer wheel to protect
RT against priority inversion and live lock issues when a (hr)timer
which should be canceled is currently executing the callback.
Instead of infinitely spinning, the task which tries to cancel the
timer blocks on a per cpu base expiry lock which is held and
released by the (hr)timer expiry code.
- Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
resulting in faster access to timekeeping functions.
- Updates to various clocksource/clockevent drivers and their device
tree bindings.
- The usual small improvements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
posix-cpu-timers: Fix permission check regression
posix-cpu-timers: Always clear head pointer on dequeue
hrtimer: Add a missing bracket and hide `migration_base' on !SMP
posix-cpu-timers: Make expiry_active check actually work correctly
posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
tick: Mark sched_timer to expire in hard interrupt context
hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
posix-cpu-timers: Utilize timerqueue for storage
posix-cpu-timers: Move state tracking to struct posix_cputimers
posix-cpu-timers: Deduplicate rlimit handling
posix-cpu-timers: Remove pointless comparisons
posix-cpu-timers: Get rid of 64bit divisions
posix-cpu-timers: Consolidate timer expiry further
posix-cpu-timers: Get rid of zero checks
rlimit: Rewrite non-sensical RLIMIT_CPU comment
posix-cpu-timers: Respect INFINITY for hard RTTIME limit
posix-cpu-timers: Switch thread group sampling to array
posix-cpu-timers: Restructure expiry array
posix-cpu-timers: Remove cputime_expires
...
Pull scheduler updates from Ingo Molnar:
- MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.
As perf and the scheduler is getting bigger and more complex,
document the status quo of current responsibilities and interests,
and spread the review pain^H^H^H^H fun via an increase in the Cc:
linecount generated by scripts/get_maintainer.pl. :-)
- Add another series of patches that brings the -rt (PREEMPT_RT) tree
closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
into a new CONFIG_PREEMPTION category that will allow the eventual
introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
to go though.
- Extend the CPU cgroup controller with uclamp.min and uclamp.max to
allow the finer shaping of CPU bandwidth usage.
- Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).
- Improve the behavior of high CPU count, high thread count
applications running under cpu.cfs_quota_us constraints.
- Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.
- Improve CPU isolation housekeeping CPU allocation NUMA locality.
- Fix deadline scheduler bandwidth calculations and logic when cpusets
rebuilds the topology, or when it gets deadline-throttled while it's
being offlined.
- Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
setscheduler() system calls without creating global serialization.
Add new synchronization between cpuset topology-changing events and
the deadline acceptance tests in setscheduler(), which were broken
before.
- Rework the active_mm state machine to be less confusing and more
optimal.
- Rework (simplify) the pick_next_task() slowpath.
- Improve load-balancing on AMD EPYC systems.
- ... and misc cleanups, smaller fixes and improvements - please see
the Git log for more details.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
sched/psi: Correct overly pessimistic size calculation
sched/fair: Speed-up energy-aware wake-ups
sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
sched/uclamp: Update CPU's refcount on TG's clamp changes
sched/uclamp: Use TG's clamps to restrict TASK's clamps
sched/uclamp: Propagate system defaults to the root group
sched/uclamp: Propagate parent clamps
sched/uclamp: Extend CPU's cgroup controller
sched/topology: Improve load balancing on AMD EPYC systems
arch, ia64: Make NUMA select SMP
sched, perf: MAINTAINERS update, add submaintainers and reviewers
sched/fair: Use rq_lock/unlock in online_fair_sched_group
cpufreq: schedutil: fix equation in comment
sched: Rework pick_next_task() slow-path
sched: Allow put_prev_task() to drop rq->lock
sched/fair: Expose newidle_balance()
sched: Add task_struct pointer to sched_class::set_curr_task
sched: Rework CPU hotplug task selection
sched/{rt,deadline}: Fix set_next_task vs pick_next_task
sched: Fix kerneldoc comment for ia64_set_curr_task
...
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl1yYREQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNAM3CAChqDFQkryXoHwdeEcaukMRVNxtxOi4pM4g
5xqkb7PoqRJssIblsuhaXjrSD97yWCgaqCmFe6rKoes++lP4bFcTe22KXPPyPBED
A+tK4nTuKKcZfVbEanUjI+ihXaHJmKZ/kwAxWsEBYZ4WCOe3voCiJVNO2fHxqg1M
8TskZ2BoayTbWMXih0eJg2MCy/xApBq4b3nZG4bKI7Z9UpXiKN1NYtDh98ZEBK4V
d/oNoHsJ2ZvIQsztoBJMsvr09DTCazCijWZiECadm6l41WEPFizngrACiSJLLtYo
0qu4qxgg9zgFlvBCRQmIYSggTuv35RgXSfcOwChmW5DUjHG+f9GK
=Ru4B
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Although there isn't tonnes of code in terms of line count, there are
a fair few headline features which I've noted both in the tag and also
in the merge commits when I pulled everything together.
The part I'm most pleased with is that we had 35 contributors this
time around, which feels like a big jump from the usual small group of
core arm64 arch developers. Hopefully they all enjoyed it so much that
they'll continue to contribute, but we'll see.
It's probably worth highlighting that we've pulled in a branch from
the risc-v folks which moves our CPU topology code out to where it can
be shared with others.
Summary:
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by
syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural
clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits)
arm64: remove __iounmap
arm64: atomics: Use K constraint when toolchain appears to support it
arm64: atomics: Undefine internal macros after use
arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
arm64: asm: Kill 'asm/atomic_arch.h'
arm64: lse: Remove unused 'alt_lse' assembly macro
arm64: atomics: Remove atomic_ll_sc compilation unit
arm64: avoid using hard-coded registers for LSE atomics
arm64: atomics: avoid out-of-line ll/sc atomics
arm64: Use correct ll/sc atomic constraints
jump_label: Don't warn on __exit jump entries
docs/perf: Add documentation for the i.MX8 DDR PMU
perf/imx_ddr: Add support for AXI ID filtering
arm64: kpti: ensure patched kernel text is fetched from PoU
arm64: fix fixmap copy for 16K pages and 48-bit VA
perf/smmuv3: Validate groups for global filtering
perf/smmuv3: Validate group size
arm64: Relax Documentation/arm64/tagged-pointers.rst
arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
arm64: mm: Ignore spurious translation faults taken from the kernel
...
Commit 7290d58095 ("module: use relative references for __ksymtab
entries") converted the '__put' #define into an assembly macro in
asm-generic/export.h but forgot to remove the corresponding '#undef'.
Remove the leftover '#undef'.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Commit 8651ec01da ("module: add support for symbol namespaces.")
broke linking for arm64 defconfig:
| lib/crypto/arc4.o: In function `__ksymtab_arc4_setkey':
| arc4.c:(___ksymtab+arc4_setkey+0x8): undefined reference to `no symbol'
| lib/crypto/arc4.o: In function `__ksymtab_arc4_crypt':
| arc4.c:(___ksymtab+arc4_crypt+0x8): undefined reference to `no symbol'
This is because the dummy initialisation of the 'namespace_offset' field
in 'struct kernel_symbol' when using EXPORT_SYMBOL on architectures with
support for PREL32 locations uses an offset from an absolute address (0)
in an effort to trick 'offset_to_pointer' into behaving as a NOP,
allowing non-namespaced symbols to be treated in the same way as those
belonging to a namespace.
Unfortunately, place-relative relocations require a symbol reference
rather than an absolute value and, although x86 appears to get away with
this due to placing the kernel text at the top of the address space, it
almost certainly results in a runtime failure if the kernel is relocated
dynamically as a result of KASLR.
Rework 'namespace_offset' so that a value of 0, which cannot occur for a
valid namespaced symbol, indicates that the corresponding symbol does
not belong to a namespace.
Cc: Matthias Maennich <maennich@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 8651ec01da ("module: add support for symbol namespaces.")
Reported-by: kbuild test robot <lkp@intel.com>
Tested-by: Matthias Maennich <maennich@google.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>