Commit Graph

1007 Commits

Author SHA1 Message Date
Mike Rapoport 7bb7f2ac24 arch, mm: wire up memfd_secret system call where relevant
Wire up memfd_secret system call on architectures that define
ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.

Link: https://lkml.kernel.org/r/20210518072034.31572-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:21 -07:00
Linus Torvalds 911a2997a5 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmDcl7AACgkQnJ2qBz9k
 QNnsBQf+LBAPsfykQ/f8EdHErO1lfbVTmwf2g/JzTkjrIVZTZ6Ic47aCIiFxgHU2
 Js9ufaPxpsbbopzpn2PAoCUzxNsZDqgXtnC03MOUAqoSFbAvgLHz2sQwjqeYJUGQ
 P6n7VipEA/qBVpQI5zeCUhHYcahoNrRjSLzaFnE2Z8CrQYQ6Ry9gVEhduvu2OTru
 62cWlAWlTJfx/FcR1Y0F/ZznnNSKMiAHcEe3F6Beztplg2ooq+z6FclJYrkmnxMq
 SXSOsqTCdi1/oFx36NpvLkykrIS9I7N/iqCnKwbm6X+nyZZKyAwYZhWVqkbozPPu
 +u1Ppq8o0IuWwEA6/UAmxgAO3m/Gkw==
 =tn0h
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull misc fs updates from Jan Kara:
 "The new quotactl_fd() syscall (remake of quotactl_path() syscall that
  got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs,
  isofs, and writeback fixes and cleanups"

* tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  writeback: fix obtain a reference to a freeing memcg css
  quota: remove unnecessary oom message
  isofs: remove redundant continue statement
  quota: Wire up quotactl_fd syscall
  quota: Change quotactl_path() systcall to an fd-based one
  reiserfs: Remove unneed check in reiserfs_write_full_page()
  udf: Fix NULL pointer dereference in udf_symlink function
  reiserfs: add check for invalid 1st journal block
2021-07-01 12:06:39 -07:00
Linus Torvalds 1dfb0f47ac X86 entry code related updates:
- Consolidate the macros for .byte ... opcode sequences
 
  - Deduplicate register offset defines in include files
 
  - Simplify the ia32,x32 compat handling of the related syscall tables to
    get rid of #ifdeffery.
 
  - Clear all EFLAGS which are not required for syscall handling
 
  - Consolidate the syscall tables and switch the generation over to the
    generic shell script and remove the CFLAGS tweaks which are not longer
    required.
 
  - Use 'int' type for system call numbers to match the generic code.
 
  - Add more selftests for syscalls
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbKzMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoae8D/9+pksdf8lE5dRLtngSeTDLiyIV+qq4
 vSks7XfrTTAhOV2nRwtIulc2CO6H7jcvn6ehmiC/X0Tn9JK5brwSJJYryNEjA3cp
 3p9jPrB1w1SDhx35JzILN4DDaJfI3jobLSLDq0KQzuEL0+c0R4l3WBplpCzbLjqj
 NaFQgslf8RSnjha9NLTKzlzSaNNNo9Ioo6DyrsBDEdcRBtAPlFfdVtT3oJE73ANH
 dK5POoVWysmAnDAwEW17j9bBJLtxeWsrhM9CrtqvcKr3HhK9WjWUFAr+diQf5GKf
 BAD2A+5y8wZQXvFOuC9WZxfQwUFSLExt8BfcXblOUbf2CdlvoYVzOlvI141kA++4
 q4wQ1vl6MbLCp6wLysc3bnwKUEmnf2E4Iyj5JR2aFrw096pAoZ3ZbAQi7s3Vhb16
 aSbGxIw3rHRuB0f8VmOA0iEHiXlkRmE/K+nH1/uDTUZLaDpktPvpKQJsp0+9qXFk
 eVtEw4bVKJ7q5ozjMzpm9aPxPp1v8MGxUOJOy80W7Ti+vBp2KmMKc1gy8QsYrTvW
 Vzvpp3U+/WFh2X7AG0zlP/JEnOuJmMwMK5QhzMC2rEbaHJ66ht7SABvtSbOHHw5Z
 zugxTE0lx3n7izCxW1RLEu//xtWY0FbU2L5oE2Ace27myUPeBQCDJzynUn93dMM9
 9nq2TtgTCF6XvA==
 =+sb9
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry code related updates from Thomas Gleixner:

 - Consolidate the macros for .byte ... opcode sequences

 - Deduplicate register offset defines in include files

 - Simplify the ia32,x32 compat handling of the related syscall tables
   to get rid of #ifdeffery.

 - Clear all EFLAGS which are not required for syscall handling

 - Consolidate the syscall tables and switch the generation over to the
   generic shell script and remove the CFLAGS tweaks which are not
   longer required.

 - Use 'int' type for system call numbers to match the generic code.

 - Add more selftests for syscalls

* tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/syscalls: Don't adjust CFLAGS for syscall tables
  x86/syscalls: Remove -Wno-override-init for syscall tables
  x86/uml/syscalls: Remove array index from syscall initializers
  x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env
  x86/entry: Use int everywhere for system call numbers
  x86/entry: Treat out of range and gap system calls the same
  x86/entry/64: Sign-extend system calls on entry to int
  selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64
  selftests/x86/syscall: Simplify message reporting in syscall_numbering
  selftests/x86/syscall: Update and extend syscall_numbering_64
  x86/syscalls: Switch to generic syscallhdr.sh
  x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max
  x86/unistd: Define X32_NR_syscalls only for 64-bit kernel
  x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall
  x86/syscalls: Switch to generic syscalltbl.sh
  x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
2021-06-29 12:44:51 -07:00
Linus Torvalds 909489bf9f Changes for this cycle:
- Micro-optimize and standardize the do_syscall_64() calling convention
  - Make syscall entry flags clearing more conservative
  - Clean up syscall table handling
  - Clean up & standardize assembly macros, in preparation of FRED
  - Misc cleanups and fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZeG8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gHQw//fI9MAIVQbB6tVMH6GtFkQZIJLMt/bik5
 AWelEXoBUbbLFGKpugC+oWGJjsvZ026f65hfQEswuqD4n0Xx8FFPRi51LP88lLya
 XQV8nssJYUKYZAVA0EJd7NmnJchbnRc4KQmu6ekEQdP6+Nht8k7U9O2QetgQgcE5
 IYhXctoYpr/FnBpV5PmVNAakOt0cZh6mXAtpzjHfdU8lUHZ13zPIpniSXCPd4vUB
 u/a3x3l1fP+Gg8d1vpfGCBvNKRBEh5pJsjaObMlLM/qhHupsDi5Ji6y6pcJSgkcv
 2nBtRGYDjYIQ0qXx6ILhNuqGFT76i/j2p8YfwMnH4NmYk908RlT0quu7fI8wBO9E
 cKd3m9BG8wP67xbOrG/0ckdl3+y/1iW8kPY6SeO03Vvfm6ryqHdZs4oi4CmcX9lP
 bFXi5AiYdHm0vqbwQG8P9LerWotgz4yFC9z7yC1KXJDXJxSwVxDFiXvyvxepRi6E
 NZxe4RSnDp7sijEvZJa/2EA+rDVDIokfzTLgnRSMkaUuxwNsVjeNsV0b5727kiVC
 DwVkxC7NZKG9UBr6WFs9hxRPE0g6xz3EJEBXaWpk2ggBmQxTfBRTjV0Pe3ii7dqQ
 z7O3Gv8pojki3ttG4wExLepPHRxTBzjdsoV6/BHZpraYTP11bpQlgx/K7IYJZYa5
 Tt9IZ4vNd10=
 =mbmH
 -----END PGP SIGNATURE-----

Merge tag 'x86-asm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm updates from Ingo Molnar:

 - Micro-optimize and standardize the do_syscall_64() calling convention

 - Make syscall entry flags clearing more conservative

 - Clean up syscall table handling

 - Clean up & standardize assembly macros, in preparation of FRED

 - Misc cleanups and fixes

* tag 'x86-asm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Make <asm/asm.h> valid on cross-builds as well
  x86/regs: Syscall_get_nr() returns -1 for a non-system call
  x86/entry: Split PUSH_AND_CLEAR_REGS into two submacros
  x86/syscall: Maximize MSR_SYSCALL_MASK
  x86/syscall: Unconditionally prototype {ia32,x32}_sys_call_table[]
  x86/entry: Reverse arguments to do_syscall_64()
  x86/entry: Unify definitions from <asm/calling.h> and <asm/ptrace-abi.h>
  x86/asm: Use _ASM_BYTES() in <asm/nops.h>
  x86/asm: Add _ASM_BYTES() macro for a .byte ... opcode sequence
  x86/asm: Have the __ASM_FORM macros handle commas in arguments
2021-06-28 12:57:11 -07:00
Linus Torvalds d04f7de0a5 - Differentiate the type of exception the #VC handler raises depending
on code executed in the guest and handle the case where failure to
 get the RIP would result in a #GP, as it should, instead of in a #PF
 
 - Disable interrupts while the per-CPU GHCB is held
 
 - Split the #VC handler depending on where the #VC exception has
 happened and therefore provide for precise context tracking like the
 rest of the exception handlers deal with noinstr regions now
 
 - Add defines for the GHCB version 2 protocol so that further shared
 development with KVM can happen without merge conflicts
 
 - The usual small cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmDZij8ACgkQEsHwGGHe
 VUpwIQ/8CzFbGm2k2RdmO0H/VPwfF1HFSWpM9YFGSs++yOqfiyCFbyIcTcRbK4IO
 +BUIRoHSgCWPb+5pJli1Wf0J/sIdYr9D4MDWt1oRQG6e/4NE2SL3EOnYJWW5VtOT
 u1AVk01ooPOFDKIoh4OIZ7tCKAeNWBv+oe5dmP46spiEZbHHCzHIEaBuOQRzvX9C
 jSKulDHjA4iaNl/BQMF7dJL1+aPWj2NXjSj86fhMAa+m5MspDXbIaM5wMZfPzc1k
 Rj/m89JThp+mFwik46o/7g/5Q8SYtTE+Hqi1TX/65/dbyizLqbH5W3g0zwrD8TYf
 B7kHguqkoE1j1avLwOYK1yJB8ZTjtf+OXjUAR4UPzxkG7Xhelu5Qb7RD/WCJ3YqO
 KEFIFq+hsiAqvb6RkmX0aVecIJ49aqGX+onsMpLWq9pz2R4BRcH7jo81TIBcosg5
 2Kfx2aPcMec7u7RMBHqwiaC4Adp7/vmHhukawfI8xCWLd7wEjvAMP3eeePxR+C0l
 SSnn0O9COj8pctvq4eOGJAUXzPa4YtsaX+kILBs+hUdQXmQGVSxyTpakyhhUpGQ8
 YyblbHybS8JeYdGqPVS/tn0Rc2DqOSQJetjmXAGhlkEkkGY8i1Ddwe0MaamJozol
 g/wHNYcok/OQWglvVThv6EAY2pTSeWelmjUkZi1dnkYNH1VUxxE=
 =iyX+
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Differentiate the type of exception the #VC handler raises depending
   on code executed in the guest and handle the case where failure to
   get the RIP would result in a #GP, as it should, instead of in a #PF

 - Disable interrupts while the per-CPU GHCB is held

 - Split the #VC handler depending on where the #VC exception has
   happened and therefore provide for precise context tracking like the
   rest of the exception handlers deal with noinstr regions now

 - Add defines for the GHCB version 2 protocol so that further shared
   development with KVM can happen without merge conflicts

 - The usual small cleanups

* tag 'x86_sev_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Use "SEV: " prefix for messages from sev.c
  x86/sev: Add defines for GHCB version 2 MSR protocol requests
  x86/sev: Split up runtime #VC handler for correct state tracking
  x86/sev: Make sure IRQs are disabled while GHCB is active
  x86/sev: Propagate #GP if getting linear instruction address failed
  x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()
  x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
  x86/sev: Fix error message in runtime #VC handler
2021-06-28 11:29:12 -07:00
Peter Zijlstra 84e60065df x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall()
Fix:

  vmlinux.o: warning: objtool: xen_pv_evtchn_do_upcall()+0x23: call to irq_enter_rcu() leaves .noinstr.text section

Fixes: 359f01d181 ("x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210621120120.532960208@infradead.org
2021-06-22 13:56:42 +02:00
Peter Zijlstra 240001d4e3 x86/entry: Fix noinstr fail in __do_fast_syscall_32()
Fix:

  vmlinux.o: warning: objtool: __do_fast_syscall_32()+0xf5: call to trace_hardirqs_off() leaves .noinstr.text section

Fixes: 5d5675df79 ("x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210621120120.467898710@infradead.org
2021-06-22 13:56:42 +02:00
Joerg Roedel be1a540886 x86/sev: Split up runtime #VC handler for correct state tracking
Split up the #VC handler code into a from-user and a from-kernel part.
This allows clean and correct state tracking, as the #VC handler needs
to enter NMI-state when raised from kernel mode and plain IRQ state when
raised from user-mode.

Fixes: 62441a1fb5 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210618115409.22735-3-joro@8bytes.org
2021-06-21 16:01:05 +02:00
Jan Kara 65ffb3d69e quota: Wire up quotactl_fd syscall
Wire up the quotactl_fd syscall.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-07 12:11:24 +02:00
Brian Gerst 48f7eee81c x86/syscalls: Don't adjust CFLAGS for syscall tables
The syscall_*.c files only contain data (the syscall tables).  There
is no need to adjust CFLAGS for tracing and stack protector since they
contain no code.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20210524181707.132844-4-brgerst@gmail.com
2021-05-25 16:59:23 +02:00
Brian Gerst fd9e8691f3 x86/syscalls: Remove -Wno-override-init for syscall tables
Commit 44fe4895f4 ("Stop filling syscall arrays with *_sys_ni_syscall")
removes the need for -Wno-override-init, since the table is now filled
sequentially instead of overriding a default value.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20210524181707.132844-3-brgerst@gmail.com
2021-05-25 16:59:23 +02:00
Masahiro Yamada 1eb8a49836 x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env
If the environment variable 'prefix' is set on the build host, it is
wrongly used as syscall macro prefixes.

  $ export prefix=/usr
  $ make -s defconfig all
  In file included from ./arch/x86/include/asm/unistd.h:20,
                   from <stdin>:2:
  ./arch/x86/include/generated/uapi/asm/unistd_64.h:4:9: warning: missing whitespace after the macro name
      4 | #define __NR_/usrread 0
        |         ^~~~~

arch/x86/entry/syscalls/Makefile should clear 'offset' and 'prefix'.

Fixes: 3cba325b35 ("x86/syscalls: Switch to generic syscallhdr.sh")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210525115420.679416-1-masahiroy@kernel.org
2021-05-25 16:59:23 +02:00
H. Peter Anvin (Intel) 2978996f62 x86/entry: Use int everywhere for system call numbers
System call numbers are defined as int, so use int everywhere for system
call numbers. This is strictly a cleanup; it should not change anything
user visible; all ABI changes have been done in the preceeding patches.

[ tglx: Replaced the unsigned long cast ]

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-7-hpa@zytor.com
2021-05-25 10:07:00 +02:00
H. Peter Anvin (Intel) b337b4965e x86/entry: Treat out of range and gap system calls the same
The current 64-bit system call entry code treats out-of-range system
calls differently than system calls that map to a hole in the system
call table.

This is visible to the user if system calls are intercepted via ptrace or
seccomp and the return value (regs->ax) is modified: in the former case,
the return value is preserved, and in the latter case, sys_ni_syscall() is
called and the return value is forced to -ENOSYS.

The API spec in <asm-generic/syscalls.h> is very clear that only
(int)-1 is the non-system-call sentinel value, so make the system call
behavior consistent by calling sys_ni_syscall() for all invalid system
call numbers except for -1.

Although currently sys_ni_syscall() simply returns -ENOSYS, calling it
explicitly is friendly for tracing and future possible extensions, and
as this is an error path there is no reason to optimize it.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-6-hpa@zytor.com
2021-05-20 15:19:49 +02:00
H. Peter Anvin (Intel) 0595494891 x86/entry/64: Sign-extend system calls on entry to int
Right now, *some* code will treat e.g. 0x0000000100000001 as a system
call and some will not. Some of the code, notably in ptrace, will
treat 0x000000018000000 as a system call and some will not. Finally,
right now, e.g. 335 for x86-64 will force the exit code to be set to
-ENOSYS even if poked by ptrace, but 548 will not, because there is an
observable difference between an out of range system call and a system
call number that falls outside the range of the table.

This is visible to the user: for example, the syscall_numbering_64
test fails if run under strace, because as strace uses ptrace, it ends
up clobbering the upper half of the 64-bit system call number.

The architecture independent code all assumes that a system call is "int"
that the value -1 specifically and not just any negative value is used for
a non-system call. This is the case on x86 as well when arch-independent
code is involved. The arch-independent API is defined/documented (but not
*implemented*!) in <asm-generic/syscall.h>.

This is an ABI change, but is in fact a revert to the original x86-64
ABI. The original assembly entry code would zero-extend the system call
number;

Use sign extend to be explicit that this is treated as a signed number
(although in practice it makes no difference, of course) and to avoid
people getting the idea of "optimizing" it, as has happened on at least
two(!) separate occasions.

Do not store the extended value into regs->orig_ax, however: on x86-64, the
ABI is that the callee is responsible for extending parameters, so only
examining the lower 32 bits is fully consistent with any "int" argument to
any system call, e.g. regs->di for write(2). The full value of %rax on
entry to the kernel is thus still available.

[ tglx: Add a comment to the ASM code ]

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-5-hpa@zytor.com
2021-05-20 15:19:49 +02:00
Masahiro Yamada 3cba325b35 x86/syscalls: Switch to generic syscallhdr.sh
Many architectures duplicate similar shell scripts.

Converts x86 to use scripts/syscallhdr.sh.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210517073815.97426-7-masahiroy@kernel.org
2021-05-20 15:03:59 +02:00
Masahiro Yamada 49f731f197 x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max
__NR_syscall_max is only used by x86 and UML. In contrast, __NR_syscalls is
widely used by all the architectures.

Convert __NR_syscall_max to __NR_syscalls and adjust the usage sites.

This prepares x86 to switch to the generic syscallhdr.sh script.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210517073815.97426-6-masahiroy@kernel.org
2021-05-20 15:03:59 +02:00
Masahiro Yamada 44fe4895f4 x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall
This is a follow-up cleanup after switching to the generic syscalltbl.sh.

The old x86 specific script skipped non-existing syscalls. So, the
generated syscalls_64.h, for example, had a big hole in the syscall numbers
335-423 range. That is why there exists [0 ... __NR_*_syscall_max] =
&__*_sys_ni_cyscall.

The new script, scripts/syscalltbl.sh automatically fills holes
with __SYSCALL(<nr>, sys_ni_syscall), hence such ugly code can
go away. The designated initializers, '[nr] =' are also unneeded.

Also, there is no need to give __NR_*_syscall_max+1 because the array
size is implied by the number of syscalls in the generated headers.
Hence, there is no need to include <asm/unistd.h>, either.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210517073815.97426-4-masahiroy@kernel.org
2021-05-20 15:03:59 +02:00
Masahiro Yamada 6218d0f6b8 x86/syscalls: Switch to generic syscalltbl.sh
Many architectures duplicate similar shell scripts.

Convert x86 and UML to use scripts/syscalltbl.sh. The generic script
generates seperate headers for x86/64 and x86/x32 syscalls, while the x86
specific script coalesced them into one. Adjust the code accordingly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210517073815.97426-3-masahiroy@kernel.org
2021-05-20 15:03:58 +02:00
Masahiro Yamada 2e958a8a51 x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
The SYSCALL macros are mapped to symbols as follows:

  __SYSCALL_COMMON(nr, sym)  -->  __x64_<sym>
  __SYSCALL_X32(nr, sym)     -->  __x32_<sym>

Originally, the syscalls in the x32 special range (512-547) were all
compat.

This assumption is now broken after the following commits:

  55db9c0e85 ("net: remove compat_sys_{get,set}sockopt")
  5f764d624a ("fs: remove the compat readv/writev syscalls")
  598b3cec83 ("fs: remove compat_sys_vmsplice")
  c3973b401e ("mm: remove compat_process_vm_{readv,writev}")

Those commits redefined __x32_sys_* to __x64_sys_* because there is no stub
like __x32_sys_*.

Defining them as follows is more sensible and cleaner.

  __SYSCALL_COMMON(nr, sym)  -->  __x64_<sym>
  __SYSCALL_X32(nr, sym)     -->  __x64_<sym>

This works because both x86_64 and x32 use the same ABI (RDI, RSI, RDX,
R10, R8, R9)

The ugly #define __x32_sys_* will go away.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210517073815.97426-2-masahiroy@kernel.org
2021-05-20 15:03:58 +02:00
Jan Kara 5b9fedb31e quota: Disable quotactl_path syscall
In commit fa8b90070a ("quota: wire up quotactl_path") we have wired up
new quotactl_path syscall. However some people in LWN discussion have
objected that the path based syscall is missing dirfd and flags argument
which is mostly standard for contemporary path based syscalls. Indeed
they have a point and after a discussion with Christian Brauner and
Sascha Hauer I've decided to disable the syscall for now and update its
API. Since there is no userspace currently using that syscall and it
hasn't been released in any major release, we should be fine.

CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein
Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-17 14:39:56 +02:00
H. Peter Anvin (Intel) 29e9758966 x86/entry: Split PUSH_AND_CLEAR_REGS into two submacros
PUSH_AND_CLEAR_REGS, as the name implies, performs two functions:
pushing registers and clearing registers. They don't necessarily have
to be performed in immediate sequence, although all current users
do. Split it into two macros for the case where that isn't desired;
the FRED enabling patchset will eventually make use of this.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210510185316.3307264-6-hpa@zytor.com
2021-05-12 10:49:15 +02:00
H. Peter Anvin (Intel) 3e5e7f7736 x86/entry: Reverse arguments to do_syscall_64()
Reverse the order of arguments to do_syscall_64() so that the first
argument is the pt_regs pointer. This is not only consistent with
*all* other entry points from assembly, but it actually makes the
compiled code slightly better.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210510185316.3307264-3-hpa@zytor.com
2021-05-12 10:49:14 +02:00
H. Peter Anvin (Intel) 6627eb25e4 x86/entry: Unify definitions from <asm/calling.h> and <asm/ptrace-abi.h>
The register offsets in <asm/ptrace-abi.h> are duplicated in
entry/calling.h, but are formatted differently and therefore not
compatible. Use the version from <asm/ptrace-abi.h> consistently.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210510185316.3307264-2-hpa@zytor.com
2021-05-12 10:49:13 +02:00
Linus Torvalds 17ae69aba8 Add Landlock, a new LSM from Mickaël Salaün <mic@linux.microsoft.com>
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEgycj0O+d1G2aycA8rZhLv9lQBTwFAmCInP4ACgkQrZhLv9lQ
 BTza0g//dTeb9woC9H7qlEhK4l9yk62lTss60Q8X7m7ZSNfdL4tiEbi64SgK+iOW
 OOegbrOEb8Kzh4KJJYmVlVZ5YUWyH4szgmee1wnylBdsWiWaPLPF3Cflz77apy6T
 TiiBsJd7rRE29FKheaMt34B41BMh8QHESN+DzjzJWsFoi/uNxjgSs2W16XuSupKu
 bpRmB1pYNXMlrkzz7taL05jndZYE5arVriqlxgAsuLOFOp/ER7zecrjImdCM/4kL
 W6ej0R1fz2Geh6CsLBJVE+bKWSQ82q5a4xZEkSYuQHXgZV5eywE5UKu8ssQcRgQA
 VmGUY5k73rfY9Ofupf2gCaf/JSJNXKO/8Xjg0zAdklKtmgFjtna5Tyg9I90j7zn+
 5swSpKuRpilN8MQH+6GWAnfqQlNoviTOpFeq3LwBtNVVOh08cOg6lko/bmebBC+R
 TeQPACKS0Q0gCDPm9RYoU1pMUuYgfOwVfVRZK1prgi2Co7ZBUMOvYbNoKYoPIydr
 ENBYljlU1OYwbzgR2nE+24fvhU8xdNOVG1xXYPAEHShu+p7dLIWRLhl8UCtRQpSR
 1ofeVaJjgjrp29O+1OIQjB2kwCaRdfv/Gq1mztE/VlMU/r++E62OEzcH0aS+mnrg
 yzfyUdI8IFv1q6FGT9yNSifWUWxQPmOKuC8kXsKYfqfJsFwKmHM=
 =uCN4
 -----END PGP SIGNATURE-----

Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull Landlock LSM from James Morris:
 "Add Landlock, a new LSM from Mickaël Salaün.

  Briefly, Landlock provides for unprivileged application sandboxing.

  From Mickaël's cover letter:
    "The goal of Landlock is to enable to restrict ambient rights (e.g.
     global filesystem access) for a set of processes. Because Landlock
     is a stackable LSM [1], it makes possible to create safe security
     sandboxes as new security layers in addition to the existing
     system-wide access-controls. This kind of sandbox is expected to
     help mitigate the security impact of bugs or unexpected/malicious
     behaviors in user-space applications. Landlock empowers any
     process, including unprivileged ones, to securely restrict
     themselves.

     Landlock is inspired by seccomp-bpf but instead of filtering
     syscalls and their raw arguments, a Landlock rule can restrict the
     use of kernel objects like file hierarchies, according to the
     kernel semantic. Landlock also takes inspiration from other OS
     sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
     Pledge/Unveil.

     In this current form, Landlock misses some access-control features.
     This enables to minimize this patch series and ease review. This
     series still addresses multiple use cases, especially with the
     combined use of seccomp-bpf: applications with built-in sandboxing,
     init systems, security sandbox tools and security-oriented APIs [2]"

  The cover letter and v34 posting is here:

      https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/

  See also:

      https://landlock.io/

  This code has had extensive design discussion and review over several
  years"

Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]

* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  landlock: Enable user space to infer supported features
  landlock: Add user and kernel documentation
  samples/landlock: Add a sandbox manager example
  selftests/landlock: Add user space tests
  landlock: Add syscall implementations
  arch: Wire up Landlock syscalls
  fs,security: Add sb_delete hook
  landlock: Support filesystem access-control
  LSM: Infrastructure management of the superblock
  landlock: Add ptrace restrictions
  landlock: Set up the security framework and manage credentials
  landlock: Add ruleset and domain management
  landlock: Add object management
2021-05-01 18:50:44 -07:00
Linus Torvalds 767fcbc80f \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmCJU1UACgkQnJ2qBz9k
 QNk62AgAgp05OIXU/AgObb7DvSyI3ycwCV8PeWBpwD8yoDAh5x0tmT7vnJu974p6
 yHdnF7rr69ZzvbNCHLJ5kRykRlUao9W7cO5fdOW1uTpL7Ic60QuJMks/NfgVTHp1
 2zIQmBDerfn1/LTK8r2pPGcvtcjRcr7Ep4beN0Duw57lfVMJhjsNRPnBbXGBcp0r
 QzKk4/8V3DCZvOw+XNC3nto7avjvf+nU9sJmuh83546eqh0atjWivvO5aAlDOe6W
 rhBiLlmP0in5u2n1fYqzI1OQvtgtleyEZT2G0CrbAZn0xjmV/if9wl+3K6TOwDvR
 778xDEX7sZCaO/xkB+WK3hrd15ftKg==
 =0kYE
 -----END PGP SIGNATURE-----

Merge tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota, ext2, reiserfs updates from Jan Kara:

 - support for path (instead of device) based quotactl syscall
   (quotactl_path(2))

 - ext2 conversion to kmap_local()

 - other minor cleanups & fixes

* tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fs/reiserfs/journal.c: delete useless variables
  fs/ext2: Replace kmap() with kmap_local_page()
  ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry()
  fs/ext2/: fix misspellings using codespell tool
  quota: report warning limits for realtime space quotas
  quota: wire up quotactl_path
  quota: Add mountpath based quota support
2021-04-29 10:51:29 -07:00
Linus Torvalds c6536676c7 - turn the stack canary into a normal __percpu variable on 32-bit which
gets rid of the LAZY_GS stuff and a lot of code.
 
 - Add an insn_decode() API which all users of the instruction decoder
 should preferrably use. Its goal is to keep the details of the
 instruction decoder away from its users and simplify and streamline how
 one decodes insns in the kernel. Convert its users to it.
 
 - kprobes improvements and fixes
 
 - Set the maximum DIE per package variable on Hygon
 
 - Rip out the dynamic NOP selection and simplify all the machinery around
 selecting NOPs. Use the simplified NOPs in objtool now too.
 
 - Add Xeon Sapphire Rapids to list of CPUs that support PPIN
 
 - Simplify the retpolines by folding the entire thing into an
 alternative now that objtool can handle alternatives with stack
 ops. Then, have objtool rewrite the call to the retpoline with the
 alternative which then will get patched at boot time.
 
 - Document Intel uarch per models in intel-family.h
 
 - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the
 exception on Intel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCHyJQACgkQEsHwGGHe
 VUpjiRAAwPZdwwp08ypZuMHR4EhLNru6gYhbAoALGgtYnQjLtn5onQhIeieK+R4L
 cmZpxHT9OFp5dXHk4kwygaQBsD4pPOiIpm60kye1dN3cSbOORRdkwEoQMpKMZ+5Y
 kvVsmn7lrwRbp600KdE4G6L5+N6gEgr0r6fMFWWGK3mgVAyCzPexVHgydcp131ch
 iYMo6/pPDcNkcV/hboVKgx7GISdQ7L356L1MAIW/Sxtw6uD/X4qGYW+kV2OQg9+t
 nQDaAo7a8Jqlop5W5TQUdMLKQZ1xK8SFOSX/nTS15DZIOBQOGgXR7Xjywn1chBH/
 PHLwM5s4XF6NT5VlIA8tXNZjWIZTiBdldr1kJAmdDYacrtZVs2LWSOC0ilXsd08Z
 EWtvcpHfHEqcuYJlcdALuXY8xDWqf6Q2F7BeadEBAxwnnBg+pAEoLXI/1UwWcmsj
 wpaZTCorhJpYo2pxXckVdHz2z0LldDCNOXOjjaWU8tyaOBKEK6MgAaYU7e0yyENv
 mVc9n5+WuvXuivC6EdZ94Pcr/KQsd09ezpJYcVfMDGv58YZrb6XIEELAJIBTu2/B
 Ua8QApgRgetx+1FKb8X6eGjPl0p40qjD381TADb4rgETPb1AgKaQflmrSTIik+7p
 O+Eo/4x/GdIi9jFk3K+j4mIznRbUX0cheTJgXoiI4zXML9Jv94w=
 =bm4S
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 updates from Borislav Petkov:

 - Turn the stack canary into a normal __percpu variable on 32-bit which
   gets rid of the LAZY_GS stuff and a lot of code.

 - Add an insn_decode() API which all users of the instruction decoder
   should preferrably use. Its goal is to keep the details of the
   instruction decoder away from its users and simplify and streamline
   how one decodes insns in the kernel. Convert its users to it.

 - kprobes improvements and fixes

 - Set the maximum DIE per package variable on Hygon

 - Rip out the dynamic NOP selection and simplify all the machinery
   around selecting NOPs. Use the simplified NOPs in objtool now too.

 - Add Xeon Sapphire Rapids to list of CPUs that support PPIN

 - Simplify the retpolines by folding the entire thing into an
   alternative now that objtool can handle alternatives with stack ops.
   Then, have objtool rewrite the call to the retpoline with the
   alternative which then will get patched at boot time.

 - Document Intel uarch per models in intel-family.h

 - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the
   exception on Intel.

* tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  x86, sched: Treat Intel SNC topology as default, COD as exception
  x86/cpu: Comment Skylake server stepping too
  x86/cpu: Resort and comment Intel models
  objtool/x86: Rewrite retpoline thunk calls
  objtool: Skip magical retpoline .altinstr_replacement
  objtool: Cache instruction relocs
  objtool: Keep track of retpoline call sites
  objtool: Add elf_create_undef_symbol()
  objtool: Extract elf_symbol_add()
  objtool: Extract elf_strtab_concat()
  objtool: Create reloc sections implicitly
  objtool: Add elf_create_reloc() helper
  objtool: Rework the elf_rebuild_reloc_section() logic
  objtool: Fix static_call list generation
  objtool: Handle per arch retpoline naming
  objtool: Correctly handle retpoline thunk calls
  x86/retpoline: Simplify retpolines
  x86/alternatives: Optimize optimize_nops()
  x86: Add insn_decode_kernel()
  x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration
  ...
2021-04-27 17:45:09 -07:00
Linus Torvalds 69f737ed3a A single fix for the x86 VDSO build infrastructure to address a compiler
warning on 32bit hosts due to a fprintf() modifier/argument mismatch.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGrz4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWggD/4q8f3L5UkM5wuyNb9BOoBBZI8tBFsm
 Pil8K3WUmc9VF8XrHMjHrFOjJPFrBQUqW6iE5UL2f2z7jb5L4t0d66KeKjzfmfuk
 N9thWuJKvUR4pOpg4y0lgFuwK/P94bRypIpvxTwtuEnaosy9JhWt+WKuWVRSqRNP
 gFABwIN9Aw904fQjXwPPsZa1/Yt9mtHrt9i4+fPkc4APRBjoANaGhPz8H3HcgOzM
 hJIV/T1hiCEni4kAr9mAOfBCMARo1aApkhWaKtV10vaieXT+db7JNYx6C6DGob/U
 bWJABQoBhX7IY+SvW1SAyoU5Z104X+CmZXG2GIPqISuL+6Fk3fZQ/6EmUBt+efoJ
 lCKv7OsEW27qrN9B5yoAxTnzSPJq5utuEXvcRbkUFMkv+pT8/zucFu1xHcyd2qHG
 fBr/urbrxSCjya4GlIhYIKwYo/LX5c61iZR/Vv/K/swcgV58G8uQAINmcUDTLi57
 eNeUd0sp4SVet6HBTlAvKADCJOOAhmKMNWtuOTepQcXjmK6HXog75DDm82Cxzgdx
 fILvVZ5acw6+rK0OYa9Wgwd2llkZjQ7JiyOZH44UJ1eTai3tF7tCem2l3mIn2otI
 QZtuAbwJ6tXVljU+0LPHefRpsiCf37CGUY+JIBkdp1cA9tYQVratZpSZ1QV1LjP1
 b53RhxXb7PCG2Q==
 =ch7x
 -----END PGP SIGNATURE-----

Merge tag 'x86-vdso-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 vdso update from Thomas Gleixner:
 "A single fix for the x86 VDSO build infrastructure to address a
  compiler warning on 32bit hosts due to a fprintf() modifier/argument
  mismatch."

* tag 'x86-vdso-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Use proper modifier for len's format specifier in extract()
2021-04-26 10:17:34 -07:00
Linus Torvalds eea2647e74 Entry code update:
Provide support for randomized stack offsets per syscall to make
  stack-based attacks harder which rely on the deterministic stack layout.
 
  The feature is based on the original idea of PaX's RANDSTACK feature, but
  uses a significantly different implementation.
 
  The offset does not affect the pt_regs location on the task stack as this
  was agreed on to be of dubious value. The offset is applied before the
  actual syscall is invoked.
 
  The offset is stored per cpu and the randomization happens at the end of
  the syscall which is less predictable than on syscall entry.
 
  The mechanism to apply the offset is via alloca(), i.e. abusing the
  dispised VLAs. This comes with the drawback that stack-clash-protection
  has to be disabled for the affected compilation units and there is also
  a negative interaction with stack-protector.
 
  Those downsides are traded with the advantage that this approach does not
  require any intrusive changes to the low level assembly entry code, does
  not affect the unwinder and the correct stack alignment is handled
  automatically by the compiler.
 
  The feature is guarded with a static branch which avoids the overhead when
  disabled.
 
  Currently this is supported for X86 and ARM64.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGjz8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWsvD/4tGnPAurd6lbzxWzRjW7jOOVyzkODM
 UXtIxxICaj7o6MNcloaGe1QtJ8+QOCw3yPQfLG/SoWHse5+oUKQRL9dmWVeJyRSt
 JZ1pirkKqWrB+OmPbJKUiO3/TsZ2Z/vO41JVgVTL5/HWhOECSDzZsJkuvF/H+qYD
 ReDzd7FUNd76pwVOsXq/cxXclRa81/wMNZRVwmyAwFYE2XoPtQyTERTLrfj6aQKF
 P0txr9fEjYlPPwYOk1kjBAoJfDltNm48BBL7CGZtRlsqpNpdsJ1MkeGffhodb6F0
 pJYQMlQJHXABZb5GF+v93+iASDpRFn0EvPmLkCxQUfZYLOkRsnuEF2S/fsYX/WPo
 uin/wQKwLVdeQq9d9BwlZUKEgsQuV7Q0GVN+JnEQerwD6cWTxv4a1RIUH+K/4Wo5
 nTeJVRKcs6m7UkGQRm8JbqnUP0vCV+PSiWWB8J9CmjYeCPbkGjt6mBIsmPaDZ9VL
 4i+UX5DJayoREF/rspOBcJftUmExize49p9860UI9N6fd7DsDt7Dq9Ai+ADtZa4C
 9BPbF4NWzJq8IWLqBi+PpKBAT3JMX9qQi7s9sbrRxpxtew9Keu5qggKZJYumX71V
 qgUMk+xB86HZOrtF6F3oY0zxYv3haPvDydsDgqojtqNGk4PdAdgDYJQwMlb8QSly
 SwIWPHIfvP4R9w==
 =GMlJ
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull entry code update from Thomas Gleixner:
 "Provide support for randomized stack offsets per syscall to make
  stack-based attacks harder which rely on the deterministic stack
  layout.

  The feature is based on the original idea of PaX's RANDSTACK feature,
  but uses a significantly different implementation.

  The offset does not affect the pt_regs location on the task stack as
  this was agreed on to be of dubious value. The offset is applied
  before the actual syscall is invoked.

  The offset is stored per cpu and the randomization happens at the end
  of the syscall which is less predictable than on syscall entry.

  The mechanism to apply the offset is via alloca(), i.e. abusing the
  dispised VLAs. This comes with the drawback that
  stack-clash-protection has to be disabled for the affected compilation
  units and there is also a negative interaction with stack-protector.

  Those downsides are traded with the advantage that this approach does
  not require any intrusive changes to the low level assembly entry
  code, does not affect the unwinder and the correct stack alignment is
  handled automatically by the compiler.

  The feature is guarded with a static branch which avoids the overhead
  when disabled.

  Currently this is supported for X86 and ARM64"

* tag 'x86-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64: entry: Enable random_kstack_offset support
  lkdtm: Add REPORT_STACK for checking stack offsets
  x86/entry: Enable random_kstack_offset support
  stack: Optionally randomize kernel stack offset each syscall
  init_on_alloc: Optimize static branches
  jump_label: Provide CONFIG-driven build state defaults
2021-04-26 10:02:09 -07:00
Linus Torvalds ea5bc7b977 Trivial cleanups and fixes all over the place.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe
 VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo
 +HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn
 /uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon
 4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+
 lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX
 LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1
 sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C
 kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8
 dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD
 wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm
 Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0=
 =bO1i
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 cleanups from Borislav Petkov:
 "Trivial cleanups and fixes all over the place"

* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Remove me from IDE/ATAPI section
  x86/pat: Do not compile stubbed functions when X86_PAT is off
  x86/asm: Ensure asm/proto.h can be included stand-alone
  x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
  x86/msr: Make locally used functions static
  x86/cacheinfo: Remove unneeded dead-store initialization
  x86/process/64: Move cpu_current_top_of_stack out of TSS
  tools/turbostat: Unmark non-kernel-doc comment
  x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
  x86/fpu/math-emu: Fix function cast warning
  x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
  x86: Fix various typos in comments, take #2
  x86: Remove unusual Unicode characters from comments
  x86/kaslr: Return boolean values from a function returning bool
  x86: Fix various typos in comments
  x86/setup: Remove unused RESERVE_BRK_ARRAY()
  stacktrace: Move documentation for arch_stack_walk_reliable() to header
  x86: Remove duplicate TSC DEADLINE MSR definitions
2021-04-26 09:25:47 -07:00
Linus Torvalds 2c5ce2dba2 First big cleanup to the paravirt infra to use alternatives and thus
eliminate custom code patching. For that, the alternatives infra is
 extended to accomodate paravirt's needs and, as a result, a lot of
 paravirt patching code goes away, leading to a sizeable cleanup and
 simplification. Work by Juergen Gross.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGiXQACgkQEsHwGGHe
 VUocbw/+OkFzphK6zlNA8O3RJ24u2csXUWWUtpGlZ2220Nn/Bgyso2+fyg/NEeQg
 EmEttaY3JG/riCDfHk5Xm2saeVtsbPXN4f0sJm/Io/djF7Cm03WS0eS0aA2Rnuca
 MhmvvkrzYqZXAYVaxKkIH6sNlPgyXX7vDNPbTd/0ZCOb3ZKIyXwL+SaLatMCtE5o
 ou7e8Bj8xPSwcaCyK6sqjrT6jdpPjoTrxxrwENW8AlRu5lCU1pIY03GGhARPVoEm
 fWkZsIPn7DxhpyIqzJtEMX8EK1xN96E+NGkNuSAtJGP9HRb+3j5f4s3IUAfXiLXq
 r7NecFw8zHhPKl9J0pPCiW7JvMrCMU5xGwyeUmmhKyK2BxwvvAC173ohgMlCfB2Q
 FPIsQWemat17tSue8LIA8SmlSDQz6R+tTdUFT+vqmNV34PxOIEeSdV7HG8rs87Ec
 dYB9ENUgXqI+h2t7atE68CpTLpWXzNDcq2olEsaEUXenky2hvsi+VxNkWpmlKQ3I
 NOMU/AyH8oUzn5O0o3oxdPhDLmK5ItEFxjYjwrgLfKFQ+Y8vIMMq3LrKQGwOj+ZU
 n9qC7JjOwDKZGjd3YqNNRhnXp+w0IJvUHbyr3vIAcp8ohQwEKgpUvpZzf/BKUvHh
 nJgJSJ53GFJBbVOJMfgVq+JcFr+WO8MDKHaw6zWeCkivFZdSs4g=
 =h+km
 -----END PGP SIGNATURE-----

Merge tag 'x86_alternatives_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 alternatives/paravirt updates from Borislav Petkov:
 "First big cleanup to the paravirt infra to use alternatives and thus
  eliminate custom code patching.

  For that, the alternatives infrastructure is extended to accomodate
  paravirt's needs and, as a result, a lot of paravirt patching code
  goes away, leading to a sizeable cleanup and simplification.

  Work by Juergen Gross"

* tag 'x86_alternatives_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/paravirt: Have only one paravirt patch function
  x86/paravirt: Switch functions with custom code to ALTERNATIVE
  x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs
  x86/paravirt: Switch iret pvops to ALTERNATIVE
  x86/paravirt: Simplify paravirt macros
  x86/paravirt: Remove no longer needed 32-bit pvops cruft
  x86/paravirt: Add new features for paravirt patching
  x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has()
  x86/alternative: Support ALTERNATIVE_TERNARY
  x86/alternative: Support not-feature
  x86/paravirt: Switch time pvops functions to use static_call()
  static_call: Add function to query current function
  static_call: Move struct static_call_key definition to static_call_types.h
  x86/alternative: Merge include files
  x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT()
2021-04-26 09:01:29 -07:00
Mickaël Salaün a49f4f81cb arch: Wire up Landlock syscalls
Wire up the following system calls for all architectures:
* landlock_create_ruleset(2)
* landlock_add_rule(2)
* landlock_restrict_self(2)

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-10-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2021-04-22 12:22:11 -07:00
Kees Cook fe950f6020 x86/entry: Enable random_kstack_offset support
Allow for a randomized stack offset on a per-syscall basis, with roughly
5-6 bits of entropy, depending on compiler and word size. Since the
method of offsetting uses macros, this cannot live in the common entry
code (the stack offset needs to be retained for the life of the syscall,
which means it needs to happen at the actual entry point).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210401232347.2791257-5-keescook@chromium.org
2021-04-08 14:05:20 +02:00
Borislav Petkov f2ac256b9a Merge 'x86/alternatives'
Pick up dependent changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-31 18:04:19 +02:00
Ingo Molnar 163b099146 x86: Fix various typos in comments, take #2
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-21 23:50:28 +01:00
Sascha Hauer fa8b90070a quota: wire up quotactl_path
Wire up the quotactl_path syscall added in the previous patch.

Link: https://lore.kernel.org/r/20210304123541.30749-3-s.hauer@pengutronix.de
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-17 15:51:17 +01:00
Borislav Petkov aa7680f6fe Linux 5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBOgu4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGUd0H/3Ey8aWjVAig9Pe+
 VQVZKwG+LXWH6UmUx5qyaTxophhmGnWLvkigJMn63qIg4eQtfp2gNFHK+T4OJNIP
 ybnkjFZ337x4J9zD6m8mt4Wmelq9iW2wNOS+3YZAyYiGlXfMGM7SlYRCQRQznTED
 2O/JCMsOoP+Z8tr5ah/bzs0dANsXmTZ3QqRP2uzb6irKTgFR3/weOhj+Ht1oJ4Aq
 V+bgdcwhtk20hJhlvVeqws+o74LR789tTDCknlz/YNMv9e6VPfyIQ5vJAcFmZATE
 Ezj9yzkZ4IU+Ux6ikAyaFyBU8d1a4Wqye3eHCZBsEo6tcSAhbTZ90eoU86vh6ajS
 LZjwkNw=
 =6y1u
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc3' into x86/core

Pick up dependent SEV-ES urgent changes to base new work ontop.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-15 10:49:00 +01:00
Juergen Gross fafe5e7422 x86/paravirt: Switch functions with custom code to ALTERNATIVE
Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Simplify the register defines for assembler paravirt calling, as there
isn't much usage left.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-14-jgross@suse.com
2021-03-11 20:07:01 +01:00
Juergen Gross 33634e42e3 x86/paravirt: Remove no longer needed 32-bit pvops cruft
PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

ENABLE_INTERRUPTS is used nowhere, so it can be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-10-jgross@suse.com
2021-03-11 19:51:55 +01:00
Juergen Gross 5e21a3ecad x86/alternative: Merge include files
Merge arch/x86/include/asm/alternative-asm.h into
arch/x86/include/asm/alternative.h in order to make it easier to use
common definitions later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgross@suse.com
2021-03-11 15:58:02 +01:00
Joerg Roedel 78a81d88f6 x86/sev-es: Introduce ip_within_syscall_gap() helper
Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.

Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-joro@8bytes.org
2021-03-08 14:22:17 +01:00
Andy Lutomirski d0962f2b24 x86/entry/32: Remove leftover macros after stackprotector cleanups
Now that nonlazy-GS mode is gone, remove the macros from entry_32.S
that obfuscated^Wabstracted GS handling.  The assembled output is
identical before and after this patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/b1543116f0f0e68f1763d90d5f7fcec27885dff5.1613243844.git.luto@kernel.org
2021-03-08 13:27:31 +01:00
Andy Lutomirski 3fb0fdb3bb x86/stackprotector/32: Make the canary into a regular percpu variable
On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage.  It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work.  Supporting both variants is a
maintenance and testing mess.

Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.

Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable.  This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.

(That name is special.  We could use any symbol we want for the
 %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
 name other than __stack_chk_guard.)

Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.

Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
2021-03-08 13:19:05 +01:00
Andy Lutomirski 5d5675df79 x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall entry work.  This confuses audit and ptrace.  For example:

    $ ./tools/testing/selftests/x86/syscall_arg_fault_32
    ...
    strace: pid 264258: entering, ptrace_syscall_info.op == 2
    ...

This is a minimal fix intended for ease of backporting.  A more
complete cleanup is coming.

Fixes: 0b085e68f4 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
2021-03-06 13:10:06 +01:00
Jiri Slaby 70c9d95922 x86/vdso: Use proper modifier for len's format specifier in extract()
Commit

  8382c668ce ("x86/vdso: Add support for exception fixup in vDSO functions")

prints length "len" which is size_t.

Compilers now complain when building on a 32-bit host:

  HOSTCC  arch/x86/entry/vdso/vdso2c
  ...
  In file included from arch/x86/entry/vdso/vdso2c.c:162:
  arch/x86/entry/vdso/vdso2c.h: In function 'extract64':
  arch/x86/entry/vdso/vdso2c.h:38:52: warning: format '%lu' expects argument of \
	type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'}

So use proper modifier (%zu) for size_t.

 [ bp: Massage commit message. ]

Fixes: 8382c668ce ("x86/vdso: Add support for exception fixup in vDSO functions")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210303064357.17056-1-jslaby@suse.cz
2021-03-06 11:34:07 +01:00
Linus Torvalds 6fbd6cf85a Kbuild updates for v5.12
- Fix false-positive build warnings for ARCH=ia64 builds
 
  - Optimize dictionary size for module compression with xz
 
  - Check the compiler and linker versions in Kconfig
 
  - Fix misuse of extra-y
 
  - Support DWARF v5 debug info
 
  - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
    exceeded the limit
 
  - Add generic syscall{tbl,hdr}.sh for cleanups across arches
 
  - Minor cleanups of genksyms
 
  - Minor cleanups of Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmA3zhgVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG0C4P/A5hUNFdkYI+EffAWZiHn69t0S8j
 M1GQkZildKu/yOfm6hp3mNwgHmYgw0aAuch1htkJuv+5rXRtoK77yw0xKbUqNHyO
 VqkJWQPVUXJbWIDiu332NaETHbFTWCnPZKGmzcbVOBHbYsXUJPp17gROQ9ke0fQN
 Ae6OV5WINhoS8UnjESWb3qOO87MdQTZ+9mP+NMnVh4kV1SUeMAXLFwFll66KZTkj
 GXB330N3p9L0wQVljhXpQ/YPOd76wJNPhJWJ9+hKLFbWsedovzlHb+duprh1z1xe
 7LLaq9dEbXxe1Uz0qmK76lupXxilYMyUupTW9HIYtIsY8br8DIoBOG0bn46LVnuL
 /m+UQNfUFCYYePT7iZQNNc1DISQJrxme3bjq0PJzZTDukNnHJVahnj9x4RoNaF8j
 Dc+JME0r2i8Ccp28vgmaRgzvSsb8Xtw5icwRdwzIpyt1ubs/+tkd/GSaGzQo30Q8
 m8y1WOjovHNX7OGnOaOWBGoQAX/2k/VHeAediMsPqWUoOxwsLHYxG/4KtgwbJ5vc
 gu/Fyk1GRDklZPpLdYFVvz8TGnqSDogJgF+7WolJ6YvPGAUIDAfd5Ky2sWayddlm
 wchc3sKDVyh3lov23h0WQVTvLO9xl+NZ6THxoAGdYeQ0DUu5OxwH8qje/UpWuo1a
 DchhNN+g5pa6n56Z
 =sLxb
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Fix false-positive build warnings for ARCH=ia64 builds

 - Optimize dictionary size for module compression with xz

 - Check the compiler and linker versions in Kconfig

 - Fix misuse of extra-y

 - Support DWARF v5 debug info

 - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
   exceeded the limit

 - Add generic syscall{tbl,hdr}.sh for cleanups across arches

 - Minor cleanups of genksyms

 - Minor cleanups of Kconfig

* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
  initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
  kbuild: remove deprecated 'always' and 'hostprogs-y/m'
  kbuild: parse C= and M= before changing the working directory
  kbuild: reuse this-makefile to define abs_srctree
  kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
  kconfig: omit --oldaskconfig option for 'make config'
  kconfig: fix 'invalid option' for help option
  kconfig: remove dead code in conf_askvalue()
  kconfig: clean up nested if-conditionals in check_conf()
  kconfig: Remove duplicate call to sym_get_string_value()
  Makefile: Remove # characters from compiler string
  Makefile: reuse CC_VERSION_TEXT
  kbuild: check the minimum linker version in Kconfig
  kbuild: remove ld-version macro
  scripts: add generic syscallhdr.sh
  scripts: add generic syscalltbl.sh
  arch: syscalls: remove $(srctree)/ prefix from syscall tables
  arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
  gen_compile_commands: prune some directories
  kbuild: simplify access to the kernel's version
  ...
2021-02-25 10:17:31 -08:00
Linus Torvalds 29c395c77a Rework of the X86 irq stack handling:
The irq stack switching was moved out of the ASM entry code in course of
   the entry code consolidation. It ended up being suboptimal in various
   ways.
 
   - Make the stack switching inline so the stackpointer manipulation is not
     longer at an easy to find place.
 
   - Get rid of the unnecessary indirect call.
 
   - Avoid the double stack switching in interrupt return and reuse the
     interrupt stack for softirq handling.
 
   - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused
     about the stack pointer manipulation.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmA21OcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaX0D/9S0ud6oqbsIvI8LwhvYub63a2cjKP9
 liHAJ7xwMYYVwzf0skwsPb/QE6+onCzdq0upJkgG/gEYm2KbiaMWZ4GgHdj0O7ER
 qXKJONDd36AGxSEdaVzLY5kPuD/mkomGk5QdaZaTmjruthkNzg4y/N2wXUBIMZR0
 FdpSpp5fGspSZCn/DXDx6FjClwpLI53VclvDs6DcZ2DIBA0K+F/cSLb1UQoDLE1U
 hxGeuNa+GhKeeZ5C+q5giho1+ukbwtjMW9WnKHAVNiStjm0uzdqq7ERGi/REvkcB
 LY62u5uOSW1zIBMmzUjDDQEqvypB0iFxFCpN8g9sieZjA0zkaUioRTQyR+YIQ8Cp
 l8LLir0dVQivR1bHghHDKQJUpdw/4zvDj4mMH10XHqbcOtIxJDOJHC5D00ridsAz
 OK0RlbAJBl9FTdLNfdVReBCoehYAO8oefeyMAG12nZeSh5XVUWl238rvzmzIYNhG
 cEtkSx2wIUNEA+uSuI+xvfmwpxL7voTGvqmiRDCAFxyO7Bl/GBu9OEBFA1eOvHB+
 +wTmPDMswRetQNh4QCRXzk1JzP1Wk5CobUL9iinCWFoTJmnsPPSOWlosN6ewaNXt
 kYFpRLy5xt9EP7dlfgBSjiRlthDhTdMrFjD5bsy1vdm1w7HKUo82lHa4O8Hq3PHS
 tinKICUqRsbjig==
 =Sqr1
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq entry updates from Thomas Gleixner:
 "The irq stack switching was moved out of the ASM entry code in course
  of the entry code consolidation. It ended up being suboptimal in
  various ways.

  This reworks the X86 irq stack handling:

   - Make the stack switching inline so the stackpointer manipulation is
     not longer at an easy to find place.

   - Get rid of the unnecessary indirect call.

   - Avoid the double stack switching in interrupt return and reuse the
     interrupt stack for softirq handling.

   - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
     confused about the stack pointer manipulation"

* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix stack-swizzle for FRAME_POINTER=y
  um: Enforce the usage of asm-generic/softirq_stack.h
  x86/softirq/64: Inline do_softirq_own_stack()
  softirq: Move do_softirq_own_stack() to generic asm header
  softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
  x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
  x86/softirq: Remove indirection in do_softirq_own_stack()
  x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
  x86/entry: Convert device interrupts to inline stack switching
  x86/entry: Convert system vectors to irq stack macro
  x86/irq: Provide macro for inlining irq stack switching
  x86/apic: Split out spurious handling code
  x86/irq/64: Adjust the per CPU irq stack pointer by 8
  x86/irq: Sanitize irq stack tracking
  x86/entry: Fix instrumentation annotation
2021-02-24 16:32:23 -08:00
Linus Torvalds 414eece95b clang-lto for v5.12-rc1 (part2)
- Generate __mcount_loc in objtool (Peter Zijlstra)
 - Support running objtool against vmlinux.o (Sami Tolvanen)
 - Clang LTO enablement for x86 (Sami Tolvanen)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmA1fn8ACgkQiXL039xt
 wCbswQ//Zmnq912Ubyn5uPe9SOS/kumGDoqtxGzlZwo/pSB3qFArhD6G07sJ49XD
 nu/05ZcOda760wubnhcuK91n2fY5i/eGLXMSjfgtdVcco4Q67nPQydc+LGdhuDco
 FlhL8TAIwqYN1f2nJK1IggZpZFxz5r/r1Pq8q1S0oQRqDenxDBQwNtBba4B1OIxw
 /FE/1Hp3xwRnuJEP2jREBeY1yQ+Y1n859pZcDgSOWlTArcp8EVUi5hIWJ9DwIe73
 mqnx6PcFWEYB0zLNZmZz2gpEac+ncGyme6ChayeuQfInbL5dhx97jFGt3S6/+NSY
 mF2zyaR/+JsGGuM8dVqH3izKCJXCEAGirrdMO1ndb9HdwS3KnYEiag2ciNWL0wm3
 UEM4r0i2B14sU3pkyotKgsJdOSgorMKkQUPb2wW+OUfnkZNEWKLqylMgNXBD80l4
 WG5vYQRwwFN9jRBik6Z5YFGnwGsNIoGg1F1GRNMjh6h51adYQeBN/1QJE1FJ5L4D
 iKzmZYqimKUINXWfI6TNyqiv9TctOt65pxnRyq+MHxfTDzHGyc3MUeCeCiR1a1yI
 S5QhcgfSnC/NjDA0+oYC6yRlcBtfhjtUqFTGoZ4q4q/LF1BVU1bPyIXZrROLc05s
 LNMMBcWbJetJxFtm/gYfiVFuNitYtxbBV1krVtsWznCA2nKGJ9w=
 =htKJ
 -----END PGP SIGNATURE-----

Merge tag 'clang-lto-v5.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull more clang LTO updates from Kees Cook:
 "Clang LTO x86 enablement.

  Full disclosure: while this has _not_ been in linux-next (since it
  initially looked like the objtool dependencies weren't going to make
  v5.12), it has been under daily build and runtime testing by Sami for
  quite some time. These x86 portions have been discussed on lkml, with
  Peter, Josh, and others helping nail things down.

  The bulk of the changes are to get objtool working happily. The rest
  of the x86 enablement is very small.

  Summary:

   - Generate __mcount_loc in objtool (Peter Zijlstra)

   - Support running objtool against vmlinux.o (Sami Tolvanen)

   - Clang LTO enablement for x86 (Sami Tolvanen)"

Link: https://lore.kernel.org/lkml/20201013003203.4168817-26-samitolvanen@google.com/
Link: https://lore.kernel.org/lkml/cover.1611263461.git.jpoimboe@redhat.com/

* tag 'clang-lto-v5.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kbuild: lto: force rebuilds when switching CONFIG_LTO
  x86, build: allow LTO to be selected
  x86, cpu: disable LTO for cpu.c
  x86, vdso: disable LTO only for vDSO
  kbuild: lto: postpone objtool
  objtool: Split noinstr validation from --vmlinux
  x86, build: use objtool mcount
  tracing: add support for objtool mcount
  objtool: Don't autodetect vmlinux.o
  objtool: Fix __mcount_loc generation with Clang's assembler
  objtool: Add a pass for generating __mcount_loc
2021-02-23 15:13:45 -08:00
Linus Torvalds 7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Sami Tolvanen e242db40be x86, vdso: disable LTO only for vDSO
Disable LTO for the vDSO. Note that while we could use Clang's LTO
for the 64-bit vDSO, it won't add noticeable benefit for the small
amount of C code.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
2021-02-23 12:46:58 -08:00