Pull EFI updates from Ingo Molnar:
"The main changes are:
- Add support for enlisting the help of the EFI firmware to create
memory reservations that persist across kexec.
- Add page fault handling to the runtime services support code on x86
so we can more gracefully recover from buggy EFI firmware.
- Fix command line handling on x86 for the boot path that omits the
stub's PE/COFF entry point.
- Other assorted fixes and updates"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: boot: Fix EFI stub alignment
efi/x86: Call efi_parse_options() from efi_main()
efi/x86: earlyprintk - Add 64bit efi fb address support
efi/x86: drop task_lock() from efi_switch_mm()
efi/x86: Handle page faults occurring while running EFI runtime services
efi: Make efi_rts_work accessible to efi page fault handler
efi/efi_test: add exporting ResetSystem runtime service
efi/libstub: arm: support building with clang
efi: add API to reserve memory persistently across kexec reboot
efi/arm: libstub: add a root memreserve config table
efi: honour memory reservations passed via a linux specific config table
When compiling the kernel with Clang, this warning appears even though
it is disabled for the whole kernel because this folder has its own set
of KBUILD_CFLAGS. It was disabled before the beginning of git history.
In file included from arch/x86/boot/compressed/kaslr.c:29:
In file included from arch/x86/boot/compressed/misc.h:21:
In file included from ./include/linux/elf.h:5:
In file included from ./arch/x86/include/asm/elf.h:77:
In file included from ./arch/x86/include/asm/vdso.h:11:
In file included from ./include/linux/mm_types.h:9:
In file included from ./include/linux/spinlock.h:88:
In file included from ./arch/x86/include/asm/spinlock.h:43:
In file included from ./arch/x86/include/asm/qrwlock.h:6:
./include/asm-generic/qrwlock.h:101:53: warning: passing 'u32 *' (aka
'unsigned int *') to parameter of type 'int *' converts between pointers
to integer types with different sign [-Wpointer-sign]
if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
^~~~~
./include/linux/compiler.h:76:40: note: expanded from macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
./include/asm-generic/atomic-instrumented.h:69:66: note: passing
argument to parameter 'old' here
static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
^
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20181013010713.6999-1-natechancellor@gmail.com
Xen PVH guests receive the address of the RSDP table from Xen. In order
to support booting a Xen PVH guest via Grub2 using the standard x86
boot entry we need a way for Grub2 to pass the RSDP address to the
kernel.
For this purpose expand the struct setup_header to hold the physical
address of the RSDP address. Being zero means it isn't specified and
has to be located the legacy way (searching through low memory or
EBDA).
While documenting the new setup_header layout and protocol version
2.14 add the missing documentation of protocol version 2.13.
There are Grub2 versions in several distros with a downstream patch
violating the boot protocol by writing past the end of setup_header.
This requires another update of the boot protocol to enable the kernel
to distinguish between a specified RSDP address and one filled with
garbage by such a broken Grub2.
From protocol 2.14 on Grub2 will write the version it is supporting
(but never a higher value than found to be supported by the kernel)
ored with 0x8000 to the version field of setup_header. This enables
the kernel to know up to which field Grub2 has written information
to. All fields after that are supposed to be clobbered.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bp@alien8.de
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit
1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
can occasionally cause system resets when kexec-ing a second kernel even
if SEV is not active.
That's because get_sev_encryption_bit() uses 32-bit rIP-relative
addressing to read the value of enc_bit - a variable which caches a
previously detected encryption bit position - but kexec may allocate
the early boot code to a higher location, beyond the 32-bit addressing
limit.
In this case, garbage will be read and get_sev_encryption_bit() will
return the wrong value, leading to accessing memory with the wrong
encryption setting.
Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
rIP-relative addressing in the first place.
[ bp: massage commit message heavily. ]
Fixes: 1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: brijesh.singh@amd.com
Cc: kexec@lists.infradead.org
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: ghook@redhat.com
Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
We currently align the end of the compressed image to a multiple of
16. However, the PE-COFF header included in the EFI stub says that
the file alignment is 32 bytes, and when adding an EFI signature to
the file it must first be padded to this alignment.
sbsigntool commands warn about this:
warning: file-aligned section .text extends beyond end of file
warning: checksum areas are greater than image size. Invalid section table?
Worse, pesign -at least when creating a detached signature- uses the
hash of the unpadded file, resulting in an invalid signature if
padding is required.
Avoid both these problems by increasing alignment to 32 bytes when
CONFIG_EFI_STUB is enabled.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Before this commit we were only calling efi_parse_options() from
make_boot_params(), but make_boot_params() only gets called if the
kernel gets booted directly as an EFI executable. So when booted through
e.g. grub we ended up not parsing the commandline in the boot code.
This makes the drivers/firmware/efi/libstub code ignore the "quiet"
commandline argument resulting in the following message being printed:
"EFI stub: UEFI Secure Boot is enabled."
Despite the quiet request. This commits adds an extra call to
efi_parse_options() to efi_main() to make sure that the options are
always processed. This fixes quiet not working.
This also fixes the libstub code ignoring nokaslr and efi=nochunk.
Reported-by: Peter Robinson <pbrobinson@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
It's not used by its sole user, so remove this unused functionality.
Also remove a stray unused variable that GCC didn't warn about for some reason.
Suggested-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kirill.shutemov@linux.intel.com
Link: http://lkml.kernel.org/r/20180807015705.21697-1-fanc.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
- fix and improve recursive dependency detection in Kconfig
- fix parallel building of menuconfig/nconfig
- fix syntax error in clang-version.sh
- suppress distracting log from syncconfig
- remove obsolete "rpm" target
- remove VMLINUX_SYMBOL(_STR) macro entirely
- fix microblaze build with CONFIG_DYNAMIC_FTRACE
- move compiler test for dead code/data elimination to Kconfig
- rename well-known LDFLAGS variable to KBUILD_LDFLAGS
- misc fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbgYhCAAoJED2LAQed4NsGErAP/jt7gt76+N0PZmADBZqyVR/H
4k286g3OiT7DIcdvwqE5BRvu+zNOamDujnnXw63/jwu2RjrkLX/JnhzTbC0IZleZ
KeO4bU4ZH0WFa0Ny9pp0LAnzbXGMnQjDXygcUd5BFoEd5JSLKW2PISEEjRh6b5B7
swJRdgySFaMrUBRNf13FwH5EvX/D0xZQe/wFhFCOv6L4gJZFMmpGUIepgTjTUmxZ
wcNN6xxXg+ulLHVcPdPQ9EYssNHN5xNys02+IdIrhhXuNHji/TFm4dGYuU+dDGeE
Eu4O6Qs7pg0PFGrZ5gLxXDJEp75W+uaTNOqV+jcjq8MRxJuWxyy2biUeelKRT/KH
0iv4ZQJVOMOhl8fZgLtQaXHyQ++5uwd6kvPPf+XFdkogGAIXK0wKWLoALFEOXwb6
z1BBnFx09LrKPGt0ZlKX624OEczedv/UAFiSh3Ic2S3PFEpq4oHrEGhTnyKRobPv
OEcF3RqKjmAdK7PLy4kVpTLhkutkWWhw6Giy9qXUkXYJWonJR7NTQ1mIan2LoGZC
sGi+qKae/8xgO2Nerx59tZpkiHYTMfYeAo8frzWurOxm3YzEfaxNNGPl+IMW7VKz
cNPzQZ5tMUy4i4PAhk/gIWibnUTPfjDbWsZSMtIbO0GFcao56EvllwD8/awuy7lO
QkaAeZHFcF+qgU3muaYK
=Vsb2
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
- fix and improve recursive dependency detection in Kconfig
- fix parallel building of menuconfig/nconfig
- fix syntax error in clang-version.sh
- suppress distracting log from syncconfig
- remove obsolete "rpm" target
- remove VMLINUX_SYMBOL(_STR) macro entirely
- fix microblaze build with CONFIG_DYNAMIC_FTRACE
- move compiler test for dead code/data elimination to Kconfig
- rename well-known LDFLAGS variable to KBUILD_LDFLAGS
- misc fixes and cleanups
* tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: rename LDFLAGS to KBUILD_LDFLAGS
kbuild: pass LDFLAGS to recordmcount.pl
kbuild: test dead code/data elimination support in Kconfig
initramfs: move gen_initramfs_list.sh from scripts/ to usr/
vmlinux.lds.h: remove stale <linux/export.h> include
export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR()
Coccinelle: remove pci_alloc_consistent semantic to detect in zalloc-simple.cocci
kbuild: make sorting initramfs contents independent of locale
kbuild: remove "rpm" target, which is alias of "rpm-pkg"
kbuild: Fix LOADLIBES rename in Documentation/kbuild/makefiles.txt
kconfig: suppress "configuration written to .config" for syncconfig
kconfig: fix "Can't open ..." in parallel build
kbuild: Add a space after `!` to prevent parsing as file pattern
scripts: modpost: check memory allocation results
kconfig: improve the recursive dependency report
kconfig: report recursive dependency involving 'imply'
kconfig: error out when seeing recursive dependency
kconfig: add build-only configurator targets
scripts/dtc: consolidate include path options in Makefile
Commit a0f97e06a4 ("kbuild: enable 'make CFLAGS=...' to add
additional options to CC") renamed CFLAGS to KBUILD_CFLAGS.
Commit 222d394d30 ("kbuild: enable 'make AFLAGS=...' to add
additional options to AS") renamed AFLAGS to KBUILD_AFLAGS.
Commit 06c5040cdb ("kbuild: enable 'make CPPFLAGS=...' to add
additional options to CPP") renamed CPPFLAGS to KBUILD_CPPFLAGS.
For some reason, LDFLAGS was not renamed.
Using a well-known variable like LDFLAGS may result in accidental
override of the variable.
Kbuild generally uses KBUILD_ prefixed variables for the internally
appended options, so here is one more conversion to sanitize the
naming convention.
I did not touch Makefiles under tools/ since the tools build system
is a different world.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
To allow existing C code to be incorporated into the decompressor or the
UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
declarations into nops, and #define it in places where such exports are
undesirable. Note that this gets rid of a rather dodgy redefine of
linux/export.h's header guard.
Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 boot updates from Thomas Gleixner:
"Boot code updates for x86:
- Allow to skip a given amount of huge pages for address layout
randomization on the kernel command line to prevent regressions in
the huge page allocation with small memory sizes
- Various cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Use CC_SET()/CC_OUT() instead of open coding it
x86/boot/KASLR: Make local variable mem_limit static
x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR)
x86/boot/KASLR: Add two new functions for 1GB huge pages handling
Pull EFI updates from Thomas Gleixner:
"The EFI pile:
- Make mixed mode UEFI runtime service invocations mutually
exclusive, as mandated by the UEFI spec
- Perform UEFI runtime services calls from a work queue so the calls
into the firmware occur from a kernel thread
- Honor the UEFI memory map attributes for live memory regions
configured by UEFI as a framebuffer. This works around a coherency
problem with KVM guests running on ARM.
- Cleanups, improvements and fixes all over the place"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Call guid_parse() against guid_t type of variable
efi/cper: Use consistent types for UUIDs
efi/x86: Replace references to efi_early->is64 with efi_is_64bit()
efi: Deduplicate efi_open_volume()
efi/x86: Add missing NULL initialization in UGA draw protocol discovery
efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines
efi/x86: Align efi_uga_draw_protocol typedef names to convention
efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines
efi/x86: Prevent reentrant firmware calls in mixed mode
efi/esrt: Only call efi_mem_reserve() for boot services memory
fbdev/efifb: Honour UEFI memory map attributes when mapping the FB
efi: Drop type and attribute checks in efi_mem_desc_lookup()
efi/libstub/arm: Add opt-in Kconfig option for the DTB loader
efi: Remove the declaration of efi_late_init() as the function is unused
efi/cper: Avoid using get_seconds()
efi: Use a work queue to invoke EFI Runtime Services
efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()
efi/x86: Clean up the eboot code
There were two report of boot failure cased by trampoline placed into
a reserved memory region. It can happen on machines that don't report
EBDA correctly.
Fix the problem by re-validating the found address against the E820 table.
If the address is in a reserved area, find the next usable region below the
initial address.
Fixes: 3548e131ec ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: Dmitry Malkin <d.malkin@real-time-systems.com>
Reported-by: youling 257 <youling257@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180801133225.38121-1-kirill.shutemov@linux.intel.com
Fix the following sparse warning:
arch/x86/boot/compressed/kaslr.c:102:20: warning: symbol 'mem_limit' was not declared. Should it be static?
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/1532958273-47725-1-git-send-email-zhongjiang@huawei.com
Dirk Gouders reported that two consecutive "make" invocations on an
already compiled tree will show alternating behaviors:
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
DATAREL arch/x86/boot/compressed/vmlinux
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
LD arch/x86/boot/compressed/vmlinux
ZOFFSET arch/x86/boot/zoffset.h
AS arch/x86/boot/header.o
LD arch/x86/boot/setup.elf
OBJCOPY arch/x86/boot/setup.bin
OBJCOPY arch/x86/boot/vmlinux.bin
BUILD arch/x86/boot/bzImage
Setup is 15644 bytes (padded to 15872 bytes).
System is 6663 kB
CRC 3eb90f40
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
He bisected it back to:
commit 98f7852537 ("x86/boot: Refuse to build with data relocations")
The root cause was the use of the "if_changed" kbuild function multiple
times for the same target. It was designed to only be used once per
target, otherwise it will effectively always trigger, flipping back and
forth between the two commands getting recorded by "if_changed". Instead,
this patch merges the two commands into a single function to get stable
build artifacts (i.e. .vmlinux.cmd), and a single build behavior.
Bisected-and-Reported-by: Dirk Gouders <dirk@gouders.net>
Fix-Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are a couple of places in the x86 EFI stub code where we select
between 32-bit and 64-bit versions of the support routines based on
the value of efi_early->is64. Referencing that field directly is a
bad idea, since it prevents the compiler from inferring that this
field can never be true on a 32-bit build, and can only become false
on a 64-bit build if support for mixed mode is compiled in. This
results in dead code to be retained in the uncompressed part of the
kernel image, which is wasteful.
So switch to the efi_is_64bit() helper, which will resolve to a
constant boolean unless building for 64-bit with mixed mode support.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-8-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's one ARM, one x86_32 and one x86_64 version of efi_open_volume()
which can be folded into a single shared version by masking their
differences with the efi_call_proto() macro introduced by commit:
3552fdf29f ("efi: Allow bitness-agnostic protocol calls").
To be able to dereference the device_handle attribute from the
efi_loaded_image_t table in an arch- and bitness-agnostic manner,
introduce the efi_table_attr() macro (which already exists for x86)
to arm and arm64.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-7-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The UGA draw protocol discovery routine looks for a EFI handle that has
both the UGA draw protocol and the PCI I/O protocol installed. It checks
for the latter by calling handle_protocol() and pass it a PCI I/O
protocol pointer variable by reference, but fails to initialize it to
NULL, which means the non-NULL check later on in the code could produce
false positives, given that the return code of the handle_protocol() call
is ignored entirely. So add the missing initialization.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The two versions of setup_uga##() are mostly identical, with the
exception of the size of EFI_HANDLE. So let's merge the two, and
pull the implementation into the calling function setup_uga().
Note that the 32-bit version was only mixed-mode safe by accident:
it only calls the get_mode() method of the UGA draw protocol, which
happens to be the first member, and so truncating the 64-bit void* at
offset 0 to 32 bits happens to produce the correct value. But let's
not rely on that, and use the proper API instead.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The linux-efi subsystem uses typedefs with the _t suffix to declare
data structures that originate in the UEFI spec. Our type mangling
for mixed mode depends on this convention, so rename the UGA drawing
protocols to allow efi_call_proto() to be used with them in a
subsequent patch.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-4-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After merging the 32-bit and 64-bit versions of the code that invokes
the PCI I/O protocol methods to preserve PCI ROM images in commit:
2c3625cb9f ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() ...")
there are still separate code paths for 32-bit and 64-bit, where the only
difference is the size of a EFI_HANDLE. So let's parameterize a single
implementation for that difference only, and get rid of the two copies of
the code.
While at it, rename __setup_efi_pci() to preserve_pci_rom_image() to
better reflect its purpose.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Various small cleanups:
- Standardize printk messages:
'alloc' => 'allocate'
'mem' => 'memory'
also put variable names in printk messages between quotes.
- Align mass-assignments vertically for better readability
- Break multi-line function prototypes at the name where possible,
not in the middle of the parameter list
- Use a newline before return statements consistently.
- Use curly braces in a balanced fashion.
- Remove stray newlines.
No change in functionality.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180711094040.12506-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hans de Goede reported that his mixed EFI mode Bay Trail tablet
would not boot at all any more, but enter a reboot loop without
any logs printed by the kernel.
Unbreak 64-bit Linux/x86 on 32-bit UEFI:
When it was first introduced, the EFI stub code that copies the
contents of PCI option ROMs originally only intended to do so if
the EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM attribute was *not* set.
The reason was that the UEFI spec permits PCI option ROM images
to be provided by the platform directly, rather than via the ROM
BAR, and in this case, the OS can only access them at runtime if
they are preserved at boot time by copying them from the areas
described by PciIo->RomImage and PciIo->RomSize.
However, it implemented this check erroneously, as can be seen in
commit:
dd5fc854de ("EFI: Stash ROMs if they're not in the PCI BAR")
which introduced:
if (!attributes & EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM)
continue;
and given that the numeric value of EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM
is 0x4000, this condition never becomes true, and so the option ROMs
were copied unconditionally.
This was spotted and 'fixed' by commit:
886d751a2e ("x86, efi: correct precedence of operators in setup_efi_pci")
but inadvertently inverted the logic at the same time, defeating
the purpose of the code, since it now only preserves option ROM
images that can be read from the ROM BAR as well.
Unsurprisingly, this broke some systems, and so the check was removed
entirely in the following commit:
739701888f ("x86, efi: remove attribute check from setup_efi_pci")
It is debatable whether this check should have been included in the
first place, since the option ROM image provided to the UEFI driver by
the firmware may be different from the one that is actually present in
the card's flash ROM, and so whatever PciIo->RomImage points at should
be preferred regardless of whether the attribute is set.
As this was the only use of the attributes field, we can remove
the call to PciIo->Attributes() entirely, which is especially
nice because its prototype involves uint64_t type by-value
arguments which the EFI mixed mode has trouble dealing with.
Any mixed mode system with PCI is likely to be affected.
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180711090235.9327-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When KASLR is enabled then 1GB huge pages allocations might regress
sporadically.
To reproduce on a KVM guest with 4GB RAM:
- add the following options to the kernel command-line:
'default_hugepagesz=1G hugepagesz=1G hugepages=1'
- boot the guest and check number of 1GB pages reserved:
# grep HugePages_Total /proc/meminfo
- sporadically, every couple of bootups the output of this
command shows that when booting with "nokaslr" HugePages_Total is always 1,
while booting without "nokaslr" sometimes HugePages_Total is set as 0
(that is, reserving the 1GB page failed).
Note that you may need to boot a few times to trigger the issue,
because it's somewhat non-deterministic.
The root cause is that kernel may be put into the only good 1GB huge page
in the [0x40000000, 0x7fffffff] physical range randomly.
Below is the dmesg output snippet from the KVM guest. We can see that only
[0x40000000, 0x7fffffff] region is good 1GB huge page,
[0x100000000, 0x13fffffff] will be touched by the memblock top-down allocation:
[...] e820: BIOS-provided physical RAM map:
[...] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[...] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[...] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[...] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[...] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[...] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[...] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[...] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
Besides, on bare-metal machines with larger memory, one less 1GB huge page
might be available with KASLR enabled. That too is because the kernel
image might be randomized into those "good" 1GB huge pages.
To fix this, firstly parse the kernel command-line to get how many 1GB huge
pages are specified. Then try to skip the specified number of 1GB huge
pages when decide which memory region kernel can be randomized into.
Also change the name of handle_mem_memmap() as handle_mem_options()
since it handles not only 'mem=' and 'memmap=', but also 'hugepagesxxx' now.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: douly.fnst@cn.fujitsu.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: indou.takao@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: lcapitulino@redhat.com
Cc: yasu.isimatu@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-3-bhe@redhat.com
[ Rewrote the changelog, fixed style problems in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce two new functions: parse_gb_huge_pages() and process_gb_huge_pages(),
which handle a conflict between KASLR and huge pages of 1GB.
These two functions will be used in the next patch:
- parse_gb_huge_pages() is used to parse kernel command-line to get
how many 1GB huge pages have been specified. A static global
variable 'max_gb_huge_pages' is added to store the number.
- process_gb_huge_pages() is used to skip as many 1GB huge pages
as possible from the passed in memory region according to the
specified number.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: douly.fnst@cn.fujitsu.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: indou.takao@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: lcapitulino@redhat.com
Cc: yasu.isimatu@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-2-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
2c3625cb9f ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function")
... merged the two versions of __setup_efi_pciXX(), without taking into
account that the 32-bit version used a rather dodgy trick to pass an
immediate 0 constant as argument for a uint64_t parameter.
The issue is caused by the fact that on x86, UEFI protocol method calls
are redirected via struct efi_config::call(), which is a variadic function,
and so the compiler has to infer the types of the parameters from the
arguments rather than from the prototype.
As the 32-bit x86 calling convention passes arguments via the stack,
passing the unqualified constant 0 twice is the same as passing 0ULL,
which is why the 32-bit code in __setup_efi_pci32() contained the
following call:
status = efi_early->call(pci->attributes, pci,
EfiPciIoAttributeOperationGet, 0, 0,
&attributes);
to invoke this UEFI protocol method:
typedef
EFI_STATUS
(EFIAPI *EFI_PCI_IO_PROTOCOL_ATTRIBUTES) (
IN EFI_PCI_IO_PROTOCOL *This,
IN EFI_PCI_IO_PROTOCOL_ATTRIBUTE_OPERATION Operation,
IN UINT64 Attributes,
OUT UINT64 *Result OPTIONAL
);
After the merge, we inadvertently ended up with this version for both
32-bit and 64-bit builds, breaking the latter.
So replace the two zeroes with the explicitly typed constant 0ULL,
which works as expected on both 32-bit and 64-bit builds.
Wilfried tested the 64-bit build, and I checked the generated assembly
of a 32-bit build with and without this patch, and they are identical.
Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hdegoede@redhat.com
Cc: linux-efi@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 updates and fixes from Thomas Gleixner:
- Fix the (late) fallout from the vector management rework causing
hlist corruption and irq descriptor reference leaks caused by a
missing sanity check.
The straight forward fix triggered another long standing issue to
surface. The pre rework code hid the issue due to being way slower,
but now the chance that user space sees an EBUSY error return when
updating irq affinities is way higher, though quite a bunch of
userspace tools do not handle it properly despite the fact that EBUSY
could be returned for at least 10 years.
It turned out that the EBUSY return can be avoided completely by
utilizing the existing delayed affinity update mechanism for irq
remapped scenarios as well. That's a bit more error handling in the
kernel, but avoids fruitless fingerpointing discussions with tool
developers.
- Decouple PHYSICAL_MASK from AMD SME as its going to be required for
the upcoming Intel memory encryption support as well.
- Handle legacy device ACPI detection properly for newer platforms
- Fix the wrong argument ordering in the vector allocation tracepoint
- Simplify the IDT setup code for the APIC=n case
- Use the proper string helpers in the MTRR code
- Remove a stale unused VDSO source file
- Convert the microcode update lock to a raw spinlock as its used in
atomic context.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
x86/apic/vector: Print APIC control bits in debugfs
genirq/affinity: Defer affinity setting if irq chip is busy
x86/platform/uv: Use apic_ack_irq()
x86/ioapic: Use apic_ack_irq()
irq_remapping: Use apic_ack_irq()
x86/apic: Provide apic_ack_irq()
genirq/migration: Avoid out of line call if pending is not set
genirq/generic_pending: Do not lose pending affinity update
x86/apic/vector: Prevent hlist corruption and leaks
x86/vector: Fix the args of vector_alloc tracepoint
x86/idt: Simplify the idt_setup_apic_and_irq_gates()
x86/platform/uv: Remove extra parentheses
x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
x86: Mark native_set_p4d() as __always_inline
x86/microcode: Make the late update update_lock a raw lock for RT
x86/mtrr: Convert to use strncpy_from_user() helper
x86/mtrr: Convert to use match_string() helper
x86/vdso: Remove unused file
x86/i8237: Register device based on FADT legacy boot flag
AMD SME claims one bit from physical address to indicate whether the
page is encrypted or not. To achieve that we clear out the bit from
__PHYSICAL_MASK.
The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
For instance for upcoming Intel Multi-Key Total Memory Encryption.
Factor it out into a separate feature with own Kconfig handle.
It also helps with overhead of AMD SME. It saves more than 3k in .text
on defconfig + AMD_MEM_ENCRYPT:
add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)
We would need to return to this once we have infrastructure to patch
constants in code. That's good candidate for it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
Pull x86 boot updates from Ingo Molnar:
- Centaur CPU updates (David Wang)
- AMD and other CPU topology enumeration improvements and fixes
(Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)
- Continued 5-level paging work (Kirill A. Shutemov)
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Mark __pgtable_l5_enabled __initdata
x86/mm: Mark p4d_offset() __always_inline
x86/mm: Introduce the 'no5lvl' kernel parameter
x86/mm: Stop pretending pgtable_l5_enabled is a variable
x86/mm: Unify pgtable_l5_enabled usage in early boot code
x86/boot/compressed/64: Fix trampoline page table address calculation
x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()
x86/Centaur: Report correct CPU/cache topology
x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
x86/CPU: Make intel_num_cpu_cores() generic
x86/CPU: Move cpu local function declarations to local header
x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available
x86/CPU: Modify detect_extended_topology() to return result
x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c
perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
x86/Centaur: Initialize supported CPU features properly
Pull EFI updates from Ingo Molnar:
- decode x86 CPER data (Yazen Ghannam)
- ignore unrealistically large option ROMs (Hans de Goede)
- initialize UEFI secure boot state during Xen dom0 boot (Daniel Kiper)
- additional minor tweaks and fixes.
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/capsule-loader: Don't output reset log when reset flags are not set
efi/x86: Ignore unrealistically large option ROMs
efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function
efi: Align efi_pci_io_protocol typedefs to type naming convention
efi/libstub/tpm: Make function efi_retrieve_tpm2_eventlog_1_2() static
efi: Decode IA32/X64 Context Info structure
efi: Decode IA32/X64 MS Check structure
efi: Decode additional IA32/X64 Bus Check fields
efi: Decode IA32/X64 Cache, TLB, and Bus Check structures
efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs
efi: Decode IA32/X64 Processor Error Info Structure
efi: Decode IA32/X64 Processor Error Section
efi: Fix IA32/X64 Processor Error Record definition
efi/cper: Remove the INDENT_SP silliness
x86/xen/efi: Initialize UEFI secure boot state during dom0 boot
Pull x86 fixes from Thomas Gleixner:
"An unfortunately larger set of fixes, but a large portion is
selftests:
- Fix the missing clusterid initializaiton for x2apic cluster
management which caused boot failures due to IPIs being sent to the
wrong cluster
- Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
task
- Wrap access to __supported_pte_mask in __startup_64() where clang
compile fails due to a non PC relative access being generated.
- Two fixes for 5 level paging fallout in the decompressor:
- Handle GOT correctly for paging_prepare() and
cleanup_trampoline()
- Fix the page table handling in cleanup_trampoline() to avoid
page table corruption.
- Stop special casing protection key 0 as this is inconsistent with
the manpage and also inconsistent with the allocation map handling.
- Override the protection key wen moving away from PROT_EXEC to
prevent inaccessible memory.
- Fix and update the protection key selftests to address breakage and
to cover the above issue
- Add a MOV SS self test"
[ Part of the x86 fixes were in the earlier core pull due to dependencies ]
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
x86/apic/x2apic: Initialize cluster ID properly
x86/boot/compressed/64: Fix moving page table out of trampoline memory
x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
x86/pkeys: Do not special case protection key 0
x86/pkeys/selftests: Add a test for pkey 0
x86/pkeys/selftests: Save off 'prot' for allocations
x86/pkeys/selftests: Fix pointer math
x86/pkeys: Override pkey when moving away from PROT_EXEC
x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
x86/pkeys/selftests: Add PROT_EXEC test
x86/pkeys/selftests: Factor out "instruction page"
x86/pkeys/selftests: Allow faults on unknown keys
x86/pkeys/selftests: Avoid printf-in-signal deadlocks
x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
x86/pkeys/selftests: Stop using assert()
x86/pkeys/selftests: Give better unexpected fault error messages
x86/selftests: Add mov_to_ss test
x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
...
This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.
The option may be useful to work around regressions related to 5-level
paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Usually pgtable_l5_enabled is defined using cpu_feature_enabled().
cpu_feature_enabled() is not available in early boot code. We use
several different preprocessor tricks to get around it. It's messy.
Unify them all.
If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can
be defined before all includes. It makes pgtable_l5_enabled rely on
__pgtable_l5_enabled variable instead. This approach fits all early
users.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hugh noticied that we calculate the address of the trampoline page table
incorrectly in cleanup_trampoline().
TRAMPOLINE_32BIT_PGTABLE_OFFSET has to be divided by sizeof(unsigned long),
since trampoline_32bit is an 'unsigned long' pointer.
TRAMPOLINE_32BIT_PGTABLE_OFFSET is zero so the bug doesn't have a
visible effect.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180518103528.59260-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cleanup_trampoline() relocates the top-level page table out of
trampoline memory. We use 'top_pgtable' as our new top-level page table.
But if the 'top_pgtable' would be referenced from C in a usual way,
the address of the table will be calculated relative to RIP.
After kernel gets relocated, the address will be in the middle of
decompression buffer and the page table may get overwritten.
This leads to a crash.
We calculate the address of other page tables relative to the relocation
address. It makes them safe. We should do the same for 'top_pgtable'.
Calculate the address of 'top_pgtable' in assembly and pass down to
cleanup_trampoline().
Move the page table to .pgtable section where the rest of page tables
are. The section is @nobits so we save 4k in kernel image.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180516080131.27913-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric and Hugh have reported instant reboot due to my recent changes in
decompression code.
The root cause is that I didn't realize that we need to adjust GOT to be
able to run C code that early.
The problem is only visible with an older toolchain. Binutils >= 2.24 is
able to eliminate GOT references by replacing them with RIP-relative
address loads:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec
We need to adjust GOT two times:
- before calling paging_prepare() using the initial load address
- before calling C code from the relocated kernel
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 194a9749c7 ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Link: http://lkml.kernel.org/r/20180516080131.27913-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
setup_efi_pci() tries to save a copy of each PCI option ROM as this may
be necessary for the device driver for the PCI device to have access too.
On some systems the efi_pci_io_protocol's romimage and romsize fields
contain invalid data, which looks a bit like pointers pointing back into
other EFI code or data. Interpreting these pointers as romsize leads to
a very large value and if we then try to alloc this amount of memory to
save a copy the alloc call fails.
This leads to a "Failed to alloc mem for rom" error being printed on the
EFI console for each PCI device.
This commit avoids the printing of these errors, by checking romsize before
doing the alloc and if it is larger then the EFI spec limit of 16 MiB
silently ignore the ROM fields instead of trying to alloc mem and fail.
Tested-by: Hans de Goede <hdegoede@redhat.com>
[ardb: deduplicate 32/64 bit changes, use SZ_16M symbolic constant]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-16-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As suggested by Lukas, use his efi_call_proto() and efi_table_attr()
macros to merge __setup_efi_pci32() and __setup_efi_pci64() into a
single function, removing the need to duplicate changes made in
subsequent patches across both.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-15-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to use the helper macros that perform type mangling with the
EFI PCI I/O protocol struct typedefs, align their Linux typenames with
the convention we use for definitionns that originate in the UEFI spec,
and add the trailing _t to each.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-14-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mixed mode allows a kernel built for x86_64 to interact with 32-bit
EFI firmware, but requires us to define all struct definitions carefully
when it comes to pointer sizes.
'struct efi_pci_io_protocol_32' currently uses a 'void *' for the
'romimage' field, which will be interpreted as a 64-bit field
on such kernels, potentially resulting in bogus memory references
and subsequent crashes.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE. This is
done implicitly within functions like pfn_pte() by massage_pgprot().
However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().
This moves the bit filtering out of pfn_pte() and friends. For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.
Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot(). This way, we still *look* for
unsupported bits and properly warn about them if we find them. This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.
- printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
- boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com>
- crash bisected by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de>
Fixed-by: Tom Lendacky <thomas.lendacky@amd.com>
Bisected-by: Mike Galbraith <efault@gmx.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull EFI updates from Ingo Molnar:
"The main EFI changes in this cycle were:
- Fix the apple-properties code (Andy Shevchenko)
- Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved
x18 register (Ard Biesheuvel)
- Use efi_switch_mm() on x86 instead of manipulating %cr3 directly
(Sai Praneeth)
- Fix early memremap leak in ESRT code (Ard Biesheuvel)
- Switch to L"xxx" notation for wide string literals (Ard Biesheuvel)
- ... plus misc other cleanups and bugfixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
x86/efi: Replace efi_pgd with efi_mm.pgd
efi: Use string literals for efi_char16_t variable initializers
efi/esrt: Fix handling of early ESRT table mapping
efi: Use efi_mm in x86 as well as ARM
efi: Make const array 'apple' static
efi/apple-properties: Use memremap() instead of ioremap()
efi: Reorder pr_notice() with add_device_randomness() call
x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store()
efi/arm64: Check whether x18 is preserved by runtime services calls
efi/arm*: Stop printing addresses of virtual mappings
efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs()
efi/arm*: Only register page tables when they exist
Pull x86 mm updates from Ingo Molnar:
- Extend the memmap= boot parameter syntax to allow the redeclaration
and dropping of existing ranges, and to support all e820 range types
(Jan H. Schönherr)
- Improve the W+X boot time security checks to remove false positive
warnings on Xen (Jan Beulich)
- Support booting as Xen PVH guest (Juergen Gross)
- Improved 5-level paging (LA57) support, in particular it's possible
now to have a single kernel image for both 4-level and 5-level
hardware (Kirill A. Shutemov)
- AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)
- Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
(Kirill A. Shutemov)
- Improved Intel-MID support (Andy Shevchenko)
- Show EFI page tables in page_tables debug files (Andy Lutomirski)
- ... plus misc fixes and smaller cleanups
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
x86/mm: Update comment in detect_tme() regarding x86_phys_bits
x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
x86/mm: Remove pointless checks in vmalloc_fault
x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
x86/pconfig: Detect PCONFIG targets
x86/tme: Detect if TME and MKTME is activated by BIOS
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
x86/boot/compressed/64: Use page table in trampoline memory
x86/boot/compressed/64: Use stack from trampoline memory
x86/boot/compressed/64: Make sure we have a 32-bit code segment
x86/mm: Do not use paravirtualized calls in native_set_p4d()
kdump, vmcoreinfo: Export pgtable_l5_enabled value
x86/boot/compressed/64: Prepare new top-level page table for trampoline
x86/boot/compressed/64: Set up trampoline memory
x86/boot/compressed/64: Save and restore trampoline memory
...
Pull x86 build updates from Ingo Molnar:
"The biggest change is the forcing of asm-goto support on x86, which
effectively increases the GCC minimum supported version to gcc-4.5 (on
x86)"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Don't pass in -D__KERNEL__ multiple times
x86: Remove FAST_FEATURE_TESTS
x86: Force asm-goto
x86/build: Drop superfluous ALIGN from the linker script
Some .<target>.cmd files under arch/x86 are showing two instances of
-D__KERNEL__, like arch/x86/boot/ and arch/x86/realmode/rm/.
__KERNEL__ is already defined in KBUILD_CPPFLAGS in the top Makefile,
so it can be dropped safely.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/20180316084944.3997-1-caoj.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In arch/x86/boot/compressed/kaslr_64.c, CONFIG_AMD_MEM_ENCRYPT support was
initially #undef'd to support SME with minimal effort. When support for
SEV was added, the #undef remained and some minimal support for setting the
encryption bit was added for building identity mapped pagetable entries.
Commit b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
changed __PHYSICAL_MASK_SHIFT from 46 to 52 in support of 5-level paging.
This change resulted in SEV guests failing to boot because the encryption
bit was no longer being automatically masked out. The compressed boot
path now requires sme_me_mask to be defined in order for the pagetable
functions, such as pud_present(), to properly mask out the encryption bit
(currently bit 47) when evaluating pagetable entries.
Add an sme_me_mask variable in arch/x86/boot/compressed/mem_encrypt.S,
which is set when SEV is active, delete the #undef CONFIG_AMD_MEM_ENCRYPT
from arch/x86/boot/compressed/kaslr_64.c and use sme_me_mask when building
the identify mapped pagetable entries.
Fixes: b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180327220711.8702.55842.stgit@tlendack-t1.amdoffice.net
Pull x86 and PTI fixes from Ingo Molnar:
"Misc fixes:
- fix EFI pagetables freeing
- fix vsyscall pagetable setting on Xen PV guests
- remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again
- fix two binutils (ld) development version related incompatibilities
- clean up breakpoint handling
- fix an x86 self-test"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Don't use IST entry for #BP stack
x86/efi: Free efi_pgd with free_pages()
x86/vsyscall/64: Use proper accessor to update P4D entry
x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk
x86/boot/64: Verify alignment of the LOAD segment
x86/build/64: Force the linker to use 2MB page size
selftests/x86/ptrace_syscall: Fix for yet more glibc interference
Since the x86-64 kernel must be aligned to 2MB, refuse to boot the
kernel if the alignment of the LOAD segment isn't a multiple of 2MB.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
This patch implements a trampoline in lower memory to handle this
situation.
We only need the memory for a very short time, until the main kernel
image sets up own page tables.
We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (i.e. in kexec() case),
we would lose control as soon as paging is disabled, because the code
becomes unreachable to the CPU.
To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.
Apart from the trampoline code itself we also need a place to store
top-level page table in lower memory as we don't have a way to load
64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there
as we only use the very first entry of the page table. But we allocate a
whole page anyway.
This patch switches 32-bit code to use page table in trampoline memory.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As the first step on using trampoline memory, let's make 32-bit code use
stack there.
Separate stack is required to return back from trampoline and we cannot
user stack from 64-bit mode as it may be above 4G.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When kernel starts in 64-bit mode we inherit the GDT from the bootloader.
It may cause a problem if the GDT doesn't have a 32-bit code segment
where we expect it to be.
Load our own GDT with known segments.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we unambiguously build the entire kernel with -fshort-wchar,
it is no longer necessary to open code efi_char16_t[] initializers as
arrays of characters, and we can move to the L"xxx" notation instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180312084500.10764-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If trampoline code would need to switch between 4- and 5-level paging
modes, we have to use a page table in trampoline memory.
Having it in trampoline memory guarantees that it's below 4G and we can
point CR3 to it from 32-bit trampoline code.
We only use the page table if the desired paging mode doesn't match the
mode we are in. Otherwise the page table is unused and trampoline code
wouldn't touch CR3.
For 4- to 5-level paging transition, we set up current (4-level paging)
CR3 as the first and the only entry in a new top-level page table.
For 5- to 4-level paging transition, copy page table pointed by first
entry in the current top-level page table as our new top-level page
table.
If the page table is used by trampoline we would need to copy it to new
page table outside trampoline and update CR3 before restoring trampoline
memory.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The memory area we found for trampoline shouldn't contain anything
useful. But let's preserve the data anyway. Just to be on safe side.
paging_prepare() would save the data into a buffer.
cleanup_trampoline() would restore it back once we are done with the
trampoline.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling of
paging, which works fine if kernel itself is loaded below 4G.
But if the bootloader puts the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.
This patch finds a spot in low memory for a trampoline.
The heuristic is based on code in reserve_bios_regions().
We find the end of low memory based on BIOS and EBDA start addresses.
The trampoline is put just before end of low memory. It's mimic approach
taken to allocate memory for realtime trampoline.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't populate the const read-only array 'buf' on the stack but instead
make it static. Makes the object code smaller by 64 bytes:
Before:
text data bss dec hex filename
9264 1 16 9281 2441 arch/x86/boot/compressed/eboot.o
After:
text data bss dec hex filename
9200 1 16 9217 2401 arch/x86/boot/compressed/eboot.o
(GCC version 7.2.0 x86_64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180308080020.22828-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both x86/mm and x86/boot contain 5-level paging related patches,
unify them to have a single tree to work against.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
struct efi_uga_draw_protocol *uga = NULL, *first_uga;
efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;;
+ u32 *handles = (u32 *)uga_handle;
efi_status_t status = EFI_INVALID_PARAMETER;
int i;
This patch is the result of the following script:
$ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
... followed by manual review to make sure it's all good.
Splitting this up is just crazy talk, let's get over with this and just do it.
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without LA57 support.
Kernel will detect that LA57 is missing and fold p4d at runtime.
Update the documentation and the Kconfig option description to reflect the
change.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Switching between paging modes requires the folding of the p4d page table level
when we only have 4 paging levels, which means we need to adjust 'pgdir_shift'
and 'ptrs_per_p4d' during early boot according to paging mode.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'pgtable_l5_enabled' indicates which paging mode we are using. We need to
initialize it at boot-time according to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new flag would indicate what paging mode we are in.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename l5_paging_required() to paging_prepare() and change the
interface of the function.
This is a preparation for the next patch, which would make the function
also allocate memory for the 32-bit trampoline.
The function now returns a 128-bit structure. RAX would return
trampoline memory address (zero for now) and RDX would indicate if we
need to enable 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Typo fixes and general clarification. ]
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The name of the file -- pagetable.c -- is misleading: it only contains
helpers used for KASLR in 64-bit mode.
Let's rename the file to reflect its content.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull tpm updates from James Morris:
- reduce polling delays in tpm_tis
- support retrieving TPM 2.0 Event Log through EFI before
ExitBootServices
- replace tpm-rng.c with a hwrng device managed by the driver for each
TPM device
- TPM resource manager synthesizes TPM_RC_COMMAND_CODE response instead
of returning -EINVAL for unknown TPM commands. This makes user space
more sound.
- CLKRUN fixes:
* Keep #CLKRUN disable through the entier TPM command/response flow
* Check whether #CLKRUN is enabled before disabling and enabling it
again because enabling it breaks PS/2 devices on a system where it
is disabled
* 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm: remove unused variables
tpm: remove unused data fields from I2C and OF device ID tables
tpm: only attempt to disable the LPC CLKRUN if is already enabled
tpm: follow coding style for variable declaration in tpm_tis_core_init()
tpm: delete the TPM_TIS_CLK_ENABLE flag
tpm: Update MAINTAINERS for Jason Gunthorpe
tpm: Keep CLKRUN enabled throughout the duration of transmit_cmd()
tpm_tis: Move ilb_base_addr to tpm_tis_data
tpm2-cmd: allow more attempts for selftest execution
tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented
tpm: Move Linux RNG connection to hwrng
tpm: use struct tpm_chip for tpm_chip_find_get()
tpm: parse TPM event logs based on EFI table
efi: call get_event_log before ExitBootServices
tpm: add event log format version
tpm: rename event log provider files
tpm: move tpm_eventlog.h outside of drivers folder
tpm: use tpm_msleep() value as max delay
tpm: reduce tpm polling delay in tpm_tis_core
tpm: move wait_for_tpm_stat() to respective driver files
With TPM 2.0 specification, the event logs may only be accessible by
calling an EFI Boot Service. Modify the EFI stub to copy the log area to
a new Linux-specific EFI configuration table so it remains accessible
once booted.
When calling this service, it is possible to specify the expected format
of the logs: TPM 1.2 (SHA1) or TPM 2.0 ("Crypto Agile"). For now, only the
first format is retrieved.
Signed-off-by: Thiebaud Weksteen <tweek@google.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Pull x86 fixes from Thomas Gleixner:
"A couple of fixlets for x86:
- Fix the ESPFIX double fault handling for 5-level pagetables
- Fix the commandline parsing for 'apic=' on 32bit systems and update
documentation
- Make zombie stack traces reliable
- Fix kexec with stack canary
- Fix the delivery mode for APICs which was missed when the x86
vector management was converted to single target delivery. Caused a
regression due to the broken hardware which ignores affinity
settings in lowest prio delivery mode.
- Unbreak modules when AMD memory encryption is enabled
- Remove an unused parameter of prepare_switch_to"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Switch all APICs to Fixed delivery mode
x86/apic: Update the 'apic=' description of setting APIC driver
x86/apic: Avoid wrong warning when parsing 'apic=' in X86-32 case
x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
x86: Remove unused parameter of prepare_switch_to
x86/stacktrace: Make zombie stack traces reliable
x86/mm: Unbreak modules that use the DMA API
x86/build: Make isoimage work on Debian
x86/espfix/64: Fix espfix double-fault handling on 5-level systems
Pull x86 page table isolation updates from Thomas Gleixner:
"This is the final set of enabling page table isolation on x86:
- Infrastructure patches for handling the extra page tables.
- Patches which map the various bits and pieces which are required to
get in and out of user space into the user space visible page
tables.
- The required changes to have CR3 switching in the entry/exit code.
- Optimizations for the CR3 switching along with documentation how
the ASID/PCID mechanism works.
- Updates to dump pagetables to cover the user space page tables for
W+X scans and extra debugfs files to analyze both the kernel and
the user space visible page tables
The whole functionality is compile time controlled via a config switch
and can be turned on/off on the command line as well"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/ldt: Make the LDT mapping RO
x86/mm/dump_pagetables: Allow dumping current pagetables
x86/mm/dump_pagetables: Check user space page table for WX pages
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
x86/mm/pti: Add Kconfig
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
x86/mm: Use INVPCID for __native_flush_tlb_single()
x86/mm: Optimize RESTORE_CR3
x86/mm: Use/Fix PCID to optimize user/kernel switches
x86/mm: Abstract switching CR3
x86/mm: Allow flushing for future ASID switches
x86/pti: Map the vsyscall page if needed
x86/pti: Put the LDT in its own PGD if PTI is on
x86/mm/64: Make a full PGD-entry size hole in the memory map
x86/events/intel/ds: Map debug buffers in cpu_entry_area
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
x86/mm/pti: Map ESPFIX into user space
x86/mm/pti: Share entry text PMD
x86/entry: Align entry text section to PMD boundary
...
Debian does not ship a 'mkisofs' symlink to genisoimage. All modern
distros ship genisoimage, so just use that directly. That requires
renaming the 'genisoimage' function. Also neaten up the 'for' loop
while I'm in here.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If mtools.conf is not generated before, 'make isoimage' could complain:
Kernel: arch/x86/boot/bzImage is ready (#597)
GENIMAGE arch/x86/boot/image.iso
*** Missing file: arch/x86/boot/mtools.conf
arch/x86/boot/Makefile:144: recipe for target 'isoimage' failed
mtools.conf is not used for isoimage generation, so do not check it.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4366d57af1 ("x86/build: Factor out fdimage/isoimage generation commands to standalone script")
Link: http://lkml.kernel.org/r/1512053480-8083-1-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the machine does not support the paging mode for which the kernel was
compiled, the boot process cannot continue.
It's not possible to let the kernel detect the mismatch as it does not even
reach the point where cpu features can be evaluted due to a triple fault in
the KASLR setup.
Instead of instantaneous silent reboot, emit an error message which gives
the user the information why the boot fails.
Fixes: 77ef56e4f0 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-3-kirill.shutemov@linux.intel.com
Prerequisite for fixing the current problem of instantaneous reboots when a
5-level paging kernel is booted on 4-level paging hardware.
At the same time this change prepares the decompression code to boot-time
switching between 4- and 5-level paging.
[ tglx: Folded the GCC < 5 fix. ]
Fixes: 77ef56e4f0 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-2-kirill.shutemov@linux.intel.com
Pull x86 cleanups from Ingo Molnar:
"Two changes: Propagate const/__initconst, and use ARRAY_SIZE() some
more"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/events/amd/iommu: Make iommu_pmu const and __initconst
x86: Use ARRAY_SIZE
This change suppresses the 'dd' output and adds the '-quiet' parameter
to mkisofs tool. It also removes the 'Using ...' messages, as none of the
messages matter to the user normally.
"make V=1" can still be used for a more verbose build.
The new build messages are now a streamlined set of:
$ make isoimage
...
Kernel: arch/x86/boot/bzImage is ready (#75)
GENIMAGE arch/x86/boot/image.iso
Kernel: arch/x86/boot/image.iso is ready
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510207751-22166-1-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.
Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
(CPUID 0x8000001f, bit 1) and then checking a non-interceptable SEV MSR
(0xc0010131, bit 0).
This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory, it can boot properly.
After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so. This allows
to distinguish between SME and SEV, each of which have unique differences
in how certain things are handled: e.g. DMA (always bounce buffered with
SEV) or EFI tables (always access decrypted with SME).
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kvm@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-13-brijesh.singh@amd.com
It avoids the following warning triggered by newer versions of mkisofs:
-input-charset not specified, using utf-8 (detected in locale settings)
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yamada.masahiro@socionext.com
Link: http://lkml.kernel.org/r/1509939179-7556-4-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recently I failed to build isoimage target, because the path of isolinux.bin
changed to /usr/xxx/ISOLINUX/isolinux.bin, as well as ldlinux.c32 which
changed to /usr/xxx/syslinux/modules/bios/ldlinux.c32.
This patch improves the file search logic:
- Show a error message instead of silent fail.
- Add above new paths.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yamada.masahiro@socionext.com
Link: http://lkml.kernel.org/r/1509939179-7556-3-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The build messages for fdimage/isoimage generation are pretty unstructured,
just the raw shell command blocks are printed.
Emit shortened messages similar to existing kbuild messages, and move
the Makefile commands into a separate shell script - which is much
easier to handle.
This patch factors out the commands used for fdimage/isoimage generation
from arch/x86/boot/Makefile to a new script arch/x86/boot/genimage.sh.
Then it adds the new kbuild command 'genimage' which invokes the new script.
All fdimages/isoimage files are now generated by a call to 'genimage' with
different parameters.
Now 'make isoimage' becomes:
...
Kernel: arch/x86/boot/bzImage is ready (#30)
GENIMAGE arch/x86/boot/image.iso
Size of boot image is 4 sectors -> No emulation
15.37% done, estimate finish Sun Nov 5 23:36:57 2017
30.68% done, estimate finish Sun Nov 5 23:36:57 2017
46.04% done, estimate finish Sun Nov 5 23:36:57 2017
61.35% done, estimate finish Sun Nov 5 23:36:57 2017
76.69% done, estimate finish Sun Nov 5 23:36:57 2017
92.00% done, estimate finish Sun Nov 5 23:36:57 2017
Total translation table size: 2048
Total rockridge attributes bytes: 659
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
32608 extents written (63 MB)
Kernel: arch/x86/boot/image.iso is ready
Before:
Kernel: arch/x86/boot/bzImage is ready (#63)
rm -rf arch/x86/boot/isoimage
mkdir arch/x86/boot/isoimage
for i in lib lib64 share end ; do \
if [ -f /usr/$i/syslinux/isolinux.bin ] ; then \
cp /usr/$i/syslinux/isolinux.bin arch/x86/boot/isoimage ; \
if [ -f /usr/$i/syslinux/ldlinux.c32 ]; then \
cp /usr/$i/syslinux/ldlinux.c32 arch/x86/boot/isoimage ; \
fi ; \
break ; \
fi ; \
if [ $i = end ] ; then exit 1 ; fi ; \
done
...
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1509939179-7556-2-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernel makes use of several GCC extensions, disable Clang warnings
about that in the boot code, as we already do for the rest of the kernel.
This suppresses the following warning when building with clang:
./include/linux/cgroup-defs.h:391:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct cgroup cgrp;
Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171030194351.122090-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using the ARRAY_SIZE macro improves the readability of the code.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-video@atrey.karlin.mff.cuni.cz
Cc: Martin Mares <mj@ucw.cz>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20171001193101.8898-13-jeremy.lefaure@lse.epita.fr
The <generated/utsrelease.h> defines UTS_RELEASE, but I do not
see any reference to it in arch/x86/boot/header.S.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1505921232-8960-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Transparently fall back to other poweroff method(s) if EFI poweroff
fails (and returns)
- Use separate PE/COFF section headers for the RX and RW parts of the
ARM stub loader so that the firmware can use strict mapping
permissions
- Add support for requesting the firmware to wipe RAM at warm reboot
- Increase the size of the random seed obtained from UEFI so CRNG
fast init can complete earlier
- Update the EFI framebuffer address if it points to a BAR that gets
moved by the PCI resource allocation code
- Enable "reset attack mitigation" of TPM environments: this is
enabled if the kernel is configured with
CONFIG_RESET_ATTACK_MITIGATION=y.
- Clang related fixes
- Misc cleanups, constification, refactoring, etc"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/bgrt: Use efi_mem_type()
efi: Move efi_mem_type() to common code
efi/reboot: Make function pointer orig_pm_power_off static
efi/random: Increase size of firmware supplied randomness
efi/libstub: Enable reset attack mitigation
firmware/efi/esrt: Constify attribute_group structures
firmware/efi: Constify attribute_group structures
firmware/dcdbas: Constify attribute_group structures
arm/efi: Split zImage code and data into separate PE/COFF sections
arm/efi: Replace open coded constants with symbolic ones
arm/efi: Remove pointless dummy .reloc section
arm/efi: Remove forbidden values from the PE/COFF header
drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
efi/arm/arm64: Add missing assignment of efi.config_table
efi/libstub/arm64: Set -fpie when building the EFI stub
efi/libstub/arm64: Force 'hidden' visibility for section markers
efi/libstub/arm64: Use hidden attribute for struct screen_info reference
efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
Pull x86 apic updates from Thomas Gleixner:
"This update provides:
- Cleanup of the IDT management including the removal of the extra
tracing IDT. A first step to cleanup the vector management code.
- The removal of the paravirt op adjust_exception_frame. This is a
XEN specific issue, but merged through this branch to avoid nasty
merge collisions
- Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
disabled the TSC DEADLINE timer in CPUID.
- Adjust a debug message in the ioapic code to print out the
information correctly"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
x86/idt: Fix the X86_TRAP_BP gate
x86/xen: Get rid of paravirt op adjust_exception_frame
x86/eisa: Add missing include
x86/idt: Remove superfluous ALIGNment
x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
x86/idt: Remove the tracing IDT leftovers
x86/idt: Hide set_intr_gate()
x86/idt: Simplify alloc_intr_gate()
x86/idt: Deinline setup functions
x86/idt: Remove unused functions/inlines
x86/idt: Move interrupt gate initialization to IDT code
x86/idt: Move APIC gate initialization to tables
x86/idt: Move regular trap init to tables
x86/idt: Move IST stack based traps to table init
x86/idt: Move debug stack init to table based
x86/idt: Switch early trap init to IDT tables
x86/idt: Prepare for table based init
x86/idt: Move early IDT setup out of 32-bit asm
x86/idt: Move early IDT handler setup to IDT code
x86/idt: Consolidate IDT invalidation
...
Pull x86 mm changes from Ingo Molnar:
"PCID support, 5-level paging support, Secure Memory Encryption support
The main changes in this cycle are support for three new, complex
hardware features of x86 CPUs:
- Add 5-level paging support, which is a new hardware feature on
upcoming Intel CPUs allowing up to 128 PB of virtual address space
and 4 PB of physical RAM space - a 512-fold increase over the old
limits. (Supercomputers of the future forecasting hurricanes on an
ever warming planet can certainly make good use of more RAM.)
Many of the necessary changes went upstream in previous cycles,
v4.14 is the first kernel that can enable 5-level paging.
This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
default.
(By Kirill A. Shutemov)
- Add 'encrypted memory' support, which is a new hardware feature on
upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
RAM to be encrypted and decrypted (mostly) transparently by the
CPU, with a little help from the kernel to transition to/from
encrypted RAM. Such RAM should be more secure against various
attacks like RAM access via the memory bus and should make the
radio signature of memory bus traffic harder to intercept (and
decrypt) as well.
This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
by default.
(By Tom Lendacky)
- Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
hardware feature that attaches an address space tag to TLB entries
and thus allows to skip TLB flushing in many cases, even if we
switch mm's.
(By Andy Lutomirski)
All three of these features were in the works for a long time, and
it's coincidence of the three independent development paths that they
are all enabled in v4.14 at once"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
x86/mm: Use pr_cont() in dump_pagetable()
x86/mm: Fix SME encryption stack ptr handling
kvm/x86: Avoid clearing the C-bit in rsvd_bits()
x86/CPU: Align CR3 defines
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
acpi, x86/mm: Remove encryption mask from ACPI page protection type
x86/mm, kexec: Fix memory corruption with SME on successive kexecs
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
x86/mm: Allow userspace have mappings above 47-bit
x86/mm: Prepare to expose larger address space to userspace
x86/mpx: Do not allow MPX if we have mappings above 47-bit
x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
x86/mm/dump_pagetables: Fix printout of p4d level
x86/mm/dump_pagetables: Generalize address normalization
x86/boot: Fix memremap() related build failure
...
Pull x86 boot updates from Ingo Molnar:
"The main changes are KASL related fixes and cleanups: in particular we
now exclude certain physical memory ranges as KASLR randomization
targets that have proven to be unreliable (early-)RAM on some firmware
versions"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice
x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address
efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor
x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()
x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()
x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()
Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
There's a potential bug in how we select the KASLR kernel address n
the early boot code.
The KASLR boot code currently chooses the kernel image's physical memory
location from E820_TYPE_RAM regions by walking over all e820 entries.
E820_TYPE_RAM includes EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA
as well, so those regions can end up hosting the kernel image. According to
the UEFI spec, all memory regions marked as EfiBootServicesCode and
EfiBootServicesData are available as free memory after the first call
to ExitBootServices(). I.e. so such regions should be usable for the
kernel, per spec.
In real life however, we have workarounds for broken x86 firmware,
where we keep such regions reserved until SetVirtualAddressMap() is done.
See the following code in should_map_region():
static bool should_map_region(efi_memory_desc_t *md)
{
...
/*
* Map boot services regions as a workaround for buggy
* firmware that accesses them even when they shouldn't.
*
* See efi_{reserve,free}_boot_services().
*/
if (md->type =3D=3D EFI_BOOT_SERVICES_CODE ||
md->type =3D=3D EFI_BOOT_SERVICES_DATA)
return false;
This workaround suppressed a boot crash, but potential issues still
remain because no one prevents the regions from overlapping with kernel
image by KASLR.
So let's make sure that EFI_BOOT_SERVICES_{CODE|DATA} regions are never
chosen as kernel memory for the workaround to work fine.
Furthermore, EFI_LOADER_{CODE|DATA} regions are also excluded because
they can be used after ExitBootServices() as defined in EFI spec.
As a result, we choose kernel address only from EFI_CONVENTIONAL_MEMORY
which is the only memory type we know to be safely free.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Junichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Link: http://lkml.kernel.org/r/20170828074444.GC23181@hori1.linux.bs1.fc.nec.co.jp
[ Rewrote/fixed/clarified the changelog and the in code comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a zero for the number of lines manages to slip through, scroll()
may underflow some offset calculations, causing accesses outside the
video memory.
Make the check in __putstr() more pessimistic to prevent that.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current slack space is not enough for LZ4, which has a worst case
overhead of 0.4% for data that cannot be further compressed. With
an LZ4 compressed kernel with an embedded initrd, the output is likely
to overwrite the input.
Increase the slack space to avoid that.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into
startup_64().
In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry()
to start at 0x210. But this requirement was removed long time ago, in:
99f857db88 ("x86, build: Dynamically find entry points in compressed startup code")
The way it is now makes the code less readable and illogical. Given
we can now safely extract the inlined efi_pe_entry() body from
startup_64() into a separate function, we do so.
We also annotate the function appropriatelly by ENTRY+ENDPROC.
ABI offsets are preserved:
0000000000000000 T startup_32
0000000000000200 T startup_64
0000000000000390 T efi64_stub_entry
On the top-level, it looked like:
.org 0x200
ENTRY(startup_64)
#ifdef CONFIG_EFI_STUB ; start of inlined
jmp preferred_addr
GLOBAL(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
leaq preferred_addr(%rax), %rax
jmp *%rax
preferred_addr:
#endif ; end of inlined
... ; a lot of assembly (startup_64)
ENDPROC(startup_64)
And it is now converted into:
.org 0x200
ENTRY(startup_64)
... ; a lot of assembly (startup_64)
ENDPROC(startup_64)
#ifdef CONFIG_EFI_STUB
ENTRY(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
leaq startup_64(%rax), %rax
jmp *%rax
ENDPROC(efi_pe_entry)
#endif
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days,
we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start
at 0x10.
But this requirement was removed long time ago, in:
99f857db88 ("x86, build: Dynamically find entry points in compressed startup code")
The way it is now makes the code less readable and illogical. Given
we can now safely extract the inlined efi_pe_entry() body from
startup_32() into a separate function, we do so and we separate it to two
functions as they are marked already: efi_pe_entry() + efi32_stub_entry().
We also annotate the functions appropriatelly by ENTRY+ENDPROC.
ABI offset is preserved:
0000 128 FUNC GLOBAL DEFAULT 6 startup_32
0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry
00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry
On the top-level, it looked like this:
ENTRY(startup_32)
#ifdef CONFIG_EFI_STUB ; start of inlined
jmp preferred_addr
ENTRY(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
ENTRY(efi32_stub_entry)
... ; a lot of assembly (efi32_stub_entry)
leal preferred_addr(%eax), %eax
jmp *%eax
preferred_addr:
#endif ; end of inlined
... ; a lot of assembly (startup_32)
ENDPROC(startup_32)
And it is now converted into:
ENTRY(startup_32)
... ; a lot of assembly (startup_32)
ENDPROC(startup_32)
#ifdef CONFIG_EFI_STUB
ENTRY(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
ENDPROC(efi_pe_entry)
ENTRY(efi32_stub_entry)
... ; a lot of assembly (efi32_stub_entry)
leal startup_32(%eax), %eax
jmp *%eax
ENDPROC(efi32_stub_entry)
#endif
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The first 32 bits of gate struct are the same for 32 and 64 bit kernels.
The 32-bit version uses desc_struct and no designated data structure,
so we need different accessors for 32 and 64 bit kernels.
Aside of that the macros which are necessary to build the 32-bit
gate descriptor are horrible to read.
Unify the gate structs and switch all code fiddling with it over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a machine is reset while secrets are present in RAM, it may be
possible for code executed after the reboot to extract those secrets
from untouched memory. The Trusted Computing Group specified a mechanism
for requesting that the firmware clear all RAM on reset before booting
another OS. This is done by setting the MemoryOverwriteRequestControl
variable at startup. If userspace can ensure that all secrets are
removed as part of a controlled shutdown, it can reset this variable to
0 before triggering a hardware reboot.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170825155019.6740-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently KASLR will parse all e820 entries of RAM type and add all
candidate positions into the slots array. After that we choose one slot
randomly as the new position which the kernel will be decompressed into
and run at.
On systems with EFI enabled, e820 memory regions are coming from EFI
memory regions by combining adjacent regions.
These EFI memory regions have various attributes, and the "mirrored"
attribute is one of them. The physical memory region whose descriptors
in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
mirrored. The address range mirroring feature of the kernel arranges such
mirrored regions into normal zones and other regions into movable zones.
With the mirroring feature enabled, the code and data of the kernel can only
be located in the more reliable mirrored regions. However, the current KASLR
code doesn't check EFI memory entries, and could choose a new kernel position
in non-mirrored regions. This will break the intended functionality of the
address range mirroring feature.
To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
region to process for adding candidate of randomization slot. If EFI is disabled
or no mirrored region found, still process the e820 memory map.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: linux-efi@vger.kernel.org
Cc: matt@codeblueprint.co.uk
Cc: n-horiguchi@ah.jp.nec.com
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com
[ Rewrote most of the text. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The existing map iteration helper for_each_efi_memory_desc_in_map can
only be used after the kernel initializes the EFI subsystem to set up
struct efi_memory_map.
Before that we also need iterate map descriptors which are stored in several
intermediate structures, like struct efi_boot_memmap for arch independent
usage and struct efi_info for x86 arch only.
Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
replace several places where that primitive is open coded.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Various improvements to the text. ]
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: linux-efi@vger.kernel.org
Cc: n-horiguchi@ah.jp.nec.com
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The clang warning 'address-of-packed-member' is disabled for the general
kernel code, also disable it for the x86 boot code.
This suppresses a bunch of warnings like this when building with clang:
./arch/x86/include/asm/processor.h:535:30: warning: taking address of
packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
unaligned pointer value [-Waddress-of-packed-member]
return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
^~~~~~~~~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
'this_cpu_read_stable'
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
^~~
./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
'percpu_stable_op'
: "p" (&(var)));
^~~
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
undef memcpy() and friends in boot/string.c so that the functions
defined here will have the correct names, otherwise we end up
up trying to redefine __builtin_memcpy() etc.
Surprisingly, GCC allows this (and, helpfully, discards the
__builtin_ prefix from the function name when compiling it),
but clang does not.
Adding these #undef's appears to preserve what I assume was
the original intent of the code.
Signed-off-by: Michael Davidson <md@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bernhard.Rosenkranzer@linaro.org
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170724235155.79255-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Every kernel build on x86 will result in some output:
Setup is 13084 bytes (padded to 13312 bytes).
System is 4833 kB
CRC 6d35fa35
Kernel: arch/x86/boot/bzImage is ready (#2)
This shuts it up, so that 'make -s' is truely silent as long as
everything works. Building without '-s' should produce unchanged
output.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170719125310.2487451-6-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros. Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active. Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.
The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.
Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask. These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.
Also, an early initialization function is added for SME. If SME is active,
this function:
- Updates the early_pmd_flags so that early page faults create mappings
with the encryption mask.
- Updates the __supported_pte_mask to include the encryption mask.
- Updates the protection_map entries to include the encryption mask so
that user-space allocations will automatically have the encryption mask
applied.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The original function process_e820_entry() only takes care of each
e820 entry passed.
And move the E820_TYPE_RAM checking logic into process_e820_entries().
And remove the redundent local variable 'addr' definition in
find_random_phys_addr().
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fanc.fnst@cn.fujitsu.com
Cc: izumi.taku@jp.fujitsu.com
Cc: matt@codeblueprint.co.uk
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time. Unlike glibc,
it covers buffer reads in addition to writes.
GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation. They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks. Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.
This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code. There will likely be issues caught in
regular use at runtime too.
Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:
* Some of the fortified string functions (strncpy, strcat), don't yet
place a limit on reads from the source based on __builtin_object_size of
the source buffer.
* Extending coverage to more string functions like strlcat.
* It should be possible to optionally use __builtin_object_size(x, 1) for
some functions (C strings) to detect intra-object overflows (like
glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
approach to avoid likely compatibility issues.
* The compile-time checks should be made available via a separate config
option which can be enabled by default (or always enabled) once enough
time has passed to get the issues it catches fixed.
Kees said:
"This is great to have. While it was out-of-tree code, it would have
blocked at least CVE-2016-3858 from being exploitable (improper size
argument to strlcpy()). I've sent a number of fixes for
out-of-bounds-reads that this detected upstream already"
[arnd@arndb.de: x86: fix fortified memcpy]
Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued work to add support for 5-level paging provided by future
Intel CPUs. In particular we switch the x86 GUP code to the generic
implementation. (Kirill A. Shutemov)
- Continued work to add PCID CPU support to native kernels as well.
In this round most of the focus is on reworking/refreshing the TLB
flush infrastructure for the upcoming PCID changes. (Andy
Lutomirski)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Delete a big outdated comment about TLB flushing
x86/mm: Don't reenter flush_tlb_func_common()
x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
x86/ftrace: Exclude functions in head64.c from function-tracing
x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
x86/mm: Remove reset_lazy_tlbstate()
x86/ldt: Simplify the LDT switching logic
x86/boot/64: Put __startup_64() into .head.text
x86/mm: Add support for 5-level paging for KASLR
x86/mm: Make kernel_physical_mapping_init() support 5-level paging
x86/mm: Add sync_global_pgds() for configuration with 5-level paging
x86/boot/64: Add support of additional page table level during early boot
x86/boot/64: Rename init_level4_pgt and early_level4_pgt
x86/boot/64: Rewrite startup_64() in C
x86/boot/compressed: Enable 5-level paging during decompression stage
x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
x86/boot/efi: Cleanup initialization of GDT entries
x86/asm: Fix comment in return_from_SYSCALL_64()
x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
...
Pull x86 cleanups from Ingo Molnar:
"Two small cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Remove unnecessary return from void function
x86/boot: Add missing strchr() declaration
Pull x86 boot updates from Ingo Molnar:
"The main changes in this cycle were KASLR improvements for rare
environments with special boot options, by Baoquan He. Also misc
smaller changes/cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Extend the lower bound of crash kernel low reservations
x86/boot: Remove unused copy_*_gs() functions
x86/KASLR: Use the right memcpy() implementation
Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options
x86/KASLR: Parse all 'memmap=' boot option entries
KASLR uses hack to detect whether we booted via startup_32() or
startup_64(): it checks what is loaded into cr3 and compares it to
_pgtables. _pgtables is the array of page tables where early code
allocates page table from.
KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
that's not true if we booted with 5-level paging enabled. In this case top
level page table is allocated separately and only the first p4d page table
is allocated from the array.
Let's modify the check to cover both 4- and 5-level paging cases.
The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
page table for 4th or 5th level, depending on configuration.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170628121730.43079-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kernel text KASLR is separated into physical address and virtual
address randomization. And for virtual address randomization, we
only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
but not the original kernel loading address 'output'.
The bug will cause kernel boot failure if kernel is loaded at a different
position than the address, 16M, which is decided at compiled time.
Kexec/kdump is such practical case.
To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
value.
Tested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately")
Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For kernel text KASLR, the virtual address is confined to area of 1G,
[0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
virtual address randomization, we only randomize to get an offset
between 16M and 1G, then add this offset to the starting address,
0xffffffff80000000. Here 16M is the offset which is decided at linking
stage. So the amount of the local variable 'virt_addr' which respresents
the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.
Add a debug check for the offset. If out of bounds, print error
message and hang there.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Sparse static analyzer emits this warning:
symbol 'strchr' was not declared. Should it be static?
This patch adds the appropriate extern declaration to string.h
to fix the warning.
Signed-off-by: Tommy Nguyen <remyabel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170623143601.GA20743@NoChina
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We need to cover two basic cases: when bootloader left us in 32-bit mode
and when bootloader enabled long mode.
The patch implements unified codepath to enabled 5-level paging for both
cases. It means case when we start in 32-bit mode, we first enable long
mode with 4-level and then switch over to 5-level paging.
Switching from 4-level to 5-level paging is not trivial. We cannot do it
directly. Setting LA57 in long mode would trigger #GP. So we need to
switch off long mode first and the then re-enable with 5-level paging.
NOTE: The need of switching off long mode means we are in trouble if
bootloader put us above 4G boundary. If bootloader wants to boot 5-level
paging kernel, it has to put kernel below 4G or enable 5-level paging on
it's own, so we could avoid the step.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We would need to switch temporarily to compatibility mode during booting
with 5-level paging enabled. It would require 32-bit code segment
descriptor.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is preparation for following patches without changing semantics of the
code.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kernel has several code paths that read CR3. Most of them assume that
CR3 contains the PGD's physical address, whereas some of them awkwardly
use PHYSICAL_PAGE_MASK to mask off low bits.
Add explicit mask macros for CR3 and convert all of the CR3 readers.
This will keep them from breaking when PCID is enabled.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
copy_from_gs() and copy_to_gs() are unused in the boot code. They have
actually never been used -- they were always commented out since their
addition in 2007:
5be8656615 ("String-handling functions for the new x86 setup code.")
So remove them -- they can be restored from history if needed.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170531081243.5709-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The decompressor has its own implementation of the string functions,
but has to include the right header to get those, while implicitly
including linux/string.h may result in a link error:
arch/x86/boot/compressed/kaslr.o: In function `choose_random_location':
kaslr.c:(.text+0xf51): undefined reference to `_mmx_memcpy'
This has appeared now as KASLR started using memcpy(), via:
d52e7d5a95 ("x86/KASLR: Parse all 'memmap=' boot option entries")
Other files in the decompressor already do the same thing.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170530091446.1000183-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'mem=' boot option limits the max address a system can use - any memory
region above the limit will be removed.
Furthermore, the 'memmap=nn[KMG]' variant (with no offset specified) has the same
behaviour as 'mem='.
KASLR needs to consider this when choosing the random position for
decompressing the kernel. Do it.
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: douly.fnst@cn.fujitsu.com
Cc: dyoung@redhat.com
Link: http://lkml.kernel.org/r/1494654390-23861-3-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In commit:
f28442497b ("x86/boot: Fix KASLR and memmap= collision")
... the memmap= option is parsed so that KASLR can avoid those reserved
regions. It uses cmdline_find_option() to get the value if memmap=
is specified, however the problem is that cmdline_find_option() can only
find the last entry if multiple memmap entries are provided. This
is not correct.
Address this by checking each command line token for a "memmap=" match
and parse each instance instead of using cmdline_find_option().
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: douly.fnst@cn.fujitsu.com
Cc: dyoung@redhat.com
Cc: m.mizuma@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.
Add the missing $(CROSS_COMPILE) prefix.
[ tglx: Rewrote changelog ]
Fixes: 98f7852537 ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Kernel identity mappings on x86-64 kernels are created in two
ways: by the early x86 boot code, or by kernel_ident_mapping_init().
Native kernels (which is the dominant usecase) use the former,
but the kexec and the hibernation code uses kernel_ident_mapping_init().
There's a subtle difference between these two ways of how identity
mappings are created, the current kernel_ident_mapping_init() code
creates identity mappings always using 2MB page(PMD level) - while
the native kernel boot path also utilizes gbpages where available.
This difference is suboptimal both for performance and for memory
usage: kernel_ident_mapping_init() needs to allocate pages for the
page tables when creating the new identity mappings.
This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
to address these concerns.
The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.
It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.
( Note that this change alone does not activate gbpages in kexec,
we are doing that in a separate patch. )
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The compressed boot function error() is used to halt execution, but it
wasn't marked with "noreturn". This fixes that in preparation for
supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation
on panic, and calls error(). GCC would warn about a noreturn function
calling a non-noreturn function:
arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’:
arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return
}
^
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20170506045116.GA2879@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 mm updates from Ingo Molnar:
"The main x86 MM changes in this cycle were:
- continued native kernel PCID support preparation patches to the TLB
flushing code (Andy Lutomirski)
- various fixes related to 32-bit compat syscall returning address
over 4Gb in applications, launched from 64-bit binaries - motivated
by C/R frameworks such as Virtuozzo. (Dmitry Safonov)
- continued Intel 5-level paging enablement: in particular the
conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)
- x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)
- ... plus misc updates, fixes and cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
x86/mm: Fix flush_tlb_page() on Xen
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm/64: Fix crash in remove_pagetable()
Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
x86/boot/e820: Remove a redundant self assignment
x86/mm: Fix dump pagetables for 4 levels of page tables
x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
x86/espfix: Add support for 5-level paging
x86/kasan: Extend KASAN to support 5-level paging
x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
x86/paravirt: Add 5-level support to the paravirt code
x86/mm: Define virtual memory map for 5-level paging
x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
x86/boot: Detect 5-level paging support
x86/mm/numa: Remove numa_nodemask_from_meminfo()
...
Dave found that a kdump kernel with KASLR enabled will reset to the BIOS
immediately if physical randomization failed to find a new position for
the kernel. A kernel with the 'nokaslr' option works in this case.
The reason is that KASLR will install a new page table for the identity
mapping, while it missed building it for the original kernel location
if KASLR physical randomization fails.
This only happens in the kexec/kdump kernel, because the identity mapping
has been built for kexec/kdump in the 1st kernel for the whole memory by
calling init_pgtable(). Here if physical randomizaiton fails, it won't build
the identity mapping for the original area of the kernel but change to a
new page table '_pgtable'. Then the kernel will triple fault immediately
caused by no identity mappings.
The normal kernel won't see this bug, because it comes here via startup_32()
and CR3 will be set to _pgtable already. In startup_32() the identity
mapping is built for the 0~4G area. In KASLR we just append to the existing
area instead of entirely overwriting it for on-demand identity mapping
building. So the identity mapping for the original area of kernel is still
there.
To fix it we just switch to the new identity mapping page table when physical
KASLR succeeds. Otherwise we keep the old page table unchanged just like
"nokaslr" does.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The minimum size for a new stack (512 bytes) setup for arch/x86/boot components
when the bootloader does not setup/provide a stack for the early boot components
is not "enough".
The setup code executing as part of early kernel startup code, uses the stack
beyond 512 bytes and accidentally overwrites and corrupts part of the BSS
section. This is exposed mostly in the early video setup code, where
it was corrupting BSS variables like force_x, force_y, which in-turn affected
kernel parameters such as screen_info (screen_info.orig_video_cols) and
later caused an exception/panic in console_init().
Most recent boot loaders setup the stack for early boot components, so this
stack overwriting into BSS section issue has not been exposed.
Signed-off-by: Ashish Kalra <ashish@bluestacks.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170419152015.10011-1-ashishkalra@Ashishs-MacBook-Pro.local
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's a conflict between ongoing level-5 paging support and
the E820 rewrite. Since the E820 rewrite is essentially ready,
merge it into x86/mm to reduce tree conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The E820 rework in WIP.x86/boot has gone through a couple of weeks
of exposure in -tip, merge it in a wider fashion.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In this initial implementation we force-require 5-level paging support
from the hardware, when compiled with CONFIG_X86_5LEVEL=y. (The kernel
will panic during boot on CPUs that don't support 5-level paging.)
We will implement boot-time switch between 4- and 5-level paging later.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sparse complains about missing forward declarations:
arch/x86/boot/compressed/error.c:8:6:
warning: symbol 'warn' was not declared. Should it be static?
arch/x86/boot/compressed/error.c:15:6:
warning: symbol 'error' was not declared. Should it be static?
Include the missing header file.
Signed-off-by: Zhengyi Shen <shenzhengyi@gmail.com>
Acked-by: Kess Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Include declarations for various symbols defined in the error.h header file
to fix the following Sparse warnings:
arch/x86/boot/compressed/error.c:8:6:
warning: symbol 'warn' was not declared. Should it be static?
arch/x86/boot/compressed/error.c:15:6:
warning: symbol 'error' was not declared. Should it be static?
Signed-off-by: Zhengyi Shen <shenzhengyi@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com
[ Fixed/enhanced the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 boot updates from Ingo Molnar:
"Misc updates:
- fix e820 error handling
- convert page table setup code from assembly to C
- fix kexec environment bug
- ... plus small cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kconfig: Remove misleading note regarding hibernation and KASLR
x86/boot: Fix KASLR and memmap= collision
x86/e820/32: Fix e820_search_gap() error handling on x86-32
x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
x86/e820: Make e820_search_gap() static and remove unused variables
Get the firmware's secure-boot status in the kernel boot wrapper and stash
it somewhere that the main kernel image can find.
The efi_get_secureboot() function is extracted from the ARM stub and (a)
generalised so that it can be called from x86 and (b) made to use
efi_call_runtime() so that it can be run in mixed-mode.
For x86, it is stored in boot_params and can be overridden by the boot
loader or kexec. This allows secure-boot mode to be passed on to a new
kernel.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-5-git-send-email-ard.biesheuvel@linaro.org
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide the ability to perform mixed-mode runtime service calls for x86 in
the same way the following commit provided the ability to invoke for boot
services:
0a637ee612 ("x86/efi: Allow invocation of arbitrary boot services")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eliminate the separate 32-bit and 64x- bit code paths by way of the shiny
new efi_call_proto() macro.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-3-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's one ARM, one x86_32 and one x86_64 version which can be folded
into a single shared version by masking their differences with the shiny
new efi_call_proto() macro.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus pointed out that relying on the compiler to pack structures with
enums is fragile not just for the kernel, but for external tooling as
well which might rely on our UAPI headers.
So separate the two from each other: introduce 'struct boot_e820_entry',
which is the boot protocol entry format.
This actually simplifies the code, as e820__update_table() is now never
called directly with boot protocol table entries - we can rely on
append_e820_table() and do a e820__update_table() call afterwards.
( This will allow further simplifications of __e820__update_table(),
but that will be done in a separate patch. )
This change also has the side effect of not modifying the bootparams structure
anymore - which might be useful for debugging. In theory we could even constify
the boot_params structure - at least from the E820 code's point of view.
Remove the uapi/asm/e820/types.h file, as it's not used anymore - all
kernel side E820 types are defined in asm/e820/types.h.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So there's a number of constants that start with "E820" but which
are not types - these create a confusing mixture when seen together
with 'enum e820_type' values:
E820MAP
E820NR
E820_X_MAX
E820MAX
To better differentiate the 'enum e820_type' values prefix them
with E820_TYPE_.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In line with the rename to 'struct e820_array', harmonize the naming of common e820
table variable names as well:
e820 => e820_array
e820_saved => e820_array_saved
e820_map => e820_array
initial_e820 => e820_array_init
This makes the variable names more consistent and easier to grep for.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'e820entry' and 'e820map' names have various annoyances:
- the missing underscore departs from the usual kernel style
and makes the code look weird,
- in the past I kept confusing the 'map' with the 'entry', because
a 'map' is ambiguous in that regard,
- it's not really clear from the 'e820map' that this is a regular
C array.
Rename them to 'struct e820_entry' and 'struct e820_array' accordingly.
( Leave the legacy UAPI header alone but do the rename in the bootparam.h
and e820/types.h file - outside tools relying on these defines should
either adjust their code, or should use the legacy header, or should
create their private copies for the definitions. )
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's an assembly guard in asm/e820/types.h, and only
a single .S file includes this header: arch/x86/boot/header.S,
but it does not actually make use of any of the E820 defines.
Remove the inclusion and remove the assembly guard as well.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h
spuriously, without making direct use of it.
Removing it is not simple: over the years various .c code learned to rely
on this indirect inclusion.
Remove the unnecessary include - this should speed up the kernel build a bit,
as a large header is not included anymore in totally unrelated code.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In line with asm/e820/types.h, move the e820 API declarations to
asm/e820/api.h and update all usage sites.
This is just a mechanical, obviously correct move & replace patch,
there will be subsequent changes to clean up the code and to make
better use of the new header organization.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CONFIG_RANDOMIZE_BASE=y relocates the kernel to a random base address.
However it does not take into account the memmap= parameter passed in from
the kernel command line. This results in the kernel sometimes being put in
the middle of memmap.
Teach KASLR to not insert the kernel in memmap defined regions. We support
up to 4 memmap regions: any additional regions will cause KASLR to disable.
The mem_avoid set has been augmented to add up to 4 unusable regions of
memmaps provided by the user to exclude those regions from the set of valid
address range to insert the uncompressed kernel image.
The nn@ss ranges will be skipped by the mem_avoid set since it indicates
that memory is useable.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: dan.j.williams@intel.com
Cc: david@fromorbit.com
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/148417664156.131935.2248592164852799738.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the missing declarations of basic string functions to string.h to allow
a clean build.
Fixes: 5be8656615 ("String-handling functions for the new x86 setup code.")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Link: http://lkml.kernel.org/r/1483781911-21399-1-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit ed68d7e9b9.
The patch wasn't quite correct -- there are non-Intel (and hence
non-486) CPUs that we support that don't have CPUID. Since we no
longer require CPUID for sync_core(), just revert the patch.
I think the relevant CPUs are Geode and Elan, but I'm not sure.
In principle, we should try to do better at identifying CPUID-less
CPUs in early boot, but that's more complicated.
Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel <Xen-devel@lists.xen.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/82acde18a108b8e353180dd6febcc2876df33f24.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 4fd06960f1 ("Use the new x86 setup code for i386") introduced a
reference to the make variable LINUX_INCLUDE. That reference got moved
around a bit and copied twice and now there are three references to it.
There has never been a definition of that variable. (Presumably that is
because it started out as a mistyped reference to LINUXINCLUDE.) So this
reference has always been an empty string. Let's remove it before it
spreads any further.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull x86 boot updates from Ingo Molnar:
"Misc cleanups/simplifications by Borislav Petkov, Paul Bolle and Wei
Yang"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Optimize fixmap page fixup
x86/boot: Simplify the GDTR calculation assembly code a bit
x86/boot/build: Remove always empty $(USERINCLUDE)
Pull EFI updates from Ingo Molnar:
"The main changes in this development cycle were:
- Implement EFI dev path parser and other changes to fully support
thunderbolt devices on Apple Macbooks (Lukas Wunner)
- Add RNG seeding via the EFI stub, on ARM/arm64 (Ard Biesheuvel)
- Expose EFI framebuffer configuration to user-space, to improve
tooling (Peter Jones)
- Misc fixes and cleanups (Ivan Hu, Wei Yongjun, Yisheng Xie, Dan
Carpenter, Roy Franz)"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit
thunderbolt: Compile on x86 only
thunderbolt, efi: Fix Kconfig dependencies harder
thunderbolt, efi: Fix Kconfig dependencies
thunderbolt: Use Device ROM retrieved from EFI
x86/efi: Retrieve and assign Apple device properties
efi: Allow bitness-agnostic protocol calls
efi: Add device path parser
efi/arm*/libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
efi/libstub: Add random.c to ARM build
efi: Add support for seeding the RNG from a UEFI config table
MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem
efi/libstub: Fix allocation size calculations
efi/efivar_ssdt_load: Don't return success on allocation failure
efifb: Show framebuffer layout as device attributes
efi/efi_test: Use memdup_user() as a cleanup
efi/efi_test: Fix uninitialized variable 'rv'
efi/efi_test: Fix uninitialized variable 'datasize'
efi/arm*: Fix efi_init() error handling
efi: Remove unused include of <linux/version.h>
Since the bootloader may load the compressed x86 kernel at any address,
it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.
Otherwise, linker in binutils 2.27 will optimize GOT load into the
absolute address when building the compressed x86 kernel as a non-PIE
executable.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Small wording changes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linux will have all kinds of sporadic problems on systems that don't
have the CPUID instruction unless CONFIG_M486=y. In particular,
sync_core() will explode.
I believe that these kernels had a better chance of working before
commit 05fb3c199b ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS
even if we don't have CPUID"). That commit inadvertently fixed a
serious bug: we used to fail to detect the FPU if CPUID wasn't
present. Because we also used to forget to set X86_FEATURE_ALWAYS, we
end up with no cpu feature bits set at all. This meant that
alternative patching didn't do anything and, if paravirt was disabled,
we could plausibly finish the entire boot process without calling
sync_core().
Rather than trying to work around these issues, just have the kernel
fail loudly if it's running on a CPUID-less 486, doesn't have CPUID,
and doesn't have CONFIG_M486 set.
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Apple's EFI drivers supply device properties which are needed to support
Macs optimally. They contain vital information which cannot be obtained
any other way (e.g. Thunderbolt Device ROM). They're also used to convey
the current device state so that OS drivers can pick up where EFI
drivers left (e.g. GPU mode setting).
There's an EFI driver dubbed "AAPL,PathProperties" which implements a
per-device key/value store. Other EFI drivers populate it using a custom
protocol. The macOS bootloader /System/Library/CoreServices/boot.efi
retrieves the properties with the same protocol. The kernel extension
AppleACPIPlatform.kext subsequently merges them into the I/O Kit
registry (see ioreg(8)) where they can be queried by other kernel
extensions and user space.
This commit extends the efistub to retrieve the device properties before
ExitBootServices is called. It assigns them to devices in an fs_initcall
so that they can be queried with the API in <linux/property.h>.
Note that the device properties will only be available if the kernel is
booted with the efistub. Distros should adjust their installers to
always use the efistub on Macs. grub with the "linux" directive will not
work unless the functionality of this commit is duplicated in grub.
(The "linuxefi" directive should work but is not included upstream as of
this writing.)
The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and
looks like this:
typedef struct {
unsigned long version; /* 0x10000 */
efi_status_t (*get) (
IN struct apple_properties_protocol *this,
IN struct efi_dev_path *device,
IN efi_char16_t *property_name,
OUT void *buffer,
IN OUT u32 *buffer_len);
/* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */
efi_status_t (*set) (
IN struct apple_properties_protocol *this,
IN struct efi_dev_path *device,
IN efi_char16_t *property_name,
IN void *property_value,
IN u32 property_value_len);
/* allocates copies of property name and value */
/* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */
efi_status_t (*del) (
IN struct apple_properties_protocol *this,
IN struct efi_dev_path *device,
IN efi_char16_t *property_name);
/* EFI_SUCCESS, EFI_NOT_FOUND */
efi_status_t (*get_all) (
IN struct apple_properties_protocol *this,
OUT void *buffer,
IN OUT u32 *buffer_len);
/* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */
} apple_properties_protocol;
Thanks to Pedro Vilaça for this blog post which was helpful in reverse
engineering Apple's EFI drivers and bootloader:
https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/
If someone at Apple is reading this, please note there's a memory leak
in your implementation of the del() function as the property struct is
freed but the name and value allocations are not.
Neither the macOS bootloader nor Apple's EFI drivers check the protocol
version, but we do to avoid breakage if it's ever changed. It's been the
same since at least OS X 10.6 (2009).
The get_all() function conveniently fills a buffer with all properties
in marshalled form which can be passed to the kernel as a setup_data
payload. The number of device properties is dynamic and can change
between a first invocation of get_all() (to determine the buffer size)
and a second invocation (to retrieve the actual buffer), hence the
peculiar loop which does not finish until the buffer size settles.
The macOS bootloader does the same.
The setup_data payload is later on unmarshalled in an fs_initcall. The
idea is that most buses instantiate devices in "subsys" initcall level
and drivers are usually bound to these devices in "device" initcall
level, so we assign the properties in-between, i.e. in "fs" initcall
level.
This assumes that devices to which properties pertain are instantiated
from a "subsys" initcall or earlier. That should always be the case
since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only
supports ACPI and PCI nodes and we've fully scanned those buses during
"subsys" initcall level.
The second assumption is that properties are only needed from a "device"
initcall or later. Seems reasonable to me, but should this ever not work
out, an alternative approach would be to store the property sets e.g. in
a btree early during boot. Then whenever device_add() is called, an EFI
Device Path would have to be constructed for the newly added device,
and looked up in the btree. That way, the property set could be assigned
to the device immediately on instantiation. And this would also work for
devices instantiated in a deferred fashion. It seems like this approach
would be more complicated and require more code. That doesn't seem
justified without a specific use case.
For comparison, the strategy on macOS is to assign properties to objects
in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
That approach is definitely wrong as it fails for devices not present in
the namespace: The NHI EFI driver supplies properties for attached
Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
level behind the host controller is described in the namespace.
Consequently macOS cannot assign properties for chained devices. With
Thunderbolt 2 they started to describe three device levels behind host
controllers in the namespace but this grossly inflates the SSDT and
still fails if the user daisy-chained more than three devices.
We copy the property names and values from the setup_data payload to
swappable virtual memory and afterwards make the payload available to
the page allocator. This is just for the sake of good housekeeping, it
wouldn't occupy a meaningful amount of physical memory (4444 bytes on my
machine). Only the payload is freed, not the setup_data header since
otherwise we'd break the list linkage and we cannot safely update the
predecessor's ->next link because there's no locking for the list.
The payload is currently not passed on to kexec'ed kernels, same for PCI
ROMs retrieved by setup_efi_pci(). This can be added later if there is
demand by amending setup_efi_state(). The payload can then no longer be
made available to the page allocator of course.
Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1]
Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Vilaça <reverser@put.as>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: grub-devel@gnu.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch calculates the GDTR's base address via a single instruction.
( EBP contains the address where it is loaded and GDTR's base address is
already set to "gdt" in compilation. It is fine to get the correct base
address by adding the delta to GDTR's base address. )
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1478015364-5547-1-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commmit b6eea87fc685:
("x86, boot: Explicitly include autoconf.h for hostprogs")
correctly noted:
[...] that because $(USERINCLUDE) isn't exported by
the top-level Makefile it's actually empty in arch/x86/boot/Makefile.
So let's do the sane thing and remove the reference to that make variable.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1478166468-9760-1-git-send-email-pebolle@tiscali.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
and allow drivers to permanently reserve EFI boot services regions
on x86, as well as ARM/arm64 - Matt Fleming
* Add ARM support for the EFI esrt driver - Ard Biesheuvel
* Make the EFI runtime services and efivar API interruptible by
swapping spinlocks for semaphores - Sylvain Chouleur
* Provide the EFI identity mapping for kexec which allows kexec to
work on SGI/UV platforms with requiring the "noefi" kernel command
line parameter - Alex Thorlton
* Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel
* Merge the EFI test driver being carried out of tree until now in
the FWTS project - Ivan Hu
* Expand the list of flags for classifying EFI regions as "RAM" on
arm64 so we align with the UEFI spec - Ard Biesheuvel
* Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
services function table for direct calls, alleviating us from
having to maintain the custom function table - Lukas Wunner
* Miscellaneous cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQI2BAABCAAgBQJX0tCTGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
wI0jPVWLVBAAn/iM91Vmhggdk3t0wCMJzrNGonw61VJ9TZJVbCUJyiH0qdDUThhj
R4rO+6Vf6yOuyswu+mGmae61tfsjwJHH+IPpB8nRLIGQRwzoxk+aGC7FzmQ0ISVO
wIdv5shsmeWhFAyNB1D4hzlp1NxOZaqcU/0cfUVGe4HmK0Js3tUpWWx8VgJ7yvW+
X1PBbfyChArGqiwV6FJz/mJxRAgByUfhvYMcX9HhQkou6F4U5Y8l3vlhUMbuAZAi
ZfG2LWGYCQ+F4XKxMq2QDAtdUgBzlYWw6W60o55x9WO4cEVSzNVRgedto5o1Zea9
2QGEr94gim+e5cJ/HeDIEmbWZhAqIdcNDqXSSBd1CDVQytp4PNAn6rxk+2S9kxoe
T9Mk523HEabo+AZvDAPPJlzcsnIe83JYy69M1xFvcP25ebk7y2BwQtd1jwWPrPDQ
Q/llzF93aezUFR/guvIw0oHckhQl0ZkNedL9Tq4+UKL0ibp2X4gSX636/x4PkBSP
5+pyfmO1SAqTiiMQGQMnp4+ngPQeQrxkmVnh1P7cKlTNXg1IoS03t46Xn2Pj10cd
3KneVDeN9DKIAOn7wPKuPnjTho+9FH36xbwTaIgbt0cWuFFfu090rmqOQfjAJEDN
foHzsMZ7S6CmeOJnj97NNR8sMQDcc+p9bh1KXpJIHaZAgrKmvqPZpMk=
=G7L6
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core
Pull EFI updates from Matt Fleming:
"* Refactor the EFI memory map code into architecture neutral files
and allow drivers to permanently reserve EFI boot services regions
on x86, as well as ARM/arm64 - Matt Fleming
* Add ARM support for the EFI esrt driver - Ard Biesheuvel
* Make the EFI runtime services and efivar API interruptible by
swapping spinlocks for semaphores - Sylvain Chouleur
* Provide the EFI identity mapping for kexec which allows kexec to
work on SGI/UV platforms with requiring the "noefi" kernel command
line parameter - Alex Thorlton
* Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel
* Merge the EFI test driver being carried out of tree until now in
the FWTS project - Ivan Hu
* Expand the list of flags for classifying EFI regions as "RAM" on
arm64 so we align with the UEFI spec - Ard Biesheuvel
* Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
services function table for direct calls, alleviating us from
having to maintain the custom function table - Lukas Wunner
* Miscellaneous cleanups and fixes"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently allow invocation of 8 boot services with efi_call_early().
Not included are LocateHandleBuffer and LocateProtocol in particular.
For graphics output or to retrieve PCI ROMs and Apple device properties,
we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
combo, which is cumbersome and needs more code.
The ARM folks allow invocation of the full set of boot services but are
restricted to our 8 boot services in functions shared across arches.
Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
struct efi_config, let's rework efi_call_early() to allow invocation of
arbitrary boot services by selecting the 64 bit vs 32 bit code path in
the macro itself.
When compiling for 32 bit or for 64 bit without mixed mode, the unused
code path is optimized away and the binary code is the same as before.
But on 64 bit with mixed mode enabled, this commit adds one compare
instruction to each invocation of a boot service and, depending on the
code path selected, two jump instructions. (Most of the time gcc
arranges the jumps in the 32 bit code path.) The result is a minuscule
performance penalty and the binary code becomes slightly larger and more
difficult to read when disassembled. This isn't a hot path, so these
drawbacks are arguably outweighed by the attainable simplification of
the C code. We have some overhead anyway for thunking or conversion
between calling conventions.
The 8 boot services can consequently be removed from struct efi_config.
No functional change intended (for now).
Example -- invocation of free_pool before (64 bit code path):
0x2d4 movq %ds:efi_early, %rdx ; efi_early
0x2db movq %ss:arg_0-0x20(%rsp), %rsi
0x2e0 xorl %eax, %eax
0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool
0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call()
Example -- invocation of free_pool after (64 / 32 bit mixed code path):
0x0dc movq %ds:efi_early, %rax ; efi_early
0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ?
0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call()
0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services
0x0ef je $0x150
0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit)
0x0f5 xorl %eax, %eax
0x0f7 callq *%rdx
...
0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit)
0x153 jmp $0x0f5
Size of eboot.o text section:
CONFIG_X86_32: 6464 before, 6318 after
CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after
CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Although very unlikey, if size is too small or zero, then we end up with
status not being set and returning garbage. Instead, initializing status to
EFI_INVALID_PARAMETER to indicate that size is invalid in the calls to
setup_uga32 and setup_uga64.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
The eboot code directly calls ExitBootServices. This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct. The
eboot code attempts allocations after calling ExitBootSerives which is
not permitted per the spec. Call the efi_exit_boot_services() helper
intead, which handles the allocation scenario properly.
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
efi_get_memory_map() allocates a buffer to store the memory map that it
retrieves. This buffer may need to be reused by the client after
ExitBootServices() is called, at which point allocations are not longer
permitted. To support this usecase, provide the allocated buffer size back
to the client, and allocate some additional headroom to account for any
reasonable growth in the map that is likely to happen between the call to
efi_get_memory_map() and the client reusing the buffer.
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Pull kbuild updates from Michal Marek:
- GCC plugin support by Emese Revfy from grsecurity, with a fixup from
Kees Cook. The plugins are meant to be used for static analysis of
the kernel code. Two plugins are provided already.
- reduction of the gcc commandline by Arnd Bergmann.
- IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada
- bin2c fix by Michael Tautschnig
- setlocalversion fix by Wolfram Sang
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
gcc-plugins: disable under COMPILE_TEST
kbuild: Abort build on bad stack protector flag
scripts: Fix size mismatch of kexec_purgatory_size
kbuild: make samples depend on headers_install
Kbuild: don't add obj tree in additional includes
Kbuild: arch: look for generated headers in obtree
Kbuild: always prefix objtree in LINUXINCLUDE
Kbuild: avoid duplicate include path
Kbuild: don't add ../../ to include path
vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
kconfig.h: use already defined macros for IS_REACHABLE() define
export.h: use __is_defined() to check if __KSYM_* is defined
kconfig.h: use __is_defined() to check if MODULE is defined
kbuild: setlocalversion: print error to STDERR
Add sancov plugin
Add Cyclomatic complexity GCC plugin
GCC plugin infrastructure
Shared library support
Pull x86 boot updates from Ingo Molnar:
"The main changes:
- add initial commits to randomize kernel memory section virtual
addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
(Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)
- enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
Cook)
- EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)
- misc cleanups/fixes"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify EBDA-vs-BIOS reservation logic
x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
x86/boot: Reorganize and clean up the BIOS area reservation code
x86/mm: Do not reference phys addr beyond kernel
x86/mm: Add memory hotplug support for KASLR memory randomization
x86/mm: Enable KASLR for vmalloc memory regions
x86/mm: Enable KASLR for physical mapping memory regions
x86/mm: Implement ASLR for kernel memory regions
x86/mm: Separate variable for trampoline PGD
x86/mm: Add PUD VA support for physical mapping
x86/mm: Update physical mapping variable names
x86/mm: Refactor KASLR entropy functions
x86/KASLR: Fix boot crash with certain memory configurations
x86/boot/64: Add forgotten end of function marker
x86/KASLR: Allow randomization below the load address
x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
x86/KASLR: Randomize virtual address separately
x86/KASLR: Clarify identity map interface
x86/boot: Refuse to build with data relocations
x86/KASLR, x86/power: Remove x86 hibernation restrictions
Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:
- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)
- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)
- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)
- MPX enhancements (Dave Hansen)
- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)
- hweight and entry code cleanups (Borislav Petkov)
- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)
- syscall entry code optimizations (Paolo Bonzini)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...
There are very few files that need add an -I$(obj) gcc for the preprocessor
or the assembler. For C files, we add always these for both the objtree and
srctree, but for the other ones we require the Makefile to add them, and
Kbuild then adds it for both trees.
As a preparation for changing the meaning of the -I$(obj) directive to
only refer to the srctree, this changes the two instances in arch/x86 to use
an explictit $(objtree) prefix where needed, otherwise we won't find the
headers any more, as reported by the kbuild 0day builder.
arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
Landing) has an erratum where a processor thread setting the Accessed
or Dirty bits may not do so atomically against its checks for the
Present bit. This may cause a thread (which is about to page fault)
to set A and/or D, even though the Present bit had already been
atomically cleared.
These bits are truly "stray". In the case of the Dirty bit, the
thread associated with the stray set was *not* allowed to write to
the page. This means that we do not have to launder the bit(s); we
can simply ignore them.
If the PTE is used for storing a swap index or a NUMA migration index,
the A bit could be misinterpreted as part of the swap type. The stray
bits being set cause a software-cleared PTE to be interpreted as a
swap entry. In some cases (like when the swap index ends up being
for a non-existent swapfile), the kernel detects the stray value
and WARN()s about it, but there is no guarantee that the kernel can
always detect it.
When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
to move the swap PTE format around to avoid these troublesome bits.
But, 32-bit non-PAE is tight on bits. So, disallow it from running
on this hardware. I can't imagine anyone wanting to run 32-bit
non-highmem kernels on this hardware, but disallowing them from
running entirely is surely the safe thing to do.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the physical mapping in the list of randomized memory regions.
The physical memory mapping holds most allocations from boot and heap
allocators. Knowing the base address and physical memory size, an attacker
can deduce the PDE virtual address for the vDSO memory page. This attack
was demonstrated at CanSecWest 2016, in the following presentation:
"Getting Physical: Extreme Abuse of Intel Based Paged Systems":
https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf
(See second part of the presentation).
The exploits used against Linux worked successfully against 4.6+ but
fail with KASLR memory enabled:
https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits
Similar research was done at Google leading to this patch proposal.
Variants exists to overwrite /proc or /sys objects ACLs leading to
elevation of privileges. These variants were tested against 4.6+.
The page offset used by the compressed kernel retains the static value
since it is not yet randomized during this boot stage.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ye Xiaolong reported this boot crash:
|
| XZ-compressed data is corrupt
|
| -- System halted
|
Fix the bug in mem_avoid_overlap() of finding the earliest overlap.
Reported-and-tested-by: Ye Xiaolong <xiaolong.ye@intel.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove unused variable 'efi', it is never used. This fixes the following
clang build warning:
arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the kernel image physical address randomization's lower
boundary is the original kernel load address.
For bootloaders that load kernels into very high memory (e.g. kexec),
this means randomization takes place in a very small window at the
top of memory, ignoring the large region of physical memory below
the load address.
Since mem_avoid[] is already correctly tracking the regions that must be
avoided, this patch changes the minimum address to whatever is less:
512M (to conservatively avoid unknown things in lower memory) or the
load address. Now, for example, if the kernel is loaded at 8G, [512M,
8G) will be added to the list of possible physical memory positions.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog, refactored the code to use min(). ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
[ Edited the changelog some more, plus the code comment as well. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We want the physical address to be randomized anywhere between
16MB and the top of physical memory (up to 64TB).
This patch exchanges the prior slots[] array for the new slot_areas[]
array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
address offset for 64-bit. As before, process_e820_entry() walks
memory and populates slot_areas[], splitting on any detected mem_avoid
collisions.
Finally, since the slots[] array and its associated functions are not
needed any more, so they are removed.
Based on earlier patches by Baoquan He.
Originally-from: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.
On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.
Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.
[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This extracts the call to prepare_level4() into a top-level function
that the user of the pagetable.c interface must call to initialize
the new page tables. For clarity and to match the "finalize" function,
it has been renamed to initialize_identity_maps(). This function also
gains the initialization of mapping_info so we don't have to do it each
time in add_identity_map().
Additionally add copyright notice to the top, to make it clear that the
bulk of the pagetable.c code was written by Yinghai, and that I just
added bugs later. :)
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The compressed kernel is built with -fPIC/-fPIE so that it can run in any
location a bootloader happens to put it. However, since ELF relocation
processing is not happening (and all the relocation information has
already been stripped at link time), none of the code can use data
relocations (e.g. static assignments of pointers). This is already noted
in a warning comment at the top of misc.c, but this adds an explicit
check for the condition during the linking stage to block any such bugs
from appearing.
If this was in place with the earlier bug in pagetable.c, the build
would fail like this:
...
CC arch/x86/boot/compressed/pagetable.o
DATAREL arch/x86/boot/compressed/vmlinux
error: arch/x86/boot/compressed/pagetable.o has data relocations!
make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
...
A clean build shows:
...
CC arch/x86/boot/compressed/pagetable.o
DATAREL arch/x86/boot/compressed/vmlinux
LD arch/x86/boot/compressed/vmlinux
...
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the following fix:
70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")
... there is no longer a problem with hibernation resuming a
KASLR-booted kernel image, so remove the restriction.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove open-coded uses of set instructions to use CC_SET()/CC_OUT() in
arch/x86/boot/boot.h.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1465414726-197858-10-git-send-email-hpa@linux.intel.com
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The gcc people have confirmed that using "bool" when combined with
inline assembly always is treated as a byte-sized operand that can be
assumed to be 0 or 1, which is exactly what the SET instruction
emits. Change the output types and intermediate variables of as many
operations as practical to "bool".
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1465414726-197858-3-git-send-email-hpa@linux.intel.com
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Stable Tree <stable@vger.kernel.org>
Pull kbuild updates from Michal Marek:
- new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
unexports symbols which are not used in the current config [Nicolas
Pitre]
- several kbuild rule cleanups [Masahiro Yamada]
- warning option adjustments for gcov etc [Arnd Bergmann]
- a few more small fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
kbuild: move -Wunused-const-variable to W=1 warning level
kbuild: fix if_change and friends to consider argument order
kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
gcov: disable -Wmaybe-uninitialized warning
gcov: disable tree-loop-im to reduce stack usage
gcov: disable for COMPILE_TEST
Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
kbuild: forbid kernel directory to contain spaces and colons
kbuild: adjust ksym_dep_filter for some cmd_* renames
kbuild: Fix dependencies for final vmlinux link
kbuild: better abstract vmlinux sequential prerequisites
kbuild: fix call to adjust_autoksyms.sh when output directory specified
kbuild: Get rid of KBUILD_STR
kbuild: rename cmd_as_s_S to cmd_cpp_s_S
kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
kbuild: drop redundant "PHONY += FORCE"
kbuild: delete unnecessary "@:"
kbuild: mark help target as PHONY
...
Pull x86 boot updates from Ingo Molnar:
"The biggest changes in this cycle were:
- prepare for more KASLR related changes, by restructuring, cleaning
up and fixing the existing boot code. (Kees Cook, Baoquan He,
Yinghai Lu)
- simplifly/concentrate subarch handling code, eliminate
paravirt_enabled() usage. (Luis R Rodriguez)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/KASLR: Clarify purpose of each get_random_long()
x86/KASLR: Add virtual address choosing function
x86/KASLR: Return earliest overlap when avoiding regions
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
x86/boot: Add missing file header comments
x86/KASLR: Initialize mapping_info every time
x86/boot: Comment what finalize_identity_maps() does
x86/KASLR: Build identity mappings on demand
x86/boot: Split out kernel_ident_mapping_init()
x86/boot: Clean up indenting for asm/boot.h
x86/KASLR: Improve comments around the mem_avoid[] logic
x86/boot: Simplify pointer casting in choose_random_location()
x86/KASLR: Consolidate mem_avoid[] entries
x86/boot: Clean up pointer casting
x86/boot: Warn on future overlapping memcpy() use
x86/boot: Extract error reporting functions
x86/boot: Correctly bounds-check relocations
x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
x86/boot: Fix "run_size" calculation
x86/boot: Calculate decompression size during boot not build
...
KASLR will be calling get_random_long() twice, but the debug output
won't distinguishing between them. This patch adds a report on when it
is fetching the physical vs virtual address. With this, once the virtual
offset is separate, the report changes from:
KASLR using RDTSC...
KASLR using RDTSC...
into:
Physical KASLR using RDTSC...
Virtual KASLR using RDTSC...
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_addr() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.
For clarity and readability, find_random_addr() is renamed to
find_random_phys_addr() and has "size" renamed to "image_size" to match
find_random_virt_addr().
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, refactored slot calculation for readability. ]
[ Renamed find_random_phys_addr() and size argument. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for being able to detect where to split up contiguous
memory regions that overlap with memory regions to avoid, we need to
pass back what the earliest overlapping region was. This modifies the
overlap checker to return that information.
Based on a separate mem_min_overlap() implementation by Baoquan He.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to support KASLR moving the kernel anywhere in physical memory
(which could be up to 64TB), we need to handle counting the potential
randomization locations in a more efficient manner.
In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time. This method would cost too much memory and it's
also very inefficient to get and save the slot information into the slot
array one by one.
This patch introduces 'struct slot_area' to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas[] will avoid the mem_avoid[] regions.
Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this level of fragmentation will be extremely rare, but we
choose a suitably large value (100) for the array. If setup_data forces
the slot_area array to become highly fragmented and there are more
slots available beyond the first 100 found, the rest will be ignored
for KASLR selection.
The function store_slot_info() is used to calculate the number of slots
available in the passed-in memory region and stores it into slot_areas[]
after adjusting for alignment and size requirements.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, squashed with new functions. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were some files with missing header comments. Since they are
included from both compressed and regular kernels, make note of that.
Also corrects a typo in the mem_avoid comments.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As it turns out, mapping_info DOES need to be initialized every
time, because pgt_data address could be changed during kernel
relocation. So it can not be build time assigned.
Without this, page tables were not being corrected updated, which
could cause reboots when a physical address beyond 2G was chosen.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So it is not really obvious that finalize_identity_maps() doesn't do any
finalization but it *actually* writes CR3 with the ident PGD. Comment
that at the call site.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bhe@redhat.com
Cc: dyoung@redhat.com
Cc: jkosina@suse.cz
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Cc: vgoyal@redhat.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20160507100541.GA24613@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identity mapping.
To support ranges above this, we need to have an identity mapping for the
desired memory range before we can decompress (and later run) the kernel.
32-bit kernels already have the needed identity mapping. This patch adds
identity mappings for the needed memory ranges on 64-bit kernels. This
happens in two possible boot paths:
If loaded via startup_32(), we need to set up the needed identity map.
If loaded from a 64-bit bootloader, the bootloader will have already
set up an identity mapping, and we'll start via the compressed kernel's
startup_64(). In this case, the bootloader's page tables need to be
avoided while selecting the new uncompressed kernel location. If not,
the decompressor could overwrite them during decompression.
To accomplish this, we could walk the pagetable and find every page
that is used, and add them to mem_avoid, but this needs extra code and
will require increasing the size of the mem_avoid array.
Instead, we can create a new set of page tables for our own identity
mapping instead. The pages for the new page table will come from the
_pagetable section of the compressed kernel, which means they are
already contained by in mem_avoid array. To do this, we reuse the code
from the uncompressed kernel's identity mapping routines.
The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
init_size, as now the compressed kernel's _rodata to _end will contribute
to init_size.
To handle the possible mappings, we need to increase the existing page
table buffer size:
When booting via startup_64(), we need to cover the old VO, params,
cmdline and uncompressed kernel. In an extreme case we could have them
all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
This gets us to 19 pages total.
When booting via startup_32(), KASLR could move the uncompressed kernel
above 4G, so we need to create extra identity mappings, which should only
need (2+2) pages at most when it is beyond the 512G boundary. So 19
pages is sufficient for this case as well.
The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
names to maintain logical consistency with the existing BOOT_HEAP_SIZE
and BOOT_STACK_SIZE defines.
This patch is based on earlier patches from Yinghai Lu and Baoquan He.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This attempts to improve the comments that describe how the memory
range used for decompression is avoided. Additionally uses an enum
instead of raw numbers for the mem_avoid[] indexing.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20160506194459.GA16480@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The mem_avoid[] array is used to track positions that should be avoided (like
the compressed kernel, decompression code, etc) when selecting a memory
position for the randomly relocated kernel. Since ZO is now at the end of
the decompression buffer and the decompression code (and its heap and
stack) are at the front, we can safely consolidate the decompression entry,
the heap entry, and the stack entry. The boot_params memory, however, could
be elsewhere, so it should be explicitly included.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rwrote changelog, cleaned up code comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently extract_kernel() defines the input and output buffer pointers
as "unsigned char *" since that's effectively what they are. It passes
these to the decompressor routine and to the ELF parser, which both
logically deal with buffer pointers too. There is some casting ("unsigned
long") done to validate the numerical value of the pointers, but it is
relatively limited.
However, choose_random_location() operates almost exclusively on the
numerical representation of these pointers, so it ended up carrying
a lot of "unsigned long" casts. With the future physical/virtual split
these casts were going to multiply, so this attempts to solve the
problem by doing all the casting in choose_random_location()'s entry
and return instead of through-out the code. Adjusts argument names to
be more meaningful, and changes one us of "choice" to "output" to make
the future physical/virtual split more clear (i.e. "choice" should be
strictly a function return value and not used as an intermediate).
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462486436-3707-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If an overlapping memcpy() is ever attempted, we should at least report
it, in case it might lead to problems, so it could be changed to a
memmove() call instead.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1462229461-3370-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently to use warn(), a caller would need to include misc.h. However,
this means they would get the (unavailable during compressed boot)
gcc built-in memcpy family of functions. But since string.c is defining
these memcpy functions for use by misc.c, we end up in a weird circular
dependency.
To break this loop, move the error reporting functions outside of misc.c
with their own header so that they can be independently included by
other sources. Since the screen-writing routines use memmove(), keep the
low-level *_putstr() functions in misc.c.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1462229461-3370-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows. The valid range
should be VO [_text, __bss_start] in the loaded physical address space.
This patch adds an export for __bss_start in voffset.h and uses it to
set the correct limit for max_addr.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since 'run_size' is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them, and renames
'run_size' to the more descriptive 'kernel_total_size'.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the "run_size" variable holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections from the vmlinux, and adds them as follows:
run_size = $(( $offsetA + $sizeA + $sizeB ))
However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.
First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.
[bhe@x1 linux]$ objdump -h vmlinux
vmlinux: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
ALLOC
28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
ALLOC
Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:
0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000
However, if we instead examine the ELF LOAD program headers, we see a
different picture.
[bhe@x1 linux]$ readelf -l vmlinux
Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000b5e000 0x0000000000b5e000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000145000 0x0000000000145000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
0x0000000000018158 0x0000000000018158 RW 200000
LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
0x000000000016a000 0x0000000000301000 RWE 200000
NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
0x00000000000001bc 0x00000000000001bc 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
__ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
__modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .parainstructions
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
04 .notes
As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:
0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000
So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000
0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.
Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Fixes: e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:
- output + output_len (VO) could be much bigger than input + input_len
(ZO). In this case, the decompressed kernel plus relocs could overwrite
the decompression code while it is running.
- The head code of ZO could be bigger than z_extract_offset. In this case
an overwrite could happen when the head code is running to move ZO to
the end of buffer. Though currently the size of the head code is very
small it's still a potential risk. Since there is no rule to limit the
size of the head code of ZO, it runs the risk of suddenly becoming a
(hard to find) bug.
Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen,
and further corrects the comments describing the calculations.
Since we have (in the previous patch) made ZO always be located against
the end of decompression buffer, z_extract_offset is only used here to
calculate an appropriate buffer size (INIT_SIZE), and is not longer used
elsewhere. As such, it can be removed from voffset.h.
Additionally clean up #if/#else #define to improve readability.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:
77d1a49995 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e3 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
Specifically:
All names prefixed with 'VO_':
- relate to the uncompressed kernel image
- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
All names prefixed with 'ZO_':
- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code
- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
The 'INIT_SIZE' value is used to find the larger of the two image sizes:
#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif
The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.
To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.
z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)
When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:
|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|
When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.
|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|
To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.
This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.
Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.
To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Graphics Output Protocol code executes in the stub, so create a generic
version based on the x86 version in libstub so that we can move other archs
to it in subsequent patches. The new source file gop.c is added to the
libstub build for all architectures, but only wired up for x86.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-18-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of moving this code to drivers/firmware/efi and reusing
it on ARM and arm64, apply any changes that will be required to make this
code build for other architectures. This should make it easier to track
down problems that this move may cause to its operation on x86.
Note that the generic version uses slightly different ways of casting the
protocol methods and some other variables to the correct types, since such
method calls are not loosely typed on ARM and arm64 as they are on x86.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-17-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of having non-standard memcpy() behavior, explicitly call the new
function memmove(), make it available to the decompressors, and switch
the two overlap cases (screen scrolling and ELF parsing) to use memmove().
Additionally documents the purpose of compressed/string.c.
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20160426214606.GA5758@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If KASLR is built in but not available at run-time (either due to the
current conflict with hibernation, command-line request, or e820 parsing
failures), announce the state explicitly. To support this, a new "warn"
function is created, based on the existing "error" function.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Two uses of memcpy() (screen scrolling and ELF parsing) were handling
overlapping memory areas. While there were no explicitly noticed bugs
here (yet), it is best to fix this so that the copying will always be
safe.
Instead of making a new memmove() function that might collide with other
memmove() definitions in the decompressors, this just makes the compressed
boot code's copy of memcpy() overlap-safe.
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461185746-8017-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This rearranges the pieces needed to include the decompressor code
in misc.c. It wasn't obvious why things were there, so a comment was
added and definitions consolidated.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.
[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are currently 6 other decompressors
that could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation, fixes several typos, moves
the comment to header.S, updates references, and adds a note at the end
of the decompressor include list to remind us about updating the comment
in the future.
(Instead of moving the comment to mkpiggy.c, where the calculation
is currently happening, it is being moved to header.S because
the calculations in mkpiggy.c will be removed in favor of header.S
calculations in a following patch, and it seemed like overkill to move
the giant comment twice, especially when there's already reference to
z_extract_offset in header.S.)
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, cleaned up comment style, moved comments around. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit 2aedcd098a ('kbuild: suppress annoying "... is up to
date." message'), $(call if_changed,...) is evaluated to "@:"
when there is nothing to do.
We no longer need to add "@:" after $(call if_changed,...) to
suppress "... is up to date." message.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
The variable "random" is also the name of a libc function. It's better
coding style to avoid overloading such things, so rename it to the more
accurate "random_addr".
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The name "choose_kernel_location" isn't specific enough, and doesn't
describe the primary thing it does: choosing a random location. This
patch renames it to "choose_random_location", and clarifies the what
routines are contained in the kaslr.c source file.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function "decompress_kernel" now performs many more duties, so this
patch renames it to "extract_kernel" and updates callers and comments.
Additionally the file header comment for misc.c is improved to actually
describe what is contained.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The non-compressed boot code uses the (much more obvious) name
"boot_params" for the global pointer to the x86 boot parameters. The
compressed kernel loader code, though, was using the legacy name
"real_mode". There is no need to have a different name, and changing it
improves readability.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the boot_params can be found using the real_mode global variable,
there is no need to pass around a pointer to it. This slightly simplifies
the choose_kernel_location function and its callers.
[kees: rewrote changelog, tracked file rename]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460997735-24785-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to avoid confusion over what this file provides, rename it to
kaslr.c since it is used exclusively for the kernel ASLR, not userspace
ASLR.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC. When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses. However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:
Failed to allocate space for phdrs
during the decompression stage.
If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.
Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less
optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel. To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.
To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:
commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Mar 15 11:07:06 2016 -0700
Add -z noreloc-overflow option to x86-64 ld
Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
disable relocation overflow check. This can be used to avoid relocation
overflow check if there will be no dynamic relocation overflow at
run-time.
The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow. So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.
kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).
Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.
This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.
We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:
https://github.com/google/syzkaller/wiki/Found-Bugs
We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.
Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.
kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.
Based on a patch by Quentin Casasnovas.
[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull 'objtool' stack frame validation from Ingo Molnar:
"This tree adds a new kernel build-time object file validation feature
(ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
It was written by and is maintained by Josh Poimboeuf.
The motivation: there's a category of hard to find kernel bugs, most
of them in assembly code (but also occasionally in C code), that
degrades the quality of kernel stack dumps/backtraces. These bugs are
hard to detect at the source code level. Such bugs result in
incorrect/incomplete backtraces most of time - but can also in some
rare cases result in crashes or other undefined behavior.
The build time correctness checking is done via the new 'objtool'
user-space utility that was written for this purpose and which is
hosted in the kernel repository in tools/objtool/. The tool's (very
simple) UI and source code design is shaped after Git and perf and
shares quite a bit of infrastructure with tools/perf (which tooling
infrastructure sharing effort got merged via perf and is already
upstream). Objtool follows the well-known kernel coding style.
Objtool does not try to check .c or .S files, it instead analyzes the
resulting .o generated machine code from first principles: it decodes
the instruction stream and interprets it. (Right now objtool supports
the x86-64 architecture.)
From tools/objtool/Documentation/stack-validation.txt:
"The kernel CONFIG_STACK_VALIDATION option enables a host tool named
objtool which runs at compile time. It has a "check" subcommand
which analyzes every .o file and ensures the validity of its stack
metadata. It enforces a set of rules on asm code and C inline
assembly code so that stack traces can be reliable.
Currently it only checks frame pointer usage, but there are plans to
add CFI validation for C files and CFI generation for asm files.
For each function, it recursively follows all possible code paths
and validates the correct frame pointer state at each instruction.
It also follows code paths involving special sections, like
.altinstructions, __jump_table, and __ex_table, which can add
alternative execution paths to a given instruction (or set of
instructions). Similarly, it knows how to follow switch statements,
for which gcc sometimes uses jump tables."
When this new kernel option is enabled (it's disabled by default), the
tool, if it finds any suspicious assembly code pattern, outputs
warnings in compiler warning format:
warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer
... so that scripts that pick up compiler warnings will notice them.
All known warnings triggered by the tool are fixed by the tree, most
of the commits in fact prepare the kernel to be warning-free. Most of
them are bugfixes or cleanups that stand on their own, but there are
also some annotations of 'special' stack frames for justified cases
such entries to JIT-ed code (BPF) or really special boot time code.
There are two other long-term motivations behind this tool as well:
- To improve the quality and reliability of kernel stack frames, so
that they can be used for optimized live patching.
- To create independent infrastructure to check the correctness of
CFI stack frames at build time. CFI debuginfo is notoriously
unreliable and we cannot use it in the kernel as-is without extra
checking done both on the kernel side and on the build side.
The quality of kernel stack frames matters to debuggability as well,
so IMO we can merge this without having to consider the live patching
or CFI debuginfo angle"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
objtool: Only print one warning per function
objtool: Add several performance improvements
tools: Copy hashtable.h into tools directory
objtool: Fix false positive warnings for functions with multiple switch statements
objtool: Rename some variables and functions
objtool: Remove superflous INIT_LIST_HEAD
objtool: Add helper macros for traversing instructions
objtool: Fix false positive warnings related to sibling calls
objtool: Compile with debugging symbols
objtool: Detect infinite recursion
objtool: Prevent infinite recursion in noreturn detection
objtool: Detect and warn if libelf is missing and don't break the build
tools: Support relative directory path for 'O='
objtool: Support CROSS_COMPILE
x86/asm/decoder: Use explicitly signed chars
objtool: Enable stack metadata validation on 64-bit x86
objtool: Add CONFIG_STACK_VALIDATION option
objtool: Add tool to perform compile-time stack metadata validation
x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
sched: Always inline context_switch()
...
Pull x86 boot updates from Ingo Molnar:
"Early command line options parsing enhancements from Dave Hansen, plus
minor cleanups and enhancements"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Remove unused 'is_big_kernel' variable
x86/boot: Use proper array element type in memset() size calculation
x86/boot: Pass in size to early cmdline parsing
x86/boot: Simplify early command line parsing
x86/boot: Fix early command-line parsing when partial word matches
x86/boot: Fix early command-line parsing when matching at end
x86/boot: Simplify kernel load address alignment check
x86/boot: Micro-optimize reset_early_page_tables()
Code which runs outside the kernel's normal mode of operation often does
unusual things which can cause a static analysis tool like objtool to
emit false positive warnings:
- boot image
- vdso image
- relocation
- realmode
- efi
- head
- purgatory
- modpost
Set OBJECT_FILES_NON_STANDARD for their related files and directories,
which will tell objtool to skip checking them. It's ok to skip them
because they don't affect runtime stack traces.
Also skip the following code which does the right thing with respect to
frame pointers, but is too "special" to be validated by a tool:
- entry
- mcount
Also skip the test_nx module because it modifies its exception handling
table at runtime, which objtool can't understand. Fortunately it's
just a test module so it doesn't matter much.
Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it
might eventually be useful for other tools.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/366c080e3844e8a5b6a0327dc7e8c2b90ca3baeb.1456719558.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Variable is_big_kernel is defined in arch/x86/boot/tools/build.c but
never used anywhere.
Boris noted that its usage went away 7 years ago, as of:
5e47c478b0 ("x86: remove zImage support")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455453358-4088-1-git-send-email-nicolas.iooss_linux@m4x.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move them to a separate header and have the following
dependency:
x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h
This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
UBSAN uses compile-time instrumentation to catch undefined behavior
(UB). Compiler inserts code that perform certain kinds of checks before
operations that could cause UB. If check fails (i.e. UB detected)
__ubsan_handle_* function called to print error message.
So the most of the work is done by compiler. This patch just implements
ubsan handlers printing errors.
GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
option and its suboptions).
However GCC 5.x has more checkers implemented [2].
Article [3] has a bit more details about UBSAN in the GCC.
[1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
[2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
[3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
Issues which UBSAN has found thus far are:
Found bugs:
* out-of-bounds access - 97840cb67f ("netfilter: nfnetlink: fix
insufficient validation in nfnetlink_bind")
undefined shifts:
* d48458d4a7 ("jbd2: use a better hash function for the revoke
table")
* 10632008b9 ("clockevents: Prevent shift out of bounds")
* 'x << -1' shift in ext4 -
http://lkml.kernel.org/r/<5444EF21.8020501@samsung.com>
* undefined rol32(0) -
http://lkml.kernel.org/r/<1449198241-20654-1-git-send-email-sasha.levin@oracle.com>
* undefined dirty_ratelimit calculation -
http://lkml.kernel.org/r/<566594E2.3050306@odin.com>
* undefined roundown_pow_of_two(0) -
http://lkml.kernel.org/r/<1449156616-11474-1-git-send-email-sasha.levin@oracle.com>
* [WONTFIX] undefined shift in __bpf_prog_run -
http://lkml.kernel.org/r/<CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com>
WONTFIX here because it should be fixed in bpf program, not in kernel.
signed overflows:
* 32a8df4e0b ("sched: Fix odd values in effective_load()
calculations")
* mul overflow in ntp -
http://lkml.kernel.org/r/<1449175608-1146-1-git-send-email-sasha.levin@oracle.com>
* incorrect conversion into rtc_time in rtc_time64_to_tm() -
http://lkml.kernel.org/r/<1449187944-11730-1-git-send-email-sasha.levin@oracle.com>
* unvalidated timespec in io_getevents() -
http://lkml.kernel.org/r/<CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com>
* [NOTABUG] signed overflow in ktime_add_safe() -
http://lkml.kernel.org/r/<CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com>
[akpm@linux-foundation.org: fix unused local warning]
[akpm@linux-foundation.org: fix __int128 build woes]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yury Gribov <y.gribov@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent PAT patchset has caused issue on 32-bit PAE machines:
page:eea45000 count:0 mapcount:-128 mapping: (null) index:0x0 flags: 0x40000000()
page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0)
------------[ cut here ]------------
kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
unmap_single_vma
? __wake_up
unmap_vmas
unmap_region
do_munmap
vm_munmap
SyS_munmap
do_fast_syscall_32
? __do_page_fault
sysenter_past_esp
Code: ...
EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98
The problem is in pmd_pfn_mask() and pmd_flags_mask(). These
helpers use PMD_PAGE_MASK to calculate resulting mask.
PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as
phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a
result, the upper bits of resulting mask get truncated.
pud_pfn_mask() and pud_flags_mask() aren't problematic since we
don't have PUD page table level on 32-bit systems, but it's
reasonable to keep them consistent with PMD counterpart.
Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in
addition to existing PHYSICAL_PAGE_MASK and reworks helpers to
use them.
Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Fix -Woverflow warnings from the realmode code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: linux-mm <linux-mm@kvack.org>
Fixes: f70abb0fc3 ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit")
Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move KASAN_SANITIZE in arch/x86/boot/Makefile above the comment
related to SVGA_MODE, since the comment refers to 'the next line'.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
non-modular by ripping out the module_* code since Kconfig doesn't
allow it to be built as a module anyway - Paul Gortmaker
* Make the x86 efi=debug kernel parameter, which enables EFI debug
code and output, generic and usable by arm64 - Leif Lindholm
* Add support to the x86 EFI boot stub for 64-bit Graphics Output
Protocol frame buffer addresses - Matt Fleming
* Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
in the firmware and set an efi.flags bit so the kernel knows when
it can apply more strict runtime mapping attributes - Ard Biesheuvel
* Auto-load the efi-pstore module on EFI systems, just like we
currently do for the efivars module - Ben Hutchings
* Add "efi_fake_mem" kernel parameter which allows the system's EFI
memory map to be updated with additional attributes for specific
memory ranges. This is useful for testing the kernel code that handles
the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
doesn't include support - Taku Izumi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
+OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
8xb/rET4JrhCG4gFMUZ7
=1Oee
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi
Pull v4.4 EFI updates from Matt Fleming:
- Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
allow it to be built as a module anyway. (Paul Gortmaker)
- Make the x86 efi=debug kernel parameter, which enables EFI debug
code and output, generic and usable by arm64. (Leif Lindholm)
- Add support to the x86 EFI boot stub for 64-bit Graphics Output
Protocol frame buffer addresses. (Matt Fleming)
- Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
in the firmware and set an efi.flags bit so the kernel knows when
it can apply more strict runtime mapping attributes - Ard Biesheuvel
- Auto-load the efi-pstore module on EFI systems, just like we
currently do for the efivars module. (Ben Hutchings)
- Add "efi_fake_mem" kernel parameter which allows the system's EFI
memory map to be updated with additional attributes for specific
memory ranges. This is useful for testing the kernel code that handles
the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
doesn't include support. (Taku Izumi)
Note: there is a semantic conflict between the following two commits:
8a53554e12 ("x86/efi: Fix multiple GOP device support")
ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")
I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When multiple GOP devices exists, but none of them implements
ConOut, the code should just choose the first GOP (according to
the comments). But currently 'fb_base' will refer to the last GOP,
while other parameters to the first GOP, which will likely
result in a garbled display.
I can reliably reproduce this bug using my ASRock Z87M Extreme4
motherboard with CSM and integrated GPU disabled, and two PCIe
video cards (NVidia GT640 and GTX980), booting from efi-stub
(booting from grub works fine). On the primary display the
ASRock logo remains and on the secondary screen it is garbled
up completely.
Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1444659236-24837-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The EFI Graphics Output Protocol uses 64-bit frame buffer addresses
but these get truncated to 32-bit by the EFI boot stub when storing
the address in the 'lfb_base' field of 'struct screen_info'.
Add a 'ext_lfb_base' field for the upper 32-bits of the frame buffer
address and set VIDEO_TYPE_CAPABILITY_64BIT_BASE when the field is
useable.
It turns out that the reason no one has required this support so far
is that there's actually code in tianocore to "downgrade" PCI
resources that have option ROMs and 64-bit BARS from 64-bit to 32-bit
to cope with legacy option ROMs that can't handle 64-bit addresses.
The upshot is that basically all GOP devices in the wild use a 32-bit
frame buffer address.
Still, it is possible to build firmware that uses a full 64-bit GOP
frame buffer address. Chad did, which led to him reporting this issue.
Add support in anticipation of GOP devices using 64-bit addresses more
widely, and so that efifb works out of the box when that happens.
Reported-by: Chad Page <chad.page@znyx.com>
Cc: Pete Hawkins <pete.hawkins@znyx.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
When loading x86 64bit kernel above 4GiB with patched grub2, got kernel
gunzip error.
| early console in decompress_kernel
| decompress_kernel:
| input: [0x807f2143b4-0x807ff61aee]
| output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len
| boot via startup_64
| KASLR using RDTSC...
| new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size
| decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee]
|
| Decompressing Linux... gz...
|
| uncompression error
|
| -- System halted
the new buffer is at 0x46fe000000ULL, decompressor_gzip is using
0xffffffb901ffffff as out_len. gunzip in lib/zlib_inflate/inflate.c cap
that len to 0x01ffffff and decompress fails later.
We could hit this problem with crashkernel booting that uses kexec loading
kernel above 4GiB.
We have decompress_* support:
1. inbuf[]/outbuf[] for kernel preboot.
2. inbuf[]/flush() for initramfs
3. fill()/flush() for initrd.
This bug only affect kernel preboot path that use outbuf[].
Add __decompress and take real out_buf_len for gunzip instead of guessing
wrong buf size.
Fixes: 1431574a1c (lib/decompressors: fix "no limit" output buffer length)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.
And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.
The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.
Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.
Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 boot updates from Ingo Molnar:
"The main x86 bootup related changes in this cycle were:
- more boot time optimizations. (Len Brown)
- implement hex output to allow the debugging of early bootup
parameters. (Kees Cook)
- remove obsolete MCA leftovers. (Paolo Pisati)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
x86/smpboot: Remove SIPI delays from cpu_up()
x86/smpboot: Remove udelay(100) when polling cpu_callin_map
x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
x86/boot: Obsolete the MCA sys_desc_table
x86/boot: Add hex output for debugging
This reverts commit:
aeffc4928e ("x86/efi: Request desired alignment via the PE/COFF headers")
Linn reports that Signtool complains that kernels built with
CONFIG_EFI_STUB=y are violating the PE/COFF specification because
the 'SizeOfImage' field is not a multiple of 'SectionAlignment'.
This violation was introduced as an optimisation to skip having
the kernel relocate itself during boot and instead have the
firmware place it at a correctly aligned address.
No one else has complained and I'm not aware of any firmware
implementations that refuse to boot with commit aeffc4928e,
but it's a real bug, so revert the offending commit.
Reported-by: Linn Crosetto <linn@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Brown <mbrown@fensystems.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438936621-5215-3-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.
While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.
It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.
The issue is triggered on Parallles virtual machine and
fixed with this patch.
Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The kernel does not support the MCA bus anymroe, so mark sys_desc_table
as obsolete: remove any reference from the code together with the remaining
of MCA logic.
bloat-o-meter output:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-55 (-55)
function old new delta
i386_start_kernel 128 119 -9
setup_arch 1421 1375 -46
Signed-off-by: Paolo Pisati <p.pisati@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409430-8491-1-git-send-email-p.pisati@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is useful for reporting various addresses or other values
while debugging early boot, for example, the recent kernel image
size vs kernel run size. For example, when
CONFIG_X86_VERBOSE_BOOTUP is set, this is now visible at boot
time:
early console in setup code
early console in decompress_kernel
input_data: 0x0000000001e1526e
input_len: 0x0000000000732236
output: 0x0000000001000000
output_len: 0x0000000001535640
run_size: 0x00000000021fb000
KASLR using RDTSC...
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20150706230620.GA17501@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory devices
(NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface
table). After registering NVDIMMs the NFIT driver then registers
"region" devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block device
(disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of persistent
memory address ranges is re-worked to drive PMEM-namespaces emitted by
the libnvdimm-core. In this update the PMEM driver, on x86, gains the
ability to assert that writes to persistent memory have been flushed all
the way through the caches and buffers in the platform to persistent
media. See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through "Block
Data Windows" as defined by the NFIT. The primary difference of this
driver to PMEM is that only a small window of persistent memory is
mapped into system address space at any given point in time. Per-NVDIMM
windows are reprogrammed at run time, per-I/O, to access different
portions of the media. BLK-mode, by definition, does not support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss). The
sinister aspect of sector tearing is that most applications do not know
they have a atomic sector dependency. At least today's disk's rarely
ever tear sectors and if they do one almost certainly gets a CRC error
on access. NVDIMMs will always tear and always silently. Until an
application is audited to be robust in the presence of sector-tearing
the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj
3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3
QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE
LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv
+tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq
KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT
h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG
b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0
UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF
etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee
fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo
7PJBbN5M5qP5tD0aO7SZ
=VtWG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm
Pull libnvdimm subsystem from Dan Williams:
"The libnvdimm sub-system introduces, in addition to the
libnvdimm-core, 4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory
devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
Interface table).
After registering NVDIMMs the NFIT driver then registers "region"
devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block
device (disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of
persistent memory address ranges is re-worked to drive
PMEM-namespaces emitted by the libnvdimm-core.
In this update the PMEM driver, on x86, gains the ability to assert
that writes to persistent memory have been flushed all the way
through the caches and buffers in the platform to persistent media.
See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through
"Block Data Windows" as defined by the NFIT. The primary difference
of this driver to PMEM is that only a small window of persistent
memory is mapped into system address space at any given point in
time.
Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
different portions of the media. BLK-mode, by definition, does not
support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss).
The sinister aspect of sector tearing is that most applications do
not know they have a atomic sector dependency. At least today's
disk's rarely ever tear sectors and if they do one almost certainly
gets a CRC error on access. NVDIMMs will always tear and always
silently. Until an application is audited to be robust in the
presence of sector-tearing the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore"
* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
arch, x86: pmem api for ensuring durability of persistent memory updates
libnvdimm: Add sysfs numa_node to NVDIMM devices
libnvdimm: Set numa_node to NVDIMM devices
acpi: Add acpi_map_pxm_to_online_node()
libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
pmem: flag pmem block devices as non-rotational
libnvdimm: enable iostat
pmem: make_request cleanups
libnvdimm, pmem: fix up max_hw_sectors
libnvdimm, blk: add support for blk integrity
libnvdimm, btt: add support for blk integrity
fs/block_dev.c: skip rw_page if bdev has integrity
libnvdimm: Non-Volatile Devices
tools/testing/nvdimm: libnvdimm unit test infrastructure
libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
nd_btt: atomic sector updates
libnvdimm: infrastructure for btt devices
libnvdimm: write blk label set
libnvdimm: write pmem label set
libnvdimm: blk labels and namespace instantiation
...
Linus reported the following new warning on x86 allmodconfig with GCC 5.1:
> ./arch/x86/include/asm/spinlock.h: In function ‘arch_spin_lock’:
> ./arch/x86/include/asm/spinlock.h:119:3: warning: implicit declaration
> of function ‘__ticket_lock_spinning’ [-Wimplicit-function-declaration]
> __ticket_lock_spinning(lock, inc.tail);
> ^
This warning triggers because of these hacks in misc.h:
/*
* we have to be careful, because no indirections are allowed here, and
* paravirt_ops is a kind of one. As it will only run in baremetal anyway,
* we just keep it from happening
*/
#undef CONFIG_PARAVIRT
#undef CONFIG_KASAN
But these hacks were not updated when CONFIG_PARAVIRT_SPINLOCKS was added,
and eventually (with the introduction of queued paravirt spinlocks in
recent kernels) this created an invalid Kconfig combination and broke
the build.
So add a CONFIG_PARAVIRT_SPINLOCKS #undef line as well.
Also remove the _ASM_X86_DESC_H quirk: that undocumented quirk
was originally added ages ago, in:
099e137726 ("x86: use ELF format in compressed images.")
and I went back to that kernel (and fixed up the main Makefile
which didn't build anymore) and checked what failure it
avoided: it avoided an include file dependencies related
build failure related to our old x86-platforms code.
That old code is long gone, the header dependencies got cleaned
up, and the build does not fail anymore with the totality of
asm/desc.h included - so remove the quirk.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.
This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).
Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
EFI variable name - Ross Lagerwall
* Stop erroneously dropping upper 32-bits of boot command line pointer
in EFI boot stub and stash them in ext_cmd_line_ptr - Roy Franz
* Fix double-free bug in error handling code path of EFI runtime map
code - Dan Carpenter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVSOSjAAoJEC84WcCNIz1VXk4P/R4GwmmzZBdYAseiwv6u/NRm
bTXnK7SN1ZyY8WibEm8ptXJuTIyXZxmQYr4lY97canJy8P7umtoCP7P3tS0Ier8U
N1AMFGes7xlwBhjIRz2Cr9e5plr5H3qk65JNMuUDp0/MVuPEiNEzi6efbL82dh9S
RCLxQ94paX+wV6ltQMKWGD3v0WnHkzouuCdETCGaozqQmJx6PGzDmJ51kXYRWDyP
esTCZpRHlIzKN0u3XEFgswlIev2wab0BtjXYOzUqb0AH1Q13OgQfiswX3WIG6k+c
3xuMH4JByBIDwOLudgu0D6Sst2QwVJZnw6JavoEgGCFao0n6IPzUGolAWLFMdDhL
Kparzc6ObHpiqYtqBjJXW+awOENVS4qIrn9MHc9wwsJxXOy++0YnyYCgge0iia47
F2/pOHvkd52QiQ0gC442W0EdX1VlPCUR04G0s4d3UX3O875yl80QTyLQ4n7ZK074
3wfi/9+Fuv8wWMJ4HI8FJgaTl57KzAP4ZPh2cy8oPs6bkiiwlnMWH24bEhlxKBK4
mEIze045kyswz3rV7j1WX3MSXrPA2cM95L5WlvVTxckMn40QwLPBWSDCOJIj3K5K
yhXNHHfHzG/GRm3SfD2i1EcK4gUW82awl72jJn0F69YMI5a+T1BIppEMP2pzsWE4
FcwvWDxzWwKxYKJosfkk
=f7a2
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Avoid garbage names in efivarfs due to buggy firmware by zeroing
EFI variable name. (Ross Lagerwall)
* Stop erroneously dropping upper 32 bits of boot command line pointer
in EFI boot stub and stash them in ext_cmd_line_ptr. (Roy Franz)
* Fix double-free bug in error handling code path of EFI runtime map
code. (Dan Carpenter)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Until now, the EFI stub was only setting the 32 bit cmd_line_ptr in
the setup_header structure, so on 64 bit platforms this could be truncated.
This patch adds setting the upper bits of the buffer address in
ext_cmd_line_ptr. This case was likely never hit, as the allocation
for this buffer is done at the lowest available address. Only
x86_64 kernels have this problem, as the 1-1 mapping mandated
by EFI ensures that all memory is 32 bit addressable on 32 bit
platforms. The EFI stub does not support mixed mode, so the
32 bit kernel on 64 bit firmware case does not need to be handled.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull x86 boot changes from Ingo Molnar:
"A number of cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Standardize strcmp()
x86/boot/64: Remove pointless early_printk() message
x86/boot/video: Move the 'video_segment' variable to video.c
Commit:
e2b32e6785 ("x86, kaslr: randomize module base load address")
made module base address randomization unconditional and didn't regard
disabled KKASLR due to CONFIG_HIBERNATION and command line option
"nokaslr". For more info see (now reverted) commit:
f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")
In order to propagate KASLR status to kernel proper, we need a single bit
in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
top-down allocated bits for bits supposed to be used by the bootloader.
Originally-From: Jiri Kosina <jkosina@suse.cz>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
strcmp() is always expected to return 0 when arguments are equal,
negative when its first argument @str1 is less than its second argument
@str2 and a positive value otherwise. Previously strcmp("a", "b")
returned 1. Now it gives -1, as it is supposed to.
Until now this bug never triggered, because all uses for strcmp() in the
boot code tested for nonzero:
triton:~/tip> git grep strcmp arch/x86/boot/
arch/x86/boot/boot.h:int strcmp(const char *str1, const char *str2);
arch/x86/boot/edd.c: if (!strcmp(eddarg, "skipmbr") || !strcmp(eddarg, "skip")) {
arch/x86/boot/edd.c: else if (!strcmp(eddarg, "off"))
arch/x86/boot/edd.c: else if (!strcmp(eddarg, "on"))
should in the future strcmp() be used in a comparative way in the boot
code, it might have led to (not so subtle) bugs.
Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426520267-1803-1-git-send-email-arjun024@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit:
f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")
The main reason for the revert is that the new boot flag does not work
at all currently, and in order to make this work, we need non-trivial
changes to the x86 boot code which we didn't manage to get done in
time for merging.
And even if we did, they would've been too risky so instead of
rushing things and break booting 4.1 on boxes left and right, we
will be very strict and conservative and will take our time with
this to fix and test it properly.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull misc x86 fixes from Ingo Molnar:
"This contains:
- EFI fixes
- a boot printout fix
- ASLR/kASLR fixes
- intel microcode driver fixes
- other misc fixes
Most of the linecount comes from an EFI revert"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch
x86/microcode/intel: Handle truncated microcode images more robustly
x86/microcode/intel: Guard against stack overflow in the loader
x86, mm/ASLR: Fix stack randomization on 64-bit systems
x86/mm/init: Fix incorrect page size in init_memory_mapping() printks
x86/mm/ASLR: Propagate base load address calculation
Documentation/x86: Fix path in zero-page.txt
x86/apic: Fix the devicetree build in certain configs
Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes"
x86/efi: Avoid triple faults during EFI mixed mode calls
Pull ASLR and kASLR fixes from Borislav Petkov:
- Add a global flag announcing KASLR state so that relevant code can do
informed decisions based on its setting. (Jiri Kosina)
- Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
e2b32e6785 ("x86, kaslr: randomize module base load address")
makes the base address for module to be unconditionally randomized in
case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't
present on the commandline.
This is not consistent with how choose_kernel_location() decides whether
it will randomize kernel load base.
Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is
explicitly specified on kernel commandline), which makes the state space
larger than what module loader is looking at. IOW CONFIG_HIBERNATION &&
CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied
by default in that case, but module loader is not aware of that.
Instead of fixing the logic in module.c, this patch takes more generic
aproach. It introduces a new bootparam setup data_type SETUP_KASLR and
uses that to pass the information whether kaslr has been applied during
kernel decompression, and sets a global 'kaslr_enabled' variable
accordingly, so that any kernel code (module loading, livepatching, ...)
can make decisions based on its value.
x86 module loader is converted to make use of this flag.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz
[ Always dump correct kaslr status when panicking ]
Signed-off-by: Borislav Petkov <bp@suse.de>
There is already defined macro KEEP_SEGMENTS in
<asm/bootparam.h>, let's use it instead of hardcoded
constants.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1424331298-7456-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
video.c is the only real user of the 'video_segment' variable,
so move it to video.c and make it static.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Martin Mares <mj@ucw.cz>
Link: http://lkml.kernel.org/r/1422123092-28750-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
calls to avoid triple faults if an NMI/MCE is received.
* Revert Ard's change to the libstub get_memory_map() that went into
the v3.20 merge window because it causes boot regressions on Qemu and
Xen.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU5HnBAAoJEC84WcCNIz1Vln4P/08oA3V/GziY2cJGn1TBtBkA
M7GzElF3ceojRVlfp7tVQpmT9oXqXsUB9ccqlOMKF8EO1s7EKnguVV/nkIah/5wu
HIPYFgDC2s57cf7MGA53xOjTJNSu6PzTkUsJF7KhtZmb0c/GyWMN38Ggsdi7zuA3
5WD3O1CSLn77IqRXYtCr8aimQI3QJeTnN8IXQxyoDmQ8XK2uV1qRwFI96+2AjFJM
EG+xw+p6DCpoJXbci1kZxPT/iV5P8fLwzvcRT2G/qMa5RqWALLN5VeP3yBe+VqPU
s9PQK8jLablQcsglIemPHRnfLcOWz13yEx5Z6S1lyzOJrQsk7rp9dLO7GKiK22ex
1CPu+Cudk1ETn+pDyjADl6wcvZJfh1krnD4Gzm6VsSUWC924/sovYvH67sPSWc5a
RxylE4pSuHYADnQZh1YqH719KMWpMKb+9UeYstq3PfebeyKJkAqXqPBTWRldI1N7
YWLweED35dg3mN8g8mYEmiIOXe/dYNoaWJw0m2FrEMxJ5x4Z1ukDSccFN5+pAkn/
Nn1wXxGzt0sU40+t5bYA6CnJAEsU1pP/kPcZeqNzB3AGYIiA+rtH45jQMVO4xorM
BA2COqRrC+UdUHbiy5IgF6EGxIKy4Yb+aE/EvEd8e4GborI4in6mK8xKAllvr4+M
hD5nwJviAXQkviZZeOq5
=I/nB
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
" - Leave a valid 64-bit IDT installed during runtime EFI mixed mode
calls to avoid triple faults if an NMI/MCE is received.
- Revert Ard's change to the libstub get_memory_map() that went into
the v3.20 merge window because it causes boot regressions on Qemu and
Xen. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recently instrumentation of builtin functions calls was removed from GCC
5.0. To check the memory accessed by such functions, userspace asan
always uses interceptors for them.
So now we should do this as well. This patch declares
memset/memmove/memcpy as weak symbols. In mm/kasan/kasan.c we have our
own implementation of those functions which checks memory before accessing
it.
Default memset/memmove/memcpy now now always have aliases with '__'
prefix. For files that built without kasan instrumentation (e.g.
mm/slub.c) original mem* replaced (via #define) with prefixed variants,
cause we don't want to check memory accesses there.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds arch specific code for kernel address sanitizer.
16TB of virtual addressed used for shadow memory. It's located in range
[ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
stacks.
At early stage we map whole shadow region with zero page. Latter, after
pages mapped to direct mapping address range we unmap zero pages from
corresponding shadow (see kasan_map_shadow()) and allocate and map a real
shadow memory reusing vmemmap_populate() function.
Also replace __pa with __pa_nodebug before shadow initialized. __pa with
CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
__phys_addr is instrumented, so __asan_load could be called before shadow
area initialized.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy pointed out that if an NMI or MCE is received while we're in the
middle of an EFI mixed mode call a triple fault will occur. This can
happen, for example, when issuing an EFI mixed mode call while running
perf.
The reason for the triple fault is that we execute the mixed mode call
in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
installed throughout the call.
At Andy's suggestion, stop playing the games we currently do at runtime,
such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
can simply switch to the __KERNEL32_CS descriptor before invoking
firmware services, and run in compatibility mode. This way, if an
NMI/MCE does occur the kernel IDT handler will execute correctly, since
it'll jump to __KERNEL_CS automatically.
However, this change is only possible post-ExitBootServices(). Before
then the firmware "owns" the machine and expects for its 32-bit IDT
handlers to be left intact to service interrupts, etc.
So, we now need to distinguish between early boot and runtime
invocations of EFI services. During early boot, we need to restore the
GDT that the firmware expects to be present. We can only jump to the
__KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
has been invoked.
A liberal sprinkling of comments in the thunking code should make the
differences in early and late environments more apparent.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Commit e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
added Perl to the required build environment. This reimplements in
shell the Perl script used to find the size of the kernel with bss and
brk added.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Rob Landley <rob@landley.net>
Acked-by: Rob Landley <rob@landley.net>
Cc: Anca Emanuel <anca.emanuel@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 64-bit, relocation is not required unless the load address gets
changed. Without this, relocations do unexpected things when the kernel
is above 4G.
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Thomas D. <whissi@whissi.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150116005146.GA4212@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 9def39be4e ("x86: Support compiling out human-friendly
processor feature names") made two source file targets
conditional. Such conditional targets will not be cleaned
automatically by make mrproper.
Fix by adding explicit clean-files targets for the two files.
Fixes: 9def39be4e ("x86: Support compiling out human-friendly processor feature names")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/1419335863-10608-1-git-send-email-bjorn@mork.no
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull EFI updates from Ingo Molnar:
"Changes in this cycle are:
- support module unload for efivarfs (Mathias Krause)
- another attempt at moving x86 to libstub taking advantage of the
__pure attribute (Ard Biesheuvel)
- add EFI runtime services section to ptdump (Mathias Krause)"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, ptdump: Add section for EFI runtime services
efi/x86: Move x86 back to libstub
efivarfs: Allow unloading when build as module
Pull x86 boot and percpu updates from Ingo Molnar:
"This tree contains a bootable images documentation update plus three
slightly misplaced x86/asm percpu changes/optimizations"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64: Use RIP-relative addressing for most per-CPU accesses
x86-64: Handle PC-relative relocations on per-CPU data
x86: Convert a few more per-CPU items to read-mostly ones
x86, boot: Document intermediates more clearly
commit e6023367d7 'x86, kaslr: Prevent .bss from overlaping initrd'
broke the cross compile of x86. It added a objdump invocation, which
invokes the host native objdump and ignores an active cross tool
chain.
Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into
account.
[ tglx: Massage changelog and use $(OBJDUMP) ]
Fixes: e6023367d7 'x86, kaslr: Prevent .bss from overlaping initrd'
Signed-off-by: Chris Clayton <chris2553@googlemail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/54705C8E.1080400@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.
The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)
Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This adds a comment detailing the various intermediate files used to build
the bootable decompression image for the x86 kernel.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Link: http://lkml.kernel.org/r/20141031162204.GA26268@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When choosing a random address, the current implementation does not take into
account the reversed space for .bss and .brk sections. Thus the relocated kernel
may overlap other components in memory. Here is an example of the overlap from a
x86_64 kernel in qemu (the ranges of physical addresses are presented):
Physical Address
0x0fe00000 --+--------------------+ <-- randomized base
/ | relocated kernel |
vmlinux.bin | (from vmlinux.bin) |
0x1336d000 (an ELF file) +--------------------+--
\ | | \
0x1376d870 --+--------------------+ |
| relocs table | |
0x13c1c2a8 +--------------------+ .bss and .brk
| | |
0x13ce6000 +--------------------+ |
| | /
0x13f77000 | initrd |--
| |
0x13fef374 +--------------------+
The initrd image will then be overwritten by the memset during early
initialization:
[ 1.655204] Unpacking initramfs...
[ 1.662831] Initramfs unpacking failed: junk in compressed archive
This patch prevents the above situation by requiring a larger space when looking
for a random kernel base, so that existing logic can effectively avoids the
overlap.
[kees: switched to perl to avoid hex translation pain in mawk vs gawk]
[kees: calculated overlap without relocs table]
Fixes: 82fa9637a2 ("x86, kaslr: Select random position from e820 maps")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414762838-13067-1-git-send-email-eternal.n08@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 EFI updates from Peter Anvin:
"This patchset falls under the "maintainers that grovel" clause in the
v3.18-rc1 announcement. We had intended to push it late in the merge
window since we got it into the -tip tree relatively late.
Many of these are relatively simple things, but there are a couple of
key bits, especially Ard's and Matt's patches"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
rtc: Disable EFI rtc for x86
efi: rtc-efi: Export platform:rtc-efi as module alias
efi: Delete the in_nmi() conditional runtime locking
efi: Provide a non-blocking SetVariable() operation
x86/efi: Adding efi_printks on memory allocationa and pci.reads
x86/efi: Mark initialization code as such
x86/efi: Update comment regarding required phys mapped EFI services
x86/efi: Unexport add_efi_memmap variable
x86/efi: Remove unused efi_call* macros
efi: Resolve some shadow warnings
arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
efi: Introduce efi_md_typeattr_format()
efi: Add macro for EFI_MEMORY_UCE memory attribute
x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode
arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi
arm64/efi: uefi_init error handling fix
efi: Add kernel param efi=noruntime
lib: Add a generic cmdline parse function parse_option_str
...
Pull x86 fixes from Ingo Molnar:
"Misc smaller fixes that missed the v3.17 cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Add arch/x86/purgatory/ make generated files to gitignore
x86: Fix section conflict for numachip
x86: Reject x32 executables if x32 ABI not supported
x86_64, entry: Filter RFLAGS.NT on entry from userspace
x86, boot, kaslr: Fix nuisance warning on 32-bit builds
Pull x86 cpufeature updates from Ingo Molnar:
"This tree includes the following changes:
- Introduce DISABLED_MASK to list disabled CPU features, to simplify
CPU feature handling and avoid excessive #ifdefs
- Remove the lightly used cpu_has_pae() primitive"
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add more disabled features
x86: Introduce disabled-features
x86: Axe the lightly-used cpu_has_pae
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2
7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS
bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR
dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP
eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x
JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk
QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ
kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa
WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf
AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt
LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G
GDTZo9iIRNSfm/M4tJ+n
=2VyW
-----END PGP SIGNATURE-----
Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux
Pull "tinification" patches from Josh Triplett.
Work on making smaller kernels.
* tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
mm: Support compiling out madvise and fadvise
x86: Support compiling out human-friendly processor feature names
x86: Drop support for /proc files when !CONFIG_PROC_FS
x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
x86, boot: Use the usual -y -n mechanism for objects in vmlinux
x86: Add "make tinyconfig" to configure the tiniest possible kernel
x86, platform, kconfig: move kvmconfig functionality to a helper
All other calls to allocate memory seem to make some noise already, with the
exception of two calls (for gop, uga) in the setup_graphics path.
The purpose is to be noisy on worrysome errors immediately.
commit fb86b2440d ("x86/efi: Add better error logging to EFI boot
stub") introduces printing false alarms for lots of hardware. Rather
than playing Whack a Mole with non-fatal exit conditions, try the other
way round.
This is per Matt Fleming's suggestion:
> Where I think we could improve things
> is by adding efi_printk() message in certain error paths. Clearly, not
> all error paths need such messages, e.g. the EFI_INVALID_PARAMETER path
> you highlighted above, but it makes sense for memory allocation and PCI
> read failures.
Link: http://article.gmane.org/gmane.linux.kernel.efi/4628
Signed-off-by: Andre Müller <andre.muller@web.de>
Cc: Ulf Winkelvos <ulf@winkelvos.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We need a way to customize the behaviour of the EFI boot stub, in
particular, we need a way to disable the "chunking" workaround, used
when reading files from the EFI System Partition.
One of my machines doesn't cope well when reading files in 1MB chunks to
a buffer above the 4GB mark - it appears that the "chunking" bug
workaround triggers another firmware bug. This was only discovered with
commit 4bf7111f50 ("x86/efi: Support initrd loaded above 4G"), and
that commit is perfectly valid. The symptom I observed was a corrupt
initrd rather than any kind of crash.
efi= is now used to specify EFI parameters in two very different
execution environments, the EFI boot stub and during kernel boot.
There is also a slight performance optimization by enabling efi=nochunk,
but that's offset by the fact that you're more likely to run into
firmware issues, at least on x86. This is the rationale behind leaving
the workaround enabled by default.
Also provide some documentation for EFI_READ_CHUNK_SIZE and why we're
using the current value of 1MB.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Building 32-bit threw a warning on kASLR enabled builds:
arch/x86/boot/compressed/aslr.c: In function ‘mem_avoid_overlap’:
arch/x86/boot/compressed/aslr.c:198:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
avoid.start = (u64)ptr;
^
This fixes the warning; unsigned long should have been used here.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20141001183632.GA11431@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull x86 fixes from Ingo Molnar:
"This has:
- EFI revert to fix a boot regression
- early_ioremap() fix for boot failure
- KASLR fix for possible boot failures
- EFI fix for corrupted string printing
- remove a misleading EFI bootup 'failed!' error message
Unfortunately it's all rather close to the merge window"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
x86/efi: Delete misleading efi_printk() error message
Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
x86/kaslr: Avoid the setup_data area when picking location
x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
causing issues for Macbooks and Fedora + Grub2 - Matt Fleming
* Delete the misleading "setup_efi_pci() failed!" message which some
people are seeing when booting EFI - Matt Fleming
* Fix printing strings from the 32-bit EFI boot stub by only passing
32-bit addresses to the firmware - Matt Fleming
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUIzOFAAoJEC84WcCNIz1VpSwQAJhp9Yu60dWXyqRpV+ER1yau
MWHXunkeaccbBXNABkzuDUqb2a6DRIJow+/n+dYjIGY6Nf9zzLQFQ+s/EsS+IiyY
s4rRvAqzfGYk1d6xzgvEccrdD4fRP32kFKnSlZpfTRuJZHZieD+f2y6TP7D33Ja3
HV/ivPQHZNjxgsExpcE8Rz/QyOZpqRacTr9Gr7IusBRL6IMCyycfCTmt6d5pf1iD
kQGSGwIBvgN4xMqPUdxTNo31bQZA5ZeywNOh9WhdSCL7FAIDfG9TmXt/J7ckq5ax
0f6X92qgCs3peLY+/szgSZ2LsZI6I/FM1udc2SFiIOPvCwwytcJ94Wro5pLYzZ4i
SnqB2xLLEmsR2J3MXIeY0aVy2VtHT4bYRnXYNd9G0eaVfrlJ+4lgwqJavAJmtDZx
88ey1R8LKRQr+ueSv/BnOvE6T2+38HrjrMooFQsPvolRR0S6MITBr8I2hoRASkUt
YsA+7s6+tO2QBmQYrKCYSAi9A7onMA9Fh93dmv7XLqFw/SsfVm3RnrNhOVsO9kPC
zIsWZoS+PGwb4RRvM2i7JAEqUCbuLpIAHYEU6gqprWm1ERHsX9mfFYfsJQHzHuOY
rg6+wtWQ9MGxek8POac4d2mC+PwC4DA0AkaTTZQBcdAJu+h/gZNt4w7mpk5v4Th1
QYr/otShMvc84Zd+RMeV
=5mVJ
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Revert the static library changes from the merge window since they're
causing issues for Macbooks and Fedora + Grub2 (Matt Fleming)
* Delete the misleading "setup_efi_pci() failed!" message which some
people are seeing when booting EFI (Matt Fleming)
* Fix printing strings from the 32-bit EFI boot stub by only passing
32-bit addresses to the firmware (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we're executing the 32-bit efi_char16_printk() code path (i.e.
running on top of 32-bit firmware) we know that efi_early->text_output
will be a 32-bit value, even though ->text_output has type u64.
Unfortunately, we currently pass ->text_output directly to
efi_early->call() so for CONFIG_X86_32 the compiler will push a 64-bit
value onto the stack, causing the other parameters to be misaligned.
The way we handle this in the rest of the EFI boot stub is to pass
pointers as arguments to efi_early->call(), which automatically do the
right thing (pointers are 32-bit on CONFIG_X86_32, and we simply ignore
the upper 32-bits of the argument register if running in 64-bit mode
with 32-bit firmware).
This fixes a corruption bug when printing strings from the 32-bit EFI
boot stub.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=84241
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
A number of people are reporting seeing the "setup_efi_pci() failed!"
error message in what used to be a quiet boot,
https://bugzilla.kernel.org/show_bug.cgi?id=81891
The message isn't all that helpful because setup_efi_pci() can return a
non-success error code for a variety of reasons, not all of them fatal.
Let's drop the return code from setup_efi_pci*() altogether, since
there's no way to process it in any meaningful way outside of the inner
__setup_efi_pci*() functions.
Reported-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Ulf Winkelvos <ulf@winkelvos.de>
Cc: Andre Müller <andre.muller@web.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This reverts commit f23cf8bd5c ("efi/x86: efistub: Move shared
dependencies to <asm/efi.h>") as well as the x86 parts of commit
f4f75ad574 ("efi: efistub: Convert into static library").
The road leading to these two reverts is long and winding.
The above two commits were merged during the v3.17 merge window and
turned the common EFI boot stub code into a static library. This
necessitated making some symbols global in the x86 boot stub which
introduced new entries into the early boot GOT.
The problem was that we weren't fixing up the newly created GOT entries
before invoking the EFI boot stub, which sometimes resulted in hangs or
resets. This failure was reported by Maarten on his Macbook pro.
The proposed fix was commit 9cb0e39423 ("x86/efi: Fixup GOT in all
boot code paths"). However, that caused issues for Linus when booting
his Sony Vaio Pro 11. It was subsequently reverted in commit
f3670394c2.
So that leaves us back with Maarten's Macbook pro not booting.
At this stage in the release cycle the least risky option is to revert
the x86 EFI boot stub to the pre-merge window code structure where we
explicitly #include efi-stub-helper.c instead of linking with the static
library. The arm64 code remains unaffected.
We can take another swing at the x86 parts for v3.18.
Conflicts:
arch/x86/include/asm/efi.h
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64]
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This reverts commit 9cb0e39423.
It causes my Sony Vaio Pro 11 to immediately reboot at startup.
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The KASLR location-choosing logic needs to avoid the setup_data
list memory areas as well. Without this, it would be possible to
have the ASLR position stomp on the memory, ultimately causing
the boot to fail.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Baoquan He <bhe@redhat.com>
Cc: stable@vger.kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140911161931.GA12001@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I believe the REQUIRED_MASK aproach was taken so that it was
easier to consult in assembly (arch/x86/kernel/verify_cpu.S).
DISABLED_MASK does not have the same restriction, but I
implemented it the same way for consistency.
We have a REQUIRED_MASK... which does two things:
1. Keeps a list of cpuid bits to check in very early boot and
refuse to boot if those are not present.
2. Consulted during cpu_has() checks, which allows us to
optimize out things at compile-time. In other words, if we
*KNOW* we will not boot with the feature off, then we can
safely assume that it will be present forever.
But, we don't have a similar mechanism for CPU features which
may be present but that we know we will not use. We simply
use our existing mechanisms to repeatedly check the status of
the bit at runtime (well, the alternatives patching helps here
but it does not provide compile-time optimization).
Adding a feature to disabled-features.h allows the bit to be
checked via a new macro: cpu_feature_enabled(). Note that
for features in DISABLED_MASK, checks with this macro have
all of the benefits of an #ifdef. Before, we would have done
this in a header:
#ifdef CONFIG_X86_INTEL_MPX
#define cpu_has_mpx cpu_has(X86_FEATURE_MPX)
#else
#define cpu_has_mpx 0
#endif
and this in the code:
if (cpu_has_mpx)
do_some_mpx_thing();
Now, just add your feature to DISABLED_MASK and you can do this
everywhere, and get the same benefits you would have from
#ifdefs:
if (cpu_feature_enabled(X86_FEATURE_MPX))
do_some_mpx_thing();
We need a new function and *not* a modification to cpu_has()
because there are cases where we actually need to check the CPU
itself, despite what features the kernel supports. The best
example of this is a hypervisor which has no control over what
features its guests are using and where the guest does not depend
on the host for support.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Maarten reported that his Macbook pro 8.2 stopped booting after commit
f23cf8bd5c ("efi/x86: efistub: Move shared dependencies to
<asm/efi.h>"), the main feature of which is changing the visibility of
symbol 'efi_early' from local to global.
By making 'efi_early' global we end up requiring an entry in the Global
Offset Table. Unfortunately, while we do include code to fixup GOT
entries in the early boot code, it's only called after we've executed
the EFI boot stub.
What this amounts to is that references to 'efi_early' in the EFI boot
stub don't point to the correct place.
Since we've got multiple boot entry points we need to be prepared to
fixup the GOT in multiple places, while ensuring that we never do it
more than once, otherwise the GOT entries will still point to the wrong
place.
Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Mantas found that after commit 4bf7111f50 ("x86/efi: Support initrd
loaded above 4G"), the kernel freezes at the earliest possible moment
when trying to boot via UEFI on Asus laptop.
Revert to old way to load initrd under 4G on first try, second try will
use above 4G buffer when initrd is too big and does not fit under 4G.
[ The cause of the freeze appears to be a firmware bug when reading
file data into buffers above 4GB, though the exact reason is unknown.
Mantas reports that the hang can be avoid if the file size is a
multiple of 512 bytes, but I've seen some ASUS firmware simply
corrupting the file data rather than freezing.
Laszlo fixed an issue in the upstream EDK2 DiskIO code in Aug 2013
which may possibly be related, commit 4e39b75e ("MdeModulePkg/DiskIoDxe:
fix source/destination pointer of overrun transfer").
Whatever the cause, it's unlikely that a fix will be forthcoming
from the vendor, hence the workaround - Matt ]
Cc: Laszlo Ersek <lersek@redhat.com>
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Reported-by: Harald Hoyer <harald@redhat.com>
Tested-by: Anders Darander <anders@chargestorm.se>
Tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The table mapping CPUID bits to human-readable strings takes up a
non-trivial amount of space, and only exists to support /proc/cpuinfo
and a couple of kernel messages. Since programs depend on the format of
/proc/cpuinfo, force inclusion of the table when building with /proc
support; otherwise, support omitting that table to save space, in which
case the kernel messages will print features numerically instead.
In addition to saving 1408 bytes out of vmlinux, this also saves 1373
bytes out of the uncompressed setup code, which contributes directly to
the size of bzImage.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
All the code in early_serial_console.c gets compiled out if
!CONFIG_EARLY_PRINTK, but early_serial_console.o itself still gets
compiled in. Eliminate it from the compile entirely in that case.
This does not change the generated code at all, in either case.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
All the code in aslr.c gets compiled out if !CONFIG_RANDOMIZE_BASE, but
aslr.o itself still gets compiled in. Eliminate it from the compile
entirely in that case.
This does not change the generated code at all, in either case.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Switch VMLINUX_OBJS to vmlinux-objs-y, to eliminate Makefile
conditionals in favor of vmlinux-objs-$(CONFIG_*) constructs.
This does not change the generated code at all.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Pull EFI changes from Ingo Molnar:
"Main changes in this cycle are:
- arm64 efi stub fixes, preservation of FP/SIMD registers across
firmware calls, and conversion of the EFI stub code into a static
library - Ard Biesheuvel
- Xen EFI support - Daniel Kiper
- Support for autoloading the efivars driver - Lee, Chun-Yi
- Use the PE/COFF headers in the x86 EFI boot stub to request that
the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael
Brown
- Consolidate all the x86 EFI quirks into one file - Saurabh Tangri
- Additional error logging in x86 EFI boot stub - Ulf Winkelvos
- Support loading initrd above 4G in EFI boot stub - Yinghai Lu
- EFI reboot patches for ACPI hardware reduced platforms"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
efi/arm64: Handle missing virtual mapping for UEFI System Table
arch/x86/xen: Silence compiler warnings
xen: Silence compiler warnings
x86/efi: Request desired alignment via the PE/COFF headers
x86/efi: Add better error logging to EFI boot stub
efi: Autoload efivars
efi: Update stale locking comment for struct efivars
arch/x86: Remove efi_set_rtc_mmss()
arch/x86: Replace plain strings with constants
xen: Put EFI machinery in place
xen: Define EFI related stuff
arch/x86: Remove redundant set_bit(EFI_MEMMAP) call
arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call
efi: Introduce EFI_PARAVIRT flag
arch/x86: Do not access EFI memory map if it is not available
efi: Use early_mem*() instead of early_io*()
arch/ia64: Define early_memunmap()
x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
efi/reboot: Allow powering off machines using EFI
efi/reboot: Add generic wrapper around EfiResetSystem()
...
Pull x86 build/cleanup/debug updates from Ingo Molnar:
"Robustify the build process with a quirk to avoid GCC reordering
related bugs.
Two code cleanups.
Simplify entry_64.S CFI annotations, by Jan Beulich"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, build: Change code16gcc.h from a C header to an assembly header
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Simplify __HAVE_ARCH_CMPXCHG tests
x86/tsc: Get rid of custom DIV_ROUND() macro
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Drop several unnecessary CFI annotations
The EFI boot stub goes to great pains to relocate the kernel image to
an appropriately aligned address, as indicated by the ->kernel_alignment
field in the bzImage header. However, for the PE stub entry case, we
can request that the EFI PE/COFF loader do the work for us.
Fix by exposing the desired alignment via the SectionAlignment field
in the PE/COFF headers. Despite its name, this field provides an
overall alignment requirement for the loaded file. (Naturally, the
FileAlignment field describes the alignment for individual sections.)
There is no way in the PE/COFF headers to express the concept of
min_alignment; we therefore do not expose the minimum (as opposed to
preferred) alignment.
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Hopefully this will enable us to better debug:
https://bugzilla.kernel.org/show_bug.cgi?id=68761
Signed-off-by: Ulf Winkelvos <ulf@winkelvos.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This patch changes both x86 and arm64 efistub implementations
from #including shared .c files under drivers/firmware/efi to
building shared code as a static library.
The x86 code uses a stub built into the boot executable which
uncompresses the kernel at boot time. In this case, the library is
linked into the decompressor.
In the arm64 case, the stub is part of the kernel proper so the library
is linked into the kernel proper as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
which, apart from reducing code duplication also stops the arm64 stub
being rebuilt every time make is invoked - Ard Biesheuvel
* Fix the EFI fdt code to not report a boot error if UEFI is
unavailable since booting without UEFI parameters is a valid use case
for non-UEFI platforms - Catalin Marinas
* Include a .bss section in the EFI boot stub PE/COFF headers to fix a
memory corruption bug - Michael Brown
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTw9I4AAoJEC84WcCNIz1VJcwQAKBZXaIjsxWqtMsqvB7RmCtw
5wi44hf2sdrwKCBUdxRBHBoiISps3f+5VJZN25eJ5QG+eqcqoQkGoQsAVlMbGrOJ
iyYF79REUj/ujovtodKQPOci4ueYhiRemBc8o6abOjSwdAe3fj5vpQgKQuS9RwtG
SIy4MBucE58Zle6vGHXLxHKdJVCB0/ALESwjg9fXWluopHdWkmmN0kKhYlysElHx
7zn2C2RoVDNQQwknueUznktKUH9pxLBEW74qE98CBZ9Spt8QpjvT22UkxoOU8grU
RnogYgJhwiy1+GNQ5524rZwcM2XP1qFhCcWoKxP3I0UhTOe2Za7Ogi8ggUAXbvEh
+hCsCIM3DqzAROOQMsmUZHkMJ0Gi2HSL12tt1KHNZ2zh74hiMPqDAmzkjBuf2KE0
5Lxdnc47UmHE9qVduj8fg2A+cMLV/K4NBlitOXq6lJp9+n8Wa93MLF0WENd9akE+
c9u7sAoTwG/5DUDy3rB1H5P65/WKyXNe9Db8JdZp2+62dxmsNyF++zkApDJT3Op7
MyFTpVkeZrWZbZVwZJXZuKNdh19Jv/adZrvC7OKvcDBfjow8AGzbDDnr4ez42QT8
OkM3uGyBuVTgtUdmPYNREXP5zjr1NfZV/w1fbslaJl/UbFZrNUILTP5YyqTphv2M
W9eO6CyrmX10gd5v47A7
=Cvne
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* Remove a duplicate copy of linux_banner from the arm64 EFI stub
which, apart from reducing code duplication also stops the arm64 stub
being rebuilt every time make is invoked - Ard Biesheuvel
* Fix the EFI fdt code to not report a boot error if UEFI is
unavailable since booting without UEFI parameters is a valid use case
for non-UEFI platforms - Catalin Marinas
* Include a .bss section in the EFI boot stub PE/COFF headers to fix a
memory corruption bug - Michael Brown
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions. Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.
Fix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Cc: Thomas Bächler <thomas@archlinux.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
In order to move from the #include "../../../xxxxx.c" anti-pattern used
by both the x86 and arm64 versions of the stub to a static library
linked into either the kernel proper (arm64) or a separate boot
executable (x86), there is some prepatory work required.
This patch does the following:
- move forward declarations of functions shared between the arch
specific and the generic parts of the stub to include/linux/efi.h
- move forward declarations of functions shared between various .c files
of the generic stub code to a new local header file called "efistub.h"
- add #includes to all .c files which were formerly relying on the
#includor to include the correct header files
- remove all static modifiers from functions which will need to be
externally visible once we move to a static library
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This moves definitions depended upon both by code under arch/x86/boot
and under drivers/firmware/efi to <asm/efi.h>. This is in preparation of
turning the stub code under drivers/firmware/efi into a static library.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
For boot efi kernel directly without bootloader.
If the kernel support XLF_CAN_BE_LOADED_ABOVE_4G, we should
not limit initrd under hdr->initrd_add_max.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Changes kASLR from being compile-time selectable (blocked by
CONFIG_HIBERNATION), to being boot-time selectable (with hibernation
available by default) via the "kaslr" kernel command line.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.
* accumulated work in next: (6809 commits)
ufs: sb mutex merge + mutex_destroy
powerpc: update comments for generic idle conversion
cris: update comments for generic idle conversion
idle: remove cpu_idle() forward declarations
nbd: zero from and len fields in NBD_CMD_DISCONNECT.
mm: convert some level-less printks to pr_*
MAINTAINERS: adi-buildroot-devel is moderated
MAINTAINERS: add linux-api for review of API/ABI changes
mm/kmemleak-test.c: use pr_fmt for logging
fs/dlm/debug_fs.c: replace seq_printf by seq_puts
fs/dlm/lockspace.c: convert simple_str to kstr
fs/dlm/config.c: convert simple_str to kstr
mm: mark remap_file_pages() syscall as deprecated
mm: memcontrol: remove unnecessary memcg argument from soft limit functions
mm: memcontrol: clean up memcg zoneinfo lookup
mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
mm/mempool.c: update the kmemleak stack trace for mempool allocations
lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
mm: introduce kmemleak_update_trace()
mm/kmemleak.c: use %u to print ->checksum
...
commit 7d453eee36 ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
regression for the functionality to load kernels above 4G. The relevant
(incorrect) reasoning behind this change can be seen in the commit
message,
"The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits."
This is obviously bogus since 32-bit EFI loaders will never place the
kernel above the 4G mark. So this restriction is entirely unnecessary.
But things are worse than that - since we want to encourage people to
always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
the box for both 32-bit and 64-bit firmware, commit 7d453eee36
effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.
Remove the overzealous and superfluous restriction and restore the
XLF_CAN_BE_LOADED_ABOVE_4G functionality.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Pull x86 EFI updates from Peter Anvin:
"A collection of EFI changes. The perhaps most important one is to
fully save and restore the FPU state around each invocation of EFI
runtime, and to not choke on non-ASCII characters in the boot stub"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Add compatibility code for compat tasks
efivars: Refactor sanity checking code into separate function
efivars: Stop passing a struct argument to efivar_validate()
efivars: Check size of user object
efivars: Use local variables instead of a pointer dereference
x86/efi: Save and restore FPU context around efi_calls (i386)
x86/efi: Save and restore FPU context around efi_calls (x86_64)
x86/efi: Implement a __efi_call_virt macro
x86, fpu: Extend the use of static_cpu_has_safe
x86/efi: Delete most of the efi_call* macros
efi: x86: Handle arbitrary Unicode characters
efi: Add get_dram_base() helper function
efi: Add shared printk wrapper for consistent prefixing
efi: create memory map iteration helper
efi: efi-stub-helper cleanup
By changing code16gcc.h from a C header to an assembly header and use
the -Wa,... option to gcc to force it to be added to the assembly
input, we can avoid the problems with gcc reordering code bits on us.
If we have -m16, we still use it, of course.
Suggested-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-xw8ibgdemucl9fz3i1bymu6w@git.kernel.org
Pull x86 boot changes from Ingo Molnar:
"Two small cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, boot: Remove misc.h inclusion from compressed/string.c
x86, boot: Do not include boot.h in string.c
Given the fact that we removed inclusion of boot.h from boot/string.c
does not look like we need misc.h inclusion in compressed/string.c. So
remove it.
misc.h was also pulling in string_32.h which in turn had macros for
memcmp and memcpy. So we don't need to #undef memcmp and memcpy anymore.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1398447972-27896-3-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
string.c does not require whole of boot.h. Just inclusion of linux/types.h
and ctypes.h seems to be sufficient.
Keep list of stuff being included in string.c to bare minimal so that
string.c can be included in other places easily.
For example, Currently boot/compressed/string.c includes boot/string.c
but looks like it does not want boot/boot.h. Hence there is a define
in boot/compressed/misc.h "define BOOT_BOOT_H" which prevents inclusion
of boot.h in compressed/string.c. And compressed/string.c is forced to
include misc.h just for that reason.
So by removing inclusion of boot.h, we can also get rid of inclusion of
misch.h in compressed/misc.c.
This also enables including of boot/string.c in purgatory/ code relatively
easily.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1398447972-27896-2-git-send-email-vgoyal@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
As requested by Linus add explicit __visible to the asmlinkage users.
This marks all functions visible to assembler.
Tree sweep for arch/x86/*
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
arch/x86/crypto/sha1_avx2_x86_64_asm.S introduced _end as a local
symbol, which broke the build under certain circumstances. Although
the wisdom of _end as a local symbol can definitely be questioned, the
build should not break for that reason.
Thus, filter the output of nm to only get global symbols of
appropriate type.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-uxm3j3w3odglcwhafwq5tjqu@git.kernel.org
We really only need one phys and one virt function call, and then only
one assembly function to make firmware calls.
Since we are not using the C type system anyway, we're not really losing
much by deleting the macros apart from no longer having a check that
we are passing the correct number of parameters. The lack of duplicated
code seems like a worthwhile trade-off.
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Instead of truncating UTF-16 assuming all characters is ASCII,
properly convert it to UTF-8.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[ Bug and style fixes. ]
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull x86 fixes from Peter Anvin:
"This is a collection of minor fixes for x86, plus the IRET information
leak fix (forbid the use of 16-bit segments in 64-bit mode)"
NOTE! We may have to relax the "forbid the use of 16-bit segments in
64-bit mode" part, since there may be people who still run and depend on
16-bit Windows binaries under Wine.
But I'm taking this in the current unconditional form for now to see who
(if anybody) screams bloody murder. Maybe nobody cares. And maybe
we'll have to update it with some kind of runtime enablement (like our
vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already
need to tweak).
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
efi: Pass correct file handle to efi_file_{read,close}
x86/efi: Correct EFI boot stub use of code32_start
x86/efi: Fix boot failure with EFI stub
x86/platform/hyperv: Handle VMBUS driver being a module
x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround
x86, CMCI: Add proper detection of end of CMCI storms
firmware was reading random values from the stack because we were
passing a pointer to the wrong object type.
* Kernel corruption has been reported when booting with the EFI boot
stub which was tracked down to setting a bogus value for
bp->hdr.code32_start, resulting in corruption during relocation.
* Olivier Martin reported that the wrong file handles were being passed
to efi_file_(read|close), which works for x86 by luck due to the way
that the FAT driver is implemented, but doesn't work on ARM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTR5QIAAoJEC84WcCNIz1VMn0P/01GF8A2frSK+NuCJCkmZoAa
fcOvcHmQajNwG3WAVsVWlS/i2QsYwK1jAgameEusn+FFrnWIwaZ9qb1TjEMbJylu
4odaRc1YYiLOJi9UD2jRB644374jJwgwteKGs0Vt99g4pa8HsgSbXTR6oF8PUDWr
1HZUV9tq8O1eAzpQdMADEgWYieylnldfvHk+ArPTJyR5fTNx8xCYALlCthc6Tv+A
cpi0rQj/YzNh+vqZF1YYZ8xqktvV1di2Hvmy3UVt05y1kwkaTquNY9478ZRF5UHm
oUk3nAYyA9M/1gxVnvUfyLgUtrWtyF02N+iDTxLoz05KxeK5wVdKaIPZfSAUrglt
hOvnL+5EOss6w9gG19zpPD4FVHCd696W+iCIBoqooWJqX8AqOVRr81GTYb3q3YDr
EIH0wLipuV4XI4sdN8JMH9fIbfkRdAvaGUR2lPSYFq2Cm7nn2hs820UdKFYeH0wT
fdgtGpWAdXhEq/SUW4KRZMCXLDz4XuNF3d/JREcC28CyiRgdjKFD/PMbZEShpisF
fYE16+IiAq8UMgfgUDqlrSP2UMqkyZ2kp5itvJBrLbTD6rWzEcpK+CMXqykWTOwV
ONzPAfZEbUmFuU3JhKOTFO5uf7dM9EG5BDKduWR6Wjl8VIVTQlD8R1OB5o1lbZPN
ecFWo1eIQGZjeoMm36EM
=rovT
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
"* Fix EFI boot regression introduced during the merge window where the
firmware was reading random values from the stack because we were
passing a pointer to the wrong object type.
* Kernel corruption has been reported when booting with the EFI boot
stub which was tracked down to setting a bogus value for
bp->hdr.code32_start, resulting in corruption during relocation.
* Olivier Martin reported that the wrong file handles were being passed
to efi_file_(read|close), which works for x86 by luck due to the way
that the FAT driver is implemented, but doesn't work on ARM."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We're currently passing the file handle for the root file system to
efi_file_read() and efi_file_close(), instead of the file handle for the
file we wish to read/close.
While this has worked up until now, it seems that it has only been by
pure luck. Olivier explains,
"The issue is the UEFI Fat driver might return the same function for
'fh->read()' and 'h->read()'. While in our case it does not work with
a different implementation of EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. In our
case, we return a different pointer when reading a directory and
reading a file."
Fixing this actually clears up the two functions because we can drop one
of the arguments, and instead only pass a file 'handle' argument.
Reported-by: Olivier Martin <olivier.martin@arm.com>
Reviewed-by: Olivier Martin <olivier.martin@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
code32_start should point at the start of the protected mode code, and
*not* at the beginning of the bzImage. This is much easier to do in
assembly so document that callers of make_boot_params() need to fill out
code32_start.
The fallout from this bug is that we would end up relocating the image
but copying the image at some offset, resulting in what appeared to be
memory corruption.
Reported-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
commit 54b52d8726 ("x86/efi: Build our own EFI services pointer
table") introduced a regression because the 64-bit file_size()
implementation passed a pointer to a 32-bit data object, instead of a
pointer to a 64-bit object.
Because the firmware treats the object as 64-bits regardless it was
reading random values from the stack for the upper 32-bits.
This resulted in people being unable to boot their machines, after
seeing the following error messages,
Failed to get file info size
Failed to alloc highmem for files
Reported-by: Dzmitry Sledneu <dzmitry.sledneu@gmail.com>
Reported-by: Koen Kooi <koen@dominion.thruhere.net>
Tested-by: Koen Kooi <koen@dominion.thruhere.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull x86 boot changes from Peter Anvin:
"This patchset is a set of cleanups aiming at librarize some of the
common code from the boot environments. We currently have three
different "little environments" (boot, boot/compressed, and
realmode/rm) in x86, and we are likely to soon get a fourth one
(kexec/purgatory, which will have to be integrated in the kernel to
support secure kexec). This is primarily a cleanup in the
anticipation of the latter.
While Vivek implemented this, he ran into some bugs, in particular the
memcmp implementation for when gcc punts from using the builtin would
have a misnamed symbol, causing compilation errors if we were ever
unlucky enough that gcc didn't want to inline the test"
* 'x86/boot' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, boot: Move memset() definition in compressed/string.c
x86, boot: Move memcmp() into string.h and string.c
x86, boot: Move optimized memcpy() 32/64 bit versions to compressed/string.c
x86, boot: Create a separate string.h file to provide standard string functions
x86, boot: Undef memcmp before providing a new definition
Pull x86 EFI changes from Ingo Molnar:
"The main changes:
- Add debug code to the dump EFI pagetable - Borislav Petkov
- Make 1:1 runtime mapping robust when booting on machines with lots
of memory - Borislav Petkov
- Move the EFI facilities bits out of 'x86_efi_facility' and into
efi.flags which is the standard architecture independent place to
keep EFI state, by Matt Fleming.
- Add 'EFI mixed mode' support: this allows 64-bit kernels to be
booted from 32-bit firmware. This needs a bootloader that supports
the 'EFI handover protocol'. By Matt Fleming"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86, efi: Abstract x86 efi_early calls
x86/efi: Restore 'attr' argument to query_variable_info()
x86/efi: Rip out phys_efi_get_time()
x86/efi: Preserve segment registers in mixed mode
x86/boot: Fix non-EFI build
x86, tools: Fix up compiler warnings
x86/efi: Re-disable interrupts after calling firmware services
x86/boot: Don't overwrite cr4 when enabling PAE
x86/efi: Wire up CONFIG_EFI_MIXED
x86/efi: Add mixed runtime services support
x86/efi: Firmware agnostic handover entry points
x86/efi: Split the boot stub into 32/64 code paths
x86/efi: Add early thunk code to go from 64-bit to 32-bit
x86/efi: Build our own EFI services pointer table
efi: Add separate 32-bit/64-bit definitions
x86/efi: Delete dead code when checking for non-native
x86/mm/pageattr: Always dump the right page table in an oops
x86, tools: Consolidate #ifdef code
x86/boot: Cleanup header.S by removing some #ifdefs
efi: Use NULL instead of 0 for pointer
...
Pull x86 cpu handling changes from Ingo Molnar:
"Bigger changes:
- Intel CPU hardware-enablement: new vector instructions support
(AVX-512), by Fenghua Yu.
- Support the clflushopt instruction and use it in appropriate
places. clflushopt is similar to clflush but with more relaxed
ordering, by Ross Zwisler.
- MSR accessor cleanups, by Borislav Petkov.
- 'forcepae' boot flag for those who have way too much time to spend
on way too old Pentium-M systems and want to live way too
dangerously, by Chris Bainbridge"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic
x86, Intel: Convert to the new bit access MSR accessors
x86, AMD: Convert to the new bit access MSR accessors
x86: Add another set of MSR accessor functions
x86: Use clflushopt in drm_clflush_virt_range
x86: Use clflushopt in drm_clflush_page
x86: Use clflushopt in clflush_cache_range
x86: Add support for the clflushopt instruction
x86, AVX-512: Enable AVX-512 States Context Switch
x86, AVX-512: AVX-512 Feature Detection