2017-11-24 22:00:32 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0+
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
2012-07-20 17:15:04 +08:00
|
|
|
* Kernel module help for s390.
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
|
|
|
* S390 version
|
2012-07-20 17:15:04 +08:00
|
|
|
* Copyright IBM Corp. 2002, 2003
|
2005-04-17 06:20:36 +08:00
|
|
|
* Author(s): Arnd Bergmann (arndb@de.ibm.com)
|
|
|
|
* Martin Schwidefsky (schwidefsky@de.ibm.com)
|
|
|
|
*
|
|
|
|
* based on i386 version
|
|
|
|
* Copyright (C) 2001 Rusty Russell.
|
|
|
|
*/
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/elf.h>
|
|
|
|
#include <linux/vmalloc.h>
|
|
|
|
#include <linux/fs.h>
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
#include <linux/ftrace.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/kernel.h>
|
2017-11-18 01:22:24 +08:00
|
|
|
#include <linux/kasan.h>
|
2007-02-06 04:16:47 +08:00
|
|
|
#include <linux/moduleloader.h>
|
2007-04-27 22:01:42 +08:00
|
|
|
#include <linux/bug.h>
|
2020-04-29 23:24:48 +08:00
|
|
|
#include <linux/memory.h>
|
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 19:01:47 +08:00
|
|
|
#include <asm/alternative.h>
|
2018-01-26 19:46:47 +08:00
|
|
|
#include <asm/nospec-branch.h>
|
|
|
|
#include <asm/facility.h>
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
#include <asm/ftrace.lds.h>
|
|
|
|
#include <asm/set_memory.h>
|
2023-03-31 21:03:23 +08:00
|
|
|
#include <asm/setup.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
#if 0
|
|
|
|
#define DEBUGP printk
|
|
|
|
#else
|
|
|
|
#define DEBUGP(fmt , ...)
|
|
|
|
#endif
|
|
|
|
|
2022-01-20 02:26:37 +08:00
|
|
|
#define PLT_ENTRY_SIZE 22
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2023-03-31 21:03:23 +08:00
|
|
|
static unsigned long get_module_load_offset(void)
|
|
|
|
{
|
|
|
|
static DEFINE_MUTEX(module_kaslr_mutex);
|
|
|
|
static unsigned long module_load_offset;
|
|
|
|
|
|
|
|
if (!kaslr_enabled())
|
|
|
|
return 0;
|
|
|
|
/*
|
|
|
|
* Calculate the module_load_offset the first time this code
|
|
|
|
* is called. Once calculated it stays the same until reboot.
|
|
|
|
*/
|
|
|
|
mutex_lock(&module_kaslr_mutex);
|
|
|
|
if (!module_load_offset)
|
|
|
|
module_load_offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE;
|
|
|
|
mutex_unlock(&module_kaslr_mutex);
|
|
|
|
return module_load_offset;
|
|
|
|
}
|
|
|
|
|
2012-10-05 22:52:18 +08:00
|
|
|
void *module_alloc(unsigned long size)
|
|
|
|
{
|
mm: defer kmemleak object creation of module_alloc()
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].
When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.
module_alloc
__vmalloc_node_range
kmemleak_vmalloc
kmemleak_scan
update_checksum
kasan_module_alloc
kmemleak_ignore
Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated. Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.
Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.
[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/
[wangkefeng.wang@huawei.com: fix build]
Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82de4 ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc682 ("arm64: add KASAN support")
Fixes: bebf56a1b176 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:04:11 +08:00
|
|
|
gfp_t gfp_mask = GFP_KERNEL;
|
2017-11-18 01:22:24 +08:00
|
|
|
void *p;
|
|
|
|
|
2012-10-05 22:52:18 +08:00
|
|
|
if (PAGE_ALIGN(size) > MODULES_LEN)
|
|
|
|
return NULL;
|
2023-03-31 21:03:23 +08:00
|
|
|
p = __vmalloc_node_range(size, MODULE_ALIGN,
|
2023-04-14 20:30:46 +08:00
|
|
|
MODULES_VADDR + get_module_load_offset(),
|
|
|
|
MODULES_END, gfp_mask, PAGE_KERNEL,
|
|
|
|
VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
|
|
|
|
NUMA_NO_NODE, __builtin_return_address(0));
|
2022-03-25 09:10:52 +08:00
|
|
|
if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
|
2017-11-18 01:22:24 +08:00
|
|
|
vfree(p);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
return p;
|
2012-10-05 22:52:18 +08:00
|
|
|
}
|
|
|
|
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
|
|
|
void module_arch_cleanup(struct module *mod)
|
|
|
|
{
|
|
|
|
module_memfree(mod->arch.trampolines_start);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2015-01-20 06:37:04 +08:00
|
|
|
void module_arch_freeing_init(struct module *mod)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2016-03-23 08:03:17 +08:00
|
|
|
if (is_livepatch_module(mod) &&
|
|
|
|
mod->state == MODULE_STATE_LIVE)
|
|
|
|
return;
|
|
|
|
|
2015-01-20 06:37:04 +08:00
|
|
|
vfree(mod->arch.syminfo);
|
|
|
|
mod->arch.syminfo = NULL;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2013-01-11 20:15:35 +08:00
|
|
|
static void check_rela(Elf_Rela *rela, struct module *me)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct mod_arch_syminfo *info;
|
|
|
|
|
|
|
|
info = me->arch.syminfo + ELF_R_SYM (rela->r_info);
|
|
|
|
switch (ELF_R_TYPE (rela->r_info)) {
|
|
|
|
case R_390_GOT12: /* 12 bit GOT offset. */
|
|
|
|
case R_390_GOT16: /* 16 bit GOT offset. */
|
|
|
|
case R_390_GOT20: /* 20 bit GOT offset. */
|
|
|
|
case R_390_GOT32: /* 32 bit GOT offset. */
|
|
|
|
case R_390_GOT64: /* 64 bit GOT offset. */
|
|
|
|
case R_390_GOTENT: /* 32 bit PC rel. to GOT entry shifted by 1. */
|
|
|
|
case R_390_GOTPLT12: /* 12 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT16: /* 16 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT20: /* 20 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT32: /* 32 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT64: /* 64 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLTENT: /* 32 bit rel. offset to jump slot >> 1. */
|
|
|
|
if (info->got_offset == -1UL) {
|
|
|
|
info->got_offset = me->arch.got_size;
|
|
|
|
me->arch.got_size += sizeof(void*);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case R_390_PLT16DBL: /* 16 bit PC rel. PLT shifted by 1. */
|
|
|
|
case R_390_PLT32DBL: /* 32 bit PC rel. PLT shifted by 1. */
|
|
|
|
case R_390_PLT32: /* 32 bit PC relative PLT address. */
|
|
|
|
case R_390_PLT64: /* 64 bit PC relative PLT address. */
|
|
|
|
case R_390_PLTOFF16: /* 16 bit offset from GOT to PLT. */
|
|
|
|
case R_390_PLTOFF32: /* 32 bit offset from GOT to PLT. */
|
|
|
|
case R_390_PLTOFF64: /* 16 bit offset from GOT to PLT. */
|
|
|
|
if (info->plt_offset == -1UL) {
|
|
|
|
info->plt_offset = me->arch.plt_size;
|
|
|
|
me->arch.plt_size += PLT_ENTRY_SIZE;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case R_390_COPY:
|
|
|
|
case R_390_GLOB_DAT:
|
|
|
|
case R_390_JMP_SLOT:
|
|
|
|
case R_390_RELATIVE:
|
|
|
|
/* Only needed if we want to support loading of
|
|
|
|
modules linked with -shared. */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Account for GOT and PLT relocations. We can't add sections for
|
|
|
|
* got and plt but we can increase the core module size.
|
|
|
|
*/
|
2013-01-11 20:15:35 +08:00
|
|
|
int module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
|
|
|
|
char *secstrings, struct module *me)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
Elf_Shdr *symtab;
|
|
|
|
Elf_Sym *symbols;
|
|
|
|
Elf_Rela *rela;
|
|
|
|
char *strings;
|
|
|
|
int nrela, i, j;
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
struct module_memory *mod_mem;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* Find symbol table and string table. */
|
2006-07-12 22:41:55 +08:00
|
|
|
symtab = NULL;
|
2005-04-17 06:20:36 +08:00
|
|
|
for (i = 0; i < hdr->e_shnum; i++)
|
|
|
|
switch (sechdrs[i].sh_type) {
|
|
|
|
case SHT_SYMTAB:
|
|
|
|
symtab = sechdrs + i;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (!symtab) {
|
|
|
|
printk(KERN_ERR "module %s: no symbol table\n", me->name);
|
|
|
|
return -ENOEXEC;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Allocate one syminfo structure per symbol. */
|
|
|
|
me->arch.nsyms = symtab->sh_size / sizeof(Elf_Sym);
|
treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:
vmalloc(a * b)
with:
vmalloc(array_size(a, b))
as well as handling cases of:
vmalloc(a * b * c)
with:
vmalloc(array3_size(a, b, c))
This does, however, attempt to ignore constant size factors like:
vmalloc(4 * 1024)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
vmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
vmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
vmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
vmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
vmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
vmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
vmalloc(
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
vmalloc(
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
vmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
vmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
vmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
vmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
vmalloc(C1 * C2 * C3, ...)
|
vmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@
(
vmalloc(C1 * C2, ...)
|
vmalloc(
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:27:11 +08:00
|
|
|
me->arch.syminfo = vmalloc(array_size(sizeof(struct mod_arch_syminfo),
|
|
|
|
me->arch.nsyms));
|
2005-04-17 06:20:36 +08:00
|
|
|
if (!me->arch.syminfo)
|
|
|
|
return -ENOMEM;
|
|
|
|
symbols = (void *) hdr + symtab->sh_offset;
|
|
|
|
strings = (void *) hdr + sechdrs[symtab->sh_link].sh_offset;
|
|
|
|
for (i = 0; i < me->arch.nsyms; i++) {
|
|
|
|
if (symbols[i].st_shndx == SHN_UNDEF &&
|
|
|
|
strcmp(strings + symbols[i].st_name,
|
|
|
|
"_GLOBAL_OFFSET_TABLE_") == 0)
|
|
|
|
/* "Define" it as absolute. */
|
|
|
|
symbols[i].st_shndx = SHN_ABS;
|
|
|
|
me->arch.syminfo[i].got_offset = -1UL;
|
|
|
|
me->arch.syminfo[i].plt_offset = -1UL;
|
|
|
|
me->arch.syminfo[i].got_initialized = 0;
|
|
|
|
me->arch.syminfo[i].plt_initialized = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Search for got/plt relocations. */
|
|
|
|
me->arch.got_size = me->arch.plt_size = 0;
|
|
|
|
for (i = 0; i < hdr->e_shnum; i++) {
|
|
|
|
if (sechdrs[i].sh_type != SHT_RELA)
|
|
|
|
continue;
|
|
|
|
nrela = sechdrs[i].sh_size / sizeof(Elf_Rela);
|
|
|
|
rela = (void *) hdr + sechdrs[i].sh_offset;
|
|
|
|
for (j = 0; j < nrela; j++)
|
|
|
|
check_rela(rela + j, me);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Increase core size by size of got & plt and set start
|
|
|
|
offsets for got and plt. */
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
mod_mem = &me->mem[MOD_TEXT];
|
|
|
|
mod_mem->size = ALIGN(mod_mem->size, 4);
|
|
|
|
me->arch.got_offset = mod_mem->size;
|
|
|
|
mod_mem->size += me->arch.got_size;
|
|
|
|
me->arch.plt_offset = mod_mem->size;
|
2018-01-26 19:46:47 +08:00
|
|
|
if (me->arch.plt_size) {
|
2018-03-23 20:04:49 +08:00
|
|
|
if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable)
|
2018-01-26 19:46:47 +08:00
|
|
|
me->arch.plt_size += PLT_ENTRY_SIZE;
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
mod_mem->size += me->arch.plt_size;
|
2018-01-26 19:46:47 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-01-11 20:15:35 +08:00
|
|
|
static int apply_rela_bits(Elf_Addr loc, Elf_Addr val,
|
2020-04-29 23:24:48 +08:00
|
|
|
int sign, int bits, int shift,
|
|
|
|
void *(*write)(void *dest, const void *src, size_t len))
|
2013-01-11 20:15:35 +08:00
|
|
|
{
|
|
|
|
unsigned long umax;
|
|
|
|
long min, max;
|
2020-04-29 23:24:48 +08:00
|
|
|
void *dest = (void *)loc;
|
2013-01-11 20:15:35 +08:00
|
|
|
|
|
|
|
if (val & ((1UL << shift) - 1))
|
|
|
|
return -ENOEXEC;
|
|
|
|
if (sign) {
|
|
|
|
val = (Elf_Addr)(((long) val) >> shift);
|
|
|
|
min = -(1L << (bits - 1));
|
|
|
|
max = (1L << (bits - 1)) - 1;
|
|
|
|
if ((long) val < min || (long) val > max)
|
|
|
|
return -ENOEXEC;
|
|
|
|
} else {
|
|
|
|
val >>= shift;
|
|
|
|
umax = ((1UL << (bits - 1)) << 1) - 1;
|
|
|
|
if ((unsigned long) val > umax)
|
|
|
|
return -ENOEXEC;
|
|
|
|
}
|
|
|
|
|
2020-04-29 23:24:48 +08:00
|
|
|
if (bits == 8) {
|
|
|
|
unsigned char tmp = val;
|
|
|
|
write(dest, &tmp, 1);
|
|
|
|
} else if (bits == 12) {
|
|
|
|
unsigned short tmp = (val & 0xfff) |
|
2013-01-11 20:15:35 +08:00
|
|
|
(*(unsigned short *) loc & 0xf000);
|
2020-04-29 23:24:48 +08:00
|
|
|
write(dest, &tmp, 2);
|
|
|
|
} else if (bits == 16) {
|
|
|
|
unsigned short tmp = val;
|
|
|
|
write(dest, &tmp, 2);
|
|
|
|
} else if (bits == 20) {
|
|
|
|
unsigned int tmp = (val & 0xfff) << 16 |
|
|
|
|
(val & 0xff000) >> 4 | (*(unsigned int *) loc & 0xf00000ff);
|
|
|
|
write(dest, &tmp, 4);
|
|
|
|
} else if (bits == 32) {
|
|
|
|
unsigned int tmp = val;
|
|
|
|
write(dest, &tmp, 4);
|
|
|
|
} else if (bits == 64) {
|
|
|
|
unsigned long tmp = val;
|
|
|
|
write(dest, &tmp, 8);
|
|
|
|
}
|
2013-01-11 20:15:35 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
|
2020-04-29 23:24:48 +08:00
|
|
|
const char *strtab, struct module *me,
|
|
|
|
void *(*write)(void *dest, const void *src, size_t len))
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct mod_arch_syminfo *info;
|
|
|
|
Elf_Addr loc, val;
|
|
|
|
int r_type, r_sym;
|
2013-02-27 23:28:20 +08:00
|
|
|
int rc = -ENOEXEC;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* This is where to make the change */
|
|
|
|
loc = base + rela->r_offset;
|
|
|
|
/* This is the symbol it is referring to. Note that all
|
|
|
|
undefined symbols have been resolved. */
|
|
|
|
r_sym = ELF_R_SYM(rela->r_info);
|
|
|
|
r_type = ELF_R_TYPE(rela->r_info);
|
|
|
|
info = me->arch.syminfo + r_sym;
|
|
|
|
val = symtab[r_sym].st_value;
|
|
|
|
|
|
|
|
switch (r_type) {
|
2012-11-01 00:26:44 +08:00
|
|
|
case R_390_NONE: /* No relocation. */
|
|
|
|
rc = 0;
|
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
case R_390_8: /* Direct 8 bit. */
|
|
|
|
case R_390_12: /* Direct 12 bit. */
|
|
|
|
case R_390_16: /* Direct 16 bit. */
|
|
|
|
case R_390_20: /* Direct 20 bit. */
|
|
|
|
case R_390_32: /* Direct 32 bit. */
|
|
|
|
case R_390_64: /* Direct 64 bit. */
|
|
|
|
val += rela->r_addend;
|
|
|
|
if (r_type == R_390_8)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 8, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_12)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 12, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_16)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 16, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_20)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 20, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_32)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 32, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_64)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 64, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case R_390_PC16: /* PC relative 16 bit. */
|
|
|
|
case R_390_PC16DBL: /* PC relative 16 bit shifted by 1. */
|
|
|
|
case R_390_PC32DBL: /* PC relative 32 bit shifted by 1. */
|
|
|
|
case R_390_PC32: /* PC relative 32 bit. */
|
|
|
|
case R_390_PC64: /* PC relative 64 bit. */
|
|
|
|
val += rela->r_addend - loc;
|
|
|
|
if (r_type == R_390_PC16)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 16, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PC16DBL)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 16, 1, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PC32DBL)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 32, 1, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PC32)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 32, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PC64)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 64, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case R_390_GOT12: /* 12 bit GOT offset. */
|
|
|
|
case R_390_GOT16: /* 16 bit GOT offset. */
|
|
|
|
case R_390_GOT20: /* 20 bit GOT offset. */
|
|
|
|
case R_390_GOT32: /* 32 bit GOT offset. */
|
|
|
|
case R_390_GOT64: /* 64 bit GOT offset. */
|
|
|
|
case R_390_GOTENT: /* 32 bit PC rel. to GOT entry shifted by 1. */
|
|
|
|
case R_390_GOTPLT12: /* 12 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT20: /* 20 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT16: /* 16 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT32: /* 32 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLT64: /* 64 bit offset to jump slot. */
|
|
|
|
case R_390_GOTPLTENT: /* 32 bit rel. offset to jump slot >> 1. */
|
|
|
|
if (info->got_initialized == 0) {
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
Elf_Addr *gotent = me->mem[MOD_TEXT].base +
|
2020-04-29 23:24:48 +08:00
|
|
|
me->arch.got_offset +
|
|
|
|
info->got_offset;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2020-04-29 23:24:48 +08:00
|
|
|
write(gotent, &val, sizeof(*gotent));
|
2005-04-17 06:20:36 +08:00
|
|
|
info->got_initialized = 1;
|
|
|
|
}
|
|
|
|
val = info->got_offset + rela->r_addend;
|
|
|
|
if (r_type == R_390_GOT12 ||
|
|
|
|
r_type == R_390_GOTPLT12)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 12, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOT16 ||
|
|
|
|
r_type == R_390_GOTPLT16)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 16, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOT20 ||
|
|
|
|
r_type == R_390_GOTPLT20)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 20, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOT32 ||
|
|
|
|
r_type == R_390_GOTPLT32)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 32, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOT64 ||
|
|
|
|
r_type == R_390_GOTPLT64)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 64, 0, write);
|
2013-01-11 20:15:35 +08:00
|
|
|
else if (r_type == R_390_GOTENT ||
|
|
|
|
r_type == R_390_GOTPLTENT) {
|
2023-06-05 15:55:27 +08:00
|
|
|
val += (Elf_Addr)me->mem[MOD_TEXT].base +
|
|
|
|
me->arch.got_offset - loc;
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 32, 1, write);
|
2013-01-11 20:15:35 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case R_390_PLT16DBL: /* 16 bit PC rel. PLT shifted by 1. */
|
|
|
|
case R_390_PLT32DBL: /* 32 bit PC rel. PLT shifted by 1. */
|
|
|
|
case R_390_PLT32: /* 32 bit PC relative PLT address. */
|
|
|
|
case R_390_PLT64: /* 64 bit PC relative PLT address. */
|
|
|
|
case R_390_PLTOFF16: /* 16 bit offset from GOT to PLT. */
|
|
|
|
case R_390_PLTOFF32: /* 32 bit offset from GOT to PLT. */
|
|
|
|
case R_390_PLTOFF64: /* 16 bit offset from GOT to PLT. */
|
|
|
|
if (info->plt_initialized == 0) {
|
2022-01-20 02:26:37 +08:00
|
|
|
unsigned char insn[PLT_ENTRY_SIZE];
|
|
|
|
char *plt_base;
|
|
|
|
char *ip;
|
2020-04-29 23:24:48 +08:00
|
|
|
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
plt_base = me->mem[MOD_TEXT].base + me->arch.plt_offset;
|
2022-01-20 02:26:37 +08:00
|
|
|
ip = plt_base + info->plt_offset;
|
|
|
|
*(int *)insn = 0x0d10e310; /* basr 1,0 */
|
|
|
|
*(int *)&insn[4] = 0x100c0004; /* lg 1,12(1) */
|
2018-03-23 20:04:49 +08:00
|
|
|
if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable) {
|
2022-01-20 02:26:37 +08:00
|
|
|
char *jump_r1;
|
|
|
|
|
|
|
|
jump_r1 = plt_base + me->arch.plt_size -
|
|
|
|
PLT_ENTRY_SIZE;
|
|
|
|
/* brcl 0xf,__jump_r1 */
|
|
|
|
*(short *)&insn[8] = 0xc0f4;
|
|
|
|
*(int *)&insn[10] = (jump_r1 - (ip + 8)) / 2;
|
2018-01-26 19:46:47 +08:00
|
|
|
} else {
|
2022-01-20 02:26:37 +08:00
|
|
|
*(int *)&insn[8] = 0x07f10000; /* br %r1 */
|
2018-01-26 19:46:47 +08:00
|
|
|
}
|
2022-01-20 02:26:37 +08:00
|
|
|
*(long *)&insn[14] = val;
|
2020-04-29 23:24:48 +08:00
|
|
|
|
|
|
|
write(ip, insn, sizeof(insn));
|
2005-04-17 06:20:36 +08:00
|
|
|
info->plt_initialized = 1;
|
|
|
|
}
|
|
|
|
if (r_type == R_390_PLTOFF16 ||
|
2009-03-26 22:24:36 +08:00
|
|
|
r_type == R_390_PLTOFF32 ||
|
|
|
|
r_type == R_390_PLTOFF64)
|
2005-04-17 06:20:36 +08:00
|
|
|
val = me->arch.plt_offset - me->arch.got_offset +
|
|
|
|
info->plt_offset + rela->r_addend;
|
2009-03-26 22:24:36 +08:00
|
|
|
else {
|
|
|
|
if (!((r_type == R_390_PLT16DBL &&
|
|
|
|
val - loc + 0xffffUL < 0x1ffffeUL) ||
|
|
|
|
(r_type == R_390_PLT32DBL &&
|
|
|
|
val - loc + 0xffffffffULL < 0x1fffffffeULL)))
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
val = (Elf_Addr) me->mem[MOD_TEXT].base +
|
2009-03-26 22:24:36 +08:00
|
|
|
me->arch.plt_offset +
|
|
|
|
info->plt_offset;
|
|
|
|
val += rela->r_addend - loc;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
if (r_type == R_390_PLT16DBL)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 16, 1, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PLTOFF16)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 16, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PLT32DBL)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 32, 1, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PLT32 ||
|
|
|
|
r_type == R_390_PLTOFF32)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 32, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_PLT64 ||
|
|
|
|
r_type == R_390_PLTOFF64)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 64, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case R_390_GOTOFF16: /* 16 bit offset to GOT. */
|
|
|
|
case R_390_GOTOFF32: /* 32 bit offset to GOT. */
|
|
|
|
case R_390_GOTOFF64: /* 64 bit offset to GOT. */
|
|
|
|
val = val + rela->r_addend -
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
((Elf_Addr) me->mem[MOD_TEXT].base + me->arch.got_offset);
|
2005-04-17 06:20:36 +08:00
|
|
|
if (r_type == R_390_GOTOFF16)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 16, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOTOFF32)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 32, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOTOFF64)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 0, 64, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case R_390_GOTPC: /* 32 bit PC relative offset to GOT. */
|
|
|
|
case R_390_GOTPCDBL: /* 32 bit PC rel. off. to GOT shifted by 1. */
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
val = (Elf_Addr) me->mem[MOD_TEXT].base + me->arch.got_offset +
|
2005-04-17 06:20:36 +08:00
|
|
|
rela->r_addend - loc;
|
|
|
|
if (r_type == R_390_GOTPC)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 32, 0, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
else if (r_type == R_390_GOTPCDBL)
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela_bits(loc, val, 1, 32, 1, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case R_390_COPY:
|
|
|
|
case R_390_GLOB_DAT: /* Create GOT entry. */
|
|
|
|
case R_390_JMP_SLOT: /* Create PLT entry. */
|
|
|
|
case R_390_RELATIVE: /* Adjust by program base. */
|
|
|
|
/* Only needed if we want to support loading of
|
|
|
|
modules linked with -shared. */
|
2013-01-11 20:15:35 +08:00
|
|
|
return -ENOEXEC;
|
2005-04-17 06:20:36 +08:00
|
|
|
default:
|
2013-01-11 20:15:35 +08:00
|
|
|
printk(KERN_ERR "module %s: unknown relocation: %u\n",
|
2005-04-17 06:20:36 +08:00
|
|
|
me->name, r_type);
|
|
|
|
return -ENOEXEC;
|
|
|
|
}
|
2013-01-11 20:15:35 +08:00
|
|
|
if (rc) {
|
|
|
|
printk(KERN_ERR "module %s: relocation error for symbol %s "
|
|
|
|
"(r_type %i, value 0x%lx)\n",
|
|
|
|
me->name, strtab + symtab[r_sym].st_name,
|
|
|
|
r_type, (unsigned long) val);
|
|
|
|
return rc;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-04-29 23:24:48 +08:00
|
|
|
static int __apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
|
2013-01-11 20:15:35 +08:00
|
|
|
unsigned int symindex, unsigned int relsec,
|
2020-04-29 23:24:48 +08:00
|
|
|
struct module *me,
|
|
|
|
void *(*write)(void *dest, const void *src, size_t len))
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
Elf_Addr base;
|
|
|
|
Elf_Sym *symtab;
|
|
|
|
Elf_Rela *rela;
|
|
|
|
unsigned long i, n;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
DEBUGP("Applying relocate section %u to %u\n",
|
|
|
|
relsec, sechdrs[relsec].sh_info);
|
|
|
|
base = sechdrs[sechdrs[relsec].sh_info].sh_addr;
|
|
|
|
symtab = (Elf_Sym *) sechdrs[symindex].sh_addr;
|
|
|
|
rela = (Elf_Rela *) sechdrs[relsec].sh_addr;
|
|
|
|
n = sechdrs[relsec].sh_size / sizeof(Elf_Rela);
|
|
|
|
|
|
|
|
for (i = 0; i < n; i++, rela++) {
|
2020-04-29 23:24:48 +08:00
|
|
|
rc = apply_rela(rela, base, symtab, strtab, me, write);
|
2005-04-17 06:20:36 +08:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-04-29 23:24:48 +08:00
|
|
|
int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
|
|
|
|
unsigned int symindex, unsigned int relsec,
|
|
|
|
struct module *me)
|
|
|
|
{
|
|
|
|
bool early = me->state == MODULE_STATE_UNFORMED;
|
|
|
|
void *(*write)(void *, const void *, size_t) = memcpy;
|
|
|
|
|
|
|
|
if (!early)
|
|
|
|
write = s390_kernel_write;
|
|
|
|
|
|
|
|
return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me,
|
|
|
|
write);
|
|
|
|
}
|
|
|
|
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
|
|
|
static int module_alloc_ftrace_hotpatch_trampolines(struct module *me,
|
|
|
|
const Elf_Shdr *s)
|
|
|
|
{
|
|
|
|
char *start, *end;
|
|
|
|
int numpages;
|
|
|
|
size_t size;
|
|
|
|
|
|
|
|
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
|
|
|
|
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
|
|
|
|
start = module_alloc(numpages * PAGE_SIZE);
|
|
|
|
if (!start)
|
|
|
|
return -ENOMEM;
|
2023-04-03 02:55:19 +08:00
|
|
|
set_memory_rox((unsigned long)start, numpages);
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
end = start + size;
|
|
|
|
|
|
|
|
me->arch.trampolines_start = (struct ftrace_hotpatch_trampoline *)start;
|
|
|
|
me->arch.trampolines_end = (struct ftrace_hotpatch_trampoline *)end;
|
|
|
|
me->arch.next_trampoline = me->arch.trampolines_start;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_FUNCTION_TRACER */
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
int module_finalize(const Elf_Ehdr *hdr,
|
|
|
|
const Elf_Shdr *sechdrs,
|
|
|
|
struct module *me)
|
|
|
|
{
|
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 19:01:47 +08:00
|
|
|
const Elf_Shdr *s;
|
2018-01-26 19:46:47 +08:00
|
|
|
char *secstrings, *secname;
|
|
|
|
void *aseg;
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
|
|
|
int ret;
|
|
|
|
#endif
|
2018-01-26 19:46:47 +08:00
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_EXPOLINE) &&
|
2018-03-23 20:04:49 +08:00
|
|
|
!nospec_disable && me->arch.plt_size) {
|
2018-01-26 19:46:47 +08:00
|
|
|
unsigned int *ij;
|
|
|
|
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-07 08:28:02 +08:00
|
|
|
ij = me->mem[MOD_TEXT].base + me->arch.plt_offset +
|
2018-01-26 19:46:47 +08:00
|
|
|
me->arch.plt_size - PLT_ENTRY_SIZE;
|
2022-02-25 05:43:31 +08:00
|
|
|
ij[0] = 0xc6000000; /* exrl %r0,.+10 */
|
|
|
|
ij[1] = 0x0005a7f4; /* j . */
|
|
|
|
ij[2] = 0x000007f1; /* br %r1 */
|
2018-01-26 19:46:47 +08:00
|
|
|
}
|
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 19:01:47 +08:00
|
|
|
|
2017-11-14 22:20:24 +08:00
|
|
|
secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
|
|
|
|
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
|
2018-01-26 19:46:47 +08:00
|
|
|
aseg = (void *) s->sh_addr;
|
|
|
|
secname = secstrings + s->sh_name;
|
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 19:01:47 +08:00
|
|
|
|
2018-01-26 19:46:47 +08:00
|
|
|
if (!strcmp(".altinstructions", secname))
|
|
|
|
/* patch .altinstructions */
|
2017-11-14 22:20:24 +08:00
|
|
|
apply_alternatives(aseg, aseg + s->sh_size);
|
2018-01-26 19:46:47 +08:00
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_EXPOLINE) &&
|
2019-08-19 23:05:44 +08:00
|
|
|
(str_has_prefix(secname, ".s390_indirect")))
|
2018-03-23 20:04:49 +08:00
|
|
|
nospec_revert(aseg, aseg + s->sh_size);
|
2018-01-26 19:46:47 +08:00
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_EXPOLINE) &&
|
2019-08-19 23:05:44 +08:00
|
|
|
(str_has_prefix(secname, ".s390_return")))
|
2018-03-23 20:04:49 +08:00
|
|
|
nospec_revert(aseg, aseg + s->sh_size);
|
s390/ftrace: implement hotpatching
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-29 05:25:46 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
|
|
|
if (!strcmp(FTRACE_CALLSITE_SECTION, secname)) {
|
|
|
|
ret = module_alloc_ftrace_hotpatch_trampolines(me, s);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_FUNCTION_TRACER */
|
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 19:01:47 +08:00
|
|
|
}
|
|
|
|
|
2010-10-06 02:29:27 +08:00
|
|
|
return 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|