When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), and entropy has been
provided by the bootloader, randomize the placement of RAM inside the
linear region if sufficient space is available. For instance, on a 4KB
granule/3 levels kernel, the linear region is 256 GB in size, and we can
choose any 1 GB aligned offset that is far enough from the top of the
address space to fit the distance between the start of the lowest memblock
and the top of the highest memblock.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This implements CONFIG_RELOCATABLE, which links the final vmlinux
image with a dynamic relocation section, allowing the early boot code
to perform a relocation to a different virtual address at runtime.
This is a prerequisite for KASLR (CONFIG_RANDOMIZE_BASE).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of using absolute addresses for both the exception location
and the fixup, use offsets relative to the exception table entry values.
Not only does this cut the size of the exception table in half, it is
also a prerequisite for KASLR, since absolute exception table entries
are subject to dynamic relocation, which is incompatible with the sorting
of the exception table that occurs at build time.
This patch also introduces the _ASM_EXTABLE preprocessor macro (which
exists on x86 as well) and its _asm_extable assembly counterpart, as
shorthands to emit exception table entries.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This adds support to the generic search_extable() and sort_extable()
implementations for dealing with exception table entries whose fields
contain relative offsets rather than absolute addresses.
Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add support to scripts/sortextable for handling relocatable (PIE)
executables, whose ELF type is ET_DYN, not ET_EXEC. Other than adding
support for the new type, no changes are needed.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This reshuffles some code in asm/elf.h and puts a #ifndef __ASSEMBLY__
around its C definitions so that the CPP defines can be used in asm
source files as well.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Before implementing KASLR for arm64 by building a self-relocating PIE
executable, we have to ensure that values we use before the relocation
routine is executed are not subject to dynamic relocation themselves.
This applies not only to virtual addresses, but also to values that are
supplied by the linker at build time and relocated using R_AARCH64_ABS64
relocations.
So instead, use assemble time constants, or force the use of static
relocations by folding the constants into the instructions.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Unfortunately, the current way of using the linker to emit build time
constants into the Image header will no longer work once we switch to
the use of PIE executables. The reason is that such constants are emitted
into the binary using R_AARCH64_ABS64 relocations, which are resolved at
runtime, not at build time, and the places targeted by those relocations
will contain zeroes before that.
So refactor the endian swapping linker script constant generation code so
that it emits the upper and lower 32-bit words separately.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This adds support for emitting PLTs at module load time for relative
branches that are out of range. This is a prerequisite for KASLR, which
may place the kernel and the modules anywhere in the vmalloc area,
making it more likely that branch target offsets exceed the maximum
range of +/- 128 MB.
In this version, I removed the distinction between relocations against
.init executable sections and ordinary executable sections. The reason
is that it is hardly worth the trouble, given that .init.text usually
does not contain that many far branches, and this version now only
reserves PLT entry space for jump and call relocations against undefined
symbols (since symbols defined in the same module can be assumed to be
within +/- 128 MB)
For example, the mac80211.ko module (which is fairly sizable at ~400 KB)
built with -mcmodel=large gives the following relocation counts:
relocs branches unique !local
.text 3925 3347 518 219
.init.text 11 8 7 1
.exit.text 4 4 4 1
.text.unlikely 81 67 36 17
('unique' means branches to unique type/symbol/addend combos, of which
!local is the subset referring to undefined symbols)
IOW, we are only emitting a single PLT entry for the .init sections, and
we are better off just adding it to the core PLT section instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of reversing the header dependency between asm/bug.h and
asm/debug-monitors.h, split off the brk instruction immediate value
defines into a new header asm/brk-imm.h, and include it from both.
This solves the circular dependency issue that prevents BUG() from
being used in some header files, and keeps the definitions together.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since PAGE_OFFSET is chosen such that it cuts the kernel VA space right
in half, and since the size of the kernel VA space itself is always a
power of 2, we can treat PAGE_OFFSET as a bitmask and replace the
additions/subtractions with 'or' and 'and-not' operations.
For the comparison against PAGE_OFFSET, a mov/cmp/branch sequence ends
up getting replaced with a single tbz instruction. For the additions and
subtractions, we save a mov instruction since the mask is folded into the
instruction's immediate field.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Checking whether memstart_addr has been assigned every time it is
referenced adds a branch instruction that may hurt performance if
the reference in question occurs on a hot path. So only perform the
check if CONFIG_DEBUG_VM=y.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[catalin.marinas@arm.com: replaced #ifdef with VM_BUG_ON]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The former gives better error reporting on unhandled permission faults
(introduced by the UAO patches).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.
This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image. As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.
Special care needs to be taken for dealing with memory limits passed
via mem=, since the generic implementation clips memory top down, which
may clip the kernel image itself if it is loaded high up in memory. To
deal with this case, we simply add back the memory covering the kernel
image, which may result in more memory to be retained than was passed
as a mem= parameter.
Since mem= should not be considered a production feature, a panic notifier
handler is installed that dumps the memory limit at panic time if one was
set.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Before deferring the assignment of memstart_addr in a subsequent patch, to
the moment where all memory has been discovered and possibly clipped based
on the size of the linear region and the presence of a mem= command line
parameter, we need to ensure that memstart_addr is not used to perform __va
translations before it is assigned.
One such use is in the generic early DT discovery of the initrd location,
which is recorded as a virtual address in the globals initrd_start and
initrd_end. So wire up the generic support to declare the initrd addresses,
and implement it without __va() translations, and perform the translation
after memstart_addr has been assigned.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This moves the module area to right before the vmalloc area, and moves
the kernel image to the base of the vmalloc area. This is an intermediate
step towards implementing KASLR, which allows the kernel image to be
located anywhere in the vmalloc area.
Since other subsystems such as hibernate may still need to refer to the
kernel text or data segments via their linears addresses, both are mapped
in the linear region as well. The linear alias of the text region is
mapped read-only/non-executable to prevent inadvertent modification or
execution.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
the HYP mapping at EL2. Before we can move the kernel virtual mapping
out of the linear mapping, we have to make sure that references to kernel
symbols that are accessed via the HYP mapping are translated to their
linear equivalent.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since the early fixmap page tables are populated using pages that are
part of the static footprint of the kernel, they are covered by the
initial kernel mapping, and we can refer to them without using __va/__pa
translations, which are tied to the linear mapping.
Since the fixmap page tables are disjoint from the kernel mapping up
to the top level pgd entry, we can refer to bm_pte[] directly, and there
is no need to walk the page tables and perform __pa()/__va() translations
at each step.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pte_offset_kimg(),
pmd_offset_kimg() and pud_offset_kimg(), which can be used instead
before any page tables have been allocated dynamically.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
the symbolic virtual base of the kernel region, i.e., the kernel's virtual
offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
equal to PAGE_OFFSET, but in the future, it will be moved below it once
we move the kernel virtual mapping out of the linear mapping.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This wires up the existing generic huge-vmap feature, which allows
ioremap() to use PMD or PUD sized block mappings. It also adds support
to the unmap path for dealing with block mappings, which will allow us
to unmap the __init region using unmap_kernel_range() in a subsequent
patch.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, using BUG_ON() in header files is cumbersome, due to the fact
that asm/bug.h transitively includes a lot of other header files, resulting
in the actual BUG_ON() invocation appearing before its definition in the
preprocessor input. So let's reverse the #include dependency between
asm/bug.h and asm/debug-monitors.h, by moving the definition of BUG_BRK_IMM
from the latter to the former. Also fix up one user of asm/debug-monitors.h
which relied on a transitive include.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since architectures may not yet have their linear mapping up and running
when the initrd address is discovered from the DT, factor out the
assignment of initrd_start and initrd_end, so that an architecture can
override it and use the translation it needs.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
By default, early_init_dt_add_memory_arch() ignores memory below
the base of the kernel image since it won't be addressable via the
linear mapping. However, this is not appropriate anymore once we
decouple the kernel text mapping from the linear mapping, so archs
may want to drop the low limit entirely. So allow the minimum to be
overridden by setting MIN_MEMBLOCK_ADDR.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This function was introduced by previous commits implementing UAO.
However, it can be replaced with task_thread_info() in
uao_thread_switch() or get_fs() in do_page_fault() (the latter being
called only on the current context, so no need for using the saved
pt_regs).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If a CPU supports both Privileged Access Never (PAN) and User Access
Override (UAO), we don't need to disable/re-enable PAN round all
copy_to_user() like calls.
UAO alternatives cause these calls to use the 'unprivileged' load/store
instructions, which are overridden to be the privileged kind when
fs==KERNEL_DS.
This patch changes the copy_to_user() calls to have their PAN toggling
depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO.
If both features are detected, PAN will be enabled, but the copy_to_user()
alternatives will not be applied. This means PAN will be enabled all the
time for these functions. If only PAN is detected, the toggling will be
enabled as normal.
This will save the time taken to disable/re-enable PAN, and allow us to
catch copy_to_user() accesses that occur with fs==KERNEL_DS.
Futex and swp-emulation code continue to hang their PAN toggling code on
ARM64_HAS_PAN.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CPU feature code uses the desc field as a test to find the end of the list,
this means every entry must have a description. This generates noise for
entries in the list that aren't really features, but combinations of them.
e.g.
> CPU features: detected feature: Privileged Access Never
> CPU features: detected feature: PAN and not UAO
These combination features are needed for corner cases with alternatives,
where cpu features interact.
Change all walkers of the arm64_features[] and arm64_hwcaps[] lists to test
'matches' not 'desc', and only print 'desc' if it is non-NULL.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
'User Access Override' is a new ARMv8.2 feature which allows the
unprivileged load and store instructions to be overridden to behave in
the normal way.
This patch converts {get,put}_user() and friends to use ldtr*/sttr*
instructions - so that they can only access EL0 memory, then enables
UAO when fs==KERNEL_DS so that these functions can access kernel memory.
This allows user space's read/write permissions to be checked against the
page tables, instead of testing addr<USER_DS, then using the kernel's
read/write permissions.
Signed-off-by: James Morse <james.morse@arm.com>
[catalin.marinas@arm.com: move uao_thread_switch() above dsb()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARMv8.2 adds a new feature register id_aa64mmfr2. This patch adds the
cpu feature boiler plate used by the actual features in later patches.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Older assemblers may not have support for newer feature registers. To get
round this, sysreg.h provides a 'mrs_s' macro that takes a register
encoding and generates the raw instruction.
Change read_cpuid() to use mrs_s in all cases so that new registers
don't have to be a special case. Including sysreg.h means we need to move
the include and definition of read_cpuid() after the #ifndef __ASSEMBLY__
to avoid syntax errors in vmlinux.lds.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The __reg_num_xNN symbols that are used to implement the msr_s and
mrs_s macros are recorded in the ELF metadata of each object file.
This does not affect the size of the final binary, but it does clutter
the output of tools like readelf, i.e.,
$ readelf -a vmlinux |grep -c __reg_num_x
50976
So let's use symbols with the .L prefix, these are strictly local,
and don't end up in the object files.
$ readelf -a vmlinux |grep -c __reg_num_x
0
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Although the arm64 vDSO is cleanly separated by code/data with the
code being read-only in userspace mappings, the code page is still
writable from the kernel. There have been exploits (such as
http://itszn.com/blog/?p=21) that take advantage of this on x86 to go
from a bad kernel write to full root.
Prevent this specific exploit on arm64 by putting the vDSO code page
in read-only memory as well.
Before the change:
[ 3.138366] vdso: 2 pages (1 code @ ffffffc000a71000, 1 data @ ffffffc000a70000)
---[ Kernel Mapping ]---
0xffffffc000000000-0xffffffc000082000 520K RW NX SHD AF UXN MEM/NORMAL
0xffffffc000082000-0xffffffc000200000 1528K ro x SHD AF UXN MEM/NORMAL
0xffffffc000200000-0xffffffc000800000 6M ro x SHD AF BLK UXN MEM/NORMAL
0xffffffc000800000-0xffffffc0009b6000 1752K ro x SHD AF UXN MEM/NORMAL
0xffffffc0009b6000-0xffffffc000c00000 2344K RW NX SHD AF UXN MEM/NORMAL
0xffffffc000c00000-0xffffffc008000000 116M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc00c000000-0xffffffc07f000000 1840M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc800000000-0xffffffc840000000 1G RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc840000000-0xffffffc87ae00000 942M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc87ae00000-0xffffffc87ae70000 448K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87af80000-0xffffffc87af8a000 40K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87af8b000-0xffffffc87b000000 468K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87b000000-0xffffffc87fe00000 78M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc87fe00000-0xffffffc87ff50000 1344K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87ff90000-0xffffffc87ffa0000 64K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87fff0000-0xffffffc880000000 64K RW NX SHD AF UXN MEM/NORMAL
After:
[ 3.138368] vdso: 2 pages (1 code @ ffffffc0006de000, 1 data @ ffffffc000a74000)
---[ Kernel Mapping ]---
0xffffffc000000000-0xffffffc000082000 520K RW NX SHD AF UXN MEM/NORMAL
0xffffffc000082000-0xffffffc000200000 1528K ro x SHD AF UXN MEM/NORMAL
0xffffffc000200000-0xffffffc000800000 6M ro x SHD AF BLK UXN MEM/NORMAL
0xffffffc000800000-0xffffffc0009b8000 1760K ro x SHD AF UXN MEM/NORMAL
0xffffffc0009b8000-0xffffffc000c00000 2336K RW NX SHD AF UXN MEM/NORMAL
0xffffffc000c00000-0xffffffc008000000 116M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc00c000000-0xffffffc07f000000 1840M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc800000000-0xffffffc840000000 1G RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc840000000-0xffffffc87ae00000 942M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc87ae00000-0xffffffc87ae70000 448K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87af80000-0xffffffc87af8a000 40K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87af8b000-0xffffffc87b000000 468K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87b000000-0xffffffc87fe00000 78M RW NX SHD AF BLK UXN MEM/NORMAL
0xffffffc87fe00000-0xffffffc87ff50000 1344K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87ff90000-0xffffffc87ffa0000 64K RW NX SHD AF UXN MEM/NORMAL
0xffffffc87fff0000-0xffffffc880000000 64K RW NX SHD AF UXN MEM/NORMAL
Inspired by https://lkml.org/lkml/2016/1/19/494 based on work by the
PaX Team, Brad Spengler, and Kees Cook.
Signed-off-by: David Brown <david.brown@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[catalin.marinas@arm.com: removed superfluous __PAGE_ALIGNED_DATA]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To enable UBSAN on arm64, ARCH_HAS_UBSAN_SANITIZE_ALL need to be selected.
Basic kernel bootup test is passed on arm64 with CONFIG_UBSAN_SANITIZE_ALL
enabled.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
With CONFIG_DEBUG_PAGEALLOC, pages do not have the valid bit
set when free in the buddy allocator. Add an indiciation to
the page table dumping code that the valid bit is not set,
'F' for fault, to make this easier to understand.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
pages for debugging purposes. This requires memory be mapped
with PAGE_SIZE mappings since breaking down larger mappings
at runtime will lead to TLB conflicts. Check if debug_pagealloc
is enabled at runtime and if so, map everyting with PAGE_SIZE
pages. Implement the functions to actually map/unmap the
pages at runtime.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
[catalin.marinas@arm.com: static annotation block_mappings_allowed() and #ifdef]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
create_mapping is only used in fixmap_remap_fdt. All the create_mapping
calls need to happen on existing translation table pages without
additional allocations. Rather than have an alloc function be called
and fail, just set it to NULL and catch its use. Also change
the name to create_mapping_noalloc to better capture what exactly is
going on.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As of 52e662326e1e ("arm64: prefetch: don't provide spin_lock_prefetch
with LSE"), spin_lock_prefetch is patched at runtime when the LSE atomics
are in use. This relies on the ARM64_LSE_ATOMIC_INSN macro to drive
the alternatives framework, but that macro is only available via
asm/lse.h, which isn't explicitly included in processor.h. Consequently,
drivers can run into build failures such as:
In file included from include/linux/prefetch.h:14:0,
from drivers/net/ethernet/intel/i40e/i40e_txrx.c:27:
arch/arm64/include/asm/processor.h: In function 'spin_lock_prefetch':
arch/arm64/include/asm/processor.h:183:15: error: expected string literal before 'ARM64_LSE_ATOMIC_INSN'
asm volatile(ARM64_LSE_ATOMIC_INSN(
This patch add the missing include and gets things building again.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so
we need to patch in explicit software prefetching instructions
Prefetching improves this code by 60% over the original code and 2x
over the code without prefetching for the affected hardware using the
benchmark code at https://github.com/apinski-cavium/copy_page_benchmark
Signed-off-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We want to avoid lots of different copy_page implementations, settling
for something that is "good enough" everywhere and hopefully easy to
understand and maintain whilst we're at it.
This patch reworks our copy_page implementation based on discussions
with Cavium on the list and benchmarking on Cortex-A processors so that:
- The loop is unrolled to copy 128 bytes per iteration
- The reads are offset so that we read from the next 128-byte block
in the same iteration that we store the previous block
- Explicit prefetch instructions are removed for now, since they hurt
performance on CPUs with hardware prefetching
- The loop exit condition is calculated at the start of the loop
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Most CPUs have a hardware prefetcher which generally performs better
without explicit prefetch instructions issued by software, however
some CPUs (e.g. Cavium ThunderX) rely solely on explicit prefetch
instructions.
This patch adds an alternative pattern (ARM64_HAS_NO_HW_PREFETCH) to
allow our library code to make use of explicit prefetch instructions
during things like copy routines only when the CPU does not have the
capability to perform the prefetching itself.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The LSE atomics rely on us not dirtying data at L1 if we can avoid it,
otherwise many of the potential scalability benefits are lost.
This patch replaces spin_lock_prefetch with a nop when the LSE atomics
are in use, so that users don't shoot themselves in the foot by causing
needless coherence traffic at L1.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The SBBR and ACPI specifications allow ACPI based systems that do not
implement PSCI (eg systems with no EL3) to boot through the ACPI parking
protocol specification[1].
This patch implements the ACPI parking protocol CPU operations, and adds
code that eases parsing the parking protocol data structures to the
ARM64 SMP initializion carried out at the same time as cpus enumeration.
To wake-up the CPUs from the parked state, this patch implements a
wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
ARM one, so that a specific IPI is sent for wake-up purpose in order
to distinguish it from other IPI sources.
Given the current ACPI MADT parsing API, the patch implements a glue
layer that helps passing MADT GICC data structure from SMP initialization
code to the parking protocol implementation somewhat overriding the CPU
operations interfaces. This to avoid creating a completely trasparent
DT/ACPI CPU operations layer that would require creating opaque
structure handling for CPUs data (DT represents CPU through DT nodes, ACPI
through static MADT table entries), which seems overkill given that ACPI
on ARM64 mandates only two booting protocols (PSCI and parking protocol),
so there is no need for further protocol additions.
Based on the original work by Mark Salter <msalter@redhat.com>
[1] https://acpica.org/sites/acpica/files/MP%20Startup%20for%20ARM%20platforms.docx
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Loc Ho <lho@apm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Al Stone <ahs3@redhat.com>
[catalin.marinas@arm.com: Added WARN_ONCE(!acpi_parking_protocol_valid() on the IPI]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
At boot we may change the granularity of the tables mapping the kernel
(by splitting or making sections). This may happen when we create the
linear mapping (in __map_memblock), or at any point we try to apply
fine-grained permissions to the kernel (e.g. fixup_executable,
mark_rodata_ro, fixup_init).
Changing the active page tables in this manner may result in multiple
entries for the same address being allocated into TLBs, risking problems
such as TLB conflict aborts or issues derived from the amalgamation of
TLB entries. Generally, a break-before-make (BBM) approach is necessary
to avoid conflicts, but we cannot do this for the kernel tables as it
risks unmapping text or data being used to do so.
Instead, we can create a new set of tables from scratch in the safety of
the existing mappings, and subsequently migrate over to these using the
new cpu_replace_ttbr1 helper, which avoids the two sets of tables being
active simultaneously.
To avoid issues when we later modify permissions of the page tables
(e.g. in fixup_init), we must create the page tables at a granularity
such that later modification does not result in splitting of tables.
This patch applies this strategy, creating a new set of fine-grained
page tables from scratch, and safely migrating to them. The existing
fixmap and kasan shadow page tables are reused in the new fine-grained
tables.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we have separate ALIGN_DEBUG_RO{,_MIN} directives to align
_etext and __init_begin. While we ensure that __init_begin is
page-aligned, we do not provide the same guarantee for _etext. This is
not problematic currently as the alignment of __init_begin is sufficient
to prevent issues when we modify permissions.
Subsequent patches will assume page alignment of segments of the kernel
we wish to map with different permissions. To ensure this, move _etext
after the ALIGN_DEBUG_RO_MIN for the init section. This renders the
prior ALIGN_DEBUG_RO irrelevant, and hence it is removed. Likewise,
upgrade to ALIGN_DEBUG_RO_MIN(PAGE_SIZE) for _stext.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To allow us to initialise pgdirs which are fixmapped, allow explicitly
passing a pgdir rather than an mm. A new __create_pgd_mapping function
is added for this, with existing __create_mapping callers migrated to
this.
The mm argument was previously only used at the top level. Now that it
is redundant at all levels, it is removed. To indicate its new found
similarity to alloc_init_{pud,pmd,pte}, __create_mapping is renamed to
init_pgd.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that create_mapping uses fixmap slots to modify pte, pmd, and pud
entries, we can access page tables anywhere in physical memory,
regardless of the extent of the linear mapping.
Given that, we no longer need to limit memblock allocations during page
table creation, and can leave the limit as its default
MEMBLOCK_ALLOC_ANYWHERE.
We never add memory which will fall outside of the linear map range
given phys_offset and MAX_MEMBLOCK_ADDR are configured appropriately, so
any tables we create will fall in the linear map of the final tables.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a preparatory step to allow us to allocate early page tables from
unmapped memory using memblock_alloc, modify the __create_mapping
callees to map and unmap the tables they modify using fixmap entries.
All but the top-level pgd initialisation is performed via the fixmap.
Subsequent patches will inject the pgd physical address, and migrate to
using the FIX_PGD slot.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a preparatory step to allow us to allocate early page tables from
unmapped memory using memblock_alloc, add new p??_{set,clear}_fixmap*
functions which can be used to walk page tables outside of the linear
mapping by using fixmap slots.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>