[ Upstream commit 10541b374aa05c8118cc6a529a615882e53f261b ]
When __bpf_prog_enter() returns zero, the s1 register is not set to zero,
resulting in incorrect runtime stats. Fix it by setting s1 immediately upon
the return of __bpf_prog_enter().
Fixes: 49b5e77ae3 ("riscv, bpf: Add bpf trampoline support for RV64")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240416064208.2919073-3-xukuohai@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 1d6cd2146c which was
mistakenly added into v6.6.y and the commit corresponding to the 'Fixes:'
tag is invalid. For more information, see link [1].
This will result in the loss of Crashkernel data in /proc/iomem, and kdump
failed:
```
Memory for crashkernel is not reserved
Please reserve memory by passing"crashkernel=Y@X" parameter to kernel
Then try to loading kdump kernel
```
After revert, kdump works fine. Tested on QEMU riscv.
Link: https://lore.kernel.org/linux-riscv/ZSiQRDGLZk7lpakE@MiWiFi-R3L-srv [1]
Cc: Baoquan He <bhe@redhat.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit aea702dde7e9876fb00571a2602f25130847bf0f ]
commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages for the linear
mapping") added logic to allow using RAM below the kernel load address.
However, this does not work for NOMMU, where PAGE_OFFSET is fixed to the
kernel load address. Since that range of memory corresponds to PFNs
below ARCH_PFN_OFFSET, mm initialization runs off the beginning of
mem_map and corrupts adjacent kernel memory. Fix this by restoring the
previous behavior for NOMMU kernels.
Fixes: 3335068f87 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6065e736f82c817c9a597a31ee67f0ce4628e948 ]
On NOMMU, userspace memory can come from anywhere in physical RAM. The
current definition of TASK_SIZE is wrong if any RAM exists above 4G,
causing spurious failures in the userspace access routines.
Fixes: 6bd33e1ece ("riscv: add nommu support")
Fixes: c3f896dcf1 ("mm: switch the test_vmalloc module to use __vmalloc_node")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Bo Gan <ganboing@gmail.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-2-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac88ff6b9d7dea9f0907c86bdae204dde7d5c0e6 ]
When below config items are set, compiler complained:
--------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
......
-----------------------
-------------------------------------------------------------------
arch/riscv/kernel/crash_core.c: In function 'arch_crash_save_vmcoreinfo':
arch/riscv/kernel/crash_core.c:11:58: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'int' [-Wformat=]
11 | vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START);
| ~~^
| |
| long unsigned int
| %x
----------------------------------------------------------------------
This is because on riscv macro VMALLOC_START has different type when
CONFIG_MMU is set or unset.
arch/riscv/include/asm/pgtable.h:
--------------------------------------------------
Changing it to _AC(0, UL) in case CONFIG_MMU=n can fix the warning.
Link: https://lkml.kernel.org/r/ZW7OsX4zQRA3mO4+@MiWiFi-R3L-srv
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Cc: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 6065e736f82c ("riscv: Fix TASK_SIZE on 64-bit NOMMU")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 upstream.
childregs represents the registers which are active for the new thread
in user context. For a kernel thread, childregs->gp is never used since
the kernel gp is not touched by switch_to. For a user mode helper, the
gp value can be observed in user space after execve or possibly by other
means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable
from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting
register state for PID 1 and other user processes started by the kernel has
sp = user stack, gp = kernel __global_pointer$, all other integer registers
zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only
way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
before it execs, but ptrace requires SIGSTOP to be delivered which can only
happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
user_mode_helpers before the exec completes, but gp is not one of the
registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have
not attempted to determine which forms of tracing allow access to user
registers without already allowing access to kernel registers.
Fixes: 7db91e57a0 ("RISC-V: Task implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear@fastmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d080a08b06b6266cc3e0e86c5acfd80db937cb6b upstream.
These macros did not initialize __kr_err, so they could fail even if
the access did not fault.
Cc: stable@vger.kernel.org
Fixes: d464118cdc ("riscv: implement __get_kernel_nofault and __put_user_nofault")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240312022030.320789-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a370c2419e4680a27382d9231edcf739d5d74efc ]
patch_map() uses fixmap mappings to circumvent the non-writability of
the kernel text mapping.
The __set_fixmap() function only flushes the current cpu tlb, it does
not emit an IPI so we must make sure that while we use a fixmap mapping,
the current task is not migrated on another cpu which could miss the
newly introduced fixmap mapping.
So in order to avoid any task migration, disable the preemption.
Reported-by: Andrea Parri <andrea@rivosinc.com>
Closes: https://lore.kernel.org/all/ZcS+GAaM25LXsBOl@andrea/
Reported-by: Andy Chiu <andy.chiu@sifive.com>
Closes: https://lore.kernel.org/linux-riscv/CABgGipUMz3Sffu-CkmeUB1dKVwVQ73+7=sgC45-m0AE9RCjOZg@mail.gmail.com/
Fixes: cad539baa4 ("riscv: implement a memset like function for text")
Fixes: 0ff7c3b331 ("riscv: Use text_mutex instead of patch_lock")
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20240326203017.310422-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 13dddf9319808badd2c1f5d7007b4e82838a648e ]
"riscv: signal: Report signal frame size to userspace via auxv" (e92f469)
has added new constant AT_MINSIGSTKSZ but failed to increment the size of
auxv, keeping AT_VECTOR_SIZE_ARCH at 9.
This fix correctly increments AT_VECTOR_SIZE_ARCH to 10, following the
approach in the commit 94b07c1 ("arm64: signal: Report signal frame size
to userspace via auxv").
Link: https://lore.kernel.org/r/73883406.20231215232720@torrio.net
Link: https://lore.kernel.org/all/20240102133617.3649-1-victor@torrio.net/
Reported-by: Ivan Komarov <ivan.komarov@dfyz.info>
Closes: https://lore.kernel.org/linux-riscv/CY3Z02NYV1C4.11BLB9PLVW9G1@fedora/
Fixes: e92f469b07 ("riscv: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Victor Isaev <isv@google.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8e936e98718f005c986be0bfa1ee6b355acf96be upstream.
The reads to APLIC in_clrip[x] registers returns rectified input values
of the interrupt sources.
A rectified input value of an interrupt source is defined by the section
"4.5.2 Source configurations (sourcecfg[1]–sourcecfg[1023])" of the
RISC-V AIA specification as:
rectified input value = (incoming wire value) XOR (source is inverted)
Update the riscv_aplic_input() implementation to match the above.
Cc: stable@vger.kernel.org
Fixes: 74967aa208 ("RISC-V: KVM: Add in-kernel emulation of AIA APLIC")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240321085041.1955293-3-apatel@ventanamicro.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8dd9f113e16bef3b29c9dcceb584a6f144f55e4 upstream.
The writes to setipnum_le/be register for APLIC in MSI-mode have special
consideration for level-triggered interrupts as-per the section "4.9.2
Special consideration for level-sensitive interrupt sources" of the RISC-V
AIA specification.
Particularly, the below text from the RISC-V AIA specification defines
the behaviour of writes to setipnum_le/be register for level-triggered
interrupts:
"A second option is for the interrupt service routine to write the
APLIC’s source identity number for the interrupt to the domain’s
setipnum register just before exiting. This will cause the interrupt’s
pending bit to be set to one again if the source is still asserting
an interrupt, but not if the source is not asserting an interrupt."
Fix setipnum_le/be write emulation for in-kernel APLIC by implementing
the above behaviour in aplic_write_pending() function.
Cc: stable@vger.kernel.org
Fixes: 74967aa208 ("RISC-V: KVM: Add in-kernel emulation of AIA APLIC")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240321085041.1955293-2-apatel@ventanamicro.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2bb7e0c49302feec1c2f777bbfe8726169986ed8 ]
By surrounding the definition of pte_leaf_size() with a ifdef napot as
it should have been.
Fixes: e0fe5ab4192c ("riscv: Fix pte_leaf_size() for NAPOT")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20240304080247.387710-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 680341382da56bd192ebfa4e58eaf4fec2e5bca7 upstream.
CALLER_ADDRx returns caller's address at specified level, they are used
for several tracers. These macros eventually use
__builtin_return_address(n) to get the caller's address if arch doesn't
define their own implementation.
In RISC-V, __builtin_return_address(n) only works when n == 0, we need
to walk the stack frame to get the caller's address at specified level.
data.level started from 'level + 3' due to the call flow of getting
caller's address in RISC-V implementation. If we don't have additional
three iteration, the level is corresponding to follows:
callsite -> return_address -> arch_stack_walk -> walk_stackframe
| | | |
level 3 level 2 level 1 level 0
Fixes: 10626c32e3 ("riscv/ftrace: Add basic support")
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Zong Li <zong.li@sifive.com>
Link: https://lore.kernel.org/r/20240202015102.26251-1-zong.li@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3aff0c459e77ac0fb1c4d6884433467f797f7357 upstream.
Commit e4bb020f3d ("riscv: detect assembler support for .option arch")
added two tests, one for a valid value to '.option arch' that should
succeed and one for an invalid value that is expected to fail to make
sure that support for '.option arch' is properly detected because Clang
does not error when '.option arch' is not supported:
$ clang --target=riscv64-linux-gnu -Werror -x assembler -c -o /dev/null <(echo '.option arch, +m')
/dev/fd/63:1:9: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
.option arch, +m
^
$ echo $?
0
Unfortunately, the invalid test started being accepted by Clang after
the linked llvm-project change, which causes CONFIG_AS_HAS_OPTION_ARCH
and configurations that depend on it to be silently disabled, even
though those versions do support '.option arch'.
The invalid test can be avoided altogether by using
'-Wa,--fatal-warnings', which will turn all assembler warnings into
errors, like '-Werror' does for the compiler:
$ clang --target=riscv64-linux-gnu -Werror -Wa,--fatal-warnings -x assembler -c -o /dev/null <(echo '.option arch, +m')
/dev/fd/63:1:9: error: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
.option arch, +m
^
$ echo $?
1
The as-instr macros have been updated to make use of this flag, so
remove the invalid test, which allows CONFIG_AS_HAS_OPTION_ARCH to work
for all compiler versions.
Cc: stable@vger.kernel.org
Fixes: e4bb020f3d ("riscv: detect assembler support for .option arch")
Link: 3ac9fe69f7
Reported-by: Eric Biggers <ebiggers@kernel.org>
Closes: https://lore.kernel.org/r/20240121011341.GA97368@sol.localdomain/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-2-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a11dd49dcb9376776193e15641f84fcc1e5980c9 ]
Offset vmemmap so that the first page of vmemmap will be mapped
to the first page of physical memory in order to ensure that
vmemmap’s bounds will be respected during
pfn_to_page()/page_to_pfn() operations.
The conversion macros will produce correct SV39/48/57 addresses
for every possible/valid DRAM_BASE inside the physical memory limits.
v2:Address Alex's comments
Suggested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Dimitris Vlachos <dvlachos@ics.forth.gr>
Reported-by: Dimitris Vlachos <dvlachos@ics.forth.gr>
Closes: https://lore.kernel.org/linux-riscv/20240202135030.42265-1-csd4492@csd.uoc.gr
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240229191723.32779-1-dvlachos@ics.forth.gr
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 16ab4646c9057e0528b985ad772e3cb88c613db2 ]
This reverts commit ce173474cf.
We cannot correctly deal with NAPOT mappings in vmalloc/vmap because if
some part of a NAPOT mapping is unmapped, the remaining mapping is not
updated accordingly. For example:
ptr = vmalloc_huge(64 * 1024, GFP_KERNEL);
vunmap_range((unsigned long)(ptr + PAGE_SIZE),
(unsigned long)(ptr + 64 * 1024));
leads to the following kernel page table dump:
0xffff8f8000ef0000-0xffff8f8000ef1000 0x00000001033c0000 4K PTE N .. .. D A G . . W R V
Meaning the first entry which was not unmapped still has the N bit set,
which, if accessed first and cached in the TLB, could allow access to the
unmapped range.
That's because the logic to break the NAPOT mapping does not exist and
likely won't. Indeed, to break a NAPOT mapping, we first have to clear
the whole mapping, flush the TLB and then set the new mapping ("break-
before-make" equivalent). That works fine in userspace since we can handle
any pagefault occurring on the remaining mapping but we can't handle a kernel
pagefault on such mapping.
So fix this by reverting the commit that introduced the vmap/vmalloc
support.
Fixes: ce173474cf ("riscv: mm: support Svnapot in huge vmap")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240227205016.121901-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d82f32202e0df7bf40d4b67c8a4ff9cea32df4d9 ]
Before attempting to support the pre-ratification version of vector
found on older T-Head CPUs, disallow "v" in riscv,isa on these
platforms. The deprecated property has no clear way to communicate
the specific version of vector that is supported and much of the vendor
provided software puts "v" in the isa string. riscv,isa-extensions
should be used instead. This should not be too much of a burden for
these systems, as the vendor shipped devicetrees and firmware do not
work with a mainline kernel and will require updating.
We can limit this restriction to only ignore v in riscv,isa on CPUs
that report T-Head's vendor ID and a zero marchid. Newer T-Head CPUs
that support the ratified version of vector should report non-zero
marchid, according to Guo Ren [1].
Link: https://lore.kernel.org/linux-riscv/CAJF2gTRy5eK73=d6s7CVy9m9pB8p4rAoMHM3cZFwzg=AuF7TDA@mail.gmail.com/ [1]
Fixes: dc6667a4e7 ("riscv: Extending cpufeature.c to detect V-extension")
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240223-tidings-shabby-607f086cb4d7@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4356e9f841f7fbb945521cef3577ba394c65f3fc upstream.
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2cf963787529f615f7c93bdcf13a5e82029e7f38 ]
The percpu area overflow_stacks is exported from arch/riscv/kernel/traps.c
for use in the entry code, but is not declared anywhere. Add the relevant
declaration to arch/riscv/include/asm/stacktrace.h to silence the following
sparse warning:
arch/riscv/kernel/traps.c:395:1: warning: symbol '__pcpu_scope_overflow_stack' was not declared. Should it be static?
We don't add the stackinfo_get_overflow() call as for some of the other
architectures as this doesn't seem to be used yet, so just silence the
warning.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Fixes: be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe")
Link: https://lore.kernel.org/r/20231123134214.81481-1-ben.dooks@codethink.co.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce68c035457bdd025a9961e0ba2157323090c581 ]
arch_hugetlb_migration_supported() must be reimplemented to add support
for NAPOT hugepages, which is done here.
Fixes: 82a1a1f3bf ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240130120114.106003-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a179a4bfb694f80f2709a1d0398469e787acb974 ]
When NAPOT is enabled, a new hugepage size is available and then we need
to make hugetlb_mask_last_page() aware of that.
Fixes: 82a1a1f3bf ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240117195741.1926459-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1458eb2c9d88ad4b35eb6d6a4aa1d43d8fbf7f62 ]
As stated by the privileged specification, we must clear a NAPOT
mapping and emit a sfence.vma before setting a new translation.
Fixes: 82a1a1f3bf ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240117195741.1926459-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9807d60c145836043ffa602328ea1d66dc458b1 ]
The spare_init() calls memmap_populate() many times to create VA to PA
mapping for the VMEMMAP area, where all "struct page" are located once
CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later
initialized in the zone_sizes_init() function. However, during this
process, no sfence.vma instruction is executed for this VMEMMAP area.
This omission may cause the hart to fail to perform page table walk
because some data related to the address translation is invisible to the
hart. To solve this issue, the local_flush_tlb_kernel_range() is called
right after the sparse_init() to execute a sfence.vma instruction for this
VMEMMAP area, ensuring that all data related to the address translation
is visible to the hart.
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240117140333.2479667-1-vincent.chen@sifive.com
Fixes: 7a92fc8b4d20 ("mm: Introduce flush_cache_vmap_early()")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a92fc8b4d20680e4c20289a670d8fca2d1f2c1b ]
The pcpu setup when using the page allocator sets up a new vmalloc
mapping very early in the boot process, so early that it cannot use the
flush_cache_vmap() function which may depend on structures not yet
initialized (for example in riscv, we currently send an IPI to flush
other cpus TLB).
But on some architectures, we must call flush_cache_vmap(): for example,
in riscv, some uarchs can cache invalid TLB entries so we need to flush
the new established mapping to avoid taking an exception.
So fix this by introducing a new function flush_cache_vmap_early() which
is called right after setting the new page table entry and before
accessing this new mapping. This new function implements a local flush
tlb on riscv and is no-op for other architectures (same as today).
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e22bfd520ea8740e9a20314d2a890baf304c9d2 ]
This function used to simply flush the whole tlb of all harts, be more
subtile and try to only flush the range.
The problem is that we can only use PAGE_SIZE as stride since we don't know
the size of the underlying mapping and then this function will be improved
only if the size of the region to flush is < threshold * PAGE_SIZE.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231030133027.19542-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d4e8d5fa7dbbb606b355f40d918a1feef821bc5 ]
Currently, when the range to flush covers more than one page (a 4K page or
a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
tlb comes with a greater cost than flushing a single entry so we should
flush single entries up to a certain threshold so that:
threshold * cost of flushing a single entry < cost of flushing the whole
tlb.
Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231030133027.19542-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5e9b2c2ae82231d85d9650854e7b3e97dde33da ]
For now, tlb_flush() simply calls flush_tlb_mm() which results in a
flush of the whole TLB. So let's use mmu_gather fields to provide a more
fine-grained flush of the TLB.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Link: https://lore.kernel.org/r/20231030133027.19542-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66f962d8939fd2ac74de901d30d30310c8ddca79 ]
commit 66f1e6809397 ("riscv: Make XIP bootable again") restricted page
offset to the sv39 page offset instead of the default sv57, which makes
sense since probably the platforms that target XIP kernels do not
support anything else than sv39 and we do not try to find out the
largest address space supported on XIP kernels (ie set_satp_mode()).
But PAGE_OFFSET_L3 is not defined for rv32, so fix the build error by
restoring the previous behaviour which picks CONFIG_PAGE_OFFSET for rv32.
Fixes: 66f1e6809397 ("riscv: Make XIP bootable again")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/linux-riscv/344dca85-5c48-44e1-bc64-4fa7973edd12@infradead.org/T/#u
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20240118212120.2087803-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66f1e68093979816a23412a3fad066f5bcbc0360 ]
Currently, the XIP kernel seems to fail to boot due to missing
XIP_FIXUP and a wrong page_offset value. A superfluous XIP_FIXUP
has also been removed.
Signed-off-by: Frederik Haxel <haxel@fzi.de>
Link: https://lore.kernel.org/r/20231212130116.848530-2-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64bac5ea17d527872121adddfee869c7a0618f8f ]
The prototype was hidden in an #ifdef on x86, which causes a warning:
kernel/irq_work.c:72:13: error: no previous prototype for 'arch_irq_work_raise' [-Werror=missing-prototypes]
Some architectures have a working prototype, while others don't.
Fix this by providing it in only one place that is always visible.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 5f449e245e5b0d9d63eef6c8968fbdc3a8594407 upstream.
In COMPAT mode, the STACK_TOP is DEFAULT_MAP_WINDOW (0x80000000), but
the TASK_SIZE is 0x7fff000. When the user stack is upon 0x7fff000, it
will cause a user segment fault. Sometimes, it would cause boot
failure when the whole rootfs is rv32.
Freeing unused kernel image (initmem) memory: 2236K
Run /sbin/init as init process
Starting init: /sbin/init exists but couldn't execute it (error -14)
Run /etc/init as init process
...
Increase the TASK_SIZE to cover STACK_TOP.
Cc: stable@vger.kernel.org
Fixes: add2cc6b65 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231222115703.2404036-2-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97b7ac69be2e5a683e898f5267f659fde52efdd5 upstream.
When the task is in COMPAT mode, the arch_get_mmap_end should be 2GB,
not TASK_SIZE_64. The TASK_SIZE has contained is_compat_mode()
detection, so change the definition of STACK_TOP_MAX to TASK_SIZE
directly.
Cc: stable@vger.kernel.org
Fixes: add2cc6b65 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231222115703.2404036-3-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit adb1f95d388a43c4c564ef3e436f18900dde978e ]
The ending NULL is not taken into account by strncat(), so switch to
strlcat() to correctly compute the size of the available memory when
appending CONFIG_CMDLINE to 'early_cmdline'.
Fixes: 26e7aacb83 ("riscv: Allow to downgrade paging mode from the command line")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/9f66d2b58c8052d4055e90b8477ee55d9a0914f9.1698564026.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5daa3726410288075ba73c336bb2e80d6b06aa4d ]
During the refactoring, a bug was introduced in the rarly used
XIP_FIXUP_FLASH_OFFSET macro.
Fixes: bee7fbc385 ("RISC-V CPU Idle Support")
Fixes: e7681beba9 ("RISC-V: Split out the XIP fixups into their own file")
Signed-off-by: Frederik Haxel <haxel@fzi.de>
Link: https://lore.kernel.org/r/20231212130116.848530-3-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8b2711336f03ece539de61479d6ffc44fb603d3 ]
When resetting the linear mapping permissions, we must make sure that we
clear the X bit so that do not end up with WX mappings (since we set
PAGE_KERNEL).
Fixes: 395a21ff85 ("riscv: add ARCH_HAS_SET_DIRECT_MAP support")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231213134027.155327-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 311cd2f6e25380cff0abc2884dc6a3d33bc9b5c3 ]
When STRICT_KERNEL_RWX is set, any change of permissions on any kernel
mapping (vmalloc/modules/kernel text...etc) should be applied on its
linear mapping alias. The problem is that the riscv kernel uses huge
mappings for the linear mapping and walk_page_range_novma() does not
split those huge mappings.
So this patchset implements such split in order to apply fine-grained
permissions on the linear mapping.
Below is the difference before and after (the first PUD mapping is split
into PTE/PMD mappings):
Before:
---[ Linear mapping ]---
0xffffaf8000080000-0xffffaf8000200000 0x0000000080080000 1536K PTE D A G . . W R V
0xffffaf8000200000-0xffffaf8077c00000 0x0000000080200000 1914M PMD D A G . . W R V
0xffffaf8077c00000-0xffffaf8078800000 0x00000000f7c00000 12M PMD D A G . . . R V
0xffffaf8078800000-0xffffaf8078c00000 0x00000000f8800000 4M PMD D A G . . W R V
0xffffaf8078c00000-0xffffaf8079200000 0x00000000f8c00000 6M PMD D A G . . . R V
0xffffaf8079200000-0xffffaf807e600000 0x00000000f9200000 84M PMD D A G . . W R V
0xffffaf807e600000-0xffffaf807e716000 0x00000000fe600000 1112K PTE D A G . . W R V
0xffffaf807e717000-0xffffaf807e71a000 0x00000000fe717000 12K PTE D A G . . W R V
0xffffaf807e71d000-0xffffaf807e71e000 0x00000000fe71d000 4K PTE D A G . . W R V
0xffffaf807e722000-0xffffaf807e800000 0x00000000fe722000 888K PTE D A G . . W R V
0xffffaf807e800000-0xffffaf807fe00000 0x00000000fe800000 22M PMD D A G . . W R V
0xffffaf807fe00000-0xffffaf807ff54000 0x00000000ffe00000 1360K PTE D A G . . W R V
0xffffaf807ff55000-0xffffaf8080000000 0x00000000fff55000 684K PTE D A G . . W R V
0xffffaf8080000000-0xffffaf8400000000 0x0000000100000000 14G PUD D A G . . W R V
After:
---[ Linear mapping ]---
0xffffaf8000080000-0xffffaf8000200000 0x0000000080080000 1536K PTE D A G . . W R V
0xffffaf8000200000-0xffffaf8077c00000 0x0000000080200000 1914M PMD D A G . . W R V
0xffffaf8077c00000-0xffffaf8078800000 0x00000000f7c00000 12M PMD D A G . . . R V
0xffffaf8078800000-0xffffaf8078a00000 0x00000000f8800000 2M PMD D A G . . W R V
0xffffaf8078a00000-0xffffaf8078c00000 0x00000000f8a00000 2M PTE D A G . . W R V
0xffffaf8078c00000-0xffffaf8079200000 0x00000000f8c00000 6M PMD D A G . . . R V
0xffffaf8079200000-0xffffaf807e600000 0x00000000f9200000 84M PMD D A G . . W R V
0xffffaf807e600000-0xffffaf807e716000 0x00000000fe600000 1112K PTE D A G . . W R V
0xffffaf807e717000-0xffffaf807e71a000 0x00000000fe717000 12K PTE D A G . . W R V
0xffffaf807e71d000-0xffffaf807e71e000 0x00000000fe71d000 4K PTE D A G . . W R V
0xffffaf807e722000-0xffffaf807e800000 0x00000000fe722000 888K PTE D A G . . W R V
0xffffaf807e800000-0xffffaf807fe00000 0x00000000fe800000 22M PMD D A G . . W R V
0xffffaf807fe00000-0xffffaf807ff54000 0x00000000ffe00000 1360K PTE D A G . . W R V
0xffffaf807ff55000-0xffffaf8080000000 0x00000000fff55000 684K PTE D A G . . W R V
0xffffaf8080000000-0xffffaf8080800000 0x0000000100000000 8M PMD D A G . . W R V
0xffffaf8080800000-0xffffaf8080af6000 0x0000000100800000 3032K PTE D A G . . W R V
0xffffaf8080af6000-0xffffaf8080af8000 0x0000000100af6000 8K PTE D A G . X . R V
0xffffaf8080af8000-0xffffaf8080c00000 0x0000000100af8000 1056K PTE D A G . . W R V
0xffffaf8080c00000-0xffffaf8081a00000 0x0000000100c00000 14M PMD D A G . . W R V
0xffffaf8081a00000-0xffffaf8081a40000 0x0000000101a00000 256K PTE D A G . . W R V
0xffffaf8081a40000-0xffffaf8081a44000 0x0000000101a40000 16K PTE D A G . X . R V
0xffffaf8081a44000-0xffffaf8081a52000 0x0000000101a44000 56K PTE D A G . . W R V
0xffffaf8081a52000-0xffffaf8081a54000 0x0000000101a52000 8K PTE D A G . X . R V
...
0xffffaf809e800000-0xffffaf80c0000000 0x000000011e800000 536M PMD D A G . . W R V
0xffffaf80c0000000-0xffffaf8400000000 0x0000000140000000 13G PUD D A G . . W R V
Note that this also fixes memfd_secret() syscall which uses
set_direct_map_invalid_noflush() and set_direct_map_default_noflush() to
remove the pages from the linear mapping. Below is the kernel page table
while a memfd_secret() syscall is running, you can see all the !valid
page table entries in the linear mapping:
...
0xffffaf8082240000-0xffffaf8082241000 0x0000000102240000 4K PTE D A G . . W R .
0xffffaf8082241000-0xffffaf8082250000 0x0000000102241000 60K PTE D A G . . W R V
0xffffaf8082250000-0xffffaf8082252000 0x0000000102250000 8K PTE D A G . . W R .
0xffffaf8082252000-0xffffaf8082256000 0x0000000102252000 16K PTE D A G . . W R V
0xffffaf8082256000-0xffffaf8082257000 0x0000000102256000 4K PTE D A G . . W R .
0xffffaf8082257000-0xffffaf8082258000 0x0000000102257000 4K PTE D A G . . W R V
0xffffaf8082258000-0xffffaf8082259000 0x0000000102258000 4K PTE D A G . . W R .
0xffffaf8082259000-0xffffaf808225a000 0x0000000102259000 4K PTE D A G . . W R V
0xffffaf808225a000-0xffffaf808225c000 0x000000010225a000 8K PTE D A G . . W R .
0xffffaf808225c000-0xffffaf8082266000 0x000000010225c000 40K PTE D A G . . W R V
0xffffaf8082266000-0xffffaf8082268000 0x0000000102266000 8K PTE D A G . . W R .
0xffffaf8082268000-0xffffaf8082284000 0x0000000102268000 112K PTE D A G . . W R V
0xffffaf8082284000-0xffffaf8082288000 0x0000000102284000 16K PTE D A G . . W R .
0xffffaf8082288000-0xffffaf808229c000 0x0000000102288000 80K PTE D A G . . W R V
0xffffaf808229c000-0xffffaf80822a0000 0x000000010229c000 16K PTE D A G . . W R .
0xffffaf80822a0000-0xffffaf80822a5000 0x00000001022a0000 20K PTE D A G . . W R V
0xffffaf80822a5000-0xffffaf80822a6000 0x00000001022a5000 4K PTE D A G . . . R V
0xffffaf80822a6000-0xffffaf80822ab000 0x00000001022a6000 20K PTE D A G . . W R V
...
And when the memfd_secret() fd is released, the linear mapping is
correctly reset:
...
0xffffaf8082240000-0xffffaf80822a5000 0x0000000102240000 404K PTE D A G . . W R V
0xffffaf80822a5000-0xffffaf80822a6000 0x00000001022a5000 4K PTE D A G . . . R V
0xffffaf80822a6000-0xffffaf80822af000 0x00000001022a6000 36K PTE D A G . . W R V
...
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231108075930.7157-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Stable-dep-of: b8b2711336f0 ("riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 749b94b08005929bbc636df21a23322733166e35 ]
After unloading a module, we must reset the linear mapping permissions,
see the example below:
Before unloading a module:
0xffffaf809d65d000-0xffffaf809d6dc000 0x000000011d65d000 508K PTE . .. .. D A G . . W R V
0xffffaf809d6dc000-0xffffaf809d6dd000 0x000000011d6dc000 4K PTE . .. .. D A G . . . R V
0xffffaf809d6dd000-0xffffaf809d6e1000 0x000000011d6dd000 16K PTE . .. .. D A G . . W R V
0xffffaf809d6e1000-0xffffaf809d6e7000 0x000000011d6e1000 24K PTE . .. .. D A G . X . R V
After unloading a module:
0xffffaf809d65d000-0xffffaf809d6e1000 0x000000011d65d000 528K PTE . .. .. D A G . . W R V
0xffffaf809d6e1000-0xffffaf809d6e7000 0x000000011d6e1000 24K PTE . .. .. D A G . X W R V
The last mapping is not reset and we end up with WX mappings in the linear
mapping.
So add VM_FLUSH_RESET_PERMS to our module_alloc() definition.
Fixes: 0cff8bff7a ("riscv: avoid the PIC offset of static percpu data in module beyond 2G limits")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231213134027.155327-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 420370f3ae3d3b883813fd3051a38805160b2b9f ]
Otherwise we fall through to vmalloc_to_page() which panics since the
address does not lie in the vmalloc region.
Fixes: 043cb41a85 ("riscv: introduce interfaces to patch kernel code")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231214091926.203439-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4aebe936554dac6a91e5d091179c934f8325708 ]
Only the posix timer system calls use this (when the posix timer support
is disabled, which does not actually happen in any normal case), because
they had debug code to print out a warning about missing system calls.
Get rid of that special case, and just use the standard COND_SYSCALL
interface that creates weak system call stubs that return -ENOSYS for
when the system call does not exist.
This fixes a kCFI issue with the SYS_NI() hackery:
CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ad9843e1ea088bd2529290234c6c4c6374836a7 ]
The emulated IMSIC update the external interrupt pending depending on
the value of eidelivery and topei. It might lose an interrupt when it
is interrupted before setting the new value to the pending status.
For example, when VCPU0 sends an IPI to VCPU1 via IMSIC:
VCPU0 VCPU1
CSRSWAP topei = 0
The VCPU1 has claimed all the external
interrupt in its interrupt handler.
topei of VCPU1's IMSIC = 0
set pending in VCPU1's IMSIC
topei of VCPU1' IMSIC = 1
set the external interrupt
pending of VCPU1
clear the external interrupt pending
of VCPU1
When the VCPU1 switches back to VS mode, it exits the interrupt handler
because the result of CSRSWAP topei is 0. If there are no other external
interrupts injected into the VCPU1's IMSIC, VCPU1 will never know this
pending interrupt unless it initiative read the topei.
If the interruption occurs between updating interrupt pending in IMSIC
and updating external interrupt pending of VCPU, it will not cause a
problem. Suppose that the VCPU1 clears the IPI pending in IMSIC right
after VCPU0 sets the pending, the external interrupt pending of VCPU1
will not be set because the topei is 0. But when the VCPU1 goes back to
VS mode, the pending IPI will be reported by the CSRSWAP topei, it will
not lose this interrupt.
So we only need to make the external interrupt updating procedure as a
critical section to avoid the problem.
Fixes: db8b7e97d6 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC")
Tested-by: Roy Lin <roy.lin@sifive.com>
Tested-by: Wayling Chen <wayling.chen@sifive.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c20d36cc2a2073d4cdcda92bd7a1bb9b3b3b7c79 ]
If misaligned_access_speed percpu var isn't so called "HWPROBE
MISALIGNED UNKNOWN", it means the probe has happened(this is possible
for example, hotplug off then hotplug on one cpu), and the percpu var
has been set, don't probe again in this case.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 584ea6564b ("RISC-V: Probe for unaligned access speed")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230912154040.3306-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c1ad12ee0efc07244be37f69311e6f7c4ac98e62 ]
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the
'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which
causes a link failure when all the crypto support is in a loadable module
and kexec_file support is built-in:
x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load':
(.text+0x32e30a): undefined reference to `crypto_alloc_shash'
x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update'
x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final'
Both s390 and x86 have this problem, while ppc64 and riscv have the
correct dependency already. On riscv, the dependency is only used for the
purgatory, not for the kexec_file code itself, which may be a bit
surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable
KEXEC_FILE but then the purgatory code is silently left out.
Move this into the common Kconfig.kexec file in a way that is correct
everywhere, using the dependency on CRYPTO_SHA256=y only when the
purgatory code is available. This requires reversing the dependency
between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect
remains the same, other than making riscv behave like the other ones.
On s390, there is an additional dependency on CRYPTO_SHA256_S390, which
should technically not be required but gives better performance. Remove
this dependency here, noting that it was not present in the initial
Kconfig code but was brought in without an explanation in commit
71406883fd ("s390/kexec_file: Add kexec_file_load system call").
[arnd@arndb.de: fix riscv build]
Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com
Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org
Fixes: 6af5138083 ("x86/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c41bd2514184d75db087fe4c1221237fb7922875 upstream.
In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for
CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP
required CONFIG_KEXEC.
However, it was not enough at least for arm64 platforms. While further
testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP
is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still
depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit
91506f7e5d ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on
arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in
turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of
which are set in our config.
Given that we already established that CONFIG_KEXEC (which is a switch for
kexec system call itself) is not required for CONFIG_CRASH_DUMP drop
CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds
just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and
CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select
of KEXEC for CRASH_DUMP") and this patch are applied given that the
necessary shared bits are included via CONFIG_KEXEC_CORE dependency.
[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n]
Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv
[bhe@redhat.com: riscv, kexec: fix dependency of two items]
Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com
Fixes: 91506f7e5d ("arm64/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Cc: <stable@vger.kernel.org> # 6.6+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78a03b9f8e6b317f7c65738a3fc60e1e85106a64 upstream.
Selects ARM_AMBA platform support for StarFive SoCs required by spi and
crypto dma engine.
Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Nam Cao <namcao@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ed5b7cfd7839f9280a63365c1133482b42d0981f ]
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,
[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ #57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558] ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607] gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795] t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984] s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360] a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548] s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736] s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924] s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113] s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301] t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---
This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.
While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().
Fixes: e021ae7f51 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com>
Link: https://lore.kernel.org/r/20231130212647.108746-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>